source: tags/arb-6.0.5/HELP_SOURCE/oldhelp/next_neighbours_common.hlp

Last change on this file was 10594, checked in by westram, 11 years ago
  • help unittest
    • at start: change to directory HELP_SOURCE
    • also perform output (unchecked; just to /dev/null)
  • recognize TODOs in comments
  • allow '*' as list indicator
  • checked and corrected affected help files
File size: 7.5 KB
Line 
1#Please insert up references in the next lines (line starts with keyword UP)
2UP      arb.hlp
3UP      glossary.hlp
4UP      pt_server.hlp
5
6#Please insert subtopic references  (line starts with keyword SUB)
7SUB     next_neighbours.hlp
8SUB     next_neighbours_listed.hlp
9SUB     faligner.hlp
10
11# Hypertext links in helptext can be added like this: LINK{ref.hlp|http://add|bla@domain}
12
13#************* Title of helpfile !! and start of real helpfile ********
14TITLE           Nearest relative search
15
16OCCURRENCE      ARB_NT/Search/More search/Search Next Relatives of SELECTED Species in PT Server
17                ARB_NT/Search/More search/Search Next Relatives of LISTED Species in PT Server
18                ARB_EDIT4/Edit/Integrated Aligners
19
20SECTION         ALGORITHM
21
22                Splits the sequence(s) into short oligos of a given size.
23                These oligos are 'Probe Matched' against the PT_SERVER database.
24                The more hits within the sequence of another species, the more related the other species is.
25
26SECTION         PARAMETERS
27
28                PT-Server
29
30                        Select the PT-Server to search
31
32                Oligo length
33
34                        Length of oligos used to perform probe match against the PT server.
35                        Default is 12.
36
37                Mismatches
38
39                        Number of mismatches allowed per oligo.
40                        Default is 0.
41
42                        Be careful: The search may get incredible slow, when rising the number of mismatches.
43
44                Search mode
45
46                       Complete:        Match all possible oligos
47                       Quick:           Only match oligos starting with 'A'
48
49                       The 'Quick mode' works well for many sequence types and is approx. 4 times
50                       faster than the 'Complete mode'. For some sequence types it completely fails,
51                       e.g. if there are repetitive areas containing many 'AAAAA'
52
53                       Relative and absolute scores will be approx. 1/4 (compared with complete mode)
54
55                Match score:
56
57                       absolute:        returns the absolute number of hits
58                       relative:        returns the number of hits relative to some maximum (see score-scaling)
59
60                       Absolute hits:
61
62                                Absolute hits are the number of oligos which occur in the source sequence
63                                and in the targeted sequences (i.e. in the relatives of the source sequence).
64
65                                If an oligo occurs multiple times in source or target sequence, it only
66                                creates the minimum number of hits (e.g. if it occurs twice in source and
67                                three times in a target, only two hits will be counted for that target).
68
69                                The theoretical maximum for absolute hits is
70
71                                    maxhits = minimumBasecount(source, target) - oligolen + 1
72
73                                In practice that value is rarely or never reached because several oligos
74                                are skipped, namely all oligos containing IUPAC codes, N's or dots.
75                                The PT-server as well will not report matches hitting ambiguous positions
76                                or sequence endings.
77
78                                The number of absolute hits is as well affected by other parameters:
79
80                                - using quick search will only produces around 25% of the hits as using
81                                  complete search (assuming that 25% of all oligo starts with an 'A')
82                                - searching for complement or reverse will duplicate the number of possible
83                                  hits. Searching for all 4 reverse/complement-combinations will produce
84                                  4 times as many hits as a plain forward search.
85
86                       Relative score:
87
88                                The relative score is absolute hits scaled versus a maximum POC (possible oligo count).
89                                You can specify which maximum POC to use with the selection button next to
90                                the score selection button:
91
92                                        to source POC         maximum possible oligos in source
93                                        to target POC         maximum possible oligos in target
94                                        to minimum POC        minimum possible oligos in source or target
95                                        to maximum POC        maximum possible oligos in source or target
96
97                                'to source POC' will report ~100% score for partial source versus
98                                all full sequences containing the part.
99
100                                'to target POC' will report ~100% score for all partial target sequences
101                                which are contained in the source sequence.
102
103                                'to minimum POC' will report ~100% score if source is part of target or vice versa
104                                (this was the default method in previous ARB versions).
105
106                                'to maximum POC' will report ~100% score if source and target contain each other, i.e.
107                                if they have an identical oligo distribution. If either source or target is missing
108                                some bases, the score will lower.
109
110
111                                When using 'quick search mode' the max. relative score will be 25% (if 25% of
112                                the oligos start with 'A').
113
114                                When searching for forward and reverse-complement, the theoretical max. relative
115                                score will be 200%. In practice it won't find much hits on the reverse-complement
116                                strand. So you'll get similar scores as without reverse-complement, but especially
117                                if you lower the oligo size, you'll probably reach scores above 100%.
118
119
120                                The EDIT4 aligner currently always uses 'to minimum POC'.
121
122
123                Complement:
124
125                       forward:             Match only forward oligos
126                       reverse:             Match only reverse oligos
127                       complement:          Match only complement oligos
128                       reverse-complement:  Match only reverse-complement oligos
129
130                       The remaining options are combinations of the above.
131
132                       The combinations will affect the score, especially for shorter oligos.
133                       Please read the section about 'Relative score' above to avoid confusion.
134
135                       Note: Not available for EDIT4 aligner.
136
137                Target range:
138
139                       Restrict the alignment range in which oligos may match.
140                       Hits outside that range will not be considered.
141
142NOTES           Special effort is taken to eliminate multi-matches, which were ignored in past versions.
143                That resulted in relative scores far beyond 100%, especially for small oligo-lengths.
144
145                Now e.g. an oligo occurring 3 times in the source sequence will give atmost 3 absolute
146                hitpoints to any target sequence - even if it occurs there far more often.
147
148EXAMPLES        None
149
150WARNINGS        Use mismatches with care!
151
152BUGS            Relative score is not scaled to the maximum possible hits in the target range.
Note: See TracBrowser for help on using the repository browser.