source: branches/lib/HELP_SOURCE/source/next_neighbours_common.hlp

Last change on this file was 19575, checked in by westram, 3 weeks ago
  • reintegrates 'help' into 'trunk'
    • preformatted text gets checked for width now (to enforce it fits into the arb help window).
    • fixed help following these checks, using the following steps:
      • ignore problems in foreign documentation.
      • increase default help window width.
      • introduce control comments to
        • accept oversized preformatted sections.
        • enforce preformatted style for whole sections.
        • simply define single-line preformatted sections
          Used intensive for definition of internal script languages.
    • fixed several non-related problems found in documentation.
    • minor layout changes for HTML version of arb help (more compacted; highlight anchored/all sections).
    • refactor system interface (GUI version) and use it from help module.
  • adds: log:branches/help@19532:19574
File size: 7.3 KB
Line 
1#       main topics:
2UP      arb.hlp
3UP      glossary.hlp
4UP      pt_server.hlp
5
6#       sub topics:
7SUB     next_neighbours.hlp
8SUB     next_neighbours_listed.hlp
9SUB     faligner.hlp
10
11# format described in ../help.readme
12
13
14TITLE           Nearest relative search
15
16OCCURRENCE      ARB_NT/Search/More search/Search Next Relatives of SELECTED Species in PT Server
17                ARB_NT/Search/More search/Search Next Relatives of LISTED Species in PT Server
18                ARB_EDIT4/Edit/Integrated Aligners
19
20SECTION         ALGORITHM
21
22                Splits the sequence(s) into short oligos of a given size.
23                These oligos are 'Probe Matched' against the PT_SERVER database.
24                The more hits within the sequence of another species, the more related the other species is.
25
26SECTION         PARAMETERS
27
28                PT-Server
29
30                        Select the PT-Server to search
31
32                Oligo length
33
34                        Length of oligos used to perform probe match against the PT server.
35                        Default is 12.
36
37                Mismatches
38
39                        Number of mismatches allowed per oligo.
40                        Default is 0.
41
42                        Be careful: The search may get incredible slow, when rising the number of mismatches.
43
44                Search mode
45
46                       Complete:        Match all possible oligos
47                       Quick:           Only match oligos starting with 'A'
48
49                       The 'Quick mode' works well for many sequence types and is approx. 4 times
50                       faster than the 'Complete mode'. For some sequence types it completely fails,
51                       e.g. if there are repetitive areas containing many 'AAAAA'
52
53                       Relative and absolute scores will be approx. 1/4 (compared with complete mode)
54
55                Match score:
56
57                       absolute:        returns the absolute number of hits
58                       relative:        returns the number of hits relative to some maximum
59                                        (for details read below)
60
61                       Absolute hits:
62
63                                Absolute hits are the number of oligos which occur in the source sequence
64                                and in the targeted sequences (i.e. in the relatives of the source sequence).
65
66                                If an oligo occurs multiple times in source or target sequence, it only
67                                creates the minimum number of hits (e.g. if it occurs twice in source and
68                                three times in a target, only two hits will be counted for that target).
69
70                                The theoretical maximum for absolute hits is
71
72                                    maxhits = minimumBasecount(source, target) - oligolen + 1
73
74                                In practice that value is rarely or never reached because several oligos
75                                are skipped, namely all oligos containing IUPAC codes, N's or dots.
76                                The PT-server as well will not report matches hitting ambiguous positions
77                                or sequence endings.
78
79                                The number of absolute hits is as well affected by other parameters:
80
81                                - using quick search will only produces around 25% of the hits as using
82                                  complete search (assuming that 25% of all oligo starts with an 'A')
83                                - searching for complement or reverse will duplicate the number of possible
84                                  hits. Searching for all 4 reverse/complement-combinations will produce
85                                  4 times as many hits as a plain forward search.
86
87                       Relative score:
88
89                                The relative score is absolute hits scaled versus a maximum POC (possible oligo count).
90                                You can specify which maximum POC to use with the selection button next to
91                                the score selection button:
92
93                                        to source POC         maximum possible oligos in source
94                                        to target POC         maximum possible oligos in target
95                                        to minimum POC        minimum possible oligos in source or target
96                                        to maximum POC        maximum possible oligos in source or target
97
98                                'to source POC' will report ~100% score for partial source versus
99                                all full sequences containing the part.
100
101                                'to target POC' will report ~100% score for all partial target sequences
102                                which are contained in the source sequence.
103
104                                'to minimum POC' will report ~100% score if source is part of target or vice versa
105                                (this was the default method in previous ARB versions).
106
107                                'to maximum POC' will report ~100% score if source and target contain each other, i.e.
108                                if they have an identical oligo distribution. If either source or target is missing
109                                some bases, the score will lower.
110
111
112                                When using 'quick search mode' the max. relative score will be 25% (if 25% of
113                                the oligos start with 'A').
114
115                                When searching for forward and reverse-complement, the theoretical max. relative
116                                score will be 200%. In practice it won't find much hits on the reverse-complement
117                                strand. So you'll get similar scores as without reverse-complement, but especially
118                                if you lower the oligo size, you'll probably reach scores above 100%.
119
120
121                                The EDIT4 aligner currently always uses 'to minimum POC'.
122
123
124                Complement:
125
126                       forward:             Match only forward oligos
127                       reverse:             Match only reverse oligos
128                       complement:          Match only complement oligos
129                       reverse-complement:  Match only reverse-complement oligos
130
131                       The remaining options are combinations of the above.
132
133                       The combinations will affect the score, especially for shorter oligos.
134                       Please read the section about 'Relative score' above to avoid confusion.
135
136                       Note: Not available for EDIT4 aligner.
137
138                Target range:
139
140                       Restrict the alignment range in which oligos may match.
141                       Hits outside that range will not be considered.
142
143NOTES           Special effort is taken to eliminate multi-matches, which were ignored in past versions.
144                That resulted in relative scores far beyond 100%, especially for small oligo-lengths.
145
146                Now e.g. an oligo occurring 3 times in the source sequence will give atmost 3 absolute
147                hitpoints to any target sequence - even if it occurs there far more often.
148
149EXAMPLES        None
150
151WARNINGS        Use mismatches with care!
152
153BUGS            Relative score is not scaled to the maximum possible hits in the target range.
Note: See TracBrowser for help on using the repository browser.