1 | #Please insert up references in the next lines (line starts with keyword UP) |
---|
2 | UP arb.hlp |
---|
3 | UP glossary.hlp |
---|
4 | UP pt_server.hlp |
---|
5 | |
---|
6 | #Please insert subtopic references (line starts with keyword SUB) |
---|
7 | SUB next_neighbours.hlp |
---|
8 | SUB next_neighbours_listed.hlp |
---|
9 | SUB faligner.hlp |
---|
10 | |
---|
11 | # Hypertext links in helptext can be added like this: LINK{ref.hlp|http://add|bla@domain} |
---|
12 | |
---|
13 | #************* Title of helpfile !! and start of real helpfile ******** |
---|
14 | TITLE Nearest relative search |
---|
15 | |
---|
16 | OCCURRENCE ARB_NT/Search/More search/Search Next Relatives of SELECTED Species in PT Server |
---|
17 | ARB_NT/Search/More search/Search Next Relatives of LISTED Species in PT Server |
---|
18 | ARB_EDIT4/Edit/Integrated Aligners |
---|
19 | |
---|
20 | SECTION ALGORITHM |
---|
21 | |
---|
22 | Splits the sequence(s) into short oligos of a given size. |
---|
23 | These oligos are 'Probe Matched' against the PT_SERVER database. |
---|
24 | The more hits within the sequence of another species, the more related the other species is. |
---|
25 | |
---|
26 | SECTION PARAMETERS |
---|
27 | |
---|
28 | PT-Server |
---|
29 | |
---|
30 | Select the PT-Server to search |
---|
31 | |
---|
32 | Oligo length |
---|
33 | |
---|
34 | Length of oligos used to perform probe match against the PT server. |
---|
35 | Default is 12. |
---|
36 | |
---|
37 | Mismatches |
---|
38 | |
---|
39 | Number of mismatches allowed per oligo. |
---|
40 | Default is 0. |
---|
41 | |
---|
42 | Be careful: The search may get incredible slow, when rising the number of mismatches. |
---|
43 | |
---|
44 | Search mode |
---|
45 | |
---|
46 | Complete: Match all possible oligos |
---|
47 | Quick: Only match oligos starting with 'A' |
---|
48 | |
---|
49 | The 'Quick mode' works well for many sequence types and is approx. 4 times |
---|
50 | faster than the 'Complete mode'. For some sequence types it completely fails, |
---|
51 | e.g. if there are repetitive areas containing many 'AAAAA' |
---|
52 | |
---|
53 | Relative and absolute scores will be approx. 1/4 (compared with complete mode) |
---|
54 | |
---|
55 | Match score: |
---|
56 | |
---|
57 | absolute: returns the absolute number of hits |
---|
58 | relative: returns the number of hits relative to some maximum (see score-scaling) |
---|
59 | |
---|
60 | Absolute hits: |
---|
61 | |
---|
62 | Absolute hits are the number of oligos which occur in the source sequence |
---|
63 | and in the targeted sequences (i.e. in the relatives of the source sequence). |
---|
64 | |
---|
65 | If an oligo occurs multiple times in source or target sequence, it only |
---|
66 | creates the minimum number of hits (e.g. if it occurs twice in source and |
---|
67 | three times in a target, only two hits will be counted for that target). |
---|
68 | |
---|
69 | The theoretical maximum for absolute hits is |
---|
70 | |
---|
71 | maxhits = minimumBasecount(source, target) - oligolen + 1 |
---|
72 | |
---|
73 | In practice that value is rarely or never reached because several oligos |
---|
74 | are skipped, namely all oligos containing IUPAC codes, N's or dots. |
---|
75 | The PT-server as well will not report matches hitting ambiguous positions |
---|
76 | or sequence endings. |
---|
77 | |
---|
78 | The number of absolute hits is as well affected by other parameters: |
---|
79 | |
---|
80 | - using quick search will only produces around 25% of the hits as using |
---|
81 | complete search (assuming that 25% of all oligo starts with an 'A') |
---|
82 | - searching for complement or reverse will duplicate the number of possible |
---|
83 | hits. Searching for all 4 reverse/complement-combinations will produce |
---|
84 | 4 times as many hits as a plain forward search. |
---|
85 | |
---|
86 | Relative score: |
---|
87 | |
---|
88 | The relative score is absolute hits scaled versus a maximum POC (possible oligo count). |
---|
89 | You can specify which maximum POC to use with the selection button next to |
---|
90 | the score selection button: |
---|
91 | |
---|
92 | to source POC maximum possible oligos in source |
---|
93 | to target POC maximum possible oligos in target |
---|
94 | to minimum POC minimum possible oligos in source or target |
---|
95 | to maximum POC maximum possible oligos in source or target |
---|
96 | |
---|
97 | 'to source POC' will report ~100% score for partial source versus |
---|
98 | all full sequences containing the part. |
---|
99 | |
---|
100 | 'to target POC' will report ~100% score for all partial target sequences |
---|
101 | which are contained in the source sequence. |
---|
102 | |
---|
103 | 'to minimum POC' will report ~100% score if source is part of target or vice versa |
---|
104 | (this was the default method in previous ARB versions). |
---|
105 | |
---|
106 | 'to maximum POC' will report ~100% score if source and target contain each other, i.e. |
---|
107 | if they have an identical oligo distribution. If either source or target is missing |
---|
108 | some bases, the score will lower. |
---|
109 | |
---|
110 | |
---|
111 | When using 'quick search mode' the max. relative score will be 25% (if 25% of |
---|
112 | the oligos start with 'A'). |
---|
113 | |
---|
114 | When searching for forward and reverse-complement, the theoretical max. relative |
---|
115 | score will be 200%. In practice it won't find much hits on the reverse-complement |
---|
116 | strand. So you'll get similar scores as without reverse-complement, but especially |
---|
117 | if you lower the oligo size, you'll probably reach scores above 100%. |
---|
118 | |
---|
119 | |
---|
120 | The EDIT4 aligner currently always uses 'to minimum POC'. |
---|
121 | |
---|
122 | |
---|
123 | Complement: |
---|
124 | |
---|
125 | forward: Match only forward oligos |
---|
126 | reverse: Match only reverse oligos |
---|
127 | complement: Match only complement oligos |
---|
128 | reverse-complement: Match only reverse-complement oligos |
---|
129 | |
---|
130 | The remaining options are combinations of the above. |
---|
131 | |
---|
132 | The combinations will affect the score, especially for shorter oligos. |
---|
133 | Please read the section about 'Relative score' above to avoid confusion. |
---|
134 | |
---|
135 | Note: Not available for EDIT4 aligner. |
---|
136 | |
---|
137 | Target range: |
---|
138 | |
---|
139 | Restrict the alignment range in which oligos may match. |
---|
140 | Hits outside that range will not be considered. |
---|
141 | |
---|
142 | NOTES Special effort is taken to eliminate multi-matches, which were ignored in past versions. |
---|
143 | That resulted in relative scores far beyond 100%, especially for small oligo-lengths. |
---|
144 | |
---|
145 | Now e.g. an oligo occurring 3 times in the source sequence will give atmost 3 absolute |
---|
146 | hitpoints to any target sequence - even if it occurs there far more often. |
---|
147 | |
---|
148 | EXAMPLES None |
---|
149 | |
---|
150 | WARNINGS Use mismatches with care! |
---|
151 | |
---|
152 | BUGS Relative score is not scaled to the maximum possible hits in the target range. |
---|