| 1 | # main topics: |
|---|
| 2 | UP arb.hlp |
|---|
| 3 | UP glossary.hlp |
|---|
| 4 | UP pt_server.hlp |
|---|
| 5 | |
|---|
| 6 | # sub topics: |
|---|
| 7 | SUB next_neighbours.hlp |
|---|
| 8 | SUB next_neighbours_listed.hlp |
|---|
| 9 | SUB faligner.hlp |
|---|
| 10 | |
|---|
| 11 | # format described in ../help.readme |
|---|
| 12 | |
|---|
| 13 | |
|---|
| 14 | TITLE Nearest relative search |
|---|
| 15 | |
|---|
| 16 | OCCURRENCE ARB_NT/Search/More search/Search Next Relatives of SELECTED Species in PT Server |
|---|
| 17 | ARB_NT/Search/More search/Search Next Relatives of LISTED Species in PT Server |
|---|
| 18 | ARB_EDIT4/Edit/Integrated Aligners |
|---|
| 19 | |
|---|
| 20 | SECTION ALGORITHM |
|---|
| 21 | |
|---|
| 22 | Splits the sequence(s) into short oligos of a given size. |
|---|
| 23 | These oligos are 'Probe Matched' against the PT_SERVER database. |
|---|
| 24 | The more hits within the sequence of another species, the more related the other species is. |
|---|
| 25 | |
|---|
| 26 | SECTION PARAMETERS |
|---|
| 27 | |
|---|
| 28 | PT-Server |
|---|
| 29 | |
|---|
| 30 | Select the PT-Server to search |
|---|
| 31 | |
|---|
| 32 | Oligo length |
|---|
| 33 | |
|---|
| 34 | Length of oligos used to perform probe match against the PT server. |
|---|
| 35 | Default is 12. |
|---|
| 36 | |
|---|
| 37 | Mismatches |
|---|
| 38 | |
|---|
| 39 | Number of mismatches allowed per oligo. |
|---|
| 40 | Default is 0. |
|---|
| 41 | |
|---|
| 42 | Be careful: The search may get incredible slow, when rising the number of mismatches. |
|---|
| 43 | |
|---|
| 44 | Search mode |
|---|
| 45 | |
|---|
| 46 | Complete: Match all possible oligos |
|---|
| 47 | Quick: Only match oligos starting with 'A' |
|---|
| 48 | |
|---|
| 49 | The 'Quick mode' works well for many sequence types and is approx. 4 times |
|---|
| 50 | faster than the 'Complete mode'. For some sequence types it completely fails, |
|---|
| 51 | e.g. if there are repetitive areas containing many 'AAAAA' |
|---|
| 52 | |
|---|
| 53 | Relative and absolute scores will be approx. 1/4 (compared with complete mode) |
|---|
| 54 | |
|---|
| 55 | Match score: |
|---|
| 56 | |
|---|
| 57 | absolute: returns the absolute number of hits |
|---|
| 58 | relative: returns the number of hits relative to some maximum |
|---|
| 59 | (for details read below) |
|---|
| 60 | |
|---|
| 61 | Absolute hits: |
|---|
| 62 | |
|---|
| 63 | Absolute hits are the number of oligos which occur in the source sequence |
|---|
| 64 | and in the targeted sequences (i.e. in the relatives of the source sequence). |
|---|
| 65 | |
|---|
| 66 | If an oligo occurs multiple times in source or target sequence, it only |
|---|
| 67 | creates the minimum number of hits (e.g. if it occurs twice in source and |
|---|
| 68 | three times in a target, only two hits will be counted for that target). |
|---|
| 69 | |
|---|
| 70 | The theoretical maximum for absolute hits is |
|---|
| 71 | |
|---|
| 72 | maxhits = minimumBasecount(source, target) - oligolen + 1 |
|---|
| 73 | |
|---|
| 74 | In practice that value is rarely or never reached because several oligos |
|---|
| 75 | are skipped, namely all oligos containing IUPAC codes, N's or dots. |
|---|
| 76 | The PT-server as well will not report matches hitting ambiguous positions |
|---|
| 77 | or sequence endings. |
|---|
| 78 | |
|---|
| 79 | The number of absolute hits is as well affected by other parameters: |
|---|
| 80 | |
|---|
| 81 | - using quick search will only produces around 25% of the hits as using |
|---|
| 82 | complete search (assuming that 25% of all oligo starts with an 'A') |
|---|
| 83 | - searching for complement or reverse will duplicate the number of possible |
|---|
| 84 | hits. Searching for all 4 reverse/complement-combinations will produce |
|---|
| 85 | 4 times as many hits as a plain forward search. |
|---|
| 86 | |
|---|
| 87 | Relative score: |
|---|
| 88 | |
|---|
| 89 | The relative score is absolute hits scaled versus a maximum POC (possible oligo count). |
|---|
| 90 | You can specify which maximum POC to use with the selection button next to |
|---|
| 91 | the score selection button: |
|---|
| 92 | |
|---|
| 93 | to source POC maximum possible oligos in source |
|---|
| 94 | to target POC maximum possible oligos in target |
|---|
| 95 | to minimum POC minimum possible oligos in source or target |
|---|
| 96 | to maximum POC maximum possible oligos in source or target |
|---|
| 97 | |
|---|
| 98 | 'to source POC' will report ~100% score for partial source versus |
|---|
| 99 | all full sequences containing the part. |
|---|
| 100 | |
|---|
| 101 | 'to target POC' will report ~100% score for all partial target sequences |
|---|
| 102 | which are contained in the source sequence. |
|---|
| 103 | |
|---|
| 104 | 'to minimum POC' will report ~100% score if source is part of target or vice versa |
|---|
| 105 | (this was the default method in previous ARB versions). |
|---|
| 106 | |
|---|
| 107 | 'to maximum POC' will report ~100% score if source and target contain each other, i.e. |
|---|
| 108 | if they have an identical oligo distribution. If either source or target is missing |
|---|
| 109 | some bases, the score will lower. |
|---|
| 110 | |
|---|
| 111 | |
|---|
| 112 | When using 'quick search mode' the max. relative score will be 25% (if 25% of |
|---|
| 113 | the oligos start with 'A'). |
|---|
| 114 | |
|---|
| 115 | When searching for forward and reverse-complement, the theoretical max. relative |
|---|
| 116 | score will be 200%. In practice it won't find much hits on the reverse-complement |
|---|
| 117 | strand. So you'll get similar scores as without reverse-complement, but especially |
|---|
| 118 | if you lower the oligo size, you'll probably reach scores above 100%. |
|---|
| 119 | |
|---|
| 120 | |
|---|
| 121 | The EDIT4 aligner currently always uses 'to minimum POC'. |
|---|
| 122 | |
|---|
| 123 | |
|---|
| 124 | Complement: |
|---|
| 125 | |
|---|
| 126 | forward: Match only forward oligos |
|---|
| 127 | reverse: Match only reverse oligos |
|---|
| 128 | complement: Match only complement oligos |
|---|
| 129 | reverse-complement: Match only reverse-complement oligos |
|---|
| 130 | |
|---|
| 131 | The remaining options are combinations of the above. |
|---|
| 132 | |
|---|
| 133 | The combinations will affect the score, especially for shorter oligos. |
|---|
| 134 | Please read the section about 'Relative score' above to avoid confusion. |
|---|
| 135 | |
|---|
| 136 | Note: Not available for EDIT4 aligner. |
|---|
| 137 | |
|---|
| 138 | Target range: |
|---|
| 139 | |
|---|
| 140 | Restrict the alignment range in which oligos may match. |
|---|
| 141 | Hits outside that range will not be considered. |
|---|
| 142 | |
|---|
| 143 | NOTES Special effort is taken to eliminate multi-matches, which were ignored in past versions. |
|---|
| 144 | That resulted in relative scores far beyond 100%, especially for small oligo-lengths. |
|---|
| 145 | |
|---|
| 146 | Now e.g. an oligo occurring 3 times in the source sequence will give atmost 3 absolute |
|---|
| 147 | hitpoints to any target sequence - even if it occurs there far more often. |
|---|
| 148 | |
|---|
| 149 | EXAMPLES None |
|---|
| 150 | |
|---|
| 151 | WARNINGS Use mismatches with care! |
|---|
| 152 | |
|---|
| 153 | BUGS Relative score is not scaled to the maximum possible hits in the target range. |
|---|