| 1 | #Please insert up references in the next lines (line starts with keyword UP) |
|---|
| 2 | UP arb.hlp |
|---|
| 3 | UP glossary.hlp |
|---|
| 4 | |
|---|
| 5 | #Please insert subtopic references (line starts with keyword SUB) |
|---|
| 6 | #SUB subtopic.hlp |
|---|
| 7 | |
|---|
| 8 | # Hypertext links in helptext can be added like this: LINK{ref.hlp|http://add|bla@domain} |
|---|
| 9 | |
|---|
| 10 | #************* Title of helpfile !! and start of real helpfile ******** |
|---|
| 11 | TITLE Column statistic |
|---|
| 12 | |
|---|
| 13 | OCCURRENCE ARB_NT/SAI/Create SAI from Sequences/Positional Variability ... |
|---|
| 14 | |
|---|
| 15 | DESCRIPTION Calculates the base and frequencies positional variability for |
|---|
| 16 | each column independently. |
|---|
| 17 | |
|---|
| 18 | It uses the parsimony method to find the minimum number of |
|---|
| 19 | mutations for each site, as they are determined by the specified |
|---|
| 20 | topology. |
|---|
| 21 | |
|---|
| 22 | The calculation is performed for sequences of all species in tree. |
|---|
| 23 | For best results you should use one of the biggest trees available. |
|---|
| 24 | The tree should have been optimized using ARB_PARSIMONY. |
|---|
| 25 | |
|---|
| 26 | The result can be used by: |
|---|
| 27 | |
|---|
| 28 | - Parsimony to weight the characters properly |
|---|
| 29 | - Neighbour joining to estimate the distances more accurately. |
|---|
| 30 | - Filter (read notes below) |
|---|
| 31 | |
|---|
| 32 | Resulting SAI will contain the following character codes: |
|---|
| 33 | |
|---|
| 34 | '.' Less than 10% valid characters |
|---|
| 35 | '-' No mutations. |
|---|
| 36 | '0123456789ABCDE...' Mutation rate category |
|---|
| 37 | |
|---|
| 38 | The higher the digit/character of the mutation rate category is, |
|---|
| 39 | the more conserved the site is. Stepping 2 positions rightwards |
|---|
| 40 | in the list of given characters, approximately halves the mutation |
|---|
| 41 | rate (explicit mappings see below). |
|---|
| 42 | |
|---|
| 43 | Valid characters are "ACGTUacgtu" for DNA/RNA (or all amino acid codes for AA sequences). |
|---|
| 44 | |
|---|
| 45 | NOTES Opposed to consensus- and max-frequency-SAIs, the positional variability SAI |
|---|
| 46 | is calculated based on the specified topology. |
|---|
| 47 | |
|---|
| 48 | Later that PVP-SAI might be used as an filter to further optimize that topology. |
|---|
| 49 | When you filter out columns with high variability, topology changes that imply |
|---|
| 50 | an increased number of mutations in these columns will receive no penalty. |
|---|
| 51 | |
|---|
| 52 | Repeating several iterations of these 2 steps might lead to a systematic error: |
|---|
| 53 | * variable columns will tend to become even more variable and |
|---|
| 54 | * conserved columns will tend to become even more conserved. |
|---|
| 55 | |
|---|
| 56 | The systematic error caused by this effect will probably mostly |
|---|
| 57 | emphasize topological errors of the initial tree. |
|---|
| 58 | To avoid that problem a tree should as well be optimized using other filters |
|---|
| 59 | (e.g. max-frequency). This is especially true for the initial tree optimization. |
|---|
| 60 | |
|---|
| 61 | WARNINGS if you only have small trees (<100 species), |
|---|
| 62 | using this function makes not much sense. |
|---|
| 63 | |
|---|
| 64 | SECTION Mapping of site mutation rate to categories: |
|---|
| 65 | |
|---|
| 66 | mutation rate category |
|---|
| 67 | |
|---|
| 68 | 45.8% .. 75% 0 (max. possible mutation rate is ~75%) |
|---|
| 69 | 36.5% .. 45.8% 1 |
|---|
| 70 | 28.2% .. 36.5% 2 |
|---|
| 71 | 21.3% .. 28.2% 3 |
|---|
| 72 | 15.7% .. 21.3% 4 |
|---|
| 73 | 11.5% .. 15.7% 5 |
|---|
| 74 | 8.3% .. 11.5% 6 |
|---|
| 75 | 6.0% .. 8.3% 7 |
|---|
| 76 | 4.3% .. 6.0% 8 |
|---|
| 77 | 3.1% .. 4.3% 9 |
|---|
| 78 | 2.2% .. 3.1% A |
|---|
| 79 | 1.5% .. 2.2% B |
|---|
| 80 | 1.1% .. 1.5% C |
|---|
| 81 | 0.78% .. 1.1% D |
|---|
| 82 | 0.55% .. 0.78% E |
|---|
| 83 | 0.39% .. 0.55% F |
|---|
| 84 | 0.28% .. 0.39% G |
|---|
| 85 | 0.20% .. 0.28% H |
|---|
| 86 | 0.14% .. 0.20% I |
|---|
| 87 | |
|---|
| 88 | mutations/million category |
|---|
| 89 | |
|---|
| 90 | 976 .. 1400 J |
|---|
| 91 | 691 .. 975 K |
|---|
| 92 | 489 .. 690 L |
|---|
| 93 | 346 .. 488 M |
|---|
| 94 | 245 .. 345 N |
|---|
| 95 | 173 .. 244 O |
|---|
| 96 | 123 .. 172 P |
|---|
| 97 | 87 .. 122 Q |
|---|
| 98 | 62 .. 86 R |
|---|
| 99 | 44 .. 61 S |
|---|
| 100 | 31 .. 43 T |
|---|
| 101 | 22 .. 30 U |
|---|
| 102 | 16 .. 21 V |
|---|
| 103 | 11 .. 15 W |
|---|
| 104 | 8 .. 10 X |
|---|
| 105 | 6 .. 7 Y |
|---|
| 106 | 1 .. 5 Z |
|---|
| 107 | |
|---|
| 108 | |
|---|
| 109 | BUGS No bugs known |
|---|