| 1 | #Please insert up references in the next lines (line starts with keyword UP) |
|---|
| 2 | UP arb.hlp |
|---|
| 3 | UP arb_pars.hlp |
|---|
| 4 | UP glossary.hlp |
|---|
| 5 | |
|---|
| 6 | #Please insert subtopic references (line starts with keyword SUB) |
|---|
| 7 | #SUB subtopic.hlp |
|---|
| 8 | |
|---|
| 9 | # Hypertext links in helptext can be added like this: LINK{ref.hlp|http://add|bla@domain} |
|---|
| 10 | |
|---|
| 11 | #************* Title of helpfile !! and start of real helpfile ******** |
|---|
| 12 | TITLE Parsimony value |
|---|
| 13 | |
|---|
| 14 | OCCURRENCE displayed in the top area of the ARB_PARSIMONY main window |
|---|
| 15 | |
|---|
| 16 | DESCRIPTION The parsimony value indicates the |
|---|
| 17 | quality of a trees topology. |
|---|
| 18 | |
|---|
| 19 | Basically it counts the minimum number of base mutations that |
|---|
| 20 | necessarily needed to occur, |
|---|
| 21 | if we assume that the current topology represents the way the evolution took. |
|---|
| 22 | Therefore smaller values indicate better topologies. |
|---|
| 23 | |
|---|
| 24 | Several parameters influence the absolute parsimony value: |
|---|
| 25 | |
|---|
| 26 | - if you specify a filter (in parsimony startup window) only mutations |
|---|
| 27 | in the remaining, unfiltered alignment columns are counted, i.e. filtering |
|---|
| 28 | will normally lower the resulting parsimony value. |
|---|
| 29 | - if you specify a weighting mask (in parsimony startup window) higher |
|---|
| 30 | weighted sites will count stronger and raise the absolute parsimony value. |
|---|
| 31 | - adding more species to a tree will normally raise the number of mutations, |
|---|
| 32 | i.e. a tree with many species has a higher parsimony value than a tree |
|---|
| 33 | with fewer species (see also LINK{pa_reset.hlp}). |
|---|
| 34 | |
|---|
| 35 | If you compare parsimony values of different topologies you need to use |
|---|
| 36 | the same alignment, the same filter, the same weighting mask and the |
|---|
| 37 | same set of species. |
|---|
| 38 | |
|---|
| 39 | SECTION Dots |
|---|
| 40 | |
|---|
| 41 | ARB uses dots ('.') as a special gap type. |
|---|
| 42 | The meaning of a dot is "might be a gap or a nucleotide/aa". |
|---|
| 43 | It indicates the lack of any information about the sequence data at the |
|---|
| 44 | position where they are used. |
|---|
| 45 | |
|---|
| 46 | Opposed to that, a normal gap ('-') clearly states that it is KNOWN that the |
|---|
| 47 | sequence does NOT CONTAIN any bases at the positions of the gaps - the gaps have |
|---|
| 48 | only been inserted for alignment purposes. |
|---|
| 49 | |
|---|
| 50 | And - opposed to gap - a 'N' (or 'X' for amino acid sequences) clearly states, |
|---|
| 51 | that it is KNOWN that the sequence CONTAINS some nucleotide/aa at that position. |
|---|
| 52 | |
|---|
| 53 | In ARB databases you should use dots at both ends of the alignment. |
|---|
| 54 | Doing so means: you know that the sequence continues in both directions - it just |
|---|
| 55 | has not been sequenced completely. |
|---|
| 56 | |
|---|
| 57 | Also you may use dots in the middle of the alignment, whenever you have |
|---|
| 58 | stronger indications, that some gap might in fact be a sequencing error. |
|---|
| 59 | |
|---|
| 60 | SECTION Mutations against dots |
|---|
| 61 | |
|---|
| 62 | When the parsimony value is calculated, dots do not cause mutations. |
|---|
| 63 | They will match any base or gap or other dot. |
|---|
| 64 | |
|---|
| 65 | That means, dots at both sequence endings will compensate some of the |
|---|
| 66 | negative effects, that are normally caused by using sequences of different |
|---|
| 67 | lengths (e.g. clustering of LINK{partial_sequences.hlp}). |
|---|
| 68 | |
|---|
| 69 | |
|---|
| 70 | SECTION Differences between sequence types |
|---|
| 71 | |
|---|
| 72 | For nucleotide sequences: |
|---|
| 73 | |
|---|
| 74 | Mutations are simply counted for single nucleotides. |
|---|
| 75 | |
|---|
| 76 | For amino acid sequences: |
|---|
| 77 | |
|---|
| 78 | Mutations are determined on amino acid basis. This differs |
|---|
| 79 | from what would be done when using the corresponding |
|---|
| 80 | DNA alignment: |
|---|
| 81 | |
|---|
| 82 | - in DNA several different codons (combinations of 3 nucleotides) |
|---|
| 83 | may represent the same amino acid. Therefore a mutation would |
|---|
| 84 | be counted for DNA, where no mutation is counted for AA. |
|---|
| 85 | - the parsimony value for amino acid alignments does not count |
|---|
| 86 | the number of amino-acid-mutations. It counts the minimum |
|---|
| 87 | number of nucleotide(!) mutations needed to mutate from one |
|---|
| 88 | amino acid to another, while assuming that there is no |
|---|
| 89 | selection pressure when mutating a codon into another codon that |
|---|
| 90 | translates into the same amino acid (see also EXAMPLES below). |
|---|
| 91 | |
|---|
| 92 | ARB generally uses the "Standard code" to calculate the |
|---|
| 93 | mutations between different amino acids, when determining |
|---|
| 94 | the parsimony value. |
|---|
| 95 | |
|---|
| 96 | NOTES The parsimony value is also used to LINK{pa_branchlengths.hlp}. |
|---|
| 97 | |
|---|
| 98 | EXAMPLES for an amino acid mutation |
|---|
| 99 | |
|---|
| 100 | Imagine an alignment position P and three species, where |
|---|
| 101 | |
|---|
| 102 | - species F has an 'F' (Phenylalanine) at position P, |
|---|
| 103 | - species Q has a 'Q' (Glutamine) at position P and |
|---|
| 104 | - species L has an 'L' (Leucine) at position P. |
|---|
| 105 | |
|---|
| 106 | These amino acids may be represented by the following codons: |
|---|
| 107 | |
|---|
| 108 | - F = TTT | TTC |
|---|
| 109 | - Q = CAA | CAG |
|---|
| 110 | - L = TTA | TTG | CTN |
|---|
| 111 | |
|---|
| 112 | Based on the minimum codon distances, the mutation costs used |
|---|
| 113 | in ARB_PARSIMONY are: |
|---|
| 114 | |
|---|
| 115 | - F -> Q = 3 mutations |
|---|
| 116 | - F -> L = 1 mutation (e.g. TTT -> TTA) |
|---|
| 117 | - L -> Q = 1 mutation (e.g. CTA -> CAA) |
|---|
| 118 | |
|---|
| 119 | This results in the following parsimony values for the |
|---|
| 120 | possible subtree-rearrangements (R=Rest of whole tree): |
|---|
| 121 | |
|---|
| 122 | R |
|---|
| 123 | | |
|---|
| 124 | | |
|---|
| 125 | F pars value = 4 |
|---|
| 126 | / \ |
|---|
| 127 | / \ |
|---|
| 128 | Q L |
|---|
| 129 | |
|---|
| 130 | |
|---|
| 131 | R |
|---|
| 132 | | |
|---|
| 133 | | |
|---|
| 134 | Q pars value = 4 |
|---|
| 135 | / \ |
|---|
| 136 | / \ |
|---|
| 137 | F L |
|---|
| 138 | |
|---|
| 139 | |
|---|
| 140 | R |
|---|
| 141 | | |
|---|
| 142 | | |
|---|
| 143 | L pars value = 2 (!) |
|---|
| 144 | / \ |
|---|
| 145 | / \ |
|---|
| 146 | Q F |
|---|
| 147 | |
|---|
| 148 | |
|---|
| 149 | Assuming the third topology (which is the "best" according to the parsimony |
|---|
| 150 | value), means to assume that the ancestor of Q and F had an L at position P. |
|---|
| 151 | As no selection pressure is assumed for mutating that 'L'-codon (e.g. |
|---|
| 152 | from 'TTA' into 'CTA') no mutation penalty is counted when calculating the |
|---|
| 153 | parsimony value. |
|---|
| 154 | |
|---|
| 155 | WARNINGS None |
|---|
| 156 | |
|---|
| 157 | BUGS No bugs known |
|---|