1 | #Please insert up references in the next lines (line starts with keyword UP) |
---|
2 | UP arb.hlp |
---|
3 | UP arb_pars.hlp |
---|
4 | UP glossary.hlp |
---|
5 | |
---|
6 | #Please insert subtopic references (line starts with keyword SUB) |
---|
7 | #SUB subtopic.hlp |
---|
8 | |
---|
9 | # Hypertext links in helptext can be added like this: LINK{ref.hlp|http://add|bla@domain} |
---|
10 | |
---|
11 | #************* Title of helpfile !! and start of real helpfile ******** |
---|
12 | TITLE Parsimony value |
---|
13 | |
---|
14 | OCCURRENCE displayed in the top area of the ARB_PARSIMONY main window |
---|
15 | |
---|
16 | DESCRIPTION The parsimony value indicates the |
---|
17 | quality of a trees topology. |
---|
18 | |
---|
19 | Basically it counts the minimum number of base mutations that |
---|
20 | necessarily needed to occur, |
---|
21 | if we assume that the current topology represents the way the evolution took. |
---|
22 | Therefore smaller values indicate better topologies. |
---|
23 | |
---|
24 | Several parameters influence the absolute parsimony value: |
---|
25 | |
---|
26 | - if you specify a filter (in parsimony startup window) only mutations |
---|
27 | in the remaining, unfiltered alignment columns are counted, i.e. filtering |
---|
28 | will normally lower the resulting parsimony value. |
---|
29 | - if you specify a weighting mask (in parsimony startup window) higher |
---|
30 | weighted sites will count stronger and raise the absolute parsimony value. |
---|
31 | - adding more species to a tree will normally raise the number of mutations, |
---|
32 | i.e. a tree with many species has a higher parsimony value than a tree |
---|
33 | with fewer species (see also LINK{pa_reset.hlp}). |
---|
34 | |
---|
35 | If you compare parsimony values of different topologies you need to use |
---|
36 | the same alignment, the same filter, the same weighting mask and the |
---|
37 | same set of species. |
---|
38 | |
---|
39 | SECTION Dots |
---|
40 | |
---|
41 | ARB uses dots ('.') as a special gap type. |
---|
42 | The meaning of a dot is "might be a gap or a nucleotide/aa". |
---|
43 | It indicates the lack of any information about the sequence data at the |
---|
44 | position where they are used. |
---|
45 | |
---|
46 | Opposed to that, a normal gap ('-') clearly states that it is KNOWN that the |
---|
47 | sequence does NOT CONTAIN any bases at the positions of the gaps - the gaps have |
---|
48 | only been inserted for alignment purposes. |
---|
49 | |
---|
50 | And - opposed to gap - a 'N' (or 'X' for amino acid sequences) clearly states, |
---|
51 | that it is KNOWN that the sequence CONTAINS some nucleotide/aa at that position. |
---|
52 | |
---|
53 | In ARB databases you should use dots at both ends of the alignment. |
---|
54 | Doing so means: you know that the sequence continues in both directions - it just |
---|
55 | has not been sequenced completely. |
---|
56 | |
---|
57 | Also you may use dots in the middle of the alignment, whenever you have |
---|
58 | stronger indications, that some gap might in fact be a sequencing error. |
---|
59 | |
---|
60 | SECTION Mutations against dots |
---|
61 | |
---|
62 | When the parsimony value is calculated, dots do not cause mutations. |
---|
63 | They will match any base or gap or other dot. |
---|
64 | |
---|
65 | That means, dots at both sequence endings will compensate some of the |
---|
66 | negative effects, that are normally caused by using sequences of different |
---|
67 | lengths (e.g. clustering of LINK{partial_sequences.hlp}). |
---|
68 | |
---|
69 | |
---|
70 | SECTION Differences between sequence types |
---|
71 | |
---|
72 | For nucleotide sequences: |
---|
73 | |
---|
74 | Mutations are simply counted for single nucleotides. |
---|
75 | |
---|
76 | For amino acid sequences: |
---|
77 | |
---|
78 | Mutations are determined on amino acid basis. This differs |
---|
79 | from what would be done when using the corresponding |
---|
80 | DNA alignment: |
---|
81 | |
---|
82 | - in DNA several different codons (combinations of 3 nucleotides) |
---|
83 | may represent the same amino acid. Therefore a mutation would |
---|
84 | be counted for DNA, where no mutation is counted for AA. |
---|
85 | - the parsimony value for amino acid alignments does not count |
---|
86 | the number of amino-acid-mutations. It counts the minimum |
---|
87 | number of nucleotide(!) mutations needed to mutate from one |
---|
88 | amino acid to another, while assuming that there is no |
---|
89 | selection pressure when mutating a codon into another codon that |
---|
90 | translates into the same amino acid (see also EXAMPLES below). |
---|
91 | |
---|
92 | ARB generally uses the "Standard code" to calculate the |
---|
93 | mutations between different amino acids, when determining |
---|
94 | the parsimony value. |
---|
95 | |
---|
96 | NOTES The parsimony value is also used to LINK{pa_branchlengths.hlp}. |
---|
97 | |
---|
98 | EXAMPLES for an amino acid mutation |
---|
99 | |
---|
100 | Imagine an alignment position P and three species, where |
---|
101 | |
---|
102 | - species F has an 'F' (Phenylalanine) at position P, |
---|
103 | - species Q has a 'Q' (Glutamine) at position P and |
---|
104 | - species L has an 'L' (Leucine) at position P. |
---|
105 | |
---|
106 | These amino acids may be represented by the following codons: |
---|
107 | |
---|
108 | - F = TTT | TTC |
---|
109 | - Q = CAA | CAG |
---|
110 | - L = TTA | TTG | CTN |
---|
111 | |
---|
112 | Based on the minimum codon distances, the mutation costs used |
---|
113 | in ARB_PARSIMONY are: |
---|
114 | |
---|
115 | - F -> Q = 3 mutations |
---|
116 | - F -> L = 1 mutation (e.g. TTT -> TTA) |
---|
117 | - L -> Q = 1 mutation (e.g. CTA -> CAA) |
---|
118 | |
---|
119 | This results in the following parsimony values for the |
---|
120 | possible subtree-rearrangements (R=Rest of whole tree): |
---|
121 | |
---|
122 | R |
---|
123 | | |
---|
124 | | |
---|
125 | F pars value = 4 |
---|
126 | / \ |
---|
127 | / \ |
---|
128 | Q L |
---|
129 | |
---|
130 | |
---|
131 | R |
---|
132 | | |
---|
133 | | |
---|
134 | Q pars value = 4 |
---|
135 | / \ |
---|
136 | / \ |
---|
137 | F L |
---|
138 | |
---|
139 | |
---|
140 | R |
---|
141 | | |
---|
142 | | |
---|
143 | L pars value = 2 (!) |
---|
144 | / \ |
---|
145 | / \ |
---|
146 | Q F |
---|
147 | |
---|
148 | |
---|
149 | Assuming the third topology (which is the "best" according to the parsimony |
---|
150 | value), means to assume that the ancestor of Q and F had an L at position P. |
---|
151 | As no selection pressure is assumed for mutating that 'L'-codon (e.g. |
---|
152 | from 'TTA' into 'CTA') no mutation penalty is counted when calculating the |
---|
153 | parsimony value. |
---|
154 | |
---|
155 | WARNINGS None |
---|
156 | |
---|
157 | BUGS No bugs known |
---|