1 | #Please insert up references in the next lines (line starts with keyword UP) |
---|
2 | UP arb.hlp |
---|
3 | UP glossary.hlp |
---|
4 | |
---|
5 | #Please insert subtopic references (line starts with keyword SUB) |
---|
6 | SUB consense_tree.hlp |
---|
7 | SUB bootstrap.hlp |
---|
8 | |
---|
9 | # Hypertext links in helptext can be added like this: LINK{ref.hlp|http://add|bla@domain} |
---|
10 | |
---|
11 | #************* Title of helpfile !! and start of real helpfile ******** |
---|
12 | TITLE Algorithm used for consensus tree |
---|
13 | |
---|
14 | OCCURRENCE ARB_DIST/NJ bootstrap |
---|
15 | ARB_NT/Tree/Build consensus tree |
---|
16 | |
---|
17 | DESCRIPTION ARB has its own library for calculating consensus trees, which is used |
---|
18 | when |
---|
19 | - calculating bootstrap trees with NJ (ARB_DIST), |
---|
20 | - directly calculating consensus trees using |
---|
21 | ARB_NT/Tree/Build consensus tree and |
---|
22 | - from the command line tool arb_consensus_tree. |
---|
23 | |
---|
24 | The algorithm works as follows: |
---|
25 | |
---|
26 | - all trees are read and all occurring branches are stored in a branch-pool |
---|
27 | - the consensus tree is constructed iteratively by |
---|
28 | - picking the "best" branch and adding it to the tree. |
---|
29 | - deleting all now impossible branches from the branch pool. |
---|
30 | |
---|
31 | The best branch (at each time of the iteration) is determined by the following |
---|
32 | criteria (listed in order of significance): |
---|
33 | |
---|
34 | - inner branches are picked before leaf branches |
---|
35 | - branches occurring more often (i.e. branches with higher bootstrap values) are picked first |
---|
36 | - branches nearer to the center of the tree are preferred over more distant branches |
---|
37 | - longer branches are preferred over shorter branches |
---|
38 | |
---|
39 | If all of the above criteria are really equal for two branches, the pick-order |
---|
40 | depends on their appearance in the source trees (to make results reproducible). |
---|
41 | |
---|
42 | The distance of each branch to the center of the tree is defined by the difference |
---|
43 | between the number of species on each side of the branch. Branches with an equal |
---|
44 | number of species on each side are considered "at the center" of the tree. |
---|
45 | |
---|
46 | SECTION Partial trees |
---|
47 | |
---|
48 | If a tree doesn't contain the full set of species (i.e. if the tree is partial), |
---|
49 | all branches of that tree differ from branches of full trees. |
---|
50 | |
---|
51 | To allow merging such trees with other trees, the set of missing species is added |
---|
52 | once to each side of each branch, adding two hypothetical branches to the branch pool |
---|
53 | (hypothetical in that way, that they do not necessarily occur in any tree). |
---|
54 | |
---|
55 | These two hypothetical branches are added with different probabilities: |
---|
56 | |
---|
57 | * adding missing species to the bigger side of a branch is done with a probability of 1.0 |
---|
58 | * adding missing species to the smaller side of a branch is done with a |
---|
59 | probability P calculated as follows: |
---|
60 | |
---|
61 | P = ( S / B ) ^ M |
---|
62 | |
---|
63 | where |
---|
64 | |
---|
65 | S = number of edges on the SMALL side of the branch + 1 |
---|
66 | B = number of edges on the BIG side of the branch + 1 |
---|
67 | M = number of missing species |
---|
68 | |
---|
69 | That probability tends towards zero for leaf branches and towards 1.0 for |
---|
70 | center branches. |
---|
71 | |
---|
72 | If you merge nearly identical topologies, which only differ by some species |
---|
73 | removed from some of the topologies, the resulting bootstrap values reflect |
---|
74 | the number of trees the species occurred in (e.g. 67% if species occurred |
---|
75 | in 2 of 3 trees). |
---|
76 | |
---|
77 | Another kind of branches is completely artificial: a branch with the missing |
---|
78 | species on one side and all species from the partial tree on the other side. |
---|
79 | These artificial branches will ONLY be considered for the output topology, if no |
---|
80 | other tree defines such a branch and if it's absolutely necessary to build a |
---|
81 | connected topology. |
---|
82 | |
---|
83 | Since nothing is known about the length of such an artificial branch, a distance |
---|
84 | of 1.0 will be assumed (e.g. happens when you merge disjunct trees) and a zero |
---|
85 | bootstrap value will be written to the output tree. |
---|
86 | |
---|
87 | |
---|
88 | SECTION Branch lengths |
---|
89 | |
---|
90 | The branch lengths of the generated consensus tree are averaged from the input |
---|
91 | branch lengths. When merging full trees, they should be quite accurate. |
---|
92 | |
---|
93 | When merging partial trees, the lengths will be hypothetical, because all branches |
---|
94 | from partial trees are hypothetical and nothing is known about how the length of |
---|
95 | each branch will change by really adding the set of missing species when you use |
---|
96 | a tree reconstruction tool. |
---|
97 | |
---|
98 | As described above, hypothetical branches are inserted with a lowered |
---|
99 | weight (probability) in case the missing species are estimated on the less probable |
---|
100 | side of that branch. In that case the branch length of the hypothetical branch is |
---|
101 | only added weighted proportionally to the assumed probability. |
---|
102 | |
---|
103 | Example: |
---|
104 | |
---|
105 | Imagine merging 3 trees, where species C is missing in one tree and length of |
---|
106 | the branch 'AB---CD' is 0.3 in the two full trees and the length of the |
---|
107 | branch 'AB---D' in the partial tree is 0.6. |
---|
108 | |
---|
109 | The probability of adding C to the D-side of 'AB---D' is P=2/3. |
---|
110 | The summarized weight for that branch is 8/3, the summarized length will |
---|
111 | be 2*0.3 + 2/3*0.6 = 1.0. Then the average length calculates |
---|
112 | to 1.0 / (8/3) = 0.375, i.e a bit more than 0.3, but far from 0.6. |
---|
113 | |
---|
114 | |
---|
115 | NOTES Please consider using a tree reconstruction tool (e.g ARB_PARSIMONY) to recalculate |
---|
116 | the branch lengths of the generated consensus tree. |
---|
117 | |
---|
118 | EXAMPLES Example for 2 trees: |
---|
119 | |
---|
120 | |
---|
121 | A C A D |
---|
122 | \ / \ / |
---|
123 | +--+ +--+ |
---|
124 | / \ / \ |
---|
125 | B +--E B +--F |
---|
126 | | | |
---|
127 | D E |
---|
128 | |
---|
129 | |
---|
130 | These trees contain the following branches: |
---|
131 | |
---|
132 | A---BCDE A---BDEF |
---|
133 | B---ACDE B---ADEF |
---|
134 | C---ABDE D---ABEF |
---|
135 | D---ABCE E---ABDF |
---|
136 | E---ABCD F---ABDE |
---|
137 | AB---CDE AB---DEF |
---|
138 | ABC---DE ABD---EF |
---|
139 | |
---|
140 | Consensus tree is build upon the following branch pool: |
---|
141 | |
---|
142 | A --- BCDE[F] A[F] --- BCDE |
---|
143 | B --- ACDE[F] B[F] --- ACDE |
---|
144 | C --- ABDE[F] C[F] --- ABDE |
---|
145 | D --- ABCE[F] D[F] --- ABCE |
---|
146 | E --- ABCD[F] E[F] --- ABCD |
---|
147 | A --- B[C]DEF A[C] --- BDEF |
---|
148 | B --- A[C]DEF B[C] --- ADEF |
---|
149 | D --- A[C]BEF D[C] --- ABEF |
---|
150 | E --- A[C]BDF E[C] --- ABDF |
---|
151 | F --- A[C]BDE F[C] --- ABDE |
---|
152 | AB --- CDE[F] AB[F] --- CDE |
---|
153 | ABC[F] --- DE ABC --- DE[F] |
---|
154 | AB --- [C]DEF AB[C] --- DEF |
---|
155 | AB[C]D --- EF ABD --- [C]EF |
---|
156 | |
---|
157 | (added missing species are shown in brackets; in left column species were added to |
---|
158 | more likely side; in right column to less likely side) |
---|
159 | |
---|
160 | Bootstrap values are calculated for branches: |
---|
161 | |
---|
162 | AB --- CDEF 100% #1 |
---|
163 | EF --- ABCD 56.2% #2 |
---|
164 | DE --- ABCF 50% #3 |
---|
165 | ABC --- DEF 33.3% #4 |
---|
166 | ABF --- CDE 16.7% #5 |
---|
167 | ABD --- CEF 16.7% #6 |
---|
168 | ABDE --- CF 12.5% #7 |
---|
169 | AF --- BCDE 6.25% #8 |
---|
170 | ACDE --- BF 6.25% #9 |
---|
171 | ABCE --- DF 6.25% #10 |
---|
172 | AC --- BDEF 6.25% #11 |
---|
173 | ADEF --- BC 6.25% #12 |
---|
174 | ABDF --- CE 6.25% #13 |
---|
175 | ABEF --- CD 6.25% #14 |
---|
176 | A --- BCDEF 100% #15 |
---|
177 | B --- ACDEF 100% #16 |
---|
178 | D --- ABCEF 100% #17 |
---|
179 | E --- ABCDF 100% #18 |
---|
180 | C --- ABDEF 50% #19 |
---|
181 | F --- ACBDE 50% #20 |
---|
182 | |
---|
183 | Branches were listed in the order they will be picked: first picking the |
---|
184 | inner branches (#1 .. #14) in order of their bootstrap values, then the |
---|
185 | leaf branches (other criteria not shown in this example). |
---|
186 | |
---|
187 | That results in the following tree building steps: |
---|
188 | |
---|
189 | AB---CDEF (add branch #1) |
---|
190 | |
---|
191 | |
---|
192 | CD |
---|
193 | / |
---|
194 | / |
---|
195 | AB--+ (add branch #2) |
---|
196 | \56% |
---|
197 | \ |
---|
198 | EF |
---|
199 | |
---|
200 | |
---|
201 | Now it's impossible to insert branch #3 => branch is dropped! |
---|
202 | |
---|
203 | |
---|
204 | C |
---|
205 | / |
---|
206 | / |
---|
207 | AB--+ (add branch #4) |
---|
208 | \33% |
---|
209 | \ |
---|
210 | +---D |
---|
211 | / |
---|
212 | /56% |
---|
213 | EF |
---|
214 | |
---|
215 | |
---|
216 | Branches #5 to #14 are now impossible to insert => drop them |
---|
217 | |
---|
218 | Now only leaf branches (#15 to #20) have to be added, which leads to |
---|
219 | the following final topology: |
---|
220 | |
---|
221 | A C |
---|
222 | \ / |
---|
223 | \ / |
---|
224 | +---+ |
---|
225 | / \33% |
---|
226 | / \ |
---|
227 | B +---D |
---|
228 | / |
---|
229 | /56% |
---|
230 | E---+ |
---|
231 | | |
---|
232 | | |
---|
233 | F |
---|
234 | |
---|
235 | |
---|
236 | WARNINGS None |
---|
237 | |
---|
238 | BUGS No bugs known |
---|