| 1 | #Please insert up references in the next lines (line starts with keyword UP) |
|---|
| 2 | UP arb.hlp |
|---|
| 3 | UP glossary.hlp |
|---|
| 4 | |
|---|
| 5 | #Please insert subtopic references (line starts with keyword SUB) |
|---|
| 6 | SUB consense_tree.hlp |
|---|
| 7 | SUB bootstrap.hlp |
|---|
| 8 | |
|---|
| 9 | # Hypertext links in helptext can be added like this: LINK{ref.hlp|http://add|bla@domain} |
|---|
| 10 | |
|---|
| 11 | #************* Title of helpfile !! and start of real helpfile ******** |
|---|
| 12 | TITLE Algorithm used for consensus tree |
|---|
| 13 | |
|---|
| 14 | OCCURRENCE ARB_DIST/NJ bootstrap |
|---|
| 15 | ARB_NT/Tree/Build consensus tree |
|---|
| 16 | |
|---|
| 17 | DESCRIPTION ARB has its own library for calculating consensus trees, which is used |
|---|
| 18 | when |
|---|
| 19 | - calculating bootstrap trees with NJ (ARB_DIST), |
|---|
| 20 | - directly calculating consensus trees using |
|---|
| 21 | ARB_NT/Tree/Build consensus tree and |
|---|
| 22 | - from the command line tool arb_consensus_tree. |
|---|
| 23 | |
|---|
| 24 | The algorithm works as follows: |
|---|
| 25 | |
|---|
| 26 | - all trees are read and all occurring branches are stored in a branch-pool |
|---|
| 27 | - the consensus tree is constructed iteratively by |
|---|
| 28 | - picking the "best" branch and adding it to the tree. |
|---|
| 29 | - deleting all now impossible branches from the branch pool. |
|---|
| 30 | |
|---|
| 31 | The best branch (at each time of the iteration) is determined by the following |
|---|
| 32 | criteria (listed in order of significance): |
|---|
| 33 | |
|---|
| 34 | - inner branches are picked before leaf branches |
|---|
| 35 | - branches occurring more often (i.e. branches with higher bootstrap values) are picked first |
|---|
| 36 | - branches nearer to the center of the tree are preferred over more distant branches |
|---|
| 37 | - longer branches are preferred over shorter branches |
|---|
| 38 | |
|---|
| 39 | If all of the above criteria are really equal for two branches, the pick-order |
|---|
| 40 | depends on their appearance in the source trees (to make results reproducible). |
|---|
| 41 | |
|---|
| 42 | The distance of each branch to the center of the tree is defined by the difference |
|---|
| 43 | between the number of species on each side of the branch. Branches with an equal |
|---|
| 44 | number of species on each side are considered "at the center" of the tree. |
|---|
| 45 | |
|---|
| 46 | SECTION Partial trees |
|---|
| 47 | |
|---|
| 48 | If a tree doesn't contain the full set of species (i.e. if the tree is partial), |
|---|
| 49 | all branches of that tree differ from branches of full trees. |
|---|
| 50 | |
|---|
| 51 | To allow merging such trees with other trees, the set of missing species is added |
|---|
| 52 | once to each side of each branch, adding two hypothetical branches to the branch pool |
|---|
| 53 | (hypothetical in that way, that they do not necessarily occur in any tree). |
|---|
| 54 | |
|---|
| 55 | These two hypothetical branches are added with different probabilities: |
|---|
| 56 | |
|---|
| 57 | * adding missing species to the bigger side of a branch is done with a probability of 1.0 |
|---|
| 58 | * adding missing species to the smaller side of a branch is done with a |
|---|
| 59 | probability P calculated as follows: |
|---|
| 60 | |
|---|
| 61 | P = ( S / B ) ^ M |
|---|
| 62 | |
|---|
| 63 | where |
|---|
| 64 | |
|---|
| 65 | S = number of edges on the SMALL side of the branch + 1 |
|---|
| 66 | B = number of edges on the BIG side of the branch + 1 |
|---|
| 67 | M = number of missing species |
|---|
| 68 | |
|---|
| 69 | That probability tends towards zero for leaf branches and towards 1.0 for |
|---|
| 70 | center branches. |
|---|
| 71 | |
|---|
| 72 | If you merge nearly identical topologies, which only differ by some species |
|---|
| 73 | removed from some of the topologies, the resulting bootstrap values reflect |
|---|
| 74 | the number of trees the species occurred in (e.g. 67% if species occurred |
|---|
| 75 | in 2 of 3 trees). |
|---|
| 76 | |
|---|
| 77 | Another kind of branches is completely artificial: a branch with the missing |
|---|
| 78 | species on one side and all species from the partial tree on the other side. |
|---|
| 79 | These artificial branches will ONLY be considered for the output topology, if no |
|---|
| 80 | other tree defines such a branch and if it's absolutely necessary to build a |
|---|
| 81 | connected topology. |
|---|
| 82 | |
|---|
| 83 | Since nothing is known about the length of such an artificial branch, a distance |
|---|
| 84 | of 1.0 will be assumed (e.g. happens when you merge disjunct trees) and a zero |
|---|
| 85 | bootstrap value will be written to the output tree. |
|---|
| 86 | |
|---|
| 87 | |
|---|
| 88 | SECTION Branch lengths |
|---|
| 89 | |
|---|
| 90 | The branch lengths of the generated consensus tree are averaged from the input |
|---|
| 91 | branch lengths. When merging full trees, they should be quite accurate. |
|---|
| 92 | |
|---|
| 93 | When merging partial trees, the lengths will be hypothetical, because all branches |
|---|
| 94 | from partial trees are hypothetical and nothing is known about how the length of |
|---|
| 95 | each branch will change by really adding the set of missing species when you use |
|---|
| 96 | a tree reconstruction tool. |
|---|
| 97 | |
|---|
| 98 | As described above, hypothetical branches are inserted with a lowered |
|---|
| 99 | weight (probability) in case the missing species are estimated on the less probable |
|---|
| 100 | side of that branch. In that case the branch length of the hypothetical branch is |
|---|
| 101 | only added weighted proportionally to the assumed probability. |
|---|
| 102 | |
|---|
| 103 | Example: |
|---|
| 104 | |
|---|
| 105 | Imagine merging 3 trees, where species C is missing in one tree and length of |
|---|
| 106 | the branch 'AB---CD' is 0.3 in the two full trees and the length of the |
|---|
| 107 | branch 'AB---D' in the partial tree is 0.6. |
|---|
| 108 | |
|---|
| 109 | The probability of adding C to the D-side of 'AB---D' is P=2/3. |
|---|
| 110 | The summarized weight for that branch is 8/3, the summarized length will |
|---|
| 111 | be 2*0.3 + 2/3*0.6 = 1.0. Then the average length calculates |
|---|
| 112 | to 1.0 / (8/3) = 0.375, i.e a bit more than 0.3, but far from 0.6. |
|---|
| 113 | |
|---|
| 114 | |
|---|
| 115 | NOTES Please consider using a tree reconstruction tool (e.g ARB_PARSIMONY) to recalculate |
|---|
| 116 | the branch lengths of the generated consensus tree. |
|---|
| 117 | |
|---|
| 118 | EXAMPLES Example for 2 trees: |
|---|
| 119 | |
|---|
| 120 | |
|---|
| 121 | A C A D |
|---|
| 122 | \ / \ / |
|---|
| 123 | +--+ +--+ |
|---|
| 124 | / \ / \ |
|---|
| 125 | B +--E B +--F |
|---|
| 126 | | | |
|---|
| 127 | D E |
|---|
| 128 | |
|---|
| 129 | |
|---|
| 130 | These trees contain the following branches: |
|---|
| 131 | |
|---|
| 132 | A---BCDE A---BDEF |
|---|
| 133 | B---ACDE B---ADEF |
|---|
| 134 | C---ABDE D---ABEF |
|---|
| 135 | D---ABCE E---ABDF |
|---|
| 136 | E---ABCD F---ABDE |
|---|
| 137 | AB---CDE AB---DEF |
|---|
| 138 | ABC---DE ABD---EF |
|---|
| 139 | |
|---|
| 140 | Consensus tree is build upon the following branch pool: |
|---|
| 141 | |
|---|
| 142 | A --- BCDE[F] A[F] --- BCDE |
|---|
| 143 | B --- ACDE[F] B[F] --- ACDE |
|---|
| 144 | C --- ABDE[F] C[F] --- ABDE |
|---|
| 145 | D --- ABCE[F] D[F] --- ABCE |
|---|
| 146 | E --- ABCD[F] E[F] --- ABCD |
|---|
| 147 | A --- B[C]DEF A[C] --- BDEF |
|---|
| 148 | B --- A[C]DEF B[C] --- ADEF |
|---|
| 149 | D --- A[C]BEF D[C] --- ABEF |
|---|
| 150 | E --- A[C]BDF E[C] --- ABDF |
|---|
| 151 | F --- A[C]BDE F[C] --- ABDE |
|---|
| 152 | AB --- CDE[F] AB[F] --- CDE |
|---|
| 153 | ABC[F] --- DE ABC --- DE[F] |
|---|
| 154 | AB --- [C]DEF AB[C] --- DEF |
|---|
| 155 | AB[C]D --- EF ABD --- [C]EF |
|---|
| 156 | |
|---|
| 157 | (added missing species are shown in brackets; in left column species were added to |
|---|
| 158 | more likely side; in right column to less likely side) |
|---|
| 159 | |
|---|
| 160 | Bootstrap values are calculated for branches: |
|---|
| 161 | |
|---|
| 162 | AB --- CDEF 100% #1 |
|---|
| 163 | EF --- ABCD 56.2% #2 |
|---|
| 164 | DE --- ABCF 50% #3 |
|---|
| 165 | ABC --- DEF 33.3% #4 |
|---|
| 166 | ABF --- CDE 16.7% #5 |
|---|
| 167 | ABD --- CEF 16.7% #6 |
|---|
| 168 | ABDE --- CF 12.5% #7 |
|---|
| 169 | AF --- BCDE 6.25% #8 |
|---|
| 170 | ACDE --- BF 6.25% #9 |
|---|
| 171 | ABCE --- DF 6.25% #10 |
|---|
| 172 | AC --- BDEF 6.25% #11 |
|---|
| 173 | ADEF --- BC 6.25% #12 |
|---|
| 174 | ABDF --- CE 6.25% #13 |
|---|
| 175 | ABEF --- CD 6.25% #14 |
|---|
| 176 | A --- BCDEF 100% #15 |
|---|
| 177 | B --- ACDEF 100% #16 |
|---|
| 178 | D --- ABCEF 100% #17 |
|---|
| 179 | E --- ABCDF 100% #18 |
|---|
| 180 | C --- ABDEF 50% #19 |
|---|
| 181 | F --- ACBDE 50% #20 |
|---|
| 182 | |
|---|
| 183 | Branches were listed in the order they will be picked: first picking the |
|---|
| 184 | inner branches (#1 .. #14) in order of their bootstrap values, then the |
|---|
| 185 | leaf branches (other criteria not shown in this example). |
|---|
| 186 | |
|---|
| 187 | That results in the following tree building steps: |
|---|
| 188 | |
|---|
| 189 | AB---CDEF (add branch #1) |
|---|
| 190 | |
|---|
| 191 | |
|---|
| 192 | CD |
|---|
| 193 | / |
|---|
| 194 | / |
|---|
| 195 | AB--+ (add branch #2) |
|---|
| 196 | \56% |
|---|
| 197 | \ |
|---|
| 198 | EF |
|---|
| 199 | |
|---|
| 200 | |
|---|
| 201 | Now it's impossible to insert branch #3 => branch is dropped! |
|---|
| 202 | |
|---|
| 203 | |
|---|
| 204 | C |
|---|
| 205 | / |
|---|
| 206 | / |
|---|
| 207 | AB--+ (add branch #4) |
|---|
| 208 | \33% |
|---|
| 209 | \ |
|---|
| 210 | +---D |
|---|
| 211 | / |
|---|
| 212 | /56% |
|---|
| 213 | EF |
|---|
| 214 | |
|---|
| 215 | |
|---|
| 216 | Branches #5 to #14 are now impossible to insert => drop them |
|---|
| 217 | |
|---|
| 218 | Now only leaf branches (#15 to #20) have to be added, which leads to |
|---|
| 219 | the following final topology: |
|---|
| 220 | |
|---|
| 221 | A C |
|---|
| 222 | \ / |
|---|
| 223 | \ / |
|---|
| 224 | +---+ |
|---|
| 225 | / \33% |
|---|
| 226 | / \ |
|---|
| 227 | B +---D |
|---|
| 228 | / |
|---|
| 229 | /56% |
|---|
| 230 | E---+ |
|---|
| 231 | | |
|---|
| 232 | | |
|---|
| 233 | F |
|---|
| 234 | |
|---|
| 235 | |
|---|
| 236 | WARNINGS None |
|---|
| 237 | |
|---|
| 238 | BUGS No bugs known |
|---|