source: branches/stable/HELP_SOURCE/oldhelp/consense_algo.hlp

Last change on this file was 17742, checked in by westram, 6 years ago
  • fix unnecessary typos.
File size: 11.7 KB
Line 
1#Please insert up references in the next lines (line starts with keyword UP)
2UP      arb.hlp
3UP      glossary.hlp
4
5#Please insert subtopic references  (line starts with keyword SUB)
6SUB     consense_tree.hlp
7SUB     bootstrap.hlp
8
9# Hypertext links in helptext can be added like this: LINK{ref.hlp|http://add|bla@domain}
10
11#************* Title of helpfile !! and start of real helpfile ********
12TITLE           Algorithm used for consensus tree
13
14OCCURRENCE      ARB_DIST/NJ bootstrap
15                ARB_NT/Tree/Build consensus tree
16
17DESCRIPTION     ARB has its own library for calculating consensus trees, which is used
18                when
19                 - calculating bootstrap trees with NJ (ARB_DIST),
20                 - directly calculating consensus trees using
21                   ARB_NT/Tree/Build consensus tree and
22                 - from the command line tool arb_consensus_tree.
23
24                The algorithm works as follows:
25
26                 - all trees are read and all occurring branches are stored in a branch-pool
27                 - the consensus tree is constructed iteratively by
28                   - picking the "best" branch and adding it to the tree.
29                   - deleting all now impossible branches from the branch pool.
30
31                The best branch (at each time of the iteration) is determined by the following
32                criteria (listed in order of significance):
33
34                 - inner branches are picked before leaf branches
35                 - branches occurring more often (i.e. branches with higher bootstrap values) are picked first
36                 - branches nearer to the center of the tree are preferred over more distant branches
37                 - longer branches are preferred over shorter branches
38
39                If all of the above criteria are really equal for two branches, the pick-order
40                depends on their appearance in the source trees (to make results reproducible).
41
42                The distance of each branch to the center of the tree is defined by the difference
43                between the number of species on each side of the branch. Branches with an equal
44                number of species on each side are considered "at the center" of the tree.
45
46SECTION         Partial trees
47
48                If a tree doesn't contain the full set of species (i.e. if the tree is partial),
49                all branches of that tree differ from branches of full trees.
50
51                To allow merging such trees with other trees, the set of missing species is added
52                once to each side of each branch, adding two hypothetical branches to the branch pool
53                (hypothetical in that way, that they do not necessarily occur in any tree).
54
55                These two hypothetical branches are added with different probabilities:
56
57                 * adding missing species to the bigger side of a branch is done with a probability of 1.0
58                 * adding missing species to the smaller side of a branch is done with a
59                   probability P calculated as follows:
60
61                      P  =  ( S / B ) ^ M
62
63                   where
64
65                      S  =  number of edges on the SMALL side of the branch + 1
66                      B  =  number of edges on the BIG side of the branch + 1
67                      M  =  number of missing species
68
69                That probability tends towards zero for leaf branches and towards 1.0 for
70                center branches.
71
72                If you merge nearly identical topologies, which only differ by some species
73                removed from some of the topologies, the resulting bootstrap values reflect
74                the number of trees the species occurred in (e.g. 67% if species occurred
75                in 2 of 3 trees).
76
77                Another kind of branches is completely artificial: a branch with the missing
78                species on one side and all species from the partial tree on the other side.
79                These artificial branches will ONLY be considered for the output topology, if no
80                other tree defines such a branch and if it's absolutely necessary to build a
81                connected topology.
82
83                Since nothing is known about the length of such an artificial branch, a distance
84                of 1.0 will be assumed (e.g. happens when you merge disjunct trees) and a zero
85                bootstrap value will be written to the output tree.
86
87
88SECTION         Branch lengths
89
90                The branch lengths of the generated consensus tree are averaged from the input
91                branch lengths. When merging full trees, they should be quite accurate.
92
93                When merging partial trees, the lengths will be hypothetical, because all branches
94                from partial trees are hypothetical and nothing is known about how the length of
95                each branch will change by really adding the set of missing species when you use
96                a tree reconstruction tool.
97
98                As described above, hypothetical branches are inserted with a lowered
99                weight (probability) in case the missing species are estimated on the less probable
100                side of that branch. In that case the branch length of the hypothetical branch is
101                only added weighted proportionally to the assumed probability.
102
103                Example:
104
105                   Imagine merging 3 trees, where species C is missing in one tree and length of
106                   the branch 'AB---CD' is 0.3 in the two full trees and the length of the
107                   branch 'AB---D' in the partial tree is 0.6.
108
109                   The probability of adding C to the D-side of 'AB---D' is P=2/3.
110                   The summarized weight for that branch is 8/3, the summarized length will
111                   be 2*0.3 + 2/3*0.6 = 1.0. Then the average length calculates
112                   to 1.0 / (8/3) = 0.375, i.e a bit more than 0.3, but far from 0.6.
113
114
115NOTES           Please consider using a tree reconstruction tool (e.g ARB_PARSIMONY) to recalculate
116                the branch lengths of the generated consensus tree.
117
118EXAMPLES        Example for 2 trees:
119
120
121                          A      C                  A      D
122                           \    /                    \    /
123                            +--+                      +--+
124                           /    \                    /    \
125                          B      +--E               B      +--F
126                                 |                         |
127                                 D                         E
128
129
130                These trees contain the following branches:
131
132                           A---BCDE                  A---BDEF
133                           B---ACDE                  B---ADEF
134                           C---ABDE                  D---ABEF
135                           D---ABCE                  E---ABDF
136                           E---ABCD                  F---ABDE
137                           AB---CDE                  AB---DEF
138                           ABC---DE                  ABD---EF
139
140                Consensus tree is build upon the following branch pool:
141
142                           A      --- BCDE[F]              A[F]      --- BCDE
143                           B      --- ACDE[F]              B[F]      --- ACDE
144                           C      --- ABDE[F]              C[F]      --- ABDE
145                           D      --- ABCE[F]              D[F]      --- ABCE
146                           E      --- ABCD[F]              E[F]      --- ABCD
147                           A      --- B[C]DEF              A[C]      --- BDEF
148                           B      --- A[C]DEF              B[C]      --- ADEF
149                           D      --- A[C]BEF              D[C]      --- ABEF
150                           E      --- A[C]BDF              E[C]      --- ABDF
151                           F      --- A[C]BDE              F[C]      --- ABDE
152                           AB     --- CDE[F]               AB[F]     --- CDE
153                           ABC[F] --- DE                   ABC       --- DE[F]
154                           AB     --- [C]DEF               AB[C]     --- DEF
155                           AB[C]D --- EF                   ABD       --- [C]EF
156
157                (added missing species are shown in brackets; in left column species were added to
158                more likely side; in right column to less likely side)
159
160                Bootstrap values are calculated for branches:
161
162                           AB --- CDEF          100%            #1
163                           EF --- ABCD           56.2%          #2
164                           DE --- ABCF           50%            #3
165                           ABC --- DEF           33.3%          #4
166                           ABF --- CDE           16.7%          #5
167                           ABD --- CEF           16.7%          #6
168                           ABDE --- CF           12.5%          #7
169                           AF --- BCDE            6.25%         #8
170                           ACDE --- BF            6.25%         #9
171                           ABCE --- DF            6.25%         #10
172                           AC --- BDEF            6.25%         #11
173                           ADEF --- BC            6.25%         #12
174                           ABDF --- CE            6.25%         #13
175                           ABEF --- CD            6.25%         #14
176                           A --- BCDEF          100%            #15
177                           B --- ACDEF          100%            #16
178                           D --- ABCEF          100%            #17
179                           E --- ABCDF          100%            #18
180                           C --- ABDEF           50%            #19
181                           F --- ACBDE           50%            #20
182
183                Branches were listed in the order they will be picked: first picking the
184                inner branches (#1 .. #14) in order of their bootstrap values, then the
185                leaf branches (other criteria not shown in this example).
186
187                That results in the following tree building steps:
188
189                              AB---CDEF                    (add branch #1)
190
191
192                                     CD
193                                    /
194                                   /
195                              AB--+                        (add branch #2)
196                                   \56%
197                                    \
198                                     EF
199
200
201                              Now it's impossible to insert branch #3 => branch is dropped!
202
203
204                                     C
205                                    /
206                                   /
207                              AB--+                        (add branch #4)
208                                   \33%
209                                    \
210                                     +---D
211                                    /
212                                   /56%
213                                 EF
214
215
216                              Branches #5 to #14 are now impossible to insert => drop them
217
218                              Now only leaf branches (#15 to #20) have to be added, which leads to
219                              the following final topology:
220
221                              A         C
222                               \       /
223                                \     /
224                                 +---+
225                                /     \33%
226                               /       \
227                              B         +---D
228                                       /
229                                      /56%
230                                 E---+
231                                     |
232                                     |
233                                     F
234
235
236WARNINGS        None
237
238BUGS            No bugs known
Note: See TracBrowser for help on using the repository browser.