| 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> |
|---|
| 2 | <HTML> |
|---|
| 3 | <HEAD> |
|---|
| 4 | <TITLE>dollop</TITLE> |
|---|
| 5 | <META NAME="description" CONTENT="dollop"> |
|---|
| 6 | <META NAME="keywords" CONTENT="dollop"> |
|---|
| 7 | <META NAME="resource-type" CONTENT="document"> |
|---|
| 8 | <META NAME="distribution" CONTENT="global"> |
|---|
| 9 | <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> |
|---|
| 10 | </HEAD> |
|---|
| 11 | <BODY BGCOLOR="#ccffff"> |
|---|
| 12 | <DIV ALIGN=RIGHT> |
|---|
| 13 | version 3.6 |
|---|
| 14 | </DIV> |
|---|
| 15 | <P> |
|---|
| 16 | <DIV ALIGN=CENTER> |
|---|
| 17 | <H1>DOLLOP -- Dollo and Polymorphism Parsimony Program</H1> |
|---|
| 18 | </DIV> |
|---|
| 19 | <P> |
|---|
| 20 | © Copyright 1986-2002 by the University of |
|---|
| 21 | Washington. Written by Joseph Felsenstein. Permission is granted to copy |
|---|
| 22 | this document provided that no fee is charged for it and that this copyright |
|---|
| 23 | notice is not removed. |
|---|
| 24 | <P> |
|---|
| 25 | This program carries out the Dollo and polymorphism parsimony methods. The |
|---|
| 26 | Dollo parsimony method was |
|---|
| 27 | first suggested in print in verbal form by Le Quesne (1974) and was |
|---|
| 28 | first well-specified by Farris (1977). The method is named after Louis |
|---|
| 29 | Dollo since he was one of the first to assert that in evolution it is |
|---|
| 30 | harder to gain a complex feature than to lose it. The algorithm |
|---|
| 31 | explains the presence of the state 1 by allowing up to one forward |
|---|
| 32 | change 0-->1 and as many reversions 1-->0 as are necessary to explain |
|---|
| 33 | the pattern of states seen. The program attempts to minimize the number |
|---|
| 34 | of 1-->0 reversions necessary. |
|---|
| 35 | <P> |
|---|
| 36 | The assumptions of this method are in effect: |
|---|
| 37 | <OL> |
|---|
| 38 | <LI>We know which state is the ancestral one (state 0). |
|---|
| 39 | <LI>The characters are evolving independently. |
|---|
| 40 | <LI>Different lineages evolve independently. |
|---|
| 41 | <LI>The probability of a forward change (0-->1) is small over the |
|---|
| 42 | evolutionary times involved. |
|---|
| 43 | <LI>The probability of a reversion (1-->0) is also small, but |
|---|
| 44 | still far larger than the probability of a forward change, so |
|---|
| 45 | that many reversions are easier to envisage than even one |
|---|
| 46 | extra forward change. |
|---|
| 47 | <LI>Retention of polymorphism for both states (0 and 1) is highly |
|---|
| 48 | improbable. |
|---|
| 49 | <LI>The lengths of the segments of the true tree are not so |
|---|
| 50 | unequal that two changes in a long segment are as probable as |
|---|
| 51 | one in a short segment. |
|---|
| 52 | </OL> |
|---|
| 53 | <P> |
|---|
| 54 | One problem can arise when using additive binary recoding to |
|---|
| 55 | represent a multistate character as a series of two-state characters. Unlike |
|---|
| 56 | the Camin-Sokal, Wagner, and Polymorphism methods, the Dollo |
|---|
| 57 | method can reconstruct ancestral states which do not exist. An example |
|---|
| 58 | is given in my 1979 paper. It will be necessary to check the output to |
|---|
| 59 | make sure that this has not occurred. |
|---|
| 60 | <P> |
|---|
| 61 | The polymorphism parsimony method was first used by me, and the results |
|---|
| 62 | published (without a clear |
|---|
| 63 | specification of the method) by Inger (1967). The method was |
|---|
| 64 | independently published by Farris (1978a) and by me (1979). The method |
|---|
| 65 | assumes that we can explain the pattern of states by no more than one |
|---|
| 66 | origination (0-->1) of state 1, followed by retention of polymorphism |
|---|
| 67 | along as many segments of the tree as are necessary, followed by loss of |
|---|
| 68 | state 0 or of state 1 where necessary. The program tries to minimize |
|---|
| 69 | the total number of polymorphic characters, where each polymorphism is |
|---|
| 70 | counted once for each segment of the tree in which it is retained. |
|---|
| 71 | <P> |
|---|
| 72 | The assumptions of the polymorphism parsimony method are in effect: |
|---|
| 73 | <OL> |
|---|
| 74 | <LI>The ancestral state (state 0) is known in each character. |
|---|
| 75 | <LI>The characters are evolving independently of each other. |
|---|
| 76 | <LI>Different lineages are evolving independently. |
|---|
| 77 | <LI>Forward change (0-->1) is highly improbable over the length of |
|---|
| 78 | time involved in the evolution of the group. |
|---|
| 79 | <LI>Retention of polymorphism is also improbable, but far more |
|---|
| 80 | probable that forward change, so that we can more easily |
|---|
| 81 | envisage much polymorhism than even one additional forward |
|---|
| 82 | change. |
|---|
| 83 | <LI>Once state 1 is reached, reoccurrence of state 0 is very |
|---|
| 84 | improbable, much less probable than multiple retentions of |
|---|
| 85 | polymorphism. |
|---|
| 86 | <LI>The lengths of segments in the true tree are not so unequal |
|---|
| 87 | that we can more easily envisage retention events occurring in |
|---|
| 88 | both of two long segments than one retention in a short |
|---|
| 89 | segment. |
|---|
| 90 | </OL> |
|---|
| 91 | <P> |
|---|
| 92 | That these are the assumptions of parsimony methods has been documented |
|---|
| 93 | in a series of papers of mine: (1973a, 1978b, 1979, 1981b, |
|---|
| 94 | 1983b, 1988b). For an opposing view arguing that the parsimony methods |
|---|
| 95 | make no substantive |
|---|
| 96 | assumptions such as these, see the papers by Farris (1983) and Sober (1983a, |
|---|
| 97 | 1983b), but also read the exchange between Felsenstein and Sober (1986). |
|---|
| 98 | <P> |
|---|
| 99 | The input format is the standard one, with "?", "P", "B" states |
|---|
| 100 | allowed. The options are selected using a menu: |
|---|
| 101 | <P> |
|---|
| 102 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 103 | <PRE> |
|---|
| 104 | |
|---|
| 105 | Dollo and polymorphism parsimony algorithm, version 3.6a3 |
|---|
| 106 | |
|---|
| 107 | Settings for this run: |
|---|
| 108 | U Search for best tree? Yes |
|---|
| 109 | P Parsimony method? Dollo |
|---|
| 110 | J Randomize input order of species? No. Use input order |
|---|
| 111 | T Use Threshold parsimony? No, use ordinary parsimony |
|---|
| 112 | A Use ancestral states in input file? No |
|---|
| 113 | W Sites weighted? No |
|---|
| 114 | M Analyze multiple data sets? No |
|---|
| 115 | 0 Terminal type (IBM PC, ANSI, none)? (none) |
|---|
| 116 | 1 Print out the data at start of run No |
|---|
| 117 | 2 Print indications of progress of run Yes |
|---|
| 118 | 3 Print out tree Yes |
|---|
| 119 | 4 Print out steps in each character No |
|---|
| 120 | 5 Print states at all nodes of tree No |
|---|
| 121 | 6 Write out trees onto tree file? Yes |
|---|
| 122 | |
|---|
| 123 | Are these settings correct? (type Y or the letter for one to change) |
|---|
| 124 | </PRE> |
|---|
| 125 | </TD></TR></TABLE> |
|---|
| 126 | <P> |
|---|
| 127 | The options U, J, T, A, and M are the usual User Tree, Jumble, |
|---|
| 128 | Ancestral States, and Multiple Data Sets options, described either |
|---|
| 129 | in the main documentation file or in the Discrete Characters Programs |
|---|
| 130 | documentation file. The A (Ancestral States) |
|---|
| 131 | option allows implementation of the unordered Dollo parsimony and unordered |
|---|
| 132 | polymorphism parsimony methods which I have |
|---|
| 133 | described elsewhere (1984b). When the A option is used the ancestor is |
|---|
| 134 | not to be counted as one of the species. The O (outgroup) option is not |
|---|
| 135 | available since the tree produced is already rooted. Since the Dollo and |
|---|
| 136 | polymorphism methods produce a rooted |
|---|
| 137 | tree, the user-defined trees required by the U option have two-way forks |
|---|
| 138 | at each level. |
|---|
| 139 | <P> |
|---|
| 140 | The P (Parsimony Method) option is the one that toggles between polymorphism |
|---|
| 141 | parsimony and Dollo parsimony. The program defaults to Dollo parsimony. |
|---|
| 142 | <P> |
|---|
| 143 | The T (Threshold) option has already been described in |
|---|
| 144 | the Discrete Characters programs documentation file. Setting T at or below |
|---|
| 145 | 1.0 but above 0 causes the criterion to become compatibility rather than |
|---|
| 146 | polymorphism parsimony, although there is no advantage to using this |
|---|
| 147 | program instead of MIX to do a compatibility method. Setting the |
|---|
| 148 | threshold value higher brings about an intermediate between |
|---|
| 149 | the Dollo or polymorphism parsimony methods and the compatibility method, |
|---|
| 150 | so that there is some rationale for doing that. Since the Dollo and |
|---|
| 151 | polymorphism methods produces a rooted |
|---|
| 152 | tree, the user-defined trees required by the U option have two-way forks |
|---|
| 153 | at each level. |
|---|
| 154 | <P> |
|---|
| 155 | Using a threshold value of 1.0 or lower, but above 0, one can |
|---|
| 156 | obtain a rooted (or, if the A option is used with ancestral states of |
|---|
| 157 | "?", unrooted) compatibility criterion, but there is no particular |
|---|
| 158 | advantage to using this program for that instead of MIX. Higher |
|---|
| 159 | threshold values are of course meaningful and provide |
|---|
| 160 | intermediates between Dollo and compatibility methods. |
|---|
| 161 | <P> |
|---|
| 162 | The X |
|---|
| 163 | (Mixed parsimony methods) option is not available in this program. The |
|---|
| 164 | Factors option is also not available in this program, as it would have no |
|---|
| 165 | effect on the result even if that information were provided in the input file. |
|---|
| 166 | <P> |
|---|
| 167 | Output is standard: a list of equally parsimonious trees, and, if the |
|---|
| 168 | user selects menu option 4, a table |
|---|
| 169 | of the numbers of reversions or retentions of polymorphism necessary |
|---|
| 170 | in each character. If any of the |
|---|
| 171 | ancestral states has been specified to be unknown, a table of |
|---|
| 172 | reconstructed ancestral states is also provided. When reconstructing |
|---|
| 173 | the placement of forward changes and reversions under the Dollo method, |
|---|
| 174 | keep in mind that each |
|---|
| 175 | polymorphic state in the input data will require one "last minute" |
|---|
| 176 | reversion. This is included in the tabulated counts. Thus if we have |
|---|
| 177 | both states 0 and 1 at a tip of the tree the program will assume that |
|---|
| 178 | the lineage had state 1 up to the last minute, and then state 0 arose in |
|---|
| 179 | that population by reversion, without loss of state 1. |
|---|
| 180 | <P> |
|---|
| 181 | If the user selects menu option 5, a table is printed out after each |
|---|
| 182 | tree, showing for each branch whether |
|---|
| 183 | there are known to be changes in the branch, and what the states are inferred |
|---|
| 184 | to have been at the top end of the branch. If the inferred state is a "?" |
|---|
| 185 | there may be multiple equally-parsimonious assignments of states; the user |
|---|
| 186 | must work these out for themselves by hand. |
|---|
| 187 | <P> |
|---|
| 188 | If the A option is used, then the program will |
|---|
| 189 | infer, for any character whose ancestral state is unknown ("?") whether the |
|---|
| 190 | ancestral state 0 or 1 will give the best tree. If these are |
|---|
| 191 | tied, then it may not be possible for the program to infer the |
|---|
| 192 | state in the internal nodes, and these will all be printed as ".". If this |
|---|
| 193 | has happened and you want to know more about the states at the internal |
|---|
| 194 | nodes, you will find helpful to use DOLMOVE to display the tree and examine |
|---|
| 195 | its interior states, as the algorithm in DOLMOVE shows all that can be known |
|---|
| 196 | in this case about the interior states, including where there is and is not |
|---|
| 197 | amibiguity. The algorithm in DOLLOP gives up more easily on displaying these |
|---|
| 198 | states. |
|---|
| 199 | <P> |
|---|
| 200 | If the U (User Tree) option is used and more than one tree is supplied, the |
|---|
| 201 | program also performs a statistical test of each of these trees against the |
|---|
| 202 | best tree. This test, which is a version of the test proposed by |
|---|
| 203 | Alan Templeton (1983) and evaluated in a test case by me (1985a). It is |
|---|
| 204 | closely parallel to a test using log likelihood differences |
|---|
| 205 | invented by Kishino and Hasegawa (1989), and uses the mean and variance of |
|---|
| 206 | step differences between trees, taken across characters. If the mean |
|---|
| 207 | is more than 1.96 standard deviations different then the trees are declared |
|---|
| 208 | significantly different. The program |
|---|
| 209 | prints out a table of the steps for each tree, the differences of |
|---|
| 210 | each from the highest one, the variance of that quantity as determined by |
|---|
| 211 | the step differences at individual characters, and a conclusion as to |
|---|
| 212 | whether that tree is or is not significantly worse than the best one. It |
|---|
| 213 | is important to understand that the test assumes that all the binary |
|---|
| 214 | characters are evolving independently, which is unlikely to be true for |
|---|
| 215 | many suites of morphological characters. |
|---|
| 216 | <P> |
|---|
| 217 | If there are more than two trees, the test done is an extension of |
|---|
| 218 | the KHT test, due to Shimodaira and Hasegawa (1999). They pointed out |
|---|
| 219 | that a correction for the number of trees was necessary, and they |
|---|
| 220 | introduced a resampling method to make this correction. In the version |
|---|
| 221 | used here the variances and covariances of the sums of steps across |
|---|
| 222 | characters are computed for all pairs of trees. To test whether the |
|---|
| 223 | difference between each tree and the best one is larger than could have |
|---|
| 224 | been expected if they all had the same expected number of steps, |
|---|
| 225 | numbers of steps for all trees are sampled with these covariances and equal |
|---|
| 226 | means (Shimodaira and Hasegawa's "least favorable hypothesis"), |
|---|
| 227 | and a P value is computed from the fraction of times the difference between |
|---|
| 228 | the tree's value and the lowest number of steps exceeds that actually |
|---|
| 229 | observed. Note that this sampling needs random numbers, and so the |
|---|
| 230 | program will prompt the user for a random number seed if one has not |
|---|
| 231 | already been supplied. With the two-tree KHT test no random numbers |
|---|
| 232 | are used. |
|---|
| 233 | <P> |
|---|
| 234 | In either the KHT or the SH test the program |
|---|
| 235 | prints out a table of the number of steps for each tree, the differences of |
|---|
| 236 | each from the lowest one, the variance of that quantity as determined by |
|---|
| 237 | the differences of the numbers of steps at individual characters, |
|---|
| 238 | and a conclusion as to |
|---|
| 239 | whether that tree is or is not significantly worse than the best one. |
|---|
| 240 | <P> |
|---|
| 241 | At the beginning of the program is the constant |
|---|
| 242 | "maxtrees", the maximum number of trees which the program will store for |
|---|
| 243 | output. |
|---|
| 244 | <P> |
|---|
| 245 | The algorithm is a fairly simple adaptation of the one used in |
|---|
| 246 | the program SOKAL, which was formerly in this package and has been |
|---|
| 247 | superseded by MIX. It requires two passes through each tree to count the |
|---|
| 248 | numbers of reversions. |
|---|
| 249 | <P> |
|---|
| 250 | <HR> |
|---|
| 251 | <P> |
|---|
| 252 | <H3>TEST DATA SET</H3> |
|---|
| 253 | <P> |
|---|
| 254 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 255 | <PRE> |
|---|
| 256 | 5 6 |
|---|
| 257 | Alpha 110110 |
|---|
| 258 | Beta 110000 |
|---|
| 259 | Gamma 100110 |
|---|
| 260 | Delta 001001 |
|---|
| 261 | Epsilon 001110 |
|---|
| 262 | </PRE> |
|---|
| 263 | </TD></TR></TABLE> |
|---|
| 264 | <P> |
|---|
| 265 | <HR> |
|---|
| 266 | <P> |
|---|
| 267 | <H3>TEST SET OUTPUT (with all numerical options on)</H3> |
|---|
| 268 | <P> |
|---|
| 269 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 270 | <PRE> |
|---|
| 271 | |
|---|
| 272 | Dollo and polymorphism parsimony algorithm, version 3.6a3 |
|---|
| 273 | |
|---|
| 274 | 5 species, 6 characters |
|---|
| 275 | |
|---|
| 276 | Dollo parsimony method |
|---|
| 277 | |
|---|
| 278 | |
|---|
| 279 | Name Characters |
|---|
| 280 | ---- ---------- |
|---|
| 281 | |
|---|
| 282 | Alpha 11011 0 |
|---|
| 283 | Beta 11000 0 |
|---|
| 284 | Gamma 10011 0 |
|---|
| 285 | Delta 00100 1 |
|---|
| 286 | Epsilon 00111 0 |
|---|
| 287 | |
|---|
| 288 | |
|---|
| 289 | |
|---|
| 290 | One most parsimonious tree found: |
|---|
| 291 | |
|---|
| 292 | |
|---|
| 293 | |
|---|
| 294 | |
|---|
| 295 | +-----------Delta |
|---|
| 296 | --3 |
|---|
| 297 | ! +--------Epsilon |
|---|
| 298 | +--4 |
|---|
| 299 | ! +-----Gamma |
|---|
| 300 | +--2 |
|---|
| 301 | ! +--Beta |
|---|
| 302 | +--1 |
|---|
| 303 | +--Alpha |
|---|
| 304 | |
|---|
| 305 | |
|---|
| 306 | requires a total of 3.000 |
|---|
| 307 | |
|---|
| 308 | reversions in each character: |
|---|
| 309 | 0 1 2 3 4 5 6 7 8 9 |
|---|
| 310 | *----------------------------------------- |
|---|
| 311 | 0! 0 0 1 1 1 0 |
|---|
| 312 | |
|---|
| 313 | From To Any Steps? State at upper node |
|---|
| 314 | ( . means same as in the node below it on tree) |
|---|
| 315 | |
|---|
| 316 | root 3 yes ..1.. . |
|---|
| 317 | 3 Delta yes ..... 1 |
|---|
| 318 | 3 4 yes ...11 . |
|---|
| 319 | 4 Epsilon no ..... . |
|---|
| 320 | 4 2 yes 1.0.. . |
|---|
| 321 | 2 Gamma no ..... . |
|---|
| 322 | 2 1 yes .1... . |
|---|
| 323 | 1 Beta yes ...00 . |
|---|
| 324 | 1 Alpha no ..... . |
|---|
| 325 | |
|---|
| 326 | |
|---|
| 327 | </PRE> |
|---|
| 328 | </TD></TR></TABLE> |
|---|
| 329 | </BODY> |
|---|
| 330 | </HTML> |
|---|