1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> |
---|
2 | <HTML> |
---|
3 | <HEAD> |
---|
4 | <TITLE>dollop</TITLE> |
---|
5 | <META NAME="description" CONTENT="dollop"> |
---|
6 | <META NAME="keywords" CONTENT="dollop"> |
---|
7 | <META NAME="resource-type" CONTENT="document"> |
---|
8 | <META NAME="distribution" CONTENT="global"> |
---|
9 | <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> |
---|
10 | </HEAD> |
---|
11 | <BODY BGCOLOR="#ccffff"> |
---|
12 | <DIV ALIGN=RIGHT> |
---|
13 | version 3.6 |
---|
14 | </DIV> |
---|
15 | <P> |
---|
16 | <DIV ALIGN=CENTER> |
---|
17 | <H1>DOLLOP -- Dollo and Polymorphism Parsimony Program</H1> |
---|
18 | </DIV> |
---|
19 | <P> |
---|
20 | © Copyright 1986-2002 by the University of |
---|
21 | Washington. Written by Joseph Felsenstein. Permission is granted to copy |
---|
22 | this document provided that no fee is charged for it and that this copyright |
---|
23 | notice is not removed. |
---|
24 | <P> |
---|
25 | This program carries out the Dollo and polymorphism parsimony methods. The |
---|
26 | Dollo parsimony method was |
---|
27 | first suggested in print in verbal form by Le Quesne (1974) and was |
---|
28 | first well-specified by Farris (1977). The method is named after Louis |
---|
29 | Dollo since he was one of the first to assert that in evolution it is |
---|
30 | harder to gain a complex feature than to lose it. The algorithm |
---|
31 | explains the presence of the state 1 by allowing up to one forward |
---|
32 | change 0-->1 and as many reversions 1-->0 as are necessary to explain |
---|
33 | the pattern of states seen. The program attempts to minimize the number |
---|
34 | of 1-->0 reversions necessary. |
---|
35 | <P> |
---|
36 | The assumptions of this method are in effect: |
---|
37 | <OL> |
---|
38 | <LI>We know which state is the ancestral one (state 0). |
---|
39 | <LI>The characters are evolving independently. |
---|
40 | <LI>Different lineages evolve independently. |
---|
41 | <LI>The probability of a forward change (0-->1) is small over the |
---|
42 | evolutionary times involved. |
---|
43 | <LI>The probability of a reversion (1-->0) is also small, but |
---|
44 | still far larger than the probability of a forward change, so |
---|
45 | that many reversions are easier to envisage than even one |
---|
46 | extra forward change. |
---|
47 | <LI>Retention of polymorphism for both states (0 and 1) is highly |
---|
48 | improbable. |
---|
49 | <LI>The lengths of the segments of the true tree are not so |
---|
50 | unequal that two changes in a long segment are as probable as |
---|
51 | one in a short segment. |
---|
52 | </OL> |
---|
53 | <P> |
---|
54 | One problem can arise when using additive binary recoding to |
---|
55 | represent a multistate character as a series of two-state characters. Unlike |
---|
56 | the Camin-Sokal, Wagner, and Polymorphism methods, the Dollo |
---|
57 | method can reconstruct ancestral states which do not exist. An example |
---|
58 | is given in my 1979 paper. It will be necessary to check the output to |
---|
59 | make sure that this has not occurred. |
---|
60 | <P> |
---|
61 | The polymorphism parsimony method was first used by me, and the results |
---|
62 | published (without a clear |
---|
63 | specification of the method) by Inger (1967). The method was |
---|
64 | independently published by Farris (1978a) and by me (1979). The method |
---|
65 | assumes that we can explain the pattern of states by no more than one |
---|
66 | origination (0-->1) of state 1, followed by retention of polymorphism |
---|
67 | along as many segments of the tree as are necessary, followed by loss of |
---|
68 | state 0 or of state 1 where necessary. The program tries to minimize |
---|
69 | the total number of polymorphic characters, where each polymorphism is |
---|
70 | counted once for each segment of the tree in which it is retained. |
---|
71 | <P> |
---|
72 | The assumptions of the polymorphism parsimony method are in effect: |
---|
73 | <OL> |
---|
74 | <LI>The ancestral state (state 0) is known in each character. |
---|
75 | <LI>The characters are evolving independently of each other. |
---|
76 | <LI>Different lineages are evolving independently. |
---|
77 | <LI>Forward change (0-->1) is highly improbable over the length of |
---|
78 | time involved in the evolution of the group. |
---|
79 | <LI>Retention of polymorphism is also improbable, but far more |
---|
80 | probable that forward change, so that we can more easily |
---|
81 | envisage much polymorhism than even one additional forward |
---|
82 | change. |
---|
83 | <LI>Once state 1 is reached, reoccurrence of state 0 is very |
---|
84 | improbable, much less probable than multiple retentions of |
---|
85 | polymorphism. |
---|
86 | <LI>The lengths of segments in the true tree are not so unequal |
---|
87 | that we can more easily envisage retention events occurring in |
---|
88 | both of two long segments than one retention in a short |
---|
89 | segment. |
---|
90 | </OL> |
---|
91 | <P> |
---|
92 | That these are the assumptions of parsimony methods has been documented |
---|
93 | in a series of papers of mine: (1973a, 1978b, 1979, 1981b, |
---|
94 | 1983b, 1988b). For an opposing view arguing that the parsimony methods |
---|
95 | make no substantive |
---|
96 | assumptions such as these, see the papers by Farris (1983) and Sober (1983a, |
---|
97 | 1983b), but also read the exchange between Felsenstein and Sober (1986). |
---|
98 | <P> |
---|
99 | The input format is the standard one, with "?", "P", "B" states |
---|
100 | allowed. The options are selected using a menu: |
---|
101 | <P> |
---|
102 | <TABLE><TR><TD BGCOLOR=white> |
---|
103 | <PRE> |
---|
104 | |
---|
105 | Dollo and polymorphism parsimony algorithm, version 3.6a3 |
---|
106 | |
---|
107 | Settings for this run: |
---|
108 | U Search for best tree? Yes |
---|
109 | P Parsimony method? Dollo |
---|
110 | J Randomize input order of species? No. Use input order |
---|
111 | T Use Threshold parsimony? No, use ordinary parsimony |
---|
112 | A Use ancestral states in input file? No |
---|
113 | W Sites weighted? No |
---|
114 | M Analyze multiple data sets? No |
---|
115 | 0 Terminal type (IBM PC, ANSI, none)? (none) |
---|
116 | 1 Print out the data at start of run No |
---|
117 | 2 Print indications of progress of run Yes |
---|
118 | 3 Print out tree Yes |
---|
119 | 4 Print out steps in each character No |
---|
120 | 5 Print states at all nodes of tree No |
---|
121 | 6 Write out trees onto tree file? Yes |
---|
122 | |
---|
123 | Are these settings correct? (type Y or the letter for one to change) |
---|
124 | </PRE> |
---|
125 | </TD></TR></TABLE> |
---|
126 | <P> |
---|
127 | The options U, J, T, A, and M are the usual User Tree, Jumble, |
---|
128 | Ancestral States, and Multiple Data Sets options, described either |
---|
129 | in the main documentation file or in the Discrete Characters Programs |
---|
130 | documentation file. The A (Ancestral States) |
---|
131 | option allows implementation of the unordered Dollo parsimony and unordered |
---|
132 | polymorphism parsimony methods which I have |
---|
133 | described elsewhere (1984b). When the A option is used the ancestor is |
---|
134 | not to be counted as one of the species. The O (outgroup) option is not |
---|
135 | available since the tree produced is already rooted. Since the Dollo and |
---|
136 | polymorphism methods produce a rooted |
---|
137 | tree, the user-defined trees required by the U option have two-way forks |
---|
138 | at each level. |
---|
139 | <P> |
---|
140 | The P (Parsimony Method) option is the one that toggles between polymorphism |
---|
141 | parsimony and Dollo parsimony. The program defaults to Dollo parsimony. |
---|
142 | <P> |
---|
143 | The T (Threshold) option has already been described in |
---|
144 | the Discrete Characters programs documentation file. Setting T at or below |
---|
145 | 1.0 but above 0 causes the criterion to become compatibility rather than |
---|
146 | polymorphism parsimony, although there is no advantage to using this |
---|
147 | program instead of MIX to do a compatibility method. Setting the |
---|
148 | threshold value higher brings about an intermediate between |
---|
149 | the Dollo or polymorphism parsimony methods and the compatibility method, |
---|
150 | so that there is some rationale for doing that. Since the Dollo and |
---|
151 | polymorphism methods produces a rooted |
---|
152 | tree, the user-defined trees required by the U option have two-way forks |
---|
153 | at each level. |
---|
154 | <P> |
---|
155 | Using a threshold value of 1.0 or lower, but above 0, one can |
---|
156 | obtain a rooted (or, if the A option is used with ancestral states of |
---|
157 | "?", unrooted) compatibility criterion, but there is no particular |
---|
158 | advantage to using this program for that instead of MIX. Higher |
---|
159 | threshold values are of course meaningful and provide |
---|
160 | intermediates between Dollo and compatibility methods. |
---|
161 | <P> |
---|
162 | The X |
---|
163 | (Mixed parsimony methods) option is not available in this program. The |
---|
164 | Factors option is also not available in this program, as it would have no |
---|
165 | effect on the result even if that information were provided in the input file. |
---|
166 | <P> |
---|
167 | Output is standard: a list of equally parsimonious trees, and, if the |
---|
168 | user selects menu option 4, a table |
---|
169 | of the numbers of reversions or retentions of polymorphism necessary |
---|
170 | in each character. If any of the |
---|
171 | ancestral states has been specified to be unknown, a table of |
---|
172 | reconstructed ancestral states is also provided. When reconstructing |
---|
173 | the placement of forward changes and reversions under the Dollo method, |
---|
174 | keep in mind that each |
---|
175 | polymorphic state in the input data will require one "last minute" |
---|
176 | reversion. This is included in the tabulated counts. Thus if we have |
---|
177 | both states 0 and 1 at a tip of the tree the program will assume that |
---|
178 | the lineage had state 1 up to the last minute, and then state 0 arose in |
---|
179 | that population by reversion, without loss of state 1. |
---|
180 | <P> |
---|
181 | If the user selects menu option 5, a table is printed out after each |
---|
182 | tree, showing for each branch whether |
---|
183 | there are known to be changes in the branch, and what the states are inferred |
---|
184 | to have been at the top end of the branch. If the inferred state is a "?" |
---|
185 | there may be multiple equally-parsimonious assignments of states; the user |
---|
186 | must work these out for themselves by hand. |
---|
187 | <P> |
---|
188 | If the A option is used, then the program will |
---|
189 | infer, for any character whose ancestral state is unknown ("?") whether the |
---|
190 | ancestral state 0 or 1 will give the best tree. If these are |
---|
191 | tied, then it may not be possible for the program to infer the |
---|
192 | state in the internal nodes, and these will all be printed as ".". If this |
---|
193 | has happened and you want to know more about the states at the internal |
---|
194 | nodes, you will find helpful to use DOLMOVE to display the tree and examine |
---|
195 | its interior states, as the algorithm in DOLMOVE shows all that can be known |
---|
196 | in this case about the interior states, including where there is and is not |
---|
197 | amibiguity. The algorithm in DOLLOP gives up more easily on displaying these |
---|
198 | states. |
---|
199 | <P> |
---|
200 | If the U (User Tree) option is used and more than one tree is supplied, the |
---|
201 | program also performs a statistical test of each of these trees against the |
---|
202 | best tree. This test, which is a version of the test proposed by |
---|
203 | Alan Templeton (1983) and evaluated in a test case by me (1985a). It is |
---|
204 | closely parallel to a test using log likelihood differences |
---|
205 | invented by Kishino and Hasegawa (1989), and uses the mean and variance of |
---|
206 | step differences between trees, taken across characters. If the mean |
---|
207 | is more than 1.96 standard deviations different then the trees are declared |
---|
208 | significantly different. The program |
---|
209 | prints out a table of the steps for each tree, the differences of |
---|
210 | each from the highest one, the variance of that quantity as determined by |
---|
211 | the step differences at individual characters, and a conclusion as to |
---|
212 | whether that tree is or is not significantly worse than the best one. It |
---|
213 | is important to understand that the test assumes that all the binary |
---|
214 | characters are evolving independently, which is unlikely to be true for |
---|
215 | many suites of morphological characters. |
---|
216 | <P> |
---|
217 | If there are more than two trees, the test done is an extension of |
---|
218 | the KHT test, due to Shimodaira and Hasegawa (1999). They pointed out |
---|
219 | that a correction for the number of trees was necessary, and they |
---|
220 | introduced a resampling method to make this correction. In the version |
---|
221 | used here the variances and covariances of the sums of steps across |
---|
222 | characters are computed for all pairs of trees. To test whether the |
---|
223 | difference between each tree and the best one is larger than could have |
---|
224 | been expected if they all had the same expected number of steps, |
---|
225 | numbers of steps for all trees are sampled with these covariances and equal |
---|
226 | means (Shimodaira and Hasegawa's "least favorable hypothesis"), |
---|
227 | and a P value is computed from the fraction of times the difference between |
---|
228 | the tree's value and the lowest number of steps exceeds that actually |
---|
229 | observed. Note that this sampling needs random numbers, and so the |
---|
230 | program will prompt the user for a random number seed if one has not |
---|
231 | already been supplied. With the two-tree KHT test no random numbers |
---|
232 | are used. |
---|
233 | <P> |
---|
234 | In either the KHT or the SH test the program |
---|
235 | prints out a table of the number of steps for each tree, the differences of |
---|
236 | each from the lowest one, the variance of that quantity as determined by |
---|
237 | the differences of the numbers of steps at individual characters, |
---|
238 | and a conclusion as to |
---|
239 | whether that tree is or is not significantly worse than the best one. |
---|
240 | <P> |
---|
241 | At the beginning of the program is the constant |
---|
242 | "maxtrees", the maximum number of trees which the program will store for |
---|
243 | output. |
---|
244 | <P> |
---|
245 | The algorithm is a fairly simple adaptation of the one used in |
---|
246 | the program SOKAL, which was formerly in this package and has been |
---|
247 | superseded by MIX. It requires two passes through each tree to count the |
---|
248 | numbers of reversions. |
---|
249 | <P> |
---|
250 | <HR> |
---|
251 | <P> |
---|
252 | <H3>TEST DATA SET</H3> |
---|
253 | <P> |
---|
254 | <TABLE><TR><TD BGCOLOR=white> |
---|
255 | <PRE> |
---|
256 | 5 6 |
---|
257 | Alpha 110110 |
---|
258 | Beta 110000 |
---|
259 | Gamma 100110 |
---|
260 | Delta 001001 |
---|
261 | Epsilon 001110 |
---|
262 | </PRE> |
---|
263 | </TD></TR></TABLE> |
---|
264 | <P> |
---|
265 | <HR> |
---|
266 | <P> |
---|
267 | <H3>TEST SET OUTPUT (with all numerical options on)</H3> |
---|
268 | <P> |
---|
269 | <TABLE><TR><TD BGCOLOR=white> |
---|
270 | <PRE> |
---|
271 | |
---|
272 | Dollo and polymorphism parsimony algorithm, version 3.6a3 |
---|
273 | |
---|
274 | 5 species, 6 characters |
---|
275 | |
---|
276 | Dollo parsimony method |
---|
277 | |
---|
278 | |
---|
279 | Name Characters |
---|
280 | ---- ---------- |
---|
281 | |
---|
282 | Alpha 11011 0 |
---|
283 | Beta 11000 0 |
---|
284 | Gamma 10011 0 |
---|
285 | Delta 00100 1 |
---|
286 | Epsilon 00111 0 |
---|
287 | |
---|
288 | |
---|
289 | |
---|
290 | One most parsimonious tree found: |
---|
291 | |
---|
292 | |
---|
293 | |
---|
294 | |
---|
295 | +-----------Delta |
---|
296 | --3 |
---|
297 | ! +--------Epsilon |
---|
298 | +--4 |
---|
299 | ! +-----Gamma |
---|
300 | +--2 |
---|
301 | ! +--Beta |
---|
302 | +--1 |
---|
303 | +--Alpha |
---|
304 | |
---|
305 | |
---|
306 | requires a total of 3.000 |
---|
307 | |
---|
308 | reversions in each character: |
---|
309 | 0 1 2 3 4 5 6 7 8 9 |
---|
310 | *----------------------------------------- |
---|
311 | 0! 0 0 1 1 1 0 |
---|
312 | |
---|
313 | From To Any Steps? State at upper node |
---|
314 | ( . means same as in the node below it on tree) |
---|
315 | |
---|
316 | root 3 yes ..1.. . |
---|
317 | 3 Delta yes ..... 1 |
---|
318 | 3 4 yes ...11 . |
---|
319 | 4 Epsilon no ..... . |
---|
320 | 4 2 yes 1.0.. . |
---|
321 | 2 Gamma no ..... . |
---|
322 | 2 1 yes .1... . |
---|
323 | 1 Beta yes ...00 . |
---|
324 | 1 Alpha no ..... . |
---|
325 | |
---|
326 | |
---|
327 | </PRE> |
---|
328 | </TD></TR></TABLE> |
---|
329 | </BODY> |
---|
330 | </HTML> |
---|