source: trunk/GDE/PHYLIP/doc/dollop.html

Last change on this file was 2176, checked in by westram, 21 years ago

* empty log message *

  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
File size: 13.5 KB
Line 
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
2<HTML>
3<HEAD>
4<TITLE>dollop</TITLE>
5<META NAME="description" CONTENT="dollop">
6<META NAME="keywords" CONTENT="dollop">
7<META NAME="resource-type" CONTENT="document">
8<META NAME="distribution" CONTENT="global">
9<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
10</HEAD>
11<BODY BGCOLOR="#ccffff">
12<DIV ALIGN=RIGHT>
13version 3.6
14</DIV>
15<P> 
16<DIV ALIGN=CENTER>
17<H1>DOLLOP  -- Dollo and Polymorphism Parsimony Program</H1>
18</DIV>
19<P>
20&#169; Copyright 1986-2002 by the University of
21Washington.  Written by Joseph Felsenstein.  Permission is granted to copy
22this document provided that no fee is charged for it and that this copyright
23notice is not removed.
24<P>
25This program carries out the Dollo and polymorphism parsimony methods.  The
26Dollo parsimony method was
27first suggested in print in verbal form by Le Quesne (1974) and was
28first well-specified by Farris (1977).  The method is named after Louis
29Dollo since he was one of the first to assert that in evolution it is
30harder to gain a complex feature than to lose it.  The algorithm
31explains the presence of the state 1 by allowing up to one forward
32change 0-->1 and as many reversions 1-->0 as are necessary to explain
33the pattern of states seen.  The program attempts to minimize the number
34of 1-->0 reversions necessary.
35<P>
36The assumptions of this method are in effect:
37<OL>
38<LI>We know which state is the ancestral one (state 0).
39<LI>The characters are evolving independently.
40<LI>Different lineages evolve independently.
41<LI>The probability of a forward change (0-->1) is small over the
42evolutionary times involved.
43<LI>The probability of a reversion (1-->0) is also small, but
44still far larger than the probability of a forward change, so
45that many reversions are easier to envisage than even one
46extra forward change.
47<LI>Retention of polymorphism for both states (0 and 1) is highly
48improbable.
49<LI>The lengths of the segments of the true tree are not so
50unequal that two changes in a long segment are as probable as
51one in a short segment.
52</OL>
53<P>
54One problem can arise when using additive binary recoding to
55represent a multistate character as a series of two-state characters.  Unlike
56the Camin-Sokal, Wagner, and Polymorphism methods, the Dollo
57method can reconstruct ancestral states which do not exist.  An example
58is given in my 1979 paper.  It will be necessary to check the output to
59make sure that this has not occurred.
60<P>
61The polymorphism parsimony method was first used by me, and the results
62published (without a clear
63specification of the method) by Inger (1967).  The method was
64independently published by Farris (1978a) and by me (1979).  The method
65assumes that we can explain the pattern of states by no more than one
66origination (0-->1) of state 1, followed by retention of polymorphism
67along as many segments of the tree as are necessary, followed by loss of
68state 0 or of state 1 where necessary.  The program tries to minimize
69the total number of polymorphic characters, where each polymorphism is
70counted once for each segment of the tree in which it is retained.
71<P>
72The assumptions of the polymorphism parsimony method are in effect:
73<OL>
74<LI>The ancestral state (state 0) is known in each character.
75<LI>The characters are evolving independently of each other.
76<LI>Different lineages are evolving independently.
77<LI>Forward change (0-->1) is highly improbable over the length of
78time involved in the evolution of the group.
79<LI>Retention of polymorphism is also improbable, but far more
80probable that forward change, so that we can more easily
81envisage much polymorhism than even one additional forward
82change.
83<LI>Once state 1 is reached, reoccurrence of state 0 is very
84improbable, much less probable than multiple retentions of
85polymorphism.
86<LI>The lengths of segments in the true tree are not so unequal
87that we can more easily envisage retention events occurring in
88both of two long segments than one retention in a short
89segment.
90</OL>
91<P>
92That these are the assumptions of parsimony methods has been documented
93in a series of papers of mine: (1973a, 1978b, 1979, 1981b,
941983b, 1988b).  For an opposing view arguing that the parsimony methods
95make no substantive
96assumptions such as these, see the papers by Farris (1983) and Sober (1983a,
971983b), but also read the exchange between Felsenstein and Sober (1986). 
98<P>
99The input format is the standard one, with "?", "P", "B" states
100allowed.  The options are selected using a menu:
101<P>
102<TABLE><TR><TD BGCOLOR=white>
103<PRE>
104
105Dollo and polymorphism parsimony algorithm, version 3.6a3
106
107Settings for this run:
108  U                 Search for best tree?  Yes
109  P                     Parsimony method?  Dollo
110  J     Randomize input order of species?  No. Use input order
111  T              Use Threshold parsimony?  No, use ordinary parsimony
112  A   Use ancestral states in input file?  No
113  W                       Sites weighted?  No
114  M           Analyze multiple data sets?  No
115  0   Terminal type (IBM PC, ANSI, none)?  (none)
116  1    Print out the data at start of run  No
117  2  Print indications of progress of run  Yes
118  3                        Print out tree  Yes
119  4     Print out steps in each character  No
120  5     Print states at all nodes of tree  No
121  6       Write out trees onto tree file?  Yes
122
123Are these settings correct? (type Y or the letter for one to change)
124</PRE>
125</TD></TR></TABLE>
126<P>
127The options U, J, T, A, and M are the usual User Tree, Jumble,
128Ancestral States, and Multiple Data Sets options, described either
129in the main documentation file or in the Discrete Characters Programs
130documentation file.  The A (Ancestral States)
131option allows implementation of the unordered Dollo parsimony and unordered
132polymorphism parsimony methods which I have
133described elsewhere (1984b).  When the A option is used the ancestor is
134not to be counted as one of the species.  The O (outgroup) option is not
135available since the tree produced is already rooted.  Since the Dollo and
136polymorphism methods produce a rooted
137tree, the user-defined trees required by the U option have two-way forks
138at each level. 
139<P>
140The P (Parsimony Method) option is the one that toggles between polymorphism
141parsimony and Dollo parsimony.  The program defaults to Dollo parsimony.
142<P>
143The T (Threshold) option has already been described in
144the Discrete Characters programs documentation file.  Setting T at or below
1451.0 but above 0 causes the criterion to become compatibility rather than
146polymorphism parsimony, although there is no advantage to using this
147program instead of MIX to do a compatibility method.  Setting the
148threshold value higher brings about an intermediate between
149the Dollo or polymorphism parsimony methods and the compatibility method,
150so that there is some rationale for doing that.  Since the Dollo and
151polymorphism methods produces a rooted
152tree, the user-defined trees required by the U option have two-way forks
153at each level. 
154<P>
155Using a threshold value of 1.0 or lower, but above 0, one can
156obtain a rooted (or, if the A option is used with ancestral states of
157"?", unrooted) compatibility criterion, but there is no particular
158advantage to using this program for that instead of MIX.  Higher
159threshold values are of course meaningful and provide
160intermediates between Dollo and compatibility methods.
161<P>
162The X
163(Mixed parsimony methods) option is not available in this program.  The
164Factors option is also not available in this program, as it would have no
165effect on the result even if that information were provided in the input file.
166<P>
167Output is standard: a list of equally parsimonious trees, and, if the
168user selects menu option 4, a table
169of the numbers of reversions or retentions of polymorphism necessary
170in each character.  If any of the
171ancestral states has been specified to be unknown, a table of
172reconstructed ancestral states is also provided.  When reconstructing
173the placement of forward changes and reversions under the Dollo method,
174keep in mind that each
175polymorphic state in the input data will require one "last minute"
176reversion.  This is included in the tabulated counts.  Thus if we have
177both states 0 and 1 at a tip of the tree the program will assume that
178the lineage had state 1 up to the last minute, and then state 0 arose in
179that population by reversion, without loss of state 1.
180<P>
181If the user selects menu option 5, a table is printed out after each
182tree, showing for each branch whether
183there are known to be changes in the branch, and what the states are inferred
184to have been at the top end of the branch.  If the inferred state is a "?"
185there may be multiple equally-parsimonious assignments of states; the user
186must work these out for themselves by hand. 
187<P>
188If the A option is used, then the program will
189infer, for any character whose ancestral state is unknown ("?") whether the
190ancestral state 0 or 1 will give the best tree.  If these are
191tied, then it may not be possible for the program to infer the
192state in the internal nodes, and these will all be printed as ".".  If this
193has happened and you want to know more about the states at the internal
194nodes, you will find helpful to use DOLMOVE to display the tree and examine
195its interior states, as the algorithm in DOLMOVE shows all that can be known
196in this case about the interior states, including where there is and is not
197amibiguity.  The algorithm in DOLLOP gives up more easily on displaying these
198states.
199<P>
200If the U (User Tree) option is used and more than one tree is supplied, the
201program also performs a statistical test of each of these trees against the
202best tree.  This test, which is a version of the test proposed by
203Alan Templeton (1983) and evaluated in a test case by me (1985a).  It is
204closely parallel to a test using log likelihood differences
205invented by Kishino and Hasegawa (1989), and uses the mean and variance of
206step differences between trees, taken across characters.  If the mean
207is more than 1.96 standard deviations different then the trees are declared
208significantly different.  The program
209prints out a table of the steps for each tree, the differences of
210each from the highest one, the variance of that quantity as determined by
211the step differences at individual characters, and a conclusion as to
212whether that tree is or is not significantly worse than the best one. It
213is important to understand that the test assumes that all the binary
214characters are evolving independently, which is unlikely to be true for
215many suites of morphological characters.
216<P>
217If there are more than two trees, the test done is an extension of
218the KHT test, due to Shimodaira and Hasegawa (1999).  They pointed out
219that a correction for the number of trees was necessary, and they
220introduced a resampling method to make this correction.  In the version
221used here the variances and covariances of the sums of steps across
222characters are computed for all pairs of trees.  To test whether the
223difference between each tree and the best one is larger than could have
224been expected if they all had the same expected number of steps,
225numbers of steps for all trees are sampled with these covariances and equal
226means (Shimodaira and Hasegawa's "least favorable hypothesis"),
227and a P value is computed from the fraction of times the difference between
228the tree's value and the lowest number of steps exceeds that actually
229observed.  Note that this sampling needs random numbers, and so the
230program will prompt the user for a random number seed if one has not
231already been supplied.  With the two-tree KHT test no random numbers
232are used.
233<P>
234In either the KHT or the SH test the program
235prints out a table of the number of steps for each tree, the differences of
236each from the lowest one, the variance of that quantity as determined by
237the differences of the numbers of steps at individual characters,
238and a conclusion as to
239whether that tree is or is not significantly worse than the best one.
240<P>
241At the beginning of the program is the constant
242"maxtrees", the maximum number of trees which the program will store for
243output.
244<P>
245The algorithm is a fairly simple adaptation of the one used in
246the program SOKAL, which was formerly in this package and has been
247superseded by MIX.  It requires two passes through each tree to count the
248numbers of reversions.
249<P>
250<HR>
251<P>
252<H3>TEST DATA SET</H3>
253<P>
254<TABLE><TR><TD BGCOLOR=white>
255<PRE>
256     5    6
257Alpha     110110
258Beta      110000
259Gamma     100110
260Delta     001001
261Epsilon   001110
262</PRE>
263</TD></TR></TABLE>
264<P>
265<HR>
266<P>
267<H3>TEST SET OUTPUT (with all numerical options on)</H3>
268<P>
269<TABLE><TR><TD BGCOLOR=white>
270<PRE>
271
272Dollo and polymorphism parsimony algorithm, version 3.6a3
273
274 5 species,   6  characters
275
276Dollo parsimony method
277
278
279Name         Characters
280----         ----------
281
282Alpha        11011 0
283Beta         11000 0
284Gamma        10011 0
285Delta        00100 1
286Epsilon      00111 0
287
288
289
290One most parsimonious tree found:
291
292
293
294
295  +-----------Delta     
296--3 
297  !  +--------Epsilon   
298  +--4 
299     !  +-----Gamma     
300     +--2 
301        !  +--Beta     
302        +--1 
303           +--Alpha     
304
305
306requires a total of      3.000
307
308 reversions in each character:
309         0   1   2   3   4   5   6   7   8   9
310     *-----------------------------------------
311    0!       0   0   1   1   1   0           
312
313From    To     Any Steps?    State at upper node
314                             ( . means same as in the node below it on tree)
315
316root      3         yes    ..1.. .
317  3    Delta        yes    ..... 1
318  3       4         yes    ...11 .
319  4    Epsilon      no     ..... .
320  4       2         yes    1.0.. .
321  2    Gamma        no     ..... .
322  2       1         yes    .1... .
323  1    Beta         yes    ...00 .
324  1    Alpha        no     ..... .
325
326
327</PRE>
328</TD></TR></TABLE>
329</BODY>
330</HTML>
Note: See TracBrowser for help on using the repository browser.