Context Navigation

dnapars.html

Visit:

Last change on this file was 2176, checked in by westram, 22 years ago
* empty log message *
Property svn:eol-style set to `native` Property svn:keywords set to `Author Date Id Revision`
File size: 13.6 KB

Line
1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
2	<HTML>
3	<HEAD>
4	<TITLE>main</TITLE>
5	<META NAME="description" CONTENT="dnapars">
6	<META NAME="keywords" CONTENT="dnapars">
7	<META NAME="resource-type" CONTENT="document">
8	<META NAME="distribution" CONTENT="global">
9	<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
10	</HEAD>
11	<BODY BGCOLOR="#ccffff">
12	<P>
13	<DIV ALIGN=RIGHT>
14	version 3.6
15	</DIV>
16	<P>
17	<DIV ALIGN=CENTER>
18	<H1>DNAPARS -- DNA Parsimony Program</H1>
19	</DIV>
20	<P>
21	© Copyright 1986-2002 by The University of
22	Washington. Written by Joseph Felsenstein. Permission is granted to copy
23	this document provided that no fee is charged for it and that this copyright
24	notice is not removed.
25	<P>
26	This program carries out unrooted parsimony (analogous to Wagner
27	trees) (Eck and Dayhoff, 1966; Kluge and Farris, 1969) on DNA
28	sequences. The method of Fitch (1971) is used to count the number of
29	changes of base needed on a given tree.
30	The assumptions of this method are analogous to those of MIX:
31	<OL>
32	<LI>Each site evolves independently.
33	<LI>Different lineages evolve independently.
34	<LI>The probability of a base substitution at a given site is
35	small over the lengths of time involved in
36	a branch of the phylogeny.
37	<LI>The expected amounts of change in different branches of the phylogeny
38	do not vary by so much that two changes in a high-rate branch
39	are more probable than one change in a low-rate branch.
40	<LI>The expected amounts of change do not vary enough among sites that two
41	changes in one site are more probable than one change in another.
42	</OL>
43	<P>
44	That these are the assumptions of parsimony methods has been documented
45	in a series of papers of mine: (1973a, 1978b, 1979, 1981b, 1983b, 1988b). For
46	an opposing view arguing that the parsimony methods make no substantive
47	assumptions such as these, see the papers by Farris (1983) and Sober (1983a,
48	1983b, 1988), but also read the exchange between Felsenstein and Sober (1986).
49	<P>
50	Change from an occupied site to a deletion is counted as one
51	change. Reversion from a deletion to an occupied site is allowed and is also
52	counted as one change. Note that this in effect assumes that a deletion
53	N bases long is N separate events.
54	<P>
55	Dnapars can handle both bifurcating and multifurcating trees. In doing its
56	search for most parsimonious trees, it adds species not only by creating new
57	forks in the middle of existing branches, but it also tries putting them at
58	the end of new branches which are added to existing forks. Thus it searches
59	among both bifurcating and multifurcating trees. If a branch in a tree
60	does not have any characters which might change in that branch in the most
61	parsimonious tree, it does not save that tree. Thus in any tree that
62	results, a branch exists only if some character has a most parsimonious
63	reconstruction that would involve change in that branch.
64	<P>
65	It also saves a number of trees tied for best (you can alter the number
66	it saves using the V option in the menu). When rearranging trees, it
67	tries rearrangements of all of the saved trees. This makes the algorithm
68	slower than earlier versions of Dnapars.
69	<P>
70	The input data is standard. The first line of the input file contains the
71	number of species and the number of sites.
72	<P>
73	Next come the species data. Each
74	sequence starts on a new line, has a ten-character species name
75	that must be blank-filled to be of that length, followed immediately
76	by the species data in the one-letter code. The sequences must either
77	be in the "interleaved" or "sequential" formats
78	described in the Molecular Sequence Programs document. The I option
79	selects between them. The sequences can have internal
80	blanks in the sequence but there must be no extra blanks at the end of the
81	terminated line. Note that a blank is not a valid symbol for a deletion.
82	<P>
83	The options are selected using an interactive menu. The menu looks like this:
84	<P>
85	<TABLE><TR><TD BGCOLOR=white>
86	<PRE>
87
88	DNA parsimony algorithm, version 3.6a3
89
90	Setting for this run:
91	U Search for best tree? Yes
92	S Search option? More thorough search
93	V Number of trees to save? 100
94	J Randomize input order of sequences? No. Use input order
95	O Outgroup root? No, use as outgroup species 1
96	T Use Threshold parsimony? No, use ordinary parsimony
97	N Use Transversion parsimony? No, count all steps
98	W Sites weighted? No
99	M Analyze multiple data sets? No
100	I Input sequences interleaved? Yes
101	0 Terminal type (IBM PC, ANSI, none)? (none)
102	1 Print out the data at start of run No
103	2 Print indications of progress of run Yes
104	3 Print out tree Yes
105	4 Print out steps in each site No
106	5 Print sequences at all nodes of tree No
107	6 Write out trees onto tree file? Yes
108
109	Y to accept these or type the letter for one to change
110	</PRE>
111	</TD></TR></TABLE>
112	<P>
113	The user either types "Y" (followed, of course, by a carriage-return)
114	if the settings shown are to be accepted, or the letter or digit corresponding
115	to an option that is to be changed.
116	<P>
117	The N option allows you to choose transversion parsimony, which counts only
118	transversions (changes between one of the purines A or G and one of the
119	pyrimidines C or T). This setting is turned off by default.
120	<P>
121	The Weights (W) option
122	takes the weights from a file whose default name is "weights". The weights
123	follow the format described in the main documentation file, with integer
124	weights from 0 to 35 allowed by using the characters 0, 1, 2, ..., 9 and
125	A, B, ... Z.
126	<P>
127	The User tree (option U) is read from a file whose default name is <TT>intree</TT>.
128	The trees can be multifurcating. They must be preceded in the file by a
129	line giving the number of trees in the file.
130	<P>
131	The options J, O, T, M, and 0 are the usual ones. They are described in the
132	main documentation file of this package. Option I is the same as in
133	other molecular sequence programs and is described in the documentation file
134	for the sequence programs.
135	<P>
136	The M (multiple data sets option) will ask you whether you want to
137	use multiple sets of weights (from the weights file) or multiple data sets.
138	The ability to use a single data set with multiple weights means that
139	much less disk space will be used for this input data. The bootstrapping
140	and jackknifing tool Seqboot has the ability to create a weights file with
141	multiple weights.
142	<P>
143	The O (outgroup) option will have no effect if the U (user-defined tree)
144	option is in effect.
145	The T (threshold) option allows a continuum of methods
146	between parsimony and compatibility. Thresholds less than or equal to 1.0 do
147	not have any meaning and should
148	not be used: they will result in a tree dependent only on the input
149	order of species and not at all on the data!
150	<P>
151	Output is standard: if option 1 is toggled on, the data is printed out,
152	with the convention that "." means "the same as in the first species".
153	Then comes a list of equally parsimonious trees.
154	Each tree has branch lengths. These are computed using an algorithm
155	published by Hochbaum and Pathria (1997) which I first heard of from
156	Wayne Maddison who invented it independently of them. This algorithm
157	averages the number of reconstructed changes of state over all sites a
158	over all possible most parsimonious placements of the changes of state
159	among branches. Note that it does not correct in any way for multiple
160	changes that overlay each other.
161	<P>
162	If option 2 is
163	toggled on a table of the
164	number of changes of state required in each character is also
165	printed. If option 5 is toggled
166	on, a table is printed
167	out after each tree, showing for each branch whether there are known to be
168	changes in the branch, and what the states are inferred to have been at the
169	top end of the branch. This is a reconstruction of the ancestral sequences
170	in the tree. If you choose option 5, a menu item D appears which gives you
171	the opportunity to turn off dot-differencing so that complete ancestral
172	sequences are shown. If the inferred state is a "?" or one of the IUB
173	ambiguity symbols, there will be multiple
174	equally-parsimonious assignments of states; the user must work these out for
175	themselves by hand. A "?" in the reconstructed states means that in
176	addition to one or more bases, a deletion may or may not be present. If
177	option 6 is left in its default state the trees
178	found will be written to a tree file, so that they are available to be used
179	in other programs.
180	<P>
181	If the U (User Tree) option is used and more than one tree is supplied, the
182	program also performs a statistical test of each of these trees against the
183	best tree. This test, which is a version of the test proposed by
184	Alan Templeton (1983) and evaluated in a test case by me (1985a). It is
185	closely parallel to a test using log likelihood differences
186	due to Kishino and Hasegawa (1989), and
187	uses the mean and variance of
188	step differences between trees, taken across sites. If the mean
189	is more than 1.96 standard deviations different then the trees are declared
190	significantly different. The program
191	prints out a table of the steps for each tree, the differences of
192	each from the best one, the variance of that quantity as determined by
193	the step differences at individual sites, and a conclusion as to
194	whether that tree is or is not significantly worse than the best one.
195	If the U (User Tree) option is used and more than one tree is supplied,
196	and the program is not told to assume autocorrelation between the
197	rates at different sites, the
198	program also performs a statistical test of each of these trees against the
199	one with highest likelihood. If there are two user trees, this
200	is a version of the test proposed by
201	Alan Templeton (1983) and evaluated in a test case by me (1985a). It is
202	closely parallel to a test using log likelihood differences
203	due to Kishino and Hasegawa (1989)
204	It uses the mean and variance of the differences in the number
205	of steps between trees, taken across sites. If the two
206	trees' means are more than 1.96 standard deviations different,
207	then the trees are
208	declared significantly different.
209	<P>
210	If there are more than two trees, the test done is an extension of
211	the KHT test, due to Shimodaira and Hasegawa (1999). They pointed out
212	that a correction for the number of trees was necessary, and they
213	introduced a resampling method to make this correction. In the version
214	used here the variances and covariances of the sums of steps across
215	sites are computed for all pairs of trees. To test whether the
216	difference between each tree and the best one is larger than could have
217	been expected if they all had the same expected number of steps,
218	numbers of steps for all trees are sampled with these covariances and equal
219	means (Shimodaira and Hasegawa's "least favorable hypothesis"),
220	and a P value is computed from the fraction of times the difference between
221	the tree's value and the lowest number of steps exceeds that actually
222	observed. Note that this sampling needs random numbers, and so the
223	program will prompt the user for a random number seed if one has not
224	already been supplied. With the two-tree KHT test no random numbers
225	are used.
226	<P>
227	In either the KHT or the SH test the program
228	prints out a table of the number of steps for each tree, the differences of
229	each from the lowest one, the variance of that quantity as determined by
230	the differences of the numbers of steps at individual sites,
231	and a conclusion as to
232	whether that tree is or is not significantly worse than the best one.
233	<P>
234	Option 6 in the menu controls whether the tree estimated by the program
235	is written onto a tree file. The default name of this output tree file
236	is "outtree". If the U option is in effect, all the user-defined
237	trees are written to the output tree file.
238	<P>
239	The program is a straightforward relative of MIX
240	and runs reasonably quickly, especially with many sites and few species.
241	<P>
242	<HR>
243	<P>
244	<H3>TEST DATA SET</H3>
245	<P>
246	<TABLE><TR><TD BGCOLOR=white>
247	<PRE>
248	5 13
249	Alpha AACGUGGCCAAAU
250	Beta AAGGUCGCCAAAC
251	Gamma CAUUUCGUCACAA
252	Delta GGUAUUUCGGCCU
253	Epsilon GGGAUCUCGGCCC
254	</PRE>
255	</TD></TR></TABLE>
256	<P>
257	<HR>
258	<P>
259	<H3>CONTENTS OF OUTPUT FILE (if all numerical options are on)</H3>
260	<P>
261	<TABLE><TR><TD BGCOLOR=white>
262	<PRE>
263
264	DNA parsimony algorithm, version 3.6a3
265
266	5 species, 13 sites
267
268
269	Name Sequences
270	---- ---------
271
272	Alpha AACGUGGCCA AAU
273	Beta ..G..C.... ..C
274	Gamma C.UU.C.U.. C.A
275	Delta GGUA.UU.GG CC.
276	Epsilon GGGA.CU.GG CCC
277
278
279
280	One most parsimonious tree found:
281
282
283	+-----Epsilon
284	+----------------------------3
285	+------------2 +-------Delta
286	\| \|
287	\| +----------------Gamma
288	\|
289	1----Beta
290	\|
291	+---------Alpha
292
293
294	requires a total of 19.000
295
296	between and length
297	------- --- ------
298	1 2 0.217949
299	2 3 0.487179
300	3 Epsilon 0.096154
301	3 Delta 0.134615
302	2 Gamma 0.275641
303	1 Beta 0.076923
304	1 Alpha 0.173077
305
306	steps in each site:
307	0 1 2 3 4 5 6 7 8 9
308	*-----------------------------------------
309	0\| 2 1 3 2 0 2 1 1 1
310	10\| 1 1 1 3
311
312	From To Any Steps? State at upper node
313	( . means same as in the node below it on tree)
314
315	1 AABGTCGCCA AAY
316	1 2 yes V.KD...... C..
317	2 3 yes GG.A..T.GG .C.
318	3 Epsilon maybe ..G....... ..C
319	3 Delta yes ..T..T.... ..T
320	2 Gamma yes C.TT...T.. ..A
321	1 Beta maybe ..G....... ..C
322	1 Alpha yes ..C..G.... ..T
323
324
325	</PRE>
326	<P>
327	</TD></TR></TABLE>
328	</BODY>
329	</HTML>
330

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: trunk/GDE/PHYLIP/doc/dnapars.html

Download in other formats: