Context Navigation

fastDNAml.txt

Visit:

Last change on this file was 19575, checked in by westram, 3 weeks ago
reintegrates 'help' into 'trunk' preformatted text gets checked for width now (to enforce it fits into the arb help window). fixed help following these checks, using the following steps: ignore problems in foreign documentation. increase default help window width. introduce control comments to accept oversized preformatted sections. enforce preformatted style for whole sections. simply define single-line preformatted sections Used intensive for definition of internal script languages. fixed several non-related problems found in documentation. minor layout changes for HTML version of arb help (more compacted; highlight anchored/all sections). refactor system interface (GUI version) and use it from help module. adds: log:branches/help@19532:19574
Property svn:eol-style set to `native` Property svn:keywords set to `Author Date Id Revision`
File size: 33.3 KB

Line
1	fastDNAml 1.2
2
3
4	Gary J. Olsen, Department of Microbiology
5	University of Illinois, Urbana, IL
6	gary@phylo.life.uiuc.edu
7
8	Ross Overbeek, Mathematics and Computer Science
9	Argonne National Laboratory, Argonne, IL
10	overbeek@mcs.anl.gov
11
12
13
14	Citing fastDNAml
15
16	If you publish work using fastDNAml, please cite the following publications:
17
18	Olsen, G. J., Matsuda, H., Hagstrom, R., and Overbeek, R. 1994. fastDNAml:
19	A tool for construction of phylogenetic trees of DNA sequences using maximum
20	likelihood. Comput. Appl. Biosci. 10: 41-48.
21
22	Felsenstein, J. 1981. Evolutionary trees from DNA sequences:
23	A maximum likelihood approach. J. Mol. Evol. 17: 368-376.
24
25
26
27	What is fastDNAml
28
29	fastDNAml is a program derived from Joseph Felsenstein's version 3.3 DNAML
30	(part of his PHYLIP package). Users should consult the documentation for
31	DNAML before using this program.
32
33	fastDNAml is an attempt to solve the same problem as DNAML, but to do so
34	faster and using less memory, so that larger trees and/or more bootstrap
35	replicates become tractable. Much of fastDNAml is merely a recoding of the
36	PHYLIP 3.3 DNAML program from PASCAL to C.
37
38	DNAML includes the following notice:
39
40	version 3.3. (c) Copyright 1986, 1990 by the University of Washington and
41	Joseph Felsenstein. Written by Joseph Felsenstein. Permission is granted to
42	copy and use this program provided no fee is charged for it and provided that
43	this copyright notice is not removed.
44
45
46
47	Why is fastDNAml faster?
48
49	Some recomputation of values has been eliminated (Joe Felsenstein has done
50	much of this in version 3.4 DNAML).
51
52	The optimization of branch lengths has been accelerated by changing from an EM
53	method to Newton's method (Joe Felsenstein has done much of this in version 3.4
54	DNAML).
55
56	The strategy for simultaneously optimizing all of the branches on the tree has
57	been modified to spend less time getting an individual branch right before
58	improving the other branches.
59
60
61
62	Other new features in fastDNAml
63
64	fastDNAml includes a checkpoint feature to regularly save its progress toward
65	finding a large tree. If the program is interrupted, a minor change to the
66	input file and adding the R (restart) option permits the work to be resumed
67	from the last checkpoint.
68
69	The new R {restart) option can also be used for more rapid addition of new
70	sequences to a previously computed tree (when new sequences are added to the
71	alignment, it is best if the relative alignment of the previous sequences is
72	not altered).
73
74	The G (global) option has been generalized to permit crossing any number of
75	branches during tree rearrangements. In addition, it is possible to modify
76	the extent of rearrangement explored during the sequential addition phase of
77	tree building.
78
79	The G U (global and user tree) option combination instructs the program to
80	find the best of the user trees, and then look for rearrangements that are
81	better still.
82
83	The number of available rate categories has been raised from 9 to 35.
84
85	The weighting mask accepts values from 0 through 35.
86
87	The new B (bootstrap) option causes generation of a bootstrap sample, drawn
88	from the input data.
89
90	The program includes "P4" code for distributing the problem over multiple
91	processors (either within one machine, or across multiple machines).
92
93
94
95	Do DNAML and fastDNAml give the same answer?
96
97	Generally yes, though there are some reservations:
98
99	One or the other might find a better tree due to minor changes in the ways
100	trees are searched. When sequence addition is replicated with different
101	values of the jumble random number seed, they have about the same probability
102	of finding the best tree, but any given seed might give different trees.
103
104	The likelihoods and branch lengths sometimes differ very slightly due to
105	different criteria for stopping the optimization process.
106
107	Little has been done to check the confidence limits on branch lengths. There
108	seem to be some instances in which they disagree, and we think that fastDNAml
109	is correct. However, do not take the "significantly greater than zero" too
110	seriously.
111
112	If you are concerned, you can supply a tree inferred by fastDNAml as a user
113	tree to DNAML and let it (1) reoptimize branch lengths, (2) tell you
114	the confidence limits and (3) tell you the tree likelihood.
115
116
117
118	Changes and new features in version 1.2
119
120	The program can now calculate the likelihood of extremely large user trees.
121	The largest tree we have tested had 3200 taxa. Generally, you will run out
122	of computer memory before you excede an intrinsic limitation. (With this,
123	it is possible to compare trees found by whatever your favorite methods are
124	under the likelihood criterion.)
125
126	The computation has been changed to permit ease of implimenting new models
127	of evolution and analysis of amino acid sequences (though these have not yet
128	been done). This has slowed down the program 5-10%.
129
130
131
132	Changes and new features in version 1.1
133
134	The quickadd option is now the default. This has the ugly effect of reversing
135	the meaning of putting a Q on the option line. (Sorry, about this, and the
136	next note, but in the long run it it is the better behavior.)
137
138	Use of empirical base frequencies is now the default. This reverses the
139	meaning of the F option, making the default behavior more like that of PHYLIP.
140
141	The tree output file is now generated by default and should be more compatible
142	with the files written and read by the PHILIP programs. In particular, the
143	comments with information about the tree, its likelihood, etc. are removed, and
144	there are no quotation marks around names unless there are unusual characters
145	within the name. (There are two things to be very careful about in names:
146	there is no completely consistent way to handle both blanks and underscores in
147	names without quotation marks, and when a name is spaced in from the margin in
148	the input file, there are leading blank spaces in the name, which can be very
149	hard to make compatible with some programs.)
150
151	Maintaining a list of the several best trees, not just the (single) best. In
152	particular, when evaluating user-supplied trees, the program tries to same
153	information about all of the trees and provides a Hasegawa and Kashino type
154	test of whether each tree is better than optimum. Note, the current version
155	of the program prints the report in the order of tree likelihood, NOT in the
156	order the trees are supplied to the program. The best way (at present) to
157	figure out which tree is which is to look at the likelihoods. This is the
158	same test used in PHILIP, but I had removed access in version 1.0 of fastDNAml
159	due to differences in how the programs handle multiple trees. The difference
160	is that fastDNAml can maintain nearly optimal trees all the time, so you can
161	get a list of the N best trees found by using the new K option (below).
162
163	The program should accept rooted trees (strictly bifurcating), as well as
164	unrooted trees (with a trifurcation at the deepest level). This is not fully
165	tested, but it seems to work.
166
167
168
169	Features in the works
170
171	Test subtree exchanges (as well as moving a single subtree) in the search for
172	better trees.
173
174	Allowing the program to optimize any user-defined subset of branches when user
175	lengths are supplied.
176
177
178
179	Input and Options
180
181
182	Basics
183
184	The input to fastDNAml is similar to that used by DNAML (and the other PHYLIP
185	programs). The user should consult the PHYLIP documentation for a basic
186	description of the format.
187
188	This version of fastDNAml expects to get its input from stdin (standard input)
189	and writes its output to stdout (standard output). (There are compile time
190	options to modify this, for those who care to get into such things.)
191
192	On a UNIX or DOS system, it is a simple matter to redirect input from a file
193	and output to a file:
194
195	fastDNAml < infile > outfile
196
197	On a VMS system it is only slightly more difficult. Immediately before
198	running the program, one includes two commands that define the input and
199	output files:
200
201	$ Define/User Sys$Input infile
202	$ Define/User Sys$Output outfile
203	$ Run fastDNAml
204
205	The default input data format is Interleaved (see I option). To help get data
206	from a GenBank or similar format, the interleaved option can be switched off with the I option. Numbers in the sequence data (i.e., sequence position
207	numbers) will be ignored, so they need not be stripped out.
208
209	(Note that the program also writes a file called checkpoint.PID. See the R
210	option below for more description.)
211
212
213	1 -- Print Data
214
215	By default, fastDNAml does not echo the sequence data to the output file.
216	Option 1 reverses this.
217
218
219	3 -- Do Not Print Tree
220
221	By default, fastDNAml prints the final tree to the output file. Option 3
222	reverses this.
223
224
225	4 -- Do Not Write Tree to File (*** Changed in version 1.1 ***)
226
227	By default, fastDNAml versions 1.1 and 1.2 write a machine readable (Newick
228	format) copy of the final tree to an output file. Option 4 reverses this.
229	The tree output file will be called treefile.PID (where PID is the process ID
230	under which fastDNAml is running). Look at the Y option below for more
231	information on alternative tree formats.
232
233
234	B -- Bootstrap
235
236	Generates a bootstrap sample of the input data. Requires auxiliary data line
237	of the form:
238
239	B random_number_seed
240
241	Example:
242
243	5 114 B
244	B 137
245	Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
246	...
247
248	If the W option is used, only positions that have nonzero weights are used in
249	computing the bootstrap sample. Warning: For a given random number seed, the
250	sample will always be the same.
251
252	PHYLIP DNAML does not include a bootstrap option. (Use the SEQBOOT program.)
253
254
255	C -- Categories
256
257	Requires auxiliary data of the form:
258
259	C number_of_categories list_of_category_rates
260
261	The maximum number of categories is 35. This line is followed by a list of
262	the rates for each site:
263
264	Categories list_of_categories [per site, one or more lines]
265
266	Category "numbers" are ordered: 1, 2, 3, ..., 9, A, B, ..., Y, Z. Category
267	zero (undefined rate) is permitted at sites with a zero in a user-supplied
268	weighting mask.
269
270	Example:
271
272	5 114 C
273	C 12 0.0625 0.125 0.25 0.5 1 2 4 8 16 32 64 128
274	Categories 5111136343678975AAA8949995566778888889AAAAAA9239898629AAAAA9
275	633792246624457364222574877188898132984963499AA9899975
276	Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
277	...
278
279	PHYLIP DNAML is limited to categories 1 through 9. Also, in PHYLIP version
280	3.3, the categories data came after all the other auxiliary data, but before
281	the user-supplied base frequencies and sequence data. If you make the C line
282	your last auxiliary data line, the programs will behave the same.
283
284
285	F -- Empirical Frequencies (*** Changed in version 1.1 ***)
286
287	By default (starting with version 1.1), the program uses base frequencies
288	derived from the sequence data (called emperical base frequencies). Therefore
289	the input file should normally NOT include a base frequencies line preceding
290	the data. If you want to include your own base freqency data, it is now
291	necessary to use the F option, and add a line to the input file that supplies
292	the frequency data:
293
294	Instructs the program to use user-supllied base frequencies derived from the
295	sequence data. Therefore the input file should not include a base frequencies
296	line IMMEDIATELY preceding the data:
297
298	5 114 F
299	0.25 0.30 0.20 0.25
300	Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
301	...
302
303	There is an alternative format: the frequencies can be anywhere in the list of
304	auxilliary data lines if they are preceded by an F in the first column:
305
306	5 114 F C W
307	F 0.25 0.30 0.20 0.25
308	C ...
309	...
310	W ...
311	Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
312	...
313
314
315	G -- Global
316
317	If the global option is specified, there may also be an [optional] auxiliary
318	data line of form:
319
320	G N1
321
322	or
323
324	G N1 N2
325
326	N1 is the number of branches to cross in rearrangements of the completed tree.
327	The value of N2 is the number of branches to cross in testing rearrangements
328	during the sequential addition phase of tree inference.
329
330	N1 = 1: local rearrangement (default without G option)
331
332	1 < N1 < numsp-3: regional rearrangements (crossing N1 branches)
333
334	N1>= numsp-3: global rearrangements (default with G option)
335
336
337
338	N2 <= N1 the default N2 is 1, local rearrangements.
339
340	The G option can also be used to force branch swapping on user trees, that is,
341	a combination of G and U options.
342
343	If the auxiliary line is supplied, it cannot be the last line of auxiliary
344	data. (It may be necessary to add the T option with an auxiliary data line of
345
346	T 2.0
347
348	if no other auxiliary data are used.)
349
350	Examples:
351
352	Do local rearrangements after each addition, and global after last addition:
353
354	5 114 G
355	Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
356	...
357
358	Do local rearrangements after each addition, and regional (crossing 4
359	branches) after last addition:
360
361	5 114 G T
362	G 4
363	T 2.0
364	Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
365	...
366
367	Do no rearrangements after each addition, and local after last addition:
368
369	5 114 G T
370	G 1 0
371	T 2.0
372	Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
373	...
374
375	PHYLIP DNAML does not support the auxiliary data line or branch swapping on a
376	user tree.
377
378
379	I -- Not Interleaved
380
381	By default, fastDNAml 1.2 expects data lines for the various sequences in an
382	interleaved format (as did PHYLIP 3.3 DNAML). The I option reverses the
383	expected format (to non-interleaved data, in which all the data lines for one
384	sequence before the next sequence begins). This is particularly useful for
385	editing a GenBank or equivalent format into a valid input file (note that
386	numbers within the sequence data are ignored, so it is not necessary to remove
387	them).
388
389	If all the data for each sequence are on one line, then the interleaved and
390	non-interleaved formats are degenerate. (This is the way David Swofford's
391	PAUP program writes PHYLIP format output files.) The drawback is that many
392	programs do not handle long lines of text. This includes the vi and EDT text
393	editors, many electronic mail programs, and some versions of FTP for VAX/VMS
394	systems.
395
396	PHYLIP 3.3 DNAML expects interleaved data, and does not include an I option to
397	alter this. PHYLIP 3.4 DNAML accepts an I option, but the default format is
398	reversed.
399
400
401	J -- Jumble
402
403	Randomize the sequence addition order. Requires an auxiliary input line of
404	the form:
405
406	J random_number_seed
407
408	Example:
409
410	5 114 J
411	J 137
412	Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
413	...
414
415	Note that fastDNAml explores a very small number of alternative tree
416	topologies relative to a typical parsimony program. There is a very real
417	chance that the search procedure will not find the tree topology with the
418	highest likelihood. Altering the order of taxon addition and comparing the
419	trees found is a fairly efficient method for testing convergence. Typically,
420	it would be nice to find the same best tree at least twice (if not three
421	times), as opposed to simply performing some fixed number of jumbles and
422	hoping that at least one of them will be the optimum.
423
424
425	K -- Keep multiple best trees (*** New in version 1.1 ***)
426
427	The program can keep a list of the best trees that it has found. When the
428	program is done, it prints a list of these, from best to worst, and print
429	a Hasegawa and Kishino type test as to which trees are significantly worse
430	than the best tree found. When evaluating user-supplied trees, the program
431	automatically keeps all trees. In other situations, the program keeps only
432	the best tree that it has found. The K option, and associate auxilliary data
433	line, can be used to define an alternative number:
434
435	Example, to keep the 15 best trees found:
436
437	5 114 K
438	K 15
439	Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
440	...
441
442	Example, to keep only the one best tree of possibly numerous user-supplied
443	trees:
444
445	5 114 K U
446	K 1
447	Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
448	...
449
450
451
452	L -- User Lengths
453
454	Causes user trees to be read with branch lengths (and it is an error to omit
455	any of them). Without the L option, branch lengths in user trees are not
456	required, and are ignored if present.
457
458	Example:
459
460	5 114 U L
461	Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
462	...
463
464	(The U is for user tree and the L for user lengths)
465
466
467	O -- Outgroup
468
469	Use the specified sequence number for the outgroup. Requires an auxiliary
470	data line of the form:
471
472	O outgroup_number
473
474	Example:
475
476	5 114 O
477	O 5
478	Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
479	...
480
481	This option only affects the way the tree is drawn (and written to the
482	treefile).
483
484
485
486	Q -- Quickadd (*** Changed in version 1.1 ***)
487
488	The quickadd feature greatly decreases the time in initially placing a new
489	sequence in the growing tree (but does not change the time required to
490	subsequently test rearrangements). The overall time savings seems to be about
491	30%, based on a number of test cases. Its downside, if any, is unknown. This
492	is now (starting in version 1.1) the default program behavior.
493
494	If the analysis is run with a global option of "G 0 0", so that no
495	rearrangements are permitted, the tree is build very approximately, but very
496	quickly. This may be of greatest interest if the question is, "Where does
497	this one new sequence fit into this known tree? The known tree is provided
498	with the restart option (below).
499
500	PHYLIP DNAML does not include anything comparable to the quickadd feature.
501
502	The quickadd feature can be turned OFF by adding a Q to the first line of the
503	input file.
504
505
506
507	R -- Restart
508
509	The R option causes the program to read a user-supplied tree with less than
510	the full number of taxa as the starting point for sequential addition of the
511	remaining taxa. Thus, the sequence data must be followed by a valid (Newick
512	format) tree. (The phylip_tree/2, prolog fact format, is now also supported.)
513
514	The restart option can also be used to increase the range of the search for
515	alternative (better) trees. For example, you can take a tree produced with
516	only "local" tree rearrangements, and increase the rearrangements to
517	"regional" or "global" by combining the appropriate global option with the
518	restart option. If the starting tree was written by fastDNAml, then the
519	extent of rearrangements is saved with the tree, and will be used as the
520	starting point for the additional search. If the tree was already globally
521	optimized, then no additional searching will be performed.
522
523	To support the R option, after each taxon is added to the growing tree, and
524	after each round of rearrangements, the program appends a checkpoint tree to a
525	file called checkpoint.PID, where PID is the process number of the running
526	fastDNAml program. The last line of this file needs to be appended to the
527	input file when the R option is used. (This should not be confused with the U
528	(user tree) option, which expects a number followed by that number of trees.
529	No additional taxa are added to user trees.)
530
531	The UNIX utility tail can be used to remove the last tree from the checkpoint
532	file, and the utility cat can be used to append it to the input. For example,
533	the following script can be used to add a starting tree and the R option to a
534	data file, and restart fastDNAml:
535
536	#! /bin/sh
537	if test $# -ne 1
538	then echo "Usage: restart checkpoint_file"
539	exit
540	fi
541	read first_line # first line of data file
542	echo "$first_line R" # add restart option
543	cat - # rest of data file
544	tail -1 $1 # append last tree in checkpoint file
545
546	If this shell script is in the file called restart, then one might use the
547	command:
548
549	restart checkpoint.21312 < infile \| fastDNAml > new_outfile
550	^script ^checkpoint tree ^data ^dnaml program ^output_file
551
552	If this is too opaque, don't worry about it, or talk with your local unix
553	wizard. In the mean time, this and other useful shell scripts are provided
554	with the program.
555
556	PHYLIP DNAML does not write checkpoint trees and does not have a restart
557	option.
558
559
560
561	T -- Transition/transversion ratio
562
563	Use a user-specified ratio of transition to transversion type substitutions.
564	Without the T option, a value of 2.0 is used. Requires an auxiliary data line
565	of the form:
566
567	T ratio
568
569	Example:
570
571	5 114 T
572	T 1.0
573	Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
574	...
575
576	(Note that a T option with a value of 2.0 does nothing, but it can provide
577	a last auxiliary data line following optional auxiliary data. See the
578	examples for G and Y.)
579
580
581
582	U -- User Tree(s)
583
584	Read an input line with the number of user-specified trees, followed by the
585	specified number of trees. These data immediately follow the sequence data.
586
587	The trees must be in Newick format, and terminated with a semicolon. (The
588	program also accepts a pseudo_newick format, which is a valid prolog fact.)
589
590	The tree reader in this program is more powerful than that in PHYLIP 3.3. In
591	particular, material enclosed in square brackets, [ like this ], is ignored as
592	comments; taxa names can be wrapped in single quotation marks to support the
593	inclusion of characters that would otherwise end the name (i.e., '(', ')',
594	':', ';', '[', ']', ',' and ' '); names of internal nodes are properly
595	ignored; and exponential notation (such as 1.0E-6) for branch lengths is
596	supported.
597
598
599
600	W -- Weights
601
602	Read user-specified column weighting information. This option requires
603	auxiliary data of the form:
604
605	Weights list_of_weight_values [per site, one or more lines]
606
607	Example:
608
609	5 114 W
610	Weights 111111111111001100000100011111100000000000000110000110000000
611	111101111111111111111111011100000111001011100000000011
612	Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
613	...
614
615	It is necessary that the weight values not start before the 11'th character in
616	the line, or some of them will be lost. Weights from 0 to 35 are indicated by
617	the series: 0, 1, 2, 3, ..., 9, A, B, ..., Y, Z.
618
619	PHYLIP DNAML does not support user weights with values other than 1 or 0.
620	This limit has been removed in fastDNAml to permit the use of user weights
621	as a mechanism for representing a bootstrap sample (that is, only the
622	auxiliary data lines change, not the body of the data file).
623
624
625
626	Y -- Write Tree (*** Changed in version 1.1 ***)
627
628	fastDNAml writes the final tree to an output file called treefile.PID. By
629	default the tree is in PHYLIP format. The Y option allows turning this off,
630	or changing the format of the tree.
631
632	The Y option by itself toggles the saving of the tree, on or off. If there
633	is also an auxiliary input line of the form:
634
635	Y number
636
637	where number can be 1, 2, or 3, the number selects one of three tree output
638	formats:
639
640	1 Newick
641	2 Prolog
642	3 PHYLIP (default)
643
644	Newick is the tree standard used by PAUP, MacClade, and serveral other
645	programs. The tree includes a comment about the analysis that the tree is
646	based upon. fastDNAml uses this comment when it reads a tree. In addition,
647	the names of the taxa are enclosed in quotation marks. Both of these
648	features of the file make it incompatible with the PHYLIP package.
649
650	PHYLIP is the subset of the Newick tree standard used by programs in the
651	PHYLIP package. There are no comments and no quotations marks around names.
652	(If a name includes unusual characters, such as a comma, fastDNAml will put
653	it in quotation marks, making it a valid tree, but it cannot be read by the
654	PHYLIP programs.)
655
656	The Prolog format very similar to the Newick format, but it is a valid prolog
657	fact that permits direct loading into some sequence analysis tools that we
658	use. The structure of the term is:
659
660	pseudo_newick([Comment], (Subtree1, Subtree2, Subtree3): Length).
661
662	where each subtree is either
663
664	(Subtree1,Subtree2): Length
665
666	or
667
668	Label: Length
669
670	The comment is a valid prolog term when && is defined as a unary operator.
671	Label is a prolog atom (it is a valid Newick label, with single quotation
672	marks). Length is a number.
673
674	Because the Y auxiliary input line is optional, it cannot be the last auxiliary
675	data line.
676
677	Examples. To turn of the saving of the tree,
678
679	5 114 Y
680	Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
681	...
682
683	or, to change the output to the full Newick format,
684
685	5 114 Y T
686	Y 1
687	T 2.0
688	Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
689	...
690
691	PHYLIP DNAML does not append the PID (process ID) to the tree file name and
692	does not support the full Newick standard or the prolog format output.
693
694	=============================================================================
695
696	Acknowledgements:
697
698	The origin and development of fastDNAml as a program to extend the use of
699	maximum likelihood phylogenetic inference to larger sets of DNA sequences
700	was encouraged by Carl Woese. Through the development and evolution of the
701	program, Joseph Felsenstein has been extremely helpful and encouraging.
702
703	Numerous users have made suggestions and/or reported program bugs:
704
705	Gary Nunn
706	Tom Schmidt
707	Ross Overbeek
708	Hideo Matsuda
709	Mitchell Sogin
710	Brenden Rielly
711
712	=============================================================================
713
714	Examples:
715
716	Data file with empirical frequencies (generic analysis) (notice that blank
717	lines are permitted in the data):
718
719	5 114
720	Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
721	Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
722	Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
723	Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
724	Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
725
726	AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
727	AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
728	AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
729	ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
730	ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
731
732
733	Data file with empirical frequencies and a random addition order:
734
735	5 114 J
736	J 137
737	Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
738	Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
739	Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
740	Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
741	Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
742
743	AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
744	AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
745	AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
746	ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
747	ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
748
749
750	Data file with empirical frequencies and a bootstrap resampling:
751
752	5 114 B
753	B 137
754	Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
755	Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
756	Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
757	Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
758	Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
759
760	AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
761	AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
762	AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
763	ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
764	ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
765
766
767	Data with weighting mask and rate categories:
768
769	5 114 W C
770	Weights 111111111111001100000100011111100000000000000110000110000000
771	111101111111111111111111011100000111001011100000000011
772	C 10 0.0625 0.125 0.25 0.5 1 2 4 8 16 32
773	Categories 5111136343678975AAA8949995566778888889AAAAAA9239898629AAAAA9
774	633792246624457364222574877188898132984963499AA9899975
775	Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
776	Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
777	Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
778	Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
779	Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
780
781	AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
782	AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
783	AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
784	ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
785	ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
786
787
788	Data with three user-specified tree branching orders:
789
790	5 114 U
791	Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
792	Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
793	Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
794	Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
795	Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
796
797	AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
798	AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
799	AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
800	ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
801	ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
802	3
803	(Sequence1,(Sequence2,Sequence3),(Sequence4,Sequence5));
804	(Sequence2,(Sequence1,Sequence3),(Sequence4,Sequence5));
805	(Sequence3,(Sequence1,Sequence2),(Sequence4,Sequence5));
806
807
808	Data with transition/transversion ratio and base frequencies to
809	simulate Jukes & Cantor model:
810
811	5 114 T F
812	T 0.501
813	F 0.25 0.25 0.25 0.25
814	Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
815	Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
816	Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
817	Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
818	Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
819
820	AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
821	AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
822	AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
823	ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
824	ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
825
826
827	Non-interleaved data:
828
829	5 114 I
830	Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
831	AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
832	Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
833	AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
834	Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
835	AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
836	Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
837	ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
838	Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
839	ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
840
841
842	Non-interleaved data by editing a GenBank format (make sure that the names are
843	padded to at least ten characters with blanks):
844
845	5 114 I
846	Sequence1
847	1 ACACGGTGTC GTATCATGCT GCAGGATGCT AGACTGCGTC ANATGTTCGT ACTAACTGTG
848	61 AGCTCGATGA TCGGTGACGT AGACTCAGGG GCCATGCCGC GAGTTTGCGA TGCG
849	Sequence2
850	1 ACGCGGTGTC GTGTCATGCT ACATTATGCT AGACTGCGTC GGATGCTCGT ATTGACTGCG
851	61 AGCACGGTGA TCAATGACGT AGNCTCAGGR TCCACGCCGT GACTTTGTGA TNCG
852	Sequence3
853	1 ACGCGGTGCC GTGTNATGCT GCATTATGCT CGACTGCGRC GGATGCTAGT ATTGACTGCG
854	61 AGCACGATGA CCGATGACGT AGACTGAGGG TCCGTGCCGC GACTTTGTGA TGCG
855	Sequence4
856	1 ACGCGCTGCC GTGTCATCCT ACACGATGCY AGACAGCGTC AGCTGCTAGT ACTGGCTGAG
857	61 ACCTCGGTGA TTGATGACGT AGACTGCGGG TCCATGCCGC GATTTTGCGR TGCG
858	Sequence5
859	1 ACGCGCTGTC GTGTCATACT GCAGGATGCT AGACTGCGTC AGCTGCTAGT ACTGGCTGAG
860	61 ACCTCGATGC TCGATGACGT AGACTGCGGG TCCATGCCGT GATTTTGCGA TGCG
861
862
863	Data analysis restarted from a four-taxon tree (which happens to be wrong,
864	but it will be corrected by local rearrangements after the tree is read):
865
866	5 114 R
867	Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
868	Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
869	Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
870	Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
871	Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
872
873	AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
874	AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
875	AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
876	ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
877	ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
878	(Sequence4:0.1,Sequence2:0.1,(Sequence1:0.1,Sequence5:0.1):0.1):0.0;
879
880
881	Data analysis restarted from a four-taxon tree (which is wrong, and which
882	will not be corrected after the tree is read due to the suppression of all
883	rearrangements by the global 0 0 option):
884
885	5 114 R G T
886	G 0 0
887	T 2.0
888	Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
889	Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
890	Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
891	Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
892	Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
893
894	AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
895	AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
896	AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
897	ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
898	ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
899	(Sequence4:0.1,Sequence2:0.1,(Sequence1:0.1,Sequence5:0.1):0.1):0.0;

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: branches/lib/GDE/FASTDNAML/fastDNAml.txt

Download in other formats: