source: branches/lib/GDE/FASTDNAML/fastDNAml.txt

Last change on this file was 19575, checked in by westram, 3 weeks ago
  • reintegrates 'help' into 'trunk'
    • preformatted text gets checked for width now (to enforce it fits into the arb help window).
    • fixed help following these checks, using the following steps:
      • ignore problems in foreign documentation.
      • increase default help window width.
      • introduce control comments to
        • accept oversized preformatted sections.
        • enforce preformatted style for whole sections.
        • simply define single-line preformatted sections
          Used intensive for definition of internal script languages.
    • fixed several non-related problems found in documentation.
    • minor layout changes for HTML version of arb help (more compacted; highlight anchored/all sections).
    • refactor system interface (GUI version) and use it from help module.
  • adds: log:branches/help@19532:19574
  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
File size: 33.3 KB
Line 
1                             fastDNAml 1.2
2
3
4Gary J. Olsen, Department of Microbiology
5University of Illinois, Urbana, IL
6gary@phylo.life.uiuc.edu
7
8Ross Overbeek, Mathematics and Computer Science
9Argonne National Laboratory, Argonne, IL
10overbeek@mcs.anl.gov
11
12
13
14Citing fastDNAml
15
16If you publish work using fastDNAml, please cite the following publications:
17
18   Olsen, G. J., Matsuda, H., Hagstrom, R., and Overbeek, R.  1994.  fastDNAml:
19   A tool for construction of phylogenetic trees of DNA sequences using maximum
20   likelihood.  Comput. Appl. Biosci. 10: 41-48.
21
22   Felsenstein, J.  1981.  Evolutionary trees from DNA sequences:
23   A maximum likelihood approach.  J. Mol. Evol. 17: 368-376.
24
25
26
27What is fastDNAml
28
29fastDNAml is a program derived from Joseph Felsenstein's version 3.3 DNAML
30(part of his PHYLIP package).  Users should consult the documentation for
31DNAML before using this program.
32
33fastDNAml is an attempt to solve the same problem as DNAML, but to do so
34faster and using less memory, so that larger trees and/or more bootstrap
35replicates become tractable.  Much of fastDNAml is merely a recoding of the
36PHYLIP 3.3 DNAML program from PASCAL to C.
37
38DNAML includes the following notice:
39
40version 3.3. (c) Copyright 1986, 1990 by the University of Washington and
41Joseph Felsenstein.  Written by Joseph Felsenstein.  Permission is granted to
42copy and use this program provided no fee is charged for it and provided that
43this copyright notice is not removed.
44
45
46
47Why is fastDNAml faster?
48
49Some recomputation of values has been eliminated (Joe Felsenstein has done
50much of this in version 3.4 DNAML).
51
52The optimization of branch lengths has been accelerated by changing from an EM
53method to Newton's method (Joe Felsenstein has done much of this in version 3.4
54DNAML).
55
56The strategy for simultaneously optimizing all of the branches on the tree has
57been modified to spend less time getting an individual branch right before
58improving the other branches.
59
60
61
62Other new features in fastDNAml
63
64fastDNAml includes a checkpoint feature to regularly save its progress toward
65finding a large tree.  If the program is interrupted, a minor change to the
66input file and adding the R (restart) option permits the work to be resumed
67from the last checkpoint.
68
69The new R {restart) option can also be used for more rapid addition of new
70sequences to a previously computed tree (when new sequences are added to the
71alignment, it is best if the relative alignment of the previous sequences is
72not altered).
73
74The G (global) option has been generalized to permit crossing any number of
75branches during tree rearrangements.  In addition, it is possible to modify
76the extent of rearrangement explored during the sequential addition phase of
77tree building.
78
79The G U (global and user tree) option combination instructs the program to
80find the best of the user trees, and then look for rearrangements that are
81better still.
82
83The number of available rate categories has been raised from 9 to 35.
84
85The weighting mask accepts values from 0 through 35.
86
87The new B (bootstrap) option causes generation of a bootstrap sample, drawn
88from the input data.
89
90The program includes "P4" code for distributing the problem over multiple
91processors (either within one machine, or across multiple machines).
92
93
94
95Do DNAML and fastDNAml give the same answer?
96
97Generally yes, though there are some reservations:
98
99One or the other might find a better tree due to minor changes in the ways
100trees are searched.  When sequence addition is replicated with different
101values of the jumble random number seed, they have about the same probability
102of finding the best tree, but any given seed might give different trees.
103
104The likelihoods and branch lengths sometimes differ very slightly due to
105different criteria for stopping the optimization process.
106
107Little has been done to check the confidence limits on branch lengths.  There
108seem to be some instances in which they disagree, and we think that fastDNAml
109is correct.  However, do not take the "significantly greater than zero" too
110seriously.
111
112If you are concerned, you can supply a tree inferred by fastDNAml as a user
113tree to DNAML and let it (1) reoptimize branch lengths, (2) tell you
114the confidence limits and (3) tell you the tree likelihood.
115
116
117
118Changes and new features in version 1.2
119
120The program can now calculate the likelihood of extremely large user trees.
121The largest tree we have tested had 3200 taxa.  Generally, you will run out
122of computer memory before you excede an intrinsic limitation.  (With this,
123it is possible to compare trees found by whatever your favorite methods are
124under the likelihood criterion.)
125
126The computation has been changed to permit ease of implimenting new models
127of evolution and analysis of amino acid sequences (though these have not yet
128been done).  This has slowed down the program 5-10%.
129
130
131
132Changes and new features in version 1.1
133
134The quickadd option is now the default.  This has the ugly effect of reversing
135the meaning of putting a Q on the option line.  (Sorry, about this, and the
136next note, but in the long run it it is the better behavior.)
137
138Use of empirical base frequencies is now the default.  This reverses the
139meaning of the F option, making the default behavior more like that of PHYLIP.
140
141The tree output file is now generated by default and should be more compatible
142with the files written and read by the PHILIP programs.  In particular, the
143comments with information about the tree, its likelihood, etc. are removed, and
144there are no quotation marks around names unless there are unusual characters
145within the name.  (There are two things to be very careful about in names:
146there is no completely consistent way to handle both blanks and underscores in
147names without quotation marks, and when a name is spaced in from the margin in
148the input file, there are leading blank spaces in the name, which can be very
149hard to make compatible with some programs.)
150
151Maintaining a list of the several best trees, not just the (single) best.  In
152particular, when evaluating user-supplied trees, the program tries to same
153information about all of the trees and provides a Hasegawa and Kashino type
154test of whether each tree is better than optimum.  Note, the current version
155of the program prints the report in the order of tree likelihood, NOT in the
156order the trees are supplied to the program.  The best way (at present) to
157figure out which tree is which is to look at the likelihoods.  This is the
158same test used in PHILIP, but I had removed access in version 1.0 of fastDNAml
159due to differences in how the programs handle multiple trees.  The difference
160is that fastDNAml can maintain nearly optimal trees all the time, so you can
161get a list of the N best trees found by using the new K option (below).
162
163The program should accept rooted trees (strictly bifurcating), as well as
164unrooted trees (with a trifurcation at the deepest level).  This is not fully
165tested, but it seems to work.
166
167
168
169Features in the works
170
171Test subtree exchanges (as well as moving a single subtree) in the search for
172better trees.
173
174Allowing the program to optimize any user-defined subset of branches when user
175lengths are supplied.
176
177
178
179Input and Options
180
181
182Basics
183
184The input to fastDNAml is similar to that used by DNAML (and the other PHYLIP
185programs).  The user should consult the PHYLIP documentation for a basic
186description of the format.
187
188This version of fastDNAml expects to get its input from stdin (standard input)
189and writes its output to stdout (standard output).  (There are compile time
190options to modify this, for those who care to get into such things.)
191
192On a UNIX or DOS system, it is a simple matter to redirect input from a file
193and output to a file:
194
195  fastDNAml < infile > outfile
196
197On a VMS system it is only slightly more difficult.  Immediately before
198running the program, one includes two commands that define the input and
199output files:
200
201  $ Define/User  Sys$Input   infile
202  $ Define/User  Sys$Output  outfile
203  $ Run fastDNAml
204
205The default input data format is Interleaved (see I option).  To help get data
206from a GenBank or similar format, the interleaved option can be switched off with the I option.  Numbers in the sequence data (i.e., sequence position
207numbers) will be ignored, so they need not be stripped out.
208
209(Note that the program also writes a file called checkpoint.PID.  See the R
210option below for more description.)
211
212
2131 -- Print Data
214
215By default, fastDNAml does not echo the sequence data to the output file.
216Option 1 reverses this.
217
218
2193 -- Do Not Print Tree
220
221By default, fastDNAml prints the final tree to the output file.  Option 3
222reverses this.
223
224
2254 -- Do Not Write Tree to File  (*****  Changed in version 1.1 *****)
226
227By default, fastDNAml versions 1.1 and 1.2 write a machine readable (Newick
228format) copy of the final tree to an output file.  Option 4 reverses this.
229The tree output file will be called treefile.PID (where PID is the process ID
230under which fastDNAml is running).  Look at the Y option below for more
231information on alternative tree formats.
232
233
234B -- Bootstrap
235
236Generates a bootstrap sample of the input data.  Requires auxiliary data line
237of the form:
238
239  B  random_number_seed
240
241Example:
242
243  5  114  B
244  B  137
245  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
246  ...
247
248If the W option is used, only positions that have nonzero weights are used in
249computing the bootstrap sample.  Warning:  For a given random number seed, the
250sample will always be the same.
251
252PHYLIP DNAML does not include a bootstrap option.  (Use the SEQBOOT program.)
253
254
255C -- Categories
256
257Requires auxiliary data of the form:
258
259  C  number_of_categories  list_of_category_rates
260
261The maximum number of categories is 35.  This line is followed by a list of
262the rates for each site:
263
264  Categories  list_of_categories  [per site, one or more lines]
265
266Category "numbers" are ordered: 1, 2, 3, ..., 9, A, B, ..., Y, Z.  Category
267zero (undefined rate) is permitted at sites with a zero in a user-supplied
268weighting mask.
269
270Example:
271
272  5  114  C
273  C  12  0.0625  0.125  0.25  0.5  1  2  4  8  16  32  64  128
274  Categories  5111136343678975AAA8949995566778888889AAAAAA9239898629AAAAA9
275              633792246624457364222574877188898132984963499AA9899975
276  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
277  ...
278
279PHYLIP DNAML is limited to categories 1 through 9.  Also, in PHYLIP version
2803.3, the categories data came after all the other auxiliary data, but before
281the user-supplied base frequencies and sequence data.  If you make the C line
282your last auxiliary data line, the programs will behave the same.
283
284
285F -- Empirical Frequencies  (*****  Changed in version 1.1 *****)
286
287By default (starting with version 1.1), the program uses base frequencies
288derived from the sequence data (called emperical base frequencies).  Therefore
289the input file should normally NOT include a base frequencies line preceding
290the data.  If you want to include your own base freqency data, it is now
291necessary to use the F option, and add a line to the input file that supplies
292the frequency data:
293
294Instructs the program to use user-supllied base frequencies derived from the
295sequence data.  Therefore the input file should not include a base frequencies
296line IMMEDIATELY preceding the data:
297
298  5  114  F
299  0.25  0.30  0.20  0.25
300  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
301  ...
302
303There is an alternative format: the frequencies can be anywhere in the list of
304auxilliary data lines if they are preceded by an F in the first column:
305
306  5  114  F C W
307  F 0.25  0.30  0.20  0.25
308  C ...
309    ...
310  W ...
311  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
312  ...
313
314
315G -- Global
316
317If the global option is specified, there may also be an [optional] auxiliary
318data line of form:
319
320  G  N1
321
322or
323
324  G  N1  N2
325
326N1 is the number of branches to cross in rearrangements of the completed tree.
327The value of N2 is the number of branches to cross in testing rearrangements
328during the sequential addition phase of tree inference.
329
330  N1 = 1:            local rearrangement (default without G option)
331
332  1 < N1 < numsp-3:  regional rearrangements (crossing N1 branches)
333
334  N1>= numsp-3:      global rearrangements (default with G option)
335
336
337
338  N2 <= N1           the default N2 is 1, local rearrangements.
339
340The G option can also be used to force branch swapping on user trees, that is,
341a combination of G and U options.
342
343If the auxiliary line is supplied, it cannot be the last line of auxiliary
344data.  (It may be necessary to add the T option with an auxiliary data line of
345
346  T 2.0
347
348if no other auxiliary data are used.)
349
350Examples:
351
352Do local rearrangements after each addition, and global after last addition:
353
354  5  114  G
355  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
356  ...
357
358Do local rearrangements after each addition, and regional (crossing 4
359branches) after last addition:
360
361  5  114  G T
362  G  4
363  T  2.0
364  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
365  ...
366
367Do no rearrangements after each addition, and local after last addition:
368
369  5  114  G T
370  G  1 0
371  T  2.0
372  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
373  ...
374
375PHYLIP DNAML does not support the auxiliary data line or branch swapping on a
376user tree.
377
378
379I -- Not Interleaved
380
381By default, fastDNAml 1.2 expects data lines for the various sequences in an
382interleaved format (as did PHYLIP 3.3 DNAML).  The I option reverses the
383expected format (to non-interleaved data, in which all the data lines for one
384sequence before the next sequence begins).  This is particularly useful for
385editing a GenBank or equivalent format into a valid input file (note that
386numbers within the sequence data are ignored, so it is not necessary to remove
387them).
388
389If all the data for each sequence are on one line, then the interleaved  and
390non-interleaved formats are degenerate.  (This is the way David Swofford's
391PAUP program writes PHYLIP format output files.)  The drawback is that many
392programs do not handle long lines of text.  This includes the vi and EDT text
393editors, many electronic mail programs, and some versions of FTP for VAX/VMS
394systems.
395
396PHYLIP 3.3 DNAML expects interleaved data, and does not include an I option to
397alter this.  PHYLIP 3.4 DNAML accepts an I option, but the default format is
398reversed.
399
400
401J -- Jumble
402
403Randomize the sequence addition order.  Requires an auxiliary input line of
404the form:
405
406  J  random_number_seed
407
408Example:
409
410  5  114  J
411  J  137
412  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
413  ...
414
415Note that fastDNAml explores a very small number of alternative tree
416topologies relative to a typical parsimony program.  There is a very real
417chance that the search procedure will not find the tree topology with the
418highest likelihood.  Altering the order of taxon addition and comparing the
419trees found is a fairly efficient method for testing convergence.  Typically,
420it would be nice to find the same best tree at least twice (if not three
421times), as opposed to simply performing some fixed number of jumbles and
422hoping that at least one of them will be the optimum.
423
424
425K -- Keep multiple best trees  (***** New in version 1.1 *****)
426
427The program can keep a list of the best trees that it has found.  When the
428program is done, it prints a list of these, from best to worst, and print
429a Hasegawa and Kishino type test as to which trees are significantly worse
430than the best tree found.  When evaluating user-supplied trees, the program
431automatically keeps all trees.  In other situations, the program keeps only
432the best tree that it has found.  The K option, and associate auxilliary data
433line, can be used to define an alternative number:
434
435Example, to keep the 15 best trees found:
436
437  5  114  K
438  K  15
439  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
440  ...
441
442Example, to keep only the one best tree of possibly numerous user-supplied
443trees:
444
445  5  114  K  U
446  K  1
447  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
448  ...
449
450
451
452L -- User Lengths
453
454Causes user trees to be read with branch lengths (and it is an error to omit
455any of them).  Without the L option, branch lengths in user trees are not
456required, and are ignored if present.
457
458Example:
459
460  5  114  U L
461  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
462  ...
463
464(The U is for user tree and the L for user lengths)
465
466
467O -- Outgroup
468
469Use the specified sequence number for the outgroup.  Requires an auxiliary
470data line of the form:
471
472  O  outgroup_number
473
474Example:
475
476  5  114  O
477  O 5
478  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
479  ...
480
481This option only affects the way the tree is drawn (and written to the
482treefile).
483
484
485
486Q -- Quickadd  (***** Changed in version 1.1 *****)
487
488The quickadd feature greatly decreases the time in initially placing a new
489sequence in the growing tree (but does not change the time required to
490subsequently test rearrangements).  The overall time savings seems to be about
49130%, based on a number of test cases.  Its downside, if any, is unknown.  This
492is now (starting in version 1.1) the default program behavior.
493
494If the analysis is run with a global option of "G 0 0", so that no
495rearrangements are permitted, the tree is build very approximately, but very
496quickly.  This may be of greatest interest if the question is, "Where does
497this one new sequence fit into this known tree?  The known tree is provided
498with the restart option (below).
499
500PHYLIP DNAML does not include anything comparable to the quickadd feature.
501
502The quickadd feature can be turned OFF by adding a Q to the first line of the
503input file.
504
505
506
507R -- Restart
508
509The R option causes the program to read a user-supplied tree with less than
510the full number of taxa as the starting point for sequential addition of the
511remaining taxa.  Thus, the sequence data must be followed by a valid (Newick
512format) tree.  (The phylip_tree/2, prolog fact format, is now also supported.)
513
514The restart option can also be used to increase the range of the search for
515alternative (better) trees.  For example, you can take a tree produced with
516only "local" tree rearrangements, and increase the rearrangements to
517"regional" or "global" by combining the appropriate global option with the
518restart option.  If the starting tree was written by fastDNAml, then the
519extent of rearrangements is saved with the tree, and will be used as the
520starting point for the additional search.  If the tree was already globally
521optimized, then no additional searching will be performed.
522
523To support the R option, after each taxon is added to the growing tree, and
524after each round of rearrangements, the program appends a checkpoint tree to a
525file called checkpoint.PID, where PID is the process number of the running
526fastDNAml program.  The last line of this file needs to be appended to the
527input file when the R option is used.  (This should not be confused with the U
528(user tree) option, which expects a number followed by that number of trees.
529No additional taxa are added to user trees.)
530
531The UNIX utility tail can be used to remove the last tree from the checkpoint
532file, and the utility cat can be used to append it to the input.  For example,
533the following script can be used to add a starting tree and the R option to a
534data file, and restart fastDNAml:
535
536  #! /bin/sh
537  if test $# -ne 1
538    then echo "Usage:  restart checkpoint_file"
539    exit
540  fi
541  read first_line             # first line of data file
542  echo "$first_line R"        # add restart option
543  cat -                       # rest of data file
544  tail -1 $1                  # append last tree in checkpoint file
545
546If this shell script is in the file called restart, then one might use the
547command:
548
549  restart  checkpoint.21312  < infile  | fastDNAml  > new_outfile
550   ^script  ^checkpoint tree    ^data     ^dnaml program  ^output_file
551
552If this is too opaque, don't worry about it, or talk with your local unix
553wizard.  In the mean time, this and other useful shell scripts are provided
554with the program.
555
556PHYLIP DNAML does not write checkpoint trees and does not have a restart
557option.
558
559
560
561T -- Transition/transversion ratio
562
563Use a user-specified ratio of transition to transversion type substitutions.
564Without the T option, a value of 2.0 is used.  Requires an auxiliary data line
565of the form:
566
567  T  ratio
568
569Example:
570
571  5  114  T
572  T  1.0
573  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
574  ...
575
576(Note that a T option with a value of 2.0 does nothing, but it can provide
577a last auxiliary data line following optional auxiliary data.  See the
578examples for G and Y.)
579
580
581
582U -- User Tree(s)
583
584Read an input line with the number of user-specified trees, followed by the
585specified number of trees.  These data immediately follow the sequence data.
586
587The trees must be in Newick format, and terminated with a semicolon.  (The
588program also accepts a pseudo_newick format, which is a valid prolog fact.)
589
590The tree reader in this program is more powerful than that in PHYLIP 3.3.  In
591particular, material enclosed in square brackets, [ like this ], is ignored as
592comments; taxa names can be wrapped in single quotation marks to support the
593inclusion of characters that would otherwise end the name (i.e., '(', ')',
594':', ';', '[', ']', ',' and ' '); names of internal nodes are properly
595ignored; and exponential notation (such as 1.0E-6) for branch lengths is
596supported.
597
598
599
600W -- Weights
601
602Read user-specified column weighting information.  This option requires
603auxiliary data of the form:
604
605  Weights     list_of_weight_values    [per site, one or more lines]
606
607Example:
608
609  5  114  W
610  Weights     111111111111001100000100011111100000000000000110000110000000
611              111101111111111111111111011100000111001011100000000011
612  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
613  ...
614
615It is necessary that the weight values not start before the 11'th character in
616the line, or some of them will be lost.  Weights from 0 to 35 are indicated by
617the series: 0, 1, 2, 3, ..., 9, A, B, ..., Y, Z.
618
619PHYLIP DNAML does not support user weights with values other than 1 or 0.
620This limit has been removed in fastDNAml to permit the use of user weights
621as a mechanism for representing a bootstrap sample (that is, only the
622auxiliary data lines change, not the body of the data file).
623
624
625
626Y -- Write Tree  (*****  Changed in version 1.1  *****)
627
628fastDNAml writes the final tree to an output file called treefile.PID.  By
629default the tree is in PHYLIP format.  The Y option allows turning this off,
630or changing the format of the tree.
631
632The Y option by itself toggles the saving of the tree, on or off.  If there
633is also an auxiliary input line of the form:
634
635  Y number
636
637where number can be 1, 2, or 3, the number selects one of three tree output
638formats:
639
640  1  Newick
641  2  Prolog
642  3  PHYLIP (default)
643
644Newick is the tree standard used by PAUP, MacClade, and serveral other
645programs.  The tree includes a comment about the analysis that the tree is
646based upon.  fastDNAml uses this comment when it reads a tree.  In addition,
647the names of the taxa are enclosed in quotation marks.  Both of these
648features of the file make it incompatible with the PHYLIP package.
649
650PHYLIP is the subset of the Newick tree standard used by programs in the
651PHYLIP package.  There are no comments and no quotations marks around names.
652(If a name includes unusual characters, such as a comma, fastDNAml will put
653it in quotation marks, making it a valid tree, but it cannot be read by the
654PHYLIP programs.)
655
656The Prolog format very similar to the Newick format, but it is a valid prolog
657fact that permits direct loading into some sequence analysis tools that we
658use.  The structure of the term is:
659
660  pseudo_newick([Comment], (Subtree1, Subtree2, Subtree3): Length).
661
662where each subtree is either
663
664  (Subtree1,Subtree2): Length
665
666or
667
668  Label: Length
669
670The comment is a valid prolog term when && is defined as a unary operator.
671Label is a prolog atom (it is a valid Newick label, with single quotation
672marks).  Length is a number.
673
674Because the Y auxiliary input line is optional, it cannot be the last auxiliary
675data line.
676
677Examples.  To turn of the saving of the tree,
678
679  5  114  Y
680  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
681  ...
682
683or, to change the output to the full Newick format,
684
685  5  114  Y T
686  Y 1
687  T 2.0
688  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
689  ...
690
691PHYLIP DNAML does not append the PID (process ID) to the tree file name and
692does not support the full Newick standard or the prolog format output.
693
694=============================================================================
695
696Acknowledgements:
697
698The origin and development of fastDNAml as a program to extend the use of
699maximum likelihood phylogenetic inference to larger sets of DNA sequences
700was encouraged by Carl Woese.  Through the development and evolution of the
701program, Joseph Felsenstein has been extremely helpful and encouraging.
702
703Numerous users have made suggestions and/or reported program bugs:
704
705   Gary Nunn
706   Tom Schmidt
707   Ross Overbeek
708   Hideo Matsuda
709   Mitchell Sogin
710   Brenden Rielly
711
712=============================================================================
713
714Examples:
715
716Data file with empirical frequencies (generic analysis) (notice that blank
717lines are permitted in the data):
718
7195  114
720Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
721Sequence2   ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
722Sequence3   ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
723Sequence4   ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
724Sequence5   ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
725
726            AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
727            AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
728            AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
729            ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
730            ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
731
732
733Data file with empirical frequencies and a random addition order:
734
7355  114  J
736J 137
737Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
738Sequence2   ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
739Sequence3   ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
740Sequence4   ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
741Sequence5   ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
742
743            AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
744            AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
745            AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
746            ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
747            ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
748
749
750Data file with empirical frequencies and a bootstrap resampling:
751
7525  114  B
753B 137
754Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
755Sequence2   ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
756Sequence3   ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
757Sequence4   ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
758Sequence5   ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
759
760            AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
761            AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
762            AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
763            ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
764            ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
765
766
767Data with weighting mask and rate categories:
768
7695  114  W C
770Weights     111111111111001100000100011111100000000000000110000110000000
771            111101111111111111111111011100000111001011100000000011
772C  10  0.0625  0.125  0.25  0.5  1  2  4  8  16  32
773Categories  5111136343678975AAA8949995566778888889AAAAAA9239898629AAAAA9
774            633792246624457364222574877188898132984963499AA9899975
775Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
776Sequence2   ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
777Sequence3   ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
778Sequence4   ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
779Sequence5   ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
780
781            AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
782            AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
783            AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
784            ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
785            ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
786
787
788Data with three user-specified tree branching orders:
789
7905  114  U
791Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
792Sequence2   ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
793Sequence3   ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
794Sequence4   ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
795Sequence5   ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
796
797            AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
798            AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
799            AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
800            ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
801            ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
8023
803(Sequence1,(Sequence2,Sequence3),(Sequence4,Sequence5));
804(Sequence2,(Sequence1,Sequence3),(Sequence4,Sequence5));
805(Sequence3,(Sequence1,Sequence2),(Sequence4,Sequence5));
806
807
808Data with transition/transversion ratio and base frequencies to
809simulate Jukes & Cantor model:
810
8115  114  T F
812T 0.501
813F 0.25 0.25 0.25 0.25
814Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
815Sequence2   ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
816Sequence3   ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
817Sequence4   ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
818Sequence5   ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
819
820            AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
821            AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
822            AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
823            ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
824            ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
825
826
827Non-interleaved data:
828
8295  114  I
830Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
831            AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
832Sequence2   ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
833            AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
834Sequence3   ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
835            AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
836Sequence4   ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
837            ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
838Sequence5   ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
839            ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
840
841
842Non-interleaved data by editing a GenBank format (make sure that the names are
843padded to at least ten characters with blanks):
844
8455  114  I
846Sequence1
847        1 ACACGGTGTC GTATCATGCT GCAGGATGCT AGACTGCGTC ANATGTTCGT ACTAACTGTG
848       61 AGCTCGATGA TCGGTGACGT AGACTCAGGG GCCATGCCGC GAGTTTGCGA TGCG
849Sequence2
850        1 ACGCGGTGTC GTGTCATGCT ACATTATGCT AGACTGCGTC GGATGCTCGT ATTGACTGCG
851       61 AGCACGGTGA TCAATGACGT AGNCTCAGGR TCCACGCCGT GACTTTGTGA TNCG
852Sequence3
853        1 ACGCGGTGCC GTGTNATGCT GCATTATGCT CGACTGCGRC GGATGCTAGT ATTGACTGCG
854       61 AGCACGATGA CCGATGACGT AGACTGAGGG TCCGTGCCGC GACTTTGTGA TGCG
855Sequence4
856        1 ACGCGCTGCC GTGTCATCCT ACACGATGCY AGACAGCGTC AGCTGCTAGT ACTGGCTGAG
857       61 ACCTCGGTGA TTGATGACGT AGACTGCGGG TCCATGCCGC GATTTTGCGR TGCG
858Sequence5
859        1 ACGCGCTGTC GTGTCATACT GCAGGATGCT AGACTGCGTC AGCTGCTAGT ACTGGCTGAG
860       61 ACCTCGATGC TCGATGACGT AGACTGCGGG TCCATGCCGT GATTTTGCGA TGCG
861
862
863Data analysis restarted from a four-taxon tree (which happens to be wrong,
864but it will be corrected by local rearrangements after the tree is read):
865
8665  114  R
867Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
868Sequence2   ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
869Sequence3   ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
870Sequence4   ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
871Sequence5   ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
872
873            AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
874            AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
875            AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
876            ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
877            ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
878(Sequence4:0.1,Sequence2:0.1,(Sequence1:0.1,Sequence5:0.1):0.1):0.0;
879
880
881Data analysis restarted from a four-taxon tree (which is wrong, and which
882will not be corrected after the tree is read due to the suppression of all
883rearrangements by the global 0 0 option):
884
8855  114  R G T
886G 0 0
887T 2.0
888Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
889Sequence2   ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
890Sequence3   ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
891Sequence4   ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
892Sequence5   ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
893
894            AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
895            AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
896            AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
897            ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
898            ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
899(Sequence4:0.1,Sequence2:0.1,(Sequence1:0.1,Sequence5:0.1):0.1):0.0;
Note: See TracBrowser for help on using the repository browser.