source: branches/items/GDE/TREEPUZZLE/doc/manual.html

Last change on this file was 10842, checked in by westram, 11 years ago
  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
File size: 71.0 KB
Line 
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
2
3<HTML>
4<!-- To view this document properly please use a HTML browser -->
5
6<HEAD>
7   <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
8   <TITLE>Documentation of TREE-PUZZLE 5.0</TITLE>
9</HEAD>
10<BODY BGCOLOR="#FFFFFF">
11
12<H1>
13<img ALT="PUZZLE Logo" SRC="puzzle.gif" HSPACE=10 BORDER=0 height=32 width=32 align=LEFT>
14<b><font size="+3">TREE-PUZZLE Manual</font></b>
15<img ALT="PPUZZLE Logo" SRC="ppuzzle.gif" HSPACE=10 BORDER=0 height=32 width=32>
16</H1>
17<B>Maximum likelihood analysis for nucleotide, amino acid, and two-state data</B>
18
19
20<P>Version 5.0
21<BR>October 2000
22<BR>Copyright 1999-2000 by Heiko A. Schmidt, Korbinian Strimmer, Martin Vingron, and Arndt von Haeseler
23<BR>Copyright 1995-1999 by Korbinian Strimmer and Arndt von Haeseler
24
25<P><b>Heiko A. Schmidt</b>
26email: h.schmidt@dkfz-heidelberg.de,
27<A HREF="http://www.dkfz-heidelberg.de/tbi/">Theoretical Bioinformatics</A>,
28<A HREF="http://www.dkfz-heidelberg.de/">DKFZ</A>,
29Im Neuenheimer Feld 280, D-69124 Heidelberg, Germany.
30
31<P><b>Korbinian Strimmer</b>,
32email: korbinian.strimmer@zoo.ox.ac.uk,
33<A HREF="http://www.zoo.ox.ac.uk/">Department of Zoology</A>,
34<A HREF="http://www.ox.ac.uk/">University of Oxford</A>,
35South Parks Road, Oxford OX1 3PS, UK.
36
37<P><b>Martin Vingron</b>
38email: vingron@dkfz-heidelberg.de,
39<A HREF="http://www.dkfz-heidelberg.de/tbi/">Theoretical Bioinformatics</A>,
40<A HREF="http://www.dkfz-heidelberg.de/">DKFZ</A>,
41Im Neuenheimer Feld 280, D-69124 Heidelberg, Germany.
42
43<P><b>Arndt von Haeseler</b>,
44email: haeseler@eva.mpg.de,
45<A HREF="http://www.eva.mpg.de/">Max-Planck-Institute for Evolutionary Anthropology</A>,
46Inselstr. 22, D-04103 Leipzig, Germany.
47
48<p><font size ="-1" color ="brown">The official name of the program has been
49   changed to TREE-PUZZLE to avoid legal conflict with the Fraunhofer
50   Gesellschaft. We are sorry for any inconvenience this may cause to you.
51   Any reference to PUZZLE in this package is only colloquial and refers
52   to TREE-PUZZLE.
53</font>
54
55<P>TREE-PUZZLE is a computer program to reconstruct phylogenetic trees from
56molecular sequence data by maximum likelihood. It implements a fast tree
57search algorithm, quartet puzzling, that allows analysis of large data
58sets and automatically assigns estimations of support to each internal
59branch. TREE-PUZZLE also computes pairwise maximum likelihood distances as well
60as branch lengths for user specified trees. Branch lengths can also be
61calculated under the clock-assumption. In addition, TREE-PUZZLE offers a novel
62method, likelihood mapping, to investigate the support of a hypothesized
63internal branch without computing an overall tree and to visualize the
64phylogenetic content of a sequence alignment. TREE-PUZZLE also conducts a number
65of statistical tests on the data set (chi-square test for homogeneity of
66base composition, likelihood ratio to test the clock hypothesis, Kishino-Hasegawa
67test). The models of substitution provided by TREE-PUZZLE are TN, HKY, F84,
68SH for nucleotides, Dayhoff, JTT, mtREV24, BLOSUM 62, VT, WAG for amino acids, and
69F81 for two-state data. Rate heterogeneity is modelled by a discrete Gamma
70distribution and by allowing invariable sites. The corresponding parameters
71can be inferred from the data set.
72
73<P>TREE-PUZZLE is available free of charge from
74<DL>
75<DD>
76<A HREF="http://www.tree-puzzle.de/">http://www.tree-puzzle.de/</A> (TREE-PUZZLE home page)
77</DD>
78
79<DD>
80<A HREF="http://www.dkfz-heidelberg.de/tbi/tree-puzzle/">http://www.dkfz-heidelberg.de/tbi/tree-puzzle/</A> (TREE-PUZZLE home page mirror at DKFZ)
81</DD>
82
83<DD>
84<A HREF="http://iubio.bio.indiana.edu/soft/molbio/evolve">http://iubio.bio.indiana.edu/soft/molbio/evolve</A>
85(IUBio archive www, USA)
86</DD>
87
88<DD>
89<A HREF="ftp://iubio.bio.indiana.edu/molbio/evolve">ftp://iubio.bio.indiana.edu/molbio/evolve</A>
90(IUBio archive ftp, USA)
91</DD>
92
93<DD>
94<A HREF="ftp://ftp.ebi.ac.uk/pub/software">ftp://ftp.ebi.ac.uk/pub/software</A>
95(European Bioinformatics Institute, UK)
96</DD>
97
98<DD>
99<A HREF="ftp://ftp.pasteur.fr/pub/GenSoft">ftp://ftp.pasteur.fr/pub/GenSoft</A>
100(Institut Pasteur, France)
101</DD>
102
103</DL>
104TREE-PUZZLE is written in ANSI C. It will run on most personal computers and
105workstations if compiled by an appropriate C compiler.
106The tree reconstruction part of TREE-PUZZLE has been parallelized using
107the Message Passing Interface (MPI)
108library standard (<A HREF="#snir1998">Snir et al.</A>, 1998 and
109<A HREF="#gropp1998">Gropp et al.</A>, 1998). If desired to run
110TREE-PUZZLE in parallel you need an implementation of the MPI library on your
111system as well.
112
113<P>Please read the <A HREF="#Installation">installation section</A>
114for more details.
115
116<P>We suggest that this documentation should be read before using TREE-PUZZLE
117the first time. If you do not have the time to read this manual completely
118please do read at least the sections <A HREF="#Input/Output Conventions">Input/Output
119Conventions</A> and <A HREF="#Quick Start">Quick Start </A>below. Then
120you should be able to use the TREE-PUZZLE program, especially if you have some
121experience with the PHYLIP programs. The other sections should then be read
122at a later time.
123
124<P>To find out what's new in version 5.0 please read the
125<A HREF="#Version History">Version History</A>.
126
127<P>
128<HR ALIGN=center WIDTH="100%" SIZE=2>
129<CENTER><H2>Contents</H2></CENTER><P>
130
131<UL>
132<LI><A HREF="#Legal Stuff">Legal Stuff</A></LI>
133
134<LI><A HREF="#Installation">Installation</A>
135    <UL>
136    <LI><A HREF="#Unix">UNIX</A></LI>
137    <LI><A HREF="#MacOS">MacOS</A></LI>
138    <LI><A HREF="#Win32">Windows 95/98/NT</A></LI>
139    <LI><A HREF="#VMS">VMS</A></LI>
140    <LI><A HREF="#MPI">Parallel TREE-PUZZLE</A></LI>
141    </UL>
142</LI>
143
144<LI><A HREF="#Introduction">Introduction</A></LI>
145<LI><A HREF="#Input/Output Conventions">Input/Output Conventions</A>
146    <UL>
147    <LI><A HREF="#Sequence Input">Sequence Input</A></LI>
148    <LI><A HREF="#General Output">General Output</A></LI>
149    <LI><A HREF="#Distance Output">Distance Output</A></LI>
150    <LI><A HREF="#Tree Output">Tree Output</A></LI>
151    <LI><A HREF="#Tree Input">Tree Input</A></LI>
152    <LI><A HREF="#Likelihood Mapping Output">Likelihood Mapping Output</A></LI>
153    </UL>
154</LI>
155
156<LI><A HREF="#Quick Start">Quick Start</A></LI>
157<LI><A HREF="#Models of Sequence Evolution">Models of Sequence Evolution</A>
158    <UL>
159    <LI><A HREF="#Models of Substitution">Models of Substitution</A></LI>
160    <LI><A HREF="#Models of Rate Heterogeneity">Models of Rate Heterogeneity</A></LI>
161    </UL>
162</LI>
163
164<LI><A HREF="#Options Available">Available Options</A></LI>
165<LI><A HREF="#Other Features">Other Features</A></LI>
166<LI><A HREF="#Interpretation and Hints">Interpretation and Hints</A>
167    <UL>
168    <LI><A HREF="#Quartet Puzzling Support Values">Quartet Puzzling Support Values</A></LI>
169    <LI><A HREF="#Percentage of Unresolved Quartets">Percentage of Unresolved Quartets</A></LI>
170    <LI><A HREF="#Automatic Parameter Estimation">Automatic Parameter Estimation</A></LI>
171    <LI><A HREF="#Likelihood Mapping">Likelihood Mapping</A></LI>
172    <LI><A HREF="#Batch Mode">Batch Mode</A></LI>
173    </UL>
174</LI>
175<LI><A HREF="#Limits and Error Messages">Limits and Error Messages</A></LI>
176<LI><A HREF="#Are Quartets Reliable">Are Quartets Reliable?</A></LI>
177<LI><A HREF="#Other Programs">Other Programs</A></LI>
178<LI><A HREF="#Acknowledgements">Acknowledgements</A></LI>
179<LI><A HREF="#References">References</A></LI>
180<LI><A HREF="#Known Bugs">Known Bugs</A></LI>
181<LI><A HREF="#Version History">Version History</A></LI>
182</UL>
183
184<HR>
185<H2>
186<A NAME="Legal Stuff"></A>Legal Stuff</H2>
187TREE-PUZZLE 5.0 is (c) 1999-2000 Heiko A. Schmidt, Korbinian Strimmer, Martin Vingron, and Arndt von Haeseler.<BR>
188Earlier PUZZLE versions were (c) 1995-1999 by Korbinian Strimmer and Arndt von Haeseler.<BR>
189The software and its accompanying documentation are provided as
190is, without guarantee of support or maintenance. The whole package is
191licensed under the GNU public license, except for the parts indicated in
192the sources where the copyright of the authors does not apply.  Please see
193<A
194HREF="http://www.opensource.org/licenses/gpl-license.html">http://www.opensource.org/licenses/gpl-license.html</A> for details.
195
196<H2>
197<A NAME="Installation"></A>Installation</H2>
198The source code of the TREE-PUZZLE software is 100% identical across platforms.
199However, installation procedures differ.
200
201<H3>
202<A NAME="Unix"></A>UNIX</H3>
203Get the file <B>tree-puzzle-5.0.tar</B>. If you received a compressed tar file
204(<B>tree-puzzle-5.0.tar.Z</B> or <B>tree-puzzle-5.0.tar.gz</B>) you have to decompress
205it first (using the "uncompress" or "gunzip" command). Then untar the file
206with
207<PRE>        tar xvf tree-puzzle-5.0.tar</PRE>
208The newly created directory "tree-puzzle-5.0" contains four subdirectories called
209"doc", "data", "bin", and "src". The "doc" directory
210contains this manual in HTML format. The "data"
211directory contains example input files. The "src" directory contains the
212ANSI C sources of TREE-PUZZLE. Switch to this directory by typing
213<PRE>        cd tree-puzzle-5.0</PRE>
214To compile we recommend the GNU gcc (or GNU egcs) compiler. If gcc is installed
215just type
216<PRE>        sh ./configure</PRE>
217<PRE>        make</PRE>
218<PRE>        make install</PRE>
219and the executable <TT>puzzle</TT> is compiled and put into the <TT>/usr/local/bin</TT> directory.
220If you want to have <TT>puzzle</TT> installed into another directory you can set this
221by setting the <TT>--prefix=/name/of/the/wanted/directory</TT> directive at the
222<TT>sh ./configure</TT> command line.
223The parallel version should have been built and installed as well, if <TT>configure</TT> 
224found a known MPI compiler (cf. <A HREF="#MPI">Parallel TREE-PUZZLE</A> section).
225
226
227Then type
228<PRE>        make clean</PRE>
229and everything will be nicely cleaned up.
230
231If your compiler is not the GNU gcc compiler and not found by <TT>configure</TT> you will have to
232modify that, by setting the <TT>CC</TT> variable (e.g. <TT>setenv CC cc</TT> under <TT>csh</TT> or
233 <TT>CC=cc; export CC</TT> under <TT>sh</TT>) before running <TT>sh ./configure</TT>.
234If you still cannot compile properly then your compiler or its runtime library
235is most probably not ANSI compliant (e.g., old SUN compilers). In most
236cases, however, you will succeed to compile by changing some parameters
237in the "makefile". Ask your local Unix expert for help.
238
239<H3>
240<A NAME="MacOS"></A>MacOS</H3>
241Get the file <B>tree-puzzle-5.0.hqx</B>. After decoding this BinHex file (this
242is done automatically on a properly installed system, otherwise use programs
243like "StuffIt Expander" or ask your local Mac expert) you will find a folder
244called "tree-puzzle-5.0" on your hard disk. This folder contains the four subfolders
245"doc", "data", "bin", and "src". The "doc" folder contains
246this manual in HTML format. The "data" folder contains
247example input files. The "bin" folder contains a Macintosh PPC executable
248with a default memory partition of 3000K.
249There is no 68k executable. <u>If you get a memory allocation error while running
250TREE-PUZZLE you have to increase TREE-PUZZLEŽs memory partition with the "Get Info" command
251of the Macintosh Finder</u>. The "src" folder contains the ANSI C sources of TREE-PUZZLE.
252
253<P>The MacOS executables have been compiled for the PowerMac using Metrowerks CodeWarrior.
254
255<P>Note: It is probably a good idea to install PPC Linux (or MkLinux) on your Macintosh.
256TREE-PUZZLE (as any other program) runs 20-50% faster under Linux compared to the
257same program under MacOS (on the same machine!), and the Mac does not freeze
258during execution because of LinuxŽs multitasking capabilities (maybe this changes in MacOS X). 
259
260
261<H3>
262<A NAME="Win32"></A>Windows 95/98/NT</H3>
263
264Get the file <B>tree-puzzle-5.0.zip</B>. After uncompressing (using, e.g., WinZip
265or a similar tool) a directory "tree-puzzle-5.0" is created containing
266four subdirectories called "doc", "data", "bin", and "src". The "doc" directory
267contains this manual in HTML format. The "data"
268directory contains example input files. The "src" directory contains the
269ANSI C sources of TREE-PUZZLE. The "bin" directory contains the executable
270<TT>puzzle.exe</TT>. To use TREE-PUZZLE the system path to the executable
271needs to be set correctly.  Ask your local Windows expert for help.
272
273<P>The executable has been compiled using
274Microsoft Visual C++ and the "makefile.w32" (contained in "src").
275
276<P>If you have a Linux partition on your PC we recommend
277to install and use TREE-PUZZLE under Linux (see <A HREF="#Unix">Unix</A> section) because it runs
278TREE-PUZZLE significantly faster than Windows.
279
280<H3>
281<A NAME="VMS"></A>VMS</H3>
282
283
284<P>Get the Unix sources and install the package on your computer
285(ask your local VMS expert for help).  Go to the subdirectory
286"src" and compile TREE-PUZZLE using the command file "makefile.com".
287
288<H3>
289<A NAME="MPI"></A>Parallel TREE-PUZZLE</H3>
290
291
292<P>To compile and run the parallelized TREE-PUZZLE you need an implementation
293of the Message Passing Interface (MPI) library, a widely used
294message passing library standard. Implementations of the MPI libraries
295are available for almost all parallel platforms and computer systems,
296and there are free implementations for most platforms as well.
297
298<P>To find an MPI implementation suitable for your platform visit
299the following web sites:
300<UL>
301   <LI><A HREF="http://www-unix.mcs.anl.gov/mpi/implementations.html">http://www-unix.mcs.anl.gov/mpi/implementations.html</A>
302   <LI><A HREF="http://WWW.ERC.MsState.Edu/labs/hpcl/projects/mpi/implementations.html">http://WWW.ERC.MsState.Edu/labs/hpcl/projects/mpi/implementations.html</A>
303   <LI><A HREF="http://www.mpi.nd.edu/MPI/">http://www.mpi.nd.edu/MPI/</A>
304</UL>
305
306Although MPI is also available on Macintosh and Windows systems,
307the developers never ran the parallel version on those
308platforms.
309
310<P>To install the parallel version of TREE-PUZZLE you need the
311Unix sources for TREE-PUZZLE and install the package on your computer
312as described above.
313The <TT>configure</TT> should configure the Makefiles apropriately.
314If there is no known MPI compiler found on the system the parallel
315version is not configured.
316(If problems occur ask your local system administrator for help.)
317
318<P>Than you should be able to compile the parallel version of TREE-PUZZLE
319using the following commands:
320<PRE>        sh ./configure</PRE>
321<PRE>        make</PRE>
322<PRE>        make install</PRE>
323and the executable <TT>ppuzzle</TT> is compiled and put into the <TT>/usr/local/bin</TT> directory.
324If you want to have the executable installed into another directory please proceede as
325described in the <A HREF="#Unix">Unix</A> section.
326
327If your compiler is non out of <TT>mpcc</TT> (IBM), <TT>hcc</TT> (LAM),
328<TT>mpicc_lam</TT> (LAM under LINUX), <TT>mpicc_mpich</TT> (MPICH under LINUX),
329and <TT>mpicc</TT> (LAM, MPICH, HP-UX, etc.) and not found by <TT>configure</TT> you will have to
330modify that by setting the <TT>MPICC</TT> variable (e.g. <TT>setenv MPICC /another/mpicc</TT> 
331under <TT>csh</TT> or <TT>MPICC=/another/mpicc; export MPICC</TT> under <TT>sh</TT>)
332before running <TT>sh ./configure</TT>.
333
334The way you have to start <TT>ppuzzle</TT> depends on the MPI implementation
335installed. So please refer to your MPI manual or ask your local MPI expert
336for help.
337
338<P><B>Note:</B>
339<BR>The parallelization of the tree reconstruction method follows a
340master-worker-concept, i.e., a master process handles the scheduling of
341the computation to the <em>n</em> worker processes, while the worker processes are
342doing almost all the computation work of evaluating the quartets and
343constructing the puzzling step trees.
344
345<BR>Since the master process does not require a lot of CPU time,
346it can be scheduled sharing one processor with a worker process.
347Thus, you can run <TT>ppuzzle</TT> by assigning <em>n+1</em> processes.
348
349<BR>If you want to evaluate a usertree or perform likelihood
350mapping analysis it is not recommended to do a parallel run, because all
351the computation will be done by the master process. Hence a run of the
352sequential version of TREE-PUZZLE is more appropriate for usertree or likelihood
353mapping analysis.
354
355<H2>
356<A NAME="Introduction"></A>Introduction</H2>
357TREE-PUZZLE is an ANSI C application to reconstruct phylogenetic trees from
358molecular sequence data by maximum likelihood. It implements a fast tree
359search algorithm, quartet puzzling, that allows analysis of large data
360sets and automatically assigns estimations of support to each internal
361branch. Rate heterogeneity (invariable sites plus Gamma distributed rates)
362is incorporated in all models of substitution available (nucleotides: SH,
363TN, HKY, F84, and submodels; amino acids: Dayhoff, JTT, mtREV24, BLOSUM
36462, VT, and WAG; two-state data: F81). All parameters including rate heterogeneity can
365be estimated from the data by maximum likelihood approaches. TREE-PUZZLE also
366computes pairwise maximum likelihood distances as well as branch lengths
367for user specified trees. In addition, TREE-PUZZLE offers a novel method, likelihood
368mapping, to investigate the support of internal branches without computing
369an overall tree.
370<H2>
371<A NAME="Input/Output Conventions"></A>Input/Output Conventions</H2>
372
373A few things of the name conventions have changed compared to
374earlier (< 5.0) PUZZLE releases. From version 5.0 onwards 
375names of the sequence input file and the usertree file can be specified 
376at the command line (e.g. '<TT>puzzle infilename intreename</TT>',
377where <TT>infilename</TT> is the name of the sequence file and <TT>intreename</TT>
378is the name of the usertree file).
379If only the input filename or no
380filename is given at the command line the TREE-PUZZLE software searches
381for input files named "<TT>infile</TT>" and/or "<TT>intree</TT>" respectively.
382
383<P>The naming conventions of the output files have changed as well.
384As prefix of the output filenames the name of the sequence input file
385(or the usertree file in the usertree analysis case) is used and an
386extension added to denote the content of the file. If no input filename
387is given at the command line the default filenames of the earlier
388versions are used.
389
390The following extensions/default filenames are possible:
391<DL><DT><DD>
392<TABLE><TR><TD><B>Extension</B></TD><TD><B>default filename</B></TD><TD><B>file content</B></TD></TR>
393       <TR><TD><TT>.puzzle </TT></TD><TD><TT>outfile       </TT></TD><TD>for the TREE-PUZZLE report</TD></TR>
394       <TR><TD><TT>.dist   </TT></TD><TD><TT>outdist       </TT></TD><TD>for the ML distances</TD></TR> 
395       <TR><TD><TT>.tree   </TT></TD><TD><TT>outtree       </TT></TD><TD>for the final tree(s)</TD></TR> 
396       <TR><TD><TT>.qlist  </TT></TD><TD><TT>outqlist      </TT></TD><TD>for the list of unresolved quartets</TD></TR> 
397       <TR><TD><TT>.ptorder</TT></TD><TD><TT>outptorder    </TT></TD><TD>for the list of unique puzzling step tree topologies</TD></TR> 
398       <TR><TD><TT>.pstep  </TT></TD><TD><TT>outpstep      </TT></TD><TD>for the list of puzzling step tree topologies in chronological order</TD></TR> 
399       <TR><TD><TT>.eps    </TT></TD><TD><TT>outlm.eps     </TT></TD><TD>for the EPS file generated in the likelihood mapping analysis</TD></TR> 
400</TABLE></DL>
401
402The file types are described in detail below. In the following
403"INFILENAME" denotes the prefix, which is the sequence input filename
404or the usertree filename respectively.
405
406<H3>
407<A NAME="Sequence Input"></A>Sequence Input</H3>
408TREE-PUZZLE requests sequence input in PHYLIP INTERLEAVED format (sometimes
409also called PHYLIP 3.4 format). Many sequence editors and alignment programs
410(e.g., CLUSTAL W) output data in this format.  The "data" directory
411contains four example input files ("globin.a", "marswolf.n", "atp6.a",
412"primates.b") that can be used as templates for own data files.
413The default name of the sequence input file is "infile", if no
414input filename is given at the command line.
415If an "infile" or a file with the given name is not present TREE-PUZZLE
416will request an alternative file name. Sequences names in the
417input file are allowed to contain blanks but all blanks will internally
418be converted to underscores "_". Sequences can be in upper or lower case,
419any spaces or control characters are ignored. The dot "." is recognized
420as character matching to the first sequence, it can be used in all sequences except in the
421first sequence. Valid symbols for nucleotides are A, C, G, T and
422U, and for amino acids A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S,
423T, V, W, and Y. All other visible characters (including gaps, question
424marks etc.) are treated as N (DNA/RNA) or X (amino acids). For two-state
425data the symbols 0 and 1 are allowed. The first sequence in the data set is
426considered the default outgroup.
427<H3>
428<A NAME="General Output"></A>General Output</H3>
429All results are written to the TREE-PUZZLE report file (INFILENAME.puzzle or
430outfile). If the option "List all unresolved quartets" is invoked a file
431called "INFILENAME.qlist"/"outqlist" is created showing all these quartets.
432If the option "List puzzling step trees" is set accordingly the files
433"INFILENAME.pstep"/"outpstep" and/or "INFILENAME.ptorder"/"outptorder" are
434generated.
435
436<P>The "INFILENAME.ptorder"/"outptorder" file contains the unique tree
437topologies in PHYLIP format preceded by PHYLIP-format comment (in parenthesis).
438A typical line in the ptorder file looks like this:
439
440<P><TT>[ 2. 60 6.00 2 5 1000 ](chicken,((cat,(horse,(mouse,rat))),(opossum,platypus)));</TT></P>
441
442The entries (separated by single blanks) in the parenthesis mean the following:
443<UL>
444  <LI><B>2.</B>   - Topology occurs second-most among all
445                    intermediate tree topologies (= order number).
446  <LI><B>60</B>   - Topology occurs 60 times.
447  <LI><B>6.00</B> - Topology occurs in 6.00 % of the intermediate tree topologies.
448  <LI><B>2</B>    - unique topology ID (needed for the pstep file)
449  <LI><B>5</B>    - Sum of uniquely occuring topologies.
450  <LI><B>1000</B> - Sum of intermediate trees estimated during the analysis.
451</UL>
452 
453<P>The "INFILENAME.pstep"/"outpstep" file contains a log of the
454puzzling steps performed and the occuring tree topologies.
455
456A typical line in the pstep file contains the following entries
457(separated by tabstops):
458
459<P><TT>"6.      55      698     3       5       828"</TT></P>
460
461The entries in the rows mean the following:
462<UL>
463  <LI><B>6.</B>  - 6th block of intermediate trees performed.
464  <LI><B>55</B>  - number of intermediate trees inferred in this block.
465  <LI><B>698</B> - occurances of this topology so far.
466  <LI><B>3</B>   - unique topology ID (for lookup in the ptorder file).
467  <LI><B>5</B>   - number unique topologies occurred so far.
468  <LI><B>828</B> - number of puzzling step performed so far.
469</UL>
470In the case of a sequential run (<TT>puzzle</TT>) the entries of this
471file are more resolved, because every block consists of one intermediate tree.
472
473<H3>
474<A NAME="Distance Output"></A>Distance Output</H3>
475TREE-PUZZLE automatically computes pairwise maximum likelihood distances for
476all the sequences in the data file. They are written in the TREE-PUZZLE report
477file "INFILENAME.puzzle"/"outfile" and in the separate file
478"INFILENAME.dist"/"outdist". The format of distance file is PHYLIP compatible
479(i.e. it can directly be used as input for PHYLIP distance-based programs
480such as "neighbor").
481<H3>
482<A NAME="Tree Output"></A>Tree Output</H3>
483The quartet puzzling tree with its support values
484and with maximum likelihood branch lengths is displayed as ASCII drawing
485in the TREE-PUZZLE report in "INFILENAME.puzzle"/"outfile". The same tree
486is written into the "INFILENAME.tree"/"outtree" file in CLUSTAL W format.
487If clock-like maximum-likelihood branch lengths are computed
488there will be both an unrooted and a rooted tree in the
489"INFILENAME.puzzle"/"outfile". The tree convention follows the NEWICK format
490(as implemented in PHYLIP or CLUSTAL W): the tree topology is described
491by the usual round brackets
492<TT>(a,b,(c,d));</TT>
493where branch lengths are written after the colon a:0.22,b:0.33.
494Support values for each branch
495are displayed as internal node labels, i.e., they follow directly after each
496node before the branch length to each node. Here is an example:
497
498<P>(Gibbon:0.1393, ((Human:0.0414, Chimpanzee:0.0538)99:0.0175, Gorilla:0.0577)98:0.0531,
499Orangutan:0.1003);
500
501<P>The likelihood value of each tree is added in parenthesis before
502the tree string (e.g. "[ lh=-1621.201605 ]"). Parenthesis mark comments
503in the Newick or PHYLIP tree format. In some cases the
504comment has to be removed before using them with other programs.
505
506<P>With the programs
507<a href="http://taxonomy.zoology.gla.ac.uk/rod/treeview.html">TreeView</a> and
508<a href="ftp://rdp.life.uiuc.edu/pub/RDP/programs/TreeTool/">TreeTool</a> 
509it is possible to view a tree both
510with its branch lengths and simultaneously with the support values for the internal
511branches (here 98% and 99%). Note, the PHYLIP programs DRAWTREE and DRAWGRAM may
512also be used with the CLUSTAL W treefile format. However, in the current version
513(3.5) they ignore the internal labels and simply print the tree
514topology along with branch lengths.
515
516<H3>
517<A NAME="Tree Input"></A>Tree Input</H3>
518TREE-PUZZLE optionally also reads input trees. The default name for the file
519containing the input tree is "intree", if not given at the command line,
520but if you choose the input tree option and there is no file with the
521given name or "intree" present you will be prompted for an alternative
522name. The format of the input trees is identical to the trees in the
523"INFILENAME.tree"/"outtree" file.
524However, it is sufficient to provide the tree topology only, you
525don't need to specify branch lengths (that are ignored anyway) or
526internal labels (that are read, stored, and written back to the
527"INFILENAME.tree"/"outtree" file).
528The input trees needs not to be unrooted, they can also be rooted. It is
529important that sequence names in the input tree file do not contain blanks
530(use underscores!). The trees can be multifurcating.
531The format of the tree input file is easy: just put the
532trees into the file. TREE-PUZZLE counts the ';' at the end of each tree description
533to determine how many input trees there are. Any header (e.g., with the
534number of trees) is ignored (this is useful in conjunction with programs
535like MOLPHY that need this header). If there is more than one tree TREE-PUZZLE
536performs the Kishino-Hasegawa test.
537<H3>
538<A NAME="Likelihood Mapping Output"></A>Likelihood Mapping Output</H3>
539TREE-PUZZLE also offers likelihood mapping analysis, a method to investigate
540support for internal branches of a tree without computing an overall tree
541 and to graphically visualize
542phylogenetic content of a sequence alignment. The results of likelihood
543mapping are written in ASCII to the "INFILENAME.puzzle"/"outfile" as well
544as to a file called "INFILENAME.eps" or "outlm.eps" respectively.
545This file contains in encapsulated Postscript format (EPSF)
546a picture of the triangle that forms the basis of the likelihood mapping
547analysis. You may print it out on a Postscript capable printer or view
548it with a suitable program. The "INFILENAME.eps"/"outlm.eps" file can be
549edited by hand (it is plain ASCII text!) or by drawing programs that
550understand the Postcript language (e.g., Adobe Ilustrator).
551<H2>
552<A NAME="Quick Start"></A>Quick Start</H2>
553Prepare your sequence input file and, optionally, your tree input
554file. Then start the TREE-PUZZLE program. TREE-PUZZLE will choose
555automatically the nucleotide or the amino acid mode. If more than 85% of
556the characters (not counting the - and ?) in the sequences are A, C, G,
557T, U or N, it will be assumed that the sequences consists of nucleotides.
558If your data set contains amino acids TREE-PUZZLE suggests whether you have
559amino acids encoded on mtDNA or on nuclear DNA, and selects the appropriate
560model of amino acid evolution. If your data set contains nucleotides the
561default model of sequence evolution chosen is the HKY model. Parameters
562need not to be specified, they will be estimated by a maximum likelihood
563procedure from the data. If TREE-PUZZLE detects a usertree file stated at the
564command line or one called "intree" it automatically switches to the input
565tree mode.
566
567<P>Then, a menu (PHYLIP "look and feel") appears with default options set.
568It is possible to change all available options. For example, if you want
569to incorporate rate heterogeneity you have to select option "w" as rate
570heterogeneity is switched off by default. Then type "y" at the input prompt
571and start the analysis. You will see a number of status messages on the
572screen during computation. When the analysis is finished all output files
573(e.g., "outfile", "outtree", "outdist", "outqlist", "outlm.eps", "outpstep",
574"outptlist" or "INFILENAME.puzzle", "INFILENAME.tree", "INFILENAME.dist",
575"INFILENAME.qlist", "INFILENAME.eps", "INFILENAME.pstep", "INFILENAME.ptorder")
576will be in the same directory as the input files.
577
578<P>To obtain a high quality picture of the output tree (including node labels)
579you might want to use use the TreeView program by Roderic Page. It is
580available free of charge and runs on MacOS and MS-Windows. It can be retrieved
581from <A HREF="http://taxonomy.zoology.gla.ac.uk/rod/treeview.html">http://taxonomy.zoology.gla.ac.uk/rod/treeview.html</A>.
582TreeView understands the CLUSTAL W treefile conventions, reads multifurcating
583trees and is able to simultaneously display branch lengths and support values
584for each branch. Open the "INFILENAME.tree"/"outtree" file with TreeView,
585choose "Phylogram" to draw branch lengths, and select "Show internal edge
586labels".
587
588<P>On a Unix you can use the TreeTool program to display and
589manipulate TREE-PUZZLE trees (See <A HREF="ftp://rdp.life.uiuc.edu/pub/RDP/programs/TreeTool/">ftp://rdp.life.uiuc.edu/pub/RDP/programs/TreeTool</A>
590for precompiled Sun executables.  A version that runs on Linux has been prepared by
591<A HREF="mailto:cato@biochem.kth.se">Anders Holmberg</A> from the Dept. of Biochemistry at
592the Royal Institute of Technology, Stockholm).
593
594<H2>
595<A NAME="Models of Sequence Evolution"></A>Models of Sequence Evolution</H2>
596Here we give a brief overview over the models implemented in TREE-PUZZLE. Formulas
597are written in TeX style.
598<H3>
599<A NAME="Models of Substitution"></A>Models of Substitution</H3>
600The substitution process is modelled as reversible time homogeneous stationary
601Markov process. If the corresponding stationary nucleotide (amino acid)
602frequencies are denoted pi_i the most general rate matrix for the transition
603from nucleotide (amino acid) i to j can be written as
604<PRE>
605                |   Q_{ij} pi_j               for i != j
606       R_{ij} = |
607                | - Sum_m Q_{im} pi_m         for i == j
608</PRE>
609The matrix Q_{ij} is symmetric with Q_{ii} == 0 (diagonals are zero). For
610nucleotides the most general model built into TREE-PUZZLE is the Tamura-Nei
611model (TN, <A HREF="#tamura1993">Tamura and Nei</A>, 1993).
612The matrix Q_{ij} for this model equals
613<PRE>
614                | 4*t*gamma/(gamma+1)         for i -> j pyrimidine transition
615                |
616       Q_{ij} = | 4*t/(gamma+1)               for i -> j purine transition
617                |
618                | 1                           for i -> j transversion
619</PRE>
620The parameter gamma is called the "Y/R transition parameter" whereas t
621is the "Transition/transversion parameter". If gamma is equal to 1 we
622get the HKY model (<A HREF="#hasegawa1985">Hasegawa et al.</A>, 1985).
623Note, the ratio of the transition and transversion
624rates (without frequencies) is kappa = 2*t. There is a subtle but important
625difference between the <I>transition-transversion parameter</I>, the
626<I>expected transition-transversion ratio</I>, and the <I>observed
627transition transversion ratio</I>.
628The <I>transition-transversion parameter</I> simply is a parameter in the
629rate matrix. The <I>expected transition-transversion ratio</I> is the ratio of
630actually occurring transitions to actually occurring transversions taking
631into account nucleotide frequencies in the alignment. Due to saturation
632and multiple hits not all substitutions are observable. Thus, the <I>observed
633transition-transversion ratio</I> counts observable transitions and transversions
634only. If the base frequencies in the HKY model are homogeneous (pi_i =
6350.25) HKY further reduces to the Kimura model. In this case t is identical
636to the expected transition/transversion ratio. If t is set to 0.5 the Jukes-Cantor
637model is obtained. The F84 model (as implemented in the various PHYLIP
638programs, <A HREF="#felsenstein1984">Felsenstein</A>, 1984)
639is a special case of the Tamura-Nei model.
640
641<P>For amino acids the matrix Q_{ij} is fixed and does not contain any free
642parameters. Depending on the type of input data four different Q_{ij} matrices
643are available in TREE-PUZZLE.
644The Dayhoff (<A HREF="#dayhoff1978">Dayhoff et al.</A>, 1978) and
645JTT (<A HREF="#jones1992">Jones et al.</A>, 1992) matrices are for use with
646proteins encoded on nuclear DNA, the mtREV24 (<A HREF="#adachi1996">Adachi
647and Hasegawa</A>, 1996) matrix is for use with proteins encoded on mtDNA,
648and the BLOSUM 62 (<A HREF="#henikoff1992">Henikoff and Henikoff</A>,
6491992) and the WAG model (<A HREF="#whelan2000">Whelan and Goldman</A>)
650are for more distantly related amino acid sequences.
651The WAG matrix has been infered from a database of 3905 globular protein
652sequences, forming 182 distinct gene families spanning a broad range of
653evolutionary distances (<A HREF="#whelan2000">Whelan and Goldman</A>).
654
655The VT model is based an new estimator for amino acid replacement rates,
656the resolvent method. The VT matrix has been computed from a large set
657alignments of varying degree of divergence. Hence VT is for use with
658proteins of distant relatedness as well (<A HREF="#mueller2000">Mueller and Vingron</A>, 2000).
659
660<P>For doublets (pairs of dependent nucleotides) the SH model
661(<A HREF="#schoeniger1994">Schoeniger and von Haeseler</A>, 1994) is
662implemented in TREE-PUZZLE. The corresponding matrix Q_{ij} reads
663<PRE>
664                | 2*t         for i -> j transition substitution
665                |
666       Q_{ij} = | 1           for i -> j transversion substitution
667                |
668                | 0           for i -> j two substitutions
669</PRE>
670The SH model basically is a F81 model
671(<A HREF="#felsenstein1981">Felsenstein</A>, 1981) for single substitutions
672in doublets.
673<H3>
674<A NAME="Models of Rate Heterogeneity"></A>Models of Rate Heterogeneity</H3>
675Rate heterogeneity is taken into account by considering invariable sites
676and by introducing Gamma-distributed rates for the variable sites.
677
678<P>For invariable sites the parameter theta ("Fraction of invariable sites")
679determines the probability of a given site to be invariable. If a site
680is invariable the probability for the constant site patterns is pi_i, the
681frequency of each nucleotide (amino acid).
682
683<P>The rates r for variable sites are determined by a discrete Gamma
684distribution that approximates the continuous Gamma distribution
685<PRE>
686                    alpha     alpha-1
687               alpha         r
688       g(r) = ------------------------
689                alpha r
690               e        Gamma(alpha)
691</PRE>
692where the parameter alpha ranges from alpha = infinity (no rate heterogeneity)
693to alpha &lt; 1 (strong heterogeneity). The mean expectation of r under this
694distribution is 1.
695
696<P>A mixed model of rate heterogeneity (Gamma plus invariable sites)
697is also available.  In this case the total rate heterogeneity rho
698(as defined by <A HREF="#gu1995">Gu et al.</A>, 1995) computes as rho = (1+theta
699alpha)/(1+alpha).
700
701<H2>
702<A NAME="Options Available"></A>Available Options</H2>
703All options can be selected and changed after TREE-PUZZLE has read the input
704file. Depending on the input files options are preselected and displayed
705in a menu ("PHYLIP look and feel"):
706<PRE>
707GENERAL OPTIONS
708 b                     Type of analysis?  Tree reconstruction
709 k                Tree search procedure?  Quartet puzzling
710 v       Approximate quartet likelihood?  No
711 u             List unresolved quartets?  No
712 n             Number of puzzling steps?  1000
713 j             List puzzling step trees?  No
714 o                  Display as outgroup?  Gibbon
715 z     Compute clocklike branch lengths?  No
716 e                  Parameter estimates?  Approximate (faster)
717 x            Parameter estimation uses?  Neighbor-joining tree
718SUBSTITUTION PROCESS
719 d          Type of sequence input data?  Nucleotides
720 m                Model of substitution?  HKY (Hasegawa et al. 1985)
721 t    Transition/transversion parameter?  Estimate from data set
722 f               Nucleotide frequencies?  Estimate from data set
723RATE HETEROGENEITY
724 w          Model of rate heterogeneity?  Uniform rate
725
726Quit [q], confirm [y], or change [menu] settings:
727</PRE>
728By typing the letters shown in the menu you can either change settings
729or enter new parameters. Some options (for example "m" and "w") can be
730invoked several times to switch through a number of different settings.
731The parameters of the models of sequence evolution can be estimated from
732the data by a variety of procedures based on maximum likelihood. The analysis
733is started by typing "y" at the input prompt. To quit the program
734type "q".
735
736<P>The following table lists in alphabetical order all TREE-PUZZLE options.
737Be aware, however, not all of them are accessible at the same time:
738<TABLE CELLPADDING=2 >
739<TR VALIGN=TOP>
740<TD>
741<CENTER><B>Option</B></CENTER>
742</TD>
743<TD>
744<CENTER><B>Description</B></CENTER>
745</TD>
746</TR>
747
748<TR VALIGN=TOP>
749<TD>
750<CENTER>a</CENTER>
751</TD>
752<TD>Gamma rate heterogeneity parameter alpha. This is the so-called shape
753parameter of the Gamma distribution.</TD>
754</TR>
755
756<TR VALIGN=TOP>
757<TD>
758<CENTER>b</CENTER>
759</TD>
760<TD>Type of analysis. Allows to switch between tree reconstruction by maximum
761likelihood and likelihood mapping.</TD>
762</TR>
763
764<TR VALIGN=TOP>
765<TD>
766<CENTER>c</CENTER>
767</TD>
768<TD>Number of rate categories (4-16) for the discrete Gamma distribution
769(rate heterogeneity).</TD>
770</TR>
771
772<TR VALIGN=TOP>
773<TD>
774<CENTER>d</CENTER>
775</TD>
776<TD>Data type. Specifies whether nucleotide, amino acid sequences, or
777two-state data serve as input. The default is automatically set by
778inspection of the input data.
779After TREE-PUZZLE has selected an appropriate data type (marked by 'Auto:')
780the 'd'-option changes the type in the following order:
781selected type -> Nucleotides -> Amino acids -> automatically selected type.</TD>
782</TR>
783
784<TR VALIGN=TOP>
785<TD>
786<CENTER>e</CENTER>
787</TD>
788<TD>Approximation option. Determines whether an approximate or the exact
789likelihood function is used to estimate parameters of the models of sequence
790evolution. The approximate likelihood function is in most cases sufficient
791and is faster.</TD>
792</TR>
793
794<TR VALIGN=TOP>
795<TD>
796<CENTER>f</CENTER>
797</TD>
798<TD>Base frequencies. The maximum likelihood calculation needs the frequency
799of each nucleotide (amino acid, doublet) as input. TREE-PUZZLE estimates these
800values from the sequence input data. This option allows specification of
801other values.</TD>
802</TR>
803
804<TR VALIGN=TOP>
805<TD>
806<CENTER>g</CENTER>
807</TD>
808<TD>Group sequences in clusters. Allows to define clusters of sequences
809as needed for the likelihood mapping analysis. Only available when likelihood
810mapping is selected ("b" option).</TD>
811</TR>
812
813<TR VALIGN=TOP>
814<TD>
815<CENTER>h</CENTER>
816</TD>
817<TD>Codon positions or definition of doublets. For nucleotide data only.
818If the TN or HKY model of substitution is used and the number of sites
819in the alignment is a multiple of three the analysis can be restricted
820to each of the three codon positions and to the 1st and 2nd positions.
821If the SH model is used this options allows to specify that the 1st and
8222nd codon positions in the alignment define a doublet.</TD>
823</TR>
824
825<TR VALIGN=TOP>
826<TD>
827<CENTER>i</CENTER>
828</TD>
829<TD>Fraction of invariable sites. Probability of a site to be invariable.
830This parameter can be estimated from the data by TREE-PUZZLE
831(only if the approximation option for the likelihood function is
832turned off).</TD>
833</TR>
834
835<TR VALIGN=TOP>
836<TD>
837<CENTER>j</CENTER>
838</TD>
839<TD>List puzzling steps trees. Writes all intermediate trees (puzzling
840step trees) used to compute the quartet puzzling tree into a file, either
841as a list of topologies ordered by number of occurrences (*.ptorder), or
842as list about the chronological occurrence of the topologies (*.pstep), or
843both.</TD>
844</TR>
845
846<TR VALIGN=TOP>
847<TD>
848<CENTER>k</CENTER>
849</TD>
850<TD>Tree search. Determines how the overall tree is obtained. The topology
851is either computed with the quartet puzzling algorithm or is defined by
852the user. Maximum likelihood branch lengths will be computed for this tree.
853Alternatively, a maximum likelihood distance matrix only can also be computed
854(no overall tree). </TD>
855</TR>
856
857<TR VALIGN=TOP>
858<TD>
859<CENTER>l</CENTER>
860</TD>
861<TD>Location of root. Only for computation of clock-like maximum likelihood
862branch lengths. Allows to specify the branch where the root should be placed
863in an unrooted tree topology. For example, in the tree (a,b,(c,d)) l =
8641 places the root at the branch leading to sequence a whereas l=5 places
865the root at the internal branch.</TD>
866</TR>
867
868<TR VALIGN=TOP>
869<TD>
870<CENTER>m</CENTER>
871</TD>
872<TD>Model of substitution. The following models are implemented for nucleotides:
873the <A HREF="#tamura1993">Tamura-Nei</A> (TN) model,
874the <A HREF="#hasegawa1985">Hasegawa et al.</A> (HKY) model, and
875the <A HREF="#schoeniger1994">Schoeniger &amp; von Haeseler</A> (SH) model.
876The SH model describes the evolution of
877pairs of dependent nucleotides (pairs are the first and the second nucleotide,
878the third and the fourth nucleotide and so on). It allows for specification
879of the transition-transversion ratio. The original model
880(<A HREF="#schoeniger1994">Schoeniger &amp; von Haeseler</A>, 1994)
881is obtained by setting the transition-transversion parameter to 0.5.
882The <A HREF="#jukes1969">Jukes-Cantor</A> (1969),
883the <A HREF="#felsenstein1981">Felsenstein</A> (1981), and
884the <A HREF="#kimura1980">Kimura</A> (1980) model are all special cases of
885the HKY model.
886<BR>For amino acid sequence data
887the <A HREF="#dayhoff1978">Dayhoff et al.</A> (Dayhoff) model,
888the <A HREF="#jones1992">Jones et al.</A> (JTT) model,
889the <A HREF="#adachi1996">Adachi and Hasegawa</A> (mtREV24) model,
890the <A HREF="#henikoff1992">Henikoff and Henikoff</A> (BLOSUM 62),
891the <A HREF="#mueller2000">Mueller and Vingron</A> (VT), and
892the <A HREF="#whelan2000">Whelan and Goldman</A> (WAG) substitution
893model are implemented in TREE-PUZZLE.
894The mtREV24 model describes the evolution of amino acids encoded on mtDNA,
895and BLOSUM 62 is for distantly related amino acid sequences, as well as the
896VT model.
897After TREE-PUZZLE has selected an appropriate amino acid substitution model
898(marked by 'Auto:') the 'm'-option changes the model in the following order:
899selected model -> Dayhoff -> JTT -> mtREV24 -> BLOSUM62 -> VT -> WAG ->
900automatically selected model
901<BR>For more information
902please read the section in this manual about models of sequence evolution.
903See also option "w" (model of rate heterogeneity).</TD>
904</TR>
905
906<TR VALIGN=TOP>
907<TD>
908<CENTER>n</CENTER>
909</TD>
910<TD>If tree reconstruction is selected: number of puzzling steps. Parameter
911of the quartet puzzling tree search. Generally,
912the more sequences are used the more puzzling steps are advised. The default
913value varies depending on the number of sequences (at least 1000).<br>
914
915If likelihood mapping is selected: number of quartets in a likelihood mapping analysis. Equal to the number
916of dots in the likelihood mapping diagram. By default 10000 dots/quartets
917are assumed. To use all possible quartets in clustered likelihood mapping
918you have to specify a value of n=0.
919</TD>
920</TR>
921
922<TR VALIGN=TOP>
923<TD>
924<CENTER>o</CENTER>
925</TD>
926<TD>Outgroup. For displaying purposes of the unrooted quartet puzzling
927tree only. The default outgroup is the first sequence of the data set.</TD>
928</TR>
929
930<TR VALIGN=TOP>
931<TD>
932<CENTER>p</CENTER>
933</TD>
934<TD>Constrain the TN model to the F84 model. This option is only available
935for the Tamura-Nei model. With this option the expected (!) transition-transversion
936ratio for the F84 model have to be entered and TREE-PUZZLE computes the corresponding
937parameters of the TN model (this depends on base frequencies of the data).
938This allows to compare the results of TREE-PUZZLE and the PHYLIP maximum likelihood
939programs which use the F84 model.
940</TD>
941</TR>
942
943<TR VALIGN=TOP>
944<TD>
945<CENTER>q</CENTER>
946</TD>
947<TD>Quits analysis.</TD>
948</TR>
949
950<TR VALIGN=TOP>
951<TD>
952<CENTER>r</CENTER>
953</TD>
954<TD>Y/R transition parameter. This option is only available for the TN
955model. This parameter is the ratio of the rates for pyrimidine transitions
956and purine transitions. You do not need to specify this parameter as TREE-PUZZLE
957estimates it from the data. For precise definition please read the section
958in this manual about models of sequence evolution.</TD>
959</TR>
960
961<TR VALIGN=TOP>
962<TD>
963<CENTER>s</CENTER>
964</TD>
965<TD>Symmetrize doublet frequencies. This option is only available for the
966SH model. With this option the doublet frequencies are symmetrized. For
967example, the frequencies of "AT" and "TA" are then set to the average of both
968frequencies.</TD>
969</TR>
970
971<TR VALIGN=TOP>
972<TD>
973<CENTER>t</CENTER>
974</TD>
975<TD>Transition/transversion parameter. For nucleotide data only. You do not
976need to specify this parameter as TREE-PUZZLE estimates it from the data. The
977precise definition of this parameter is given in the section on models
978of sequence evolution in this manual.</TD>
979</TR>
980
981<TR VALIGN=TOP>
982<TD>
983<CENTER>u</CENTER>
984</TD>
985<TD>Show unresolved quartets. During the quartet puzzling tree search TREE-PUZZLE
986counts the number of unresolved quartet trees. An unresolved quartet is
987a quartet where the maximum likelihood values for each of the three possible
988quartet topologies are so similar that it is not possible to prefer one
989of them (<A HREF="#strimmer1997">Strimmer, Goldman, and von Haeseler</A>, 1997).
990If this option is selected you will get a detailed list of all starlike
991quartets. Note, for some data
992sets there may be a lot of unresolved quartets. In this case a list of
993all unresolved quartets is probably not very useful and also needs a lot
994of disk space.</TD>
995</TR>
996
997<TR VALIGN=TOP>
998<TD>
999<CENTER>v</CENTER>
1000</TD>
1001<TD>Approximate quartet likelihood. For the quartet puzzling tree search
1002only. Only for very small data sets it is necessary to compute an exact
1003maximum likelihood. For larger data sets this option should always be turned
1004on.</TD>
1005</TR>
1006
1007<TR VALIGN=TOP>
1008<TD>
1009<CENTER>w</CENTER>
1010</TD>
1011<TD>Model of rate heterogeneity. TREE-PUZZLE provides several different models
1012of rate heterogeneity: uniform rate over all sites (rate homogeneity),
1013Gamma distributed rates, two rates (1 invariable + 1 variable), and a mixed
1014model (1 invariable rate + Gamma distributed rates). All necessary parameters
1015can be estimated by TREE-PUZZLE. Note that whenever invariable sites are taken
1016into account the parameter estimation will invoke the "e" option to use
1017an exact likelihood function. For more detailed information please read
1018the section in this manual about models of sequence evolution. See also
1019option "m" (model of substitution).</TD>
1020</TR>
1021
1022<TR VALIGN=TOP>
1023<TD>
1024<CENTER>x</CENTER>
1025</TD>
1026<TD>Selects the methods used in the estimation of the model parameters.
1027Neighbor-joining tree means that a NJ tree is used to estimate the parameters.
1028Quartet sampling means that a number of random sets of four sequences are
1029selected to estimate parameters.</TD>
1030</TR>
1031
1032<TR VALIGN=TOP>
1033<TD>
1034<CENTER>y</CENTER>
1035</TD>
1036<TD>Starts analysis.</TD>
1037</TR>
1038
1039<TR VALIGN=TOP>
1040<TD>
1041<CENTER>z</CENTER>
1042</TD>
1043<TD>Computation of clock-like maximum likelihood branch lengths. This option
1044also invokes the likelihood ratio clock test.</TD>
1045</TR>
1046</TABLE>
1047
1048<H2>
1049<A NAME="Other Features"></A>Other Features</H2>
1050For nucleotide data TREE-PUZZLE computes the expected transition/transversion
1051ratio and the expected pyrimidine transition/purine transition ratio
1052corresponding to the selected model. Base frequencies play an important
1053role in the calculation of both numbers.
1054
1055<P>TREE-PUZZLE also tests with a 5% level chi-square-test whether the base composition
1056of each sequence is identical to the average base composition of the whole
1057alignment. All sequences with deviating composition are listed in the TREE-PUZZLE
1058report file. It is desired that no sequence (possibly except for the outgroup)
1059has a deviating base composition. Otherwise a basic assumption implicit
1060in the maximum likelihood calculation is violated.
1061
1062<P>A hidden feature of TREE-PUZZLE (since version 2.5) is the employment of
1063a weighting scheme of quartets (<A HREF="#strimmer1997">Strimmer, Goldman,
1064and von Haeseler</A>, 1997) in the quartet puzzling tree search.
1065
1066<P>TREE-PUZZLE also computes the average distance between all pairs of sequences
1067(maximum likelihood distances). The average distances can be viewed as
1068a rough measure for the overall sequence divergence.
1069
1070<P>If more than one input tree is provided TREE-PUZZLE uses the
1071<A HREF="#kishino1989">Kishino-Hasegawa</A> test (1989) to check which
1072trees are significantly worse than the best tree.
1073
1074<P>If clock-like maximum-likelihood branch lengths are computed TREE-PUZZLE
1075checks with the help of a likelihood-ratio test
1076(<A HREF="#felsenstein1988">Felsenstein</A>, 1988) whether
1077the data set is clock-like.
1078
1079<P>TREE-PUZZLE also detects sequences that occur more than once in the data
1080and that therefore can be removed from the data set to speed up analysis.
1081
1082<P>If rate heterogeneity is taken into account in the analysis TREE-PUZZLE also
1083computes the most probable assignment of rate categories to sequence positions,
1084according <A HREF="#felsenstein1996">Felsenstein and Churchill</A> (1996).
1085
1086<H2>
1087<A NAME="Interpretation and Hints"></A>Interpretation and Hints</H2>
1088
1089<H3>
1090<A NAME="Quartet Puzzling Support Values"></A>Quartet Puzzling Support
1091Values</H3>
1092The quartet puzzling (QP) tree search estimates support values for each
1093internal branch. They can be interpreted in much the same way as
1094bootstrap values (though they should not be confused with them).
1095Branches showing a QP reliability from 90% to 100% can be considered
1096very strongly supported. Branches with lower reliability (> 70%) can
1097in principle be also trusted but in this case it is advisable to
1098check how well the respective internal branch does in comparison to other
1099branches in the tree (i.e. check relative reliability).
1100If you are interested in a branch with a low confidence it is also
1101important to check the alternative groupings that are not included
1102in the QP tree (they are listed in the TREE-PUZZLE report file in *.** format).
1103There should be a substantial gap between the lowest reliability
1104value of the QP tree and
1105the most frequent grouping that is not included in the QP tree.
1106<H3>
1107<A NAME="Percentage of Unresolved Quartets"></A>Percentage of Unresolved
1108Quartets</H3>
1109TREE-PUZZLE computes the number and the percentage of completely unresolved
1110maximum likelihood quartets. An unresolved quartet is a quartet where the
1111maximum likelihood values for each of the three possible quartet topologies
1112are so similar that it is not possible to prefer one of them
1113(<A HREF="#strimmer1997">Strimmer, Goldman, and von Haeseler</A>, 1997).
1114The percentage of the unresolved quartets
1115among all possible quartets is an indicator of the suitability of the data
1116for phylogenetic analysis. A high percentage usually results in a highly
1117multifurcating quartet puzzling tree. If you only have a few unresolved
1118quartets we recommend to invoke option "u" to get a list of all these quartets.
1119In a likelihood mapping analysis the percentage of completely unresolved
1120quartets is shown in the central region of the triangle diagram.
1121<H3>
1122<A NAME="Automatic Parameter Estimation"></A>Automatic Parameter Estimation</H3>
1123TREE-PUZZLE estimates both the parameters of the models of substitution (TN,
1124HKY) and of the model of rate variation (Gamma distribution, fraction of
1125invariable sites) without prior knowledge of an overall tree by a number
1126of different strategies based on maximum likelihood. For all estimated
1127parameters a corresponding standard error (S.E.) is computed. If you have
1128good arguments to choose a different set of parameters than the values
1129obtained by TREE-PUZZLE don't hesitate to use them. If sequences are extremely
1130similar it is very hard for every algorithm to extract information about
1131the model of substitution from the data set. Also, be careful if the
1132estimated parameter values
1133are very close to the internal upper and lower bounds:
1134<TABLE CELLPADDING=2 >
1135<TR VALIGN=TOP>
1136<TD><B>Parameter (Symbol)</B> </TD>
1137
1138<TD><B>Minimal Value</B> </TD>
1139
1140<TD><B>Maximal Value</B> </TD>
1141</TR>
1142
1143<TR VALIGN=TOP>
1144<TD>Transition/transversion parameter (t) </TD>
1145
1146<TD>0.20 </TD>
1147
1148<TD>30.00 </TD>
1149</TR>
1150
1151<TR VALIGN=TOP>
1152<TD>Y/R transition parameter (gamma) </TD>
1153
1154<TD>0.10 </TD>
1155
1156<TD>6.00 </TD>
1157</TR>
1158
1159<TR VALIGN=TOP>
1160<TD>Fraction of invariable sites (theta) </TD>
1161
1162<TD>0.00 </TD>
1163
1164<TD>0.99 </TD>
1165</TR>
1166
1167<TR VALIGN=TOP>
1168<TD>Gamma rate heterogeneity parameter (alpha) </TD>
1169
1170<TD>0.01 </TD>
1171
1172<TD>99 </TD>
1173</TR>
1174</TABLE>
1175
1176<H3>
1177<A NAME="Likelihood Mapping"></A>Likelihood Mapping</H3>
1178Likelihood mapping (<A HREF="#strimmer1997">Strimmer and von Haeseler</A>,
11791997) is a method to analyzethe support for internal branches in a tree
1180without having to compute an overall tree.
1181Every internal branch in an a completely resolved tree defines
1182up to four clusters of sequences. Sometimes only the relationship of these
1183groups are of interest and not details of the structure of the clusters
1184themselves. Then a likelihood mapping analysis is sufficient.
1185The corresponding likelihood mapping triangle diagrams (as contained in
1186various output files generated by TREE-PUZZLE) will
1187illucidate the possible relationships in detail.
1188
1189<H3><A NAME="Batch Mode"></A>Batch Mode</H3>
1190Running TREE-PUZZLE from a Unix batch file is straightforward despite the lack
1191of command switches. For example, to run TREE-PUZZLE with a the transition/transversion
1192parameter equal to 10 the following lines in a batch file are sufficient:
1193<PRE>
1194puzzle &lt;&lt; !
1195t
119610
1197y
1198!
1199</PRE>
1200All other parameters can also be accessed the same way.
1201
1202<H2>
1203<A NAME="Limits and Error Messages"></A>Limits and Error Messages</H2>
1204TREE-PUZZLE has a built-in limit to allow data sets only up to 257 sequences
1205in order to avoid overflow of internal integer variables. At least 32767
1206sites should be possible depending on the compiler used. Computation time
1207will be the largest constraint even if sufficient computer memory is available.
1208If rate heterogeneity is taken into account every additional category slows
1209down the overall computation by the amount of time needed for one complete
1210run assuming rate homogeneity.
1211
1212<P>If problems are encountered TREE-PUZZLE terminates program execution and
1213returns a plain text error message. Depending on the severity errors can be
1214classified into three groups:
1215<TABLE CELLPADDING=2 >
1216<TR VALIGN=TOP>
1217<TD>"HALT " errors: </TD>
1218
1219<TD>Very severe. You should never ever see one of these messages. If so,
1220please contact the developers! </TD>
1221</TR>
1222
1223<TR VALIGN=TOP>
1224<TD>"Unable to proceed" errors: </TD>
1225
1226<TD>Harmless but annoying. Mostly memory errors (not enough RAM) or problems
1227with the format of the input files. </TD>
1228</TR>
1229
1230<TR VALIGN=TOP>
1231<TD>Other errors: </TD>
1232
1233<TD>Completely uncritical. Occur mostly when options of TREE-PUZZLE are being
1234set. </TD>
1235</TR>
1236</TABLE>
1237A standard machine (1996 Unix workstation) with 32 to 64 MB RAM TREE-PUZZLE
1238can easily do maximum likelihood tree searches including estimation of
1239support values for data sets with 50-100 sequences. As likelihood mapping
1240is not memory consuming and computationally quite fast it can be applied
1241to large data sets as well.
1242<H2>
1243<A NAME="Are Quartets Reliable"></A>Are Quartets Reliable?</H2>
1244Quartets may be intrinsically one of the most difficult phylogenies to
1245resolve accurately (cf. <A HREF="#hillis1996">Hillis</A>, 1996).
1246It has been asked whether this is
1247a problem for quartet puzzling because it works with quartets.
1248
1249<P>However, this is not true. According to Hillis' findings
1250(<A HREF="#hillis1996">Hillis</A>, 1996),
1251quartets can be hard, but extra information helps. That is, if all you
1252have are data on species (A, B, C, D) then it might be relatively difficult
1253to find the correct tree for them. But if you have additional data (species
1254E, F, G, ...) and try to find a tree for all the species, then that part
1255of the tree relating (A, B, C, D) will more likely be correct than if you
1256had just the data for (A, B, C, D). In Hillis' big 'model' tree, there
1257are many examples of subsets of 4 species which in themselves might be
1258hard to resolve correctly, but which are correctly resolved thanks to the
1259(...large amount of...) additional data. TREE-PUZZLE (quartet puzzling) also
1260gains advantage from extra data in the same way. It's 'understanding' or
1261resolution of the quartet (A, B, C, D) might be incorrect, but the information
1262on the relationships of (A, B, C, D) implicit in its treatment of (A, B,
1263C, E), (A, B, E, D), (A, E, C, D), (E, B, C, D), (A, B, C, F), (A, B, F,
1264D), (A, F, C, D), (F, B, C, D), (A, B, C, G), etc. etc. should overcome
1265this problem.
1266
1267<P>The facts about how well TREE-PUZZLE actually works have been investigated
1268in the <A HREF="#strimmer1996">Strimmer and von Haeseler</A> (1996) and
1269<A HREF="#strimmer1997">Strimmer, Goldman, and von Haeseler</A> (1997) papers.
1270Their results cannot be altered by Hillis' findings.
1271Considered as a heuristic search for maximum likelihood trees, quartet
1272puzzling works very well.
1273
1274<P>(This section follows N. Goldman, personal communication).
1275<H2>
1276<A NAME="Other Programs"></A>Other Programs</H2>
1277There are a number of other very useful and widespread programs to reconstruct
1278phylogenetic relationships and to analyse molecular sequence data that
1279are available free of charge. Here are the URLS of some web pages that
1280provide links to most of them (including the PHYLIP package and
1281the MOLPHY and PAML maximum likelihood programs):
1282<DL>
1283
1284<DD>
1285Joe Felsenstein's list of programs (well-organized and pretty exhaustive):<br>
1286<A
1287HREF="http://evolution.genetics.washington.edu/phylip/software.html">http://evolution.genetics.washington.edu/phylip/software.html</A></DD>
1288
1289
1290<DD>
1291"Tree of Life" software page:<br>
1292<A HREF="http://phylogeny.arizona.edu/tree/programs/programs.html">http://phylogeny.arizona.edu/tree/programs/programs.html</A></DD>
1293
1294
1295<DD>
1296European Bioinformatics Institute:<br>
1297<A HREF="http://www.ebi.ac.uk/biocat/biocat.html">http://www.ebi.ac.uk/biocat/biocat.html</A></DD>
1298
1299</DL>
1300
1301<H2>
1302<A NAME="Acknowledgements"></A>Acknowledgements</H2>
1303The maximum likelihood kernel of TREE-PUZZLE is an offspring of the program
1304NucML/ProtML version 2.2 by Jun Adachi and Masami Hasegawa (<A HREF="ftp://sunmh.ism.ac.jp/pub/molphy">ftp://sunmh.ism.ac.jp/pub/molphy</A>).
1305We thank them for generously allowing us to use the source code of their
1306program.
1307We would also like to thank
1308the <A HREF="http://www.ebi.ac.uk">European Bioinformatics Institute (EBI)</A>,
1309the <A HREF="http://www.pasteur.fr">Institut Pasteur</A>,
1310and the <A HREF="http://www.indiana.edu">University of Indiana</A> 
1311(i.e. Don Gilbert)
1312for kindly distributing the TREE-PUZZLE program.
1313
1314We thank Stephane Bortzmeyer for his with debugging of
1315<EM>floating point exception</EM> errors.
1316
1317We also thank Peter Foster for pointing out the inconsistency
1318in the invariable site models in respect to other programs.
1319
1320Finally we thank the
1321<A HREF="http://www.dfg.de">Deutsche Forschungsgemeinschaft</A> 
1322(VI 160/3-1 and Ha 1628/4-1) and the Max-Planck-Society
1323for financial support.
1324
1325<H2><A NAME="References"></A>References</H2>
1326
1327<A NAME="adachi1996"></A>
1328Adachi, J., and M. Hasegawa. 1996. MOLPHY: programs for molecular phylogenetics,
1329version 2.3. Institute of Statistical Mathematics, Tokyo.
1330
1331<P><A NAME="adachi1996"></A>
1332Adachi, J., and M. Hasegawa. 1996. Model of amino acid substitution
1333in proteins encoded by mitochondrial DNA. <I>J. Mol. Evol.</I> <B>42</B>:
1334459-468.
1335
1336<P><A NAME="dayhoff1978"></A>
1337Dayhoff, M. O., R. M. Schwartz, and B. C. Orcutt. 1978. A model of evolutionary
1338change in proteins. In: Dayhoff, M. O. (ed.) Atlas of Protein Sequence
1339Structure, Vol. 5, Suppl. 3. National Biomedical Research Foundation, Washington
1340DC, pp. 345-352.
1341
1342<P><A NAME="felsenstein1981"></A>
1343Felsenstein, J. 1981. Evolutionary trees from DNA sequences: A maximum
1344likelihood approach. <I>J. Mol. Evol.</I> <B>17</B>: 368-376.
1345
1346<P><A NAME="felsenstein1984"></A>
1347Felsenstein, J. 1984. Distance methods for inferring phylogenies:
1348A Justification. <I>Evolution</I> <B>38</B>: 16-24.
1349
1350<P><A NAME="felsenstein1988"></A>
1351Felsenstein, J. 1988. Phylogenies from molecular sequences: Inference
1352and reliability. <I>Annu. Rev. Genet.</I> <B>22</B>: 521-565.
1353
1354<P><A NAME="felsenstein1993"></A>
1355Felsenstein, J. 1993. PHYLIP (Phylogeny Inference Package) version 3.5c.
1356Distributed by the author. Department of Genetics, University of Washington,
1357Seattle.
1358
1359<P><A NAME="felsenstein1996"></A>
1360Felsenstein, J., and G.A. Churchill. 1996. A hidden Markov model approach
1361to variation among sites in rate of evolution. <I>Mol. Biol. Evol.</I>
1362<B>13</B>: 93-104.
1363
1364<P><A NAME="gropp1998"></A>
1365Gropp, W., S. Huss-Lederman, A. Lumsdaine, E. Lusk, B. Nitzberg,
1366W. Saphir, and M. Snir. 1998. MPI - The Complete Reference: Volume 2,
1367The MPI Extensions. 2nd Edition, The MIT Press, Cambridge, MA.
1368
1369<P><A NAME="gu1995"></A>
1370Gu, X., Y.-X. Fu, and W.-H. Li. 1995. Maximum likelihood estimation
1371of the heterogeneity of substitution rate among nucleotide sites. <I>Mol.
1372Biol. Evol.</I> <B>12</B>: 546-557.
1373
1374<P><A NAME="hasegawa1985"></A
1375>Hasegawa, M., H. Kishino, and K. Yano. 1985. Dating of the human-ape
1376splitting by a molecular clock of mitochondrial DNA. <I>J. Mol. Evol.</I>
1377<B>22</B>: 160-174.
1378
1379<P><A NAME="henikoff1992"></A>
1380Henikoff, S., J. G. Henikoff. 1992. Amino acid substitution matrices
1381from protein blocks. <I>PNAS (USA)</I> <B>89</B>:10915-10919.
1382
1383<P><A NAME="hillis1996"></A>
1384Hillis, D. M. 1996. Inferring complex phylogenies. <I>Nature</I> 
1385<B>383</B>:130-131.
1386
1387<P><A NAME="jukes1969"></A>
1388Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules.
1389In: Munro, H. N. (ed.) Mammalian Protein Metabolism, New York: Academic
1390Press, pp. 21-132.
1391
1392<P><A NAME="jones1992"></A>
1393Jones, D. T., W. R. Taylor, and J. M. Thornton. 1992. The rapid generation
1394of mutation data matrices from protein sequences. <I>CABIOS</I> <B>8</B>:
1395275-282.
1396
1397<P><A NAME="kimura1980"></A>
1398Kimura, M. 1980. A simple method for estimating evolutionary rates of
1399base substitutions through comparative studies of nucleotide sequences.
1400<I>J. Mol. Evol.</I> <B>16</B>: 111-120.
1401
1402<P><A NAME="kishino1989"></A>
1403Kishino, H., and M. Hasegawa. 1989. Evaluation of the maximum likelihood
1404estimate of the evolutionary tree topologies from DNA sequence data, and
1405the branching order in Hominoidea. <I>J. Mol. Evol.</I> <B>29</B>: 170-179.
1406
1407<P><A NAME="mueller2000"></A>
1408Mueller, T., and M. Vingron. 2000. Modeling Amino Acid Replacement.
1409<I>J. Comp. Biol.</I>, to appear
1410(<A HREF="http://www.dkfz-heidelberg.de/tbi/people/tmueller/paper/paper.ps">preprint of the article</A>)
1411
1412<P><A NAME="saitou1987"></A>
1413Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method
1414for reconstructing phylogenetic trees. <I>Mol. Biol. Evol.</I> <B>4</B>:
14151406-425.
1416
1417<P><A NAME="schoeniger1994"></A>
1418Schoeniger, M., and A. von Haeseler. 1994. A stochastic model for
1419the evolution of autocorrelated DNA sequences. <I>Mol. Phyl. Evol.</I>
1420<B>3</B>: 240-247.
1421
1422<P><A NAME="snir1998"></A>
1423Snir, M., S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra.
14241998. MPI - The Complete Reference: Volume 1, The MPI Core. 2nd Edition,
1425The MIT Press, Cambridge, MA.
1426
1427<P><A NAME="strimmer1996"></A>
1428Strimmer, K., and A. von Haeseler. 1996. Quartet puzzling: a quartet
1429maximum likelihood method for reconstructing tree topologies. <I>Mol. Biol.
1430Evol.</I> <B>13</B>: 964-969.
1431
1432<P><A NAME="strimmer1997"></A>
1433Strimmer, K., N. Goldman, and A. von Haeseler. 1997. Bayesian probabilities
1434and quartet puzzling. <I>Mol. Biol. Evol.</I> <B>14</B>: 210-211.
1435
1436<P><A NAME="strimmer1997"></A>
1437Strimmer, K., and A. von Haeseler. 1997. Likelihood-mapping: a simple
1438method to visualize phylogenetic content of a sequence alignment. <I>PNAS
1439(USA).</I> <B>94</B>:6815-6819.
1440
1441<P><A NAME="tamura1993"></A>
1442Tamura, K., and M. Nei. 1993. Estimation of the number of nucleotide
1443substitutions in the control region of mitochondrial DNA in humans and
1444chimpanzees. <I>Mol. Biol. Evol.</I> <B>10</B>: 512-526.
1445
1446<P><A NAME="tamura1994"></A>
1447Tamura K. 1994. Model selection in the estimation of the number of
1448nucleotide substitutions. <I>Mol. Biol. Evol.</I> <B>11</B>: 154-157.
1449
1450<P><A NAME="thompson1994"></A>
1451Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: Improving
1452the sensitivity of progressive multiple sequence alignment through sequence
1453weighting, positions-specific gap penalties and weight matrix choice. <I>Nucl.
1454Acids Res.</I> <B>22</B>: 4673-4680.
1455
1456<P><A NAME="whelan2000"></A>
1457Whelan, S. and Goldman, N. 2000. A new empirical model of
1458amino acid evolution. <I>Manuscript in prep.</I>
1459
1460<P><A NAME="yang1994"></A>
1461Yang, Z. 1994. Maximum likelihood phylogenetic estimation from DNA sequences
1462with variable rates over sites: approximate methods. <I>J. Mol. Evol.</I>
1463<B>39</B>:306-314.
1464
1465
1466<H2>
1467<A NAME="Known Bugs"></A>Known Bugs</H2>
1468
1469On Alpha based computers sometimes <EM>floating point exception</EM>
1470errors occur. Some of those result on a bug in the malloc routine
1471in the system routines of the Compaq operating system. We recomend
1472to use the GNU cc compiler
1473(<TT><A HREF="http://egcs.gnu.org">http://egcs.gnu.org</A></TT>),
1474which does not use the system malloc routine.
1475
1476For other occurrances of the <EM>floating point exception</EM>
1477we need datasets and information about the operating system
1478to reproduce and debug those errors.
1479
1480<H2>
1481<A NAME="Version History"></A>Version History</H2>
1482The TREE-PUZZLE program has first been distributed in 1995 under the name
1483PUZZLE. Since then it has
1484been continually improved. Here is a list of the most important changes.
1485<TABLE CELLPADDING=2 >
1486
1487<TR VALIGN=TOP>
1488<TD>5.0</TD>
1489
1490<TD>Puzzle tree reconstruction part parallelized using the MPI standard
1491(Message Passing Interface).
1492<BR>Possibility added to give input file and user tree file at the command line.
1493Output files renamed to the form PREFIX.EXTENSION, where PREFIX is the
1494input file name or, if used, the user tree file name.
1495The EXTENSION could be one of the following: puzzle (PUZZLE report),
1496tree (tree file), dist (ML distance file), eps (likelihood mapping output
1497in eps format), qlist (bad quartets), qstep (puzzling step tree IDs as they
1498occur in the analysis), or qtorder (sorted unique list of puzzling step trees).
1499<BR>The likelihood value is added to the treefile as a leading comment
1500("[ lh=x.xxx ]") to the tree string.
1501<BR>VT (variable time) matrix (<A HREF="#mueller2000">Mueller and
1502Vingron</A>, 2000) and WAG matrix (<A HREF="#whelan2000">Whelan and
1503Goldman</A>, 2000)
1504added to the AA substitution models.
1505<BR>The Data type and AA-model options in the menu now show the
1506automatically set type/model first. These can now be changed using 'd' or
1507'm' key in an order independent from the type/model selected. This makes
1508it possible to select a desired AA substitution model or data type by
1509piping letters to the standard input without knowing PUZZLE's preselection.
1510<BR>Parameters are written to file when estimated before evaluation of
1511the quartets.
1512<BR>The inconsistency to respect to other programs in handling
1513invariable sites has been fixed.
1514<BR>Some minor bug fixes (e.g. the clockbug and another in the optimization
1515routine have been fixed).
1516</TD>
1517</TR>
1518
1519<TR VALIGN=TOP>
1520<TD>4.0.2</TD>
1521
1522<TD>Update to provide precompiled Windows 95/98/NT executables. In addition:
1523Internal rearrangement of rate matrices.
1524Improved BLOSUM 62 matrix.  Endless input loop for input
1525files restricted to 10 trials. 
1526Source code clean up to remove compile time warnings.
1527Explicit quit option in menu. Changes in NJ tree code.
1528Updates of documentation (address changes, correction of errors).
1529</TD>
1530</TR>
1531
1532<TR VALIGN=TOP>
1533<TD>4.0.1</TD>
1534
1535<TD>Maintenance release. Correction of mtREV matrix. Fix of the "intree bug".
1536Removal of stringent runtime-compatibility check to allow out-of-the-box compile
1537on alpha. More accurate gamma distribution allowing 16 instead of 8 categories
1538and ensuring a better alpha > 1.0. Update of documentation (mainly address changes).
1539More Unix-like file layout, and change of license to GPL.
1540</TD>
1541</TR>
1542
1543<TR VALIGN=TOP>
1544<TD>4.0 </TD>
1545
1546<TD>Executables for Windows 95/NT and OS/2 instead of MS-DOS. Computation
1547of clock-like branch lengths (also for amino acids and for non-binary trees).
1548Automatic likelihood ratio clock test. Model for two-state sequences data
1549(0,1) included. Display of most probable assignment of rates to sites.
1550Identification of groups of identical sequences. Possibility to read multiple
1551input trees. Kishino-Hasegawa test to check whether trees are significantly
1552different. BLOSUM 62 model of amino acid substitution
1553(<A HREF="#henikoff1992">Henikoff-Henikoff</A>, 1992).
1554Use of parameter alpha instead of eta = 1/(1+alpha) (for rate heterogeneity).
1555
1556Improvements to user interface. SH model can be applied to 1st and 2nd
1557codon positions. Automatic check for compatible compiler settings. Workaround
1558for severe runtime problem when the gcc compiler was used.</TD>
1559</TR>
1560
1561<TR VALIGN=TOP>
1562<TD>3.1 </TD>
1563
1564<TD>Much improved user interface to rate heterogeneity (less confusing
1565menu, rearranged outfile, additional out-of-range check). Possibility to
1566read rooted input trees (automatic removal of basal bifurcation). Computation
1567of average distance between all pairs of sequences. Fix of a bug that caused
1568PUZZLE 3.0 to crash on some systems (DEC Alpha). Cosmetic changes in program
1569and documentation. </TD>
1570</TR>
1571
1572<TR VALIGN=TOP>
1573<TD>3.0 </TD>
1574
1575<TD>Rate heterogeneity included in all models of substitution (Gamma distribution
1576plus invariable sites). Likelihood mapping analysis with Postscript output
1577added. Much more sophisticated maximum likelihood parameter estimation
1578for all model parameters including those of rate heterogeneity. Codon positions
1579selectable. Update to mtREV24. New icon. Less verbose runtime messages.
1580HTML documentation. Better internal error classification. More information
1581in outfile (number of constant positions etc.). </TD>
1582</TR>
1583
1584<TR VALIGN=TOP>
1585<TD>2.5.1 </TD>
1586
1587<TD>Fix of a bug (present only in version 2.5) related to computation of
1588the variance of the maximum likelihood branch lengths that caused occasional
1589crashes of PUZZLE on some systems when applied to data sets containing many
1590very similar sequences. Drop of support for non-FPU Macintosh version.
1591Corrections in manual. </TD>
1592</TR>
1593
1594<TR VALIGN=TOP>
1595<TD>2.5 </TD>
1596
1597<TD>Improved QP algorithm (<A HREF="#strimmer1997">Strimmer, Goldman, and
1598von Haeseler</A>, 1997). Bug
1599fixes in ML engine, computation of ML distances and ML branch lengths,
1600optional input of a user tree, F84 model added, estimation of all TN model
1601parameters and corresponding standard errors, CLUSTAL W treefile convention
1602adopted to allow to show branch lengths and QP support values simultaneously,
1603display of unresolved quartets, update of mtREV matrix, source code more
1604compatible with some almost-ANSI compilers, more safety checks in the code. </TD>
1605</TR>
1606
1607<TR VALIGN=TOP>
1608<TD>2.4 </TD>
1609
1610<TD>Automatic data type recognition, chi-square-test on base composition,
1611automatic selection of best amino acid model, estimation of transition-transversion
1612parameter, ASCII plot of quartet puzzling tree into the outfile. </TD>
1613</TR>
1614
1615<TR VALIGN=TOP>
1616<TD>2.3 </TD>
1617
1618<TD>More models, many usability improvements, built-in consensus tree routines,
1619more supported systems, bug fixes, no more dependencies of input order.
1620First EBI distributed version. </TD>
1621</TR>
1622
1623<TR VALIGN=TOP>
1624<TD>2.2 </TD>
1625
1626<TD>Optimized internal data structure requiring much less computer memory.
1627Bug fixes. </TD>
1628</TR>
1629
1630<TR VALIGN=TOP>
1631<TD>2.1 </TD>
1632
1633<TD>Bug fixes concerning algorithm and transition/transversion parameter. </TD>
1634</TR>
1635
1636<TR VALIGN=TOP>
1637<TD>2.0 </TD>
1638
1639<TD>Complete revision merging the maximum likelihood and the quartet puzzling
1640routines into one user friendly program. First electronic distribution. </TD>
1641</TR>
1642
1643<TR VALIGN=TOP>
1644<TD>1.0 </TD>
1645
1646<TD>First public release, presented at the 1995 phylogenetic workshop (15-17
1647June 1995) at the University of Bielefeld, Germany. </TD>
1648</TR>
1649</TABLE>
1650 
1651</BODY>
1652</HTML>
1653
Note: See TracBrowser for help on using the repository browser.