Context Navigation

← Previous Revision
Next Revision →
Blame
Revision Log

manual.html

Visit:

Last change on this file was 10842, checked in by westram, 12 years ago
reintegrates 'help' into 'trunk': adds: log:branches/help@10647:10841 log:branches/helptest@10704:10720
Property svn:eol-style set to `native` Property svn:keywords set to `Author Date Id Revision`
File size: 71.0 KB

Line
1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
2
3	<HTML>
4	<!-- To view this document properly please use a HTML browser -->
5
6	<HEAD>
7	<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
8	<TITLE>Documentation of TREE-PUZZLE 5.0</TITLE>
9	</HEAD>
10	<BODY BGCOLOR="#FFFFFF">
11
12	<H1>
13	<img ALT="PUZZLE Logo" SRC="puzzle.gif" HSPACE=10 BORDER=0 height=32 width=32 align=LEFT>
14	<b><font size="+3">TREE-PUZZLE Manual</font></b>
15	<img ALT="PPUZZLE Logo" SRC="ppuzzle.gif" HSPACE=10 BORDER=0 height=32 width=32>
16	</H1>
17	<B>Maximum likelihood analysis for nucleotide, amino acid, and two-state data</B>
18
19
20	<P>Version 5.0
21	<BR>October 2000
22	<BR>Copyright 1999-2000 by Heiko A. Schmidt, Korbinian Strimmer, Martin Vingron, and Arndt von Haeseler
23	<BR>Copyright 1995-1999 by Korbinian Strimmer and Arndt von Haeseler
24
25	<P><b>Heiko A. Schmidt</b>,
26	email: h.schmidt@dkfz-heidelberg.de,
27	<A HREF="http://www.dkfz-heidelberg.de/tbi/">Theoretical Bioinformatics</A>,
28	<A HREF="http://www.dkfz-heidelberg.de/">DKFZ</A>,
29	Im Neuenheimer Feld 280, D-69124 Heidelberg, Germany.
30
31	<P><b>Korbinian Strimmer</b>,
32	email: korbinian.strimmer@zoo.ox.ac.uk,
33	<A HREF="http://www.zoo.ox.ac.uk/">Department of Zoology</A>,
34	<A HREF="http://www.ox.ac.uk/">University of Oxford</A>,
35	South Parks Road, Oxford OX1 3PS, UK.
36
37	<P><b>Martin Vingron</b>,
38	email: vingron@dkfz-heidelberg.de,
39	<A HREF="http://www.dkfz-heidelberg.de/tbi/">Theoretical Bioinformatics</A>,
40	<A HREF="http://www.dkfz-heidelberg.de/">DKFZ</A>,
41	Im Neuenheimer Feld 280, D-69124 Heidelberg, Germany.
42
43	<P><b>Arndt von Haeseler</b>,
44	email: haeseler@eva.mpg.de,
45	<A HREF="http://www.eva.mpg.de/">Max-Planck-Institute for Evolutionary Anthropology</A>,
46	Inselstr. 22, D-04103 Leipzig, Germany.
47
48	<p><font size ="-1" color ="brown">The official name of the program has been
49	changed to TREE-PUZZLE to avoid legal conflict with the Fraunhofer
50	Gesellschaft. We are sorry for any inconvenience this may cause to you.
51	Any reference to PUZZLE in this package is only colloquial and refers
52	to TREE-PUZZLE.
53	</font>
54
55	<P>TREE-PUZZLE is a computer program to reconstruct phylogenetic trees from
56	molecular sequence data by maximum likelihood. It implements a fast tree
57	search algorithm, quartet puzzling, that allows analysis of large data
58	sets and automatically assigns estimations of support to each internal
59	branch. TREE-PUZZLE also computes pairwise maximum likelihood distances as well
60	as branch lengths for user specified trees. Branch lengths can also be
61	calculated under the clock-assumption. In addition, TREE-PUZZLE offers a novel
62	method, likelihood mapping, to investigate the support of a hypothesized
63	internal branch without computing an overall tree and to visualize the
64	phylogenetic content of a sequence alignment. TREE-PUZZLE also conducts a number
65	of statistical tests on the data set (chi-square test for homogeneity of
66	base composition, likelihood ratio to test the clock hypothesis, Kishino-Hasegawa
67	test). The models of substitution provided by TREE-PUZZLE are TN, HKY, F84,
68	SH for nucleotides, Dayhoff, JTT, mtREV24, BLOSUM 62, VT, WAG for amino acids, and
69	F81 for two-state data. Rate heterogeneity is modelled by a discrete Gamma
70	distribution and by allowing invariable sites. The corresponding parameters
71	can be inferred from the data set.
72
73	<P>TREE-PUZZLE is available free of charge from
74	<DL>
75	<DD>
76	<A HREF="http://www.tree-puzzle.de/">http://www.tree-puzzle.de/</A> (TREE-PUZZLE home page)
77	</DD>
78
79	<DD>
80	<A HREF="http://www.dkfz-heidelberg.de/tbi/tree-puzzle/">http://www.dkfz-heidelberg.de/tbi/tree-puzzle/</A> (TREE-PUZZLE home page mirror at DKFZ)
81	</DD>
82
83	<DD>
84	<A HREF="http://iubio.bio.indiana.edu/soft/molbio/evolve">http://iubio.bio.indiana.edu/soft/molbio/evolve</A>
85	(IUBio archive www, USA)
86	</DD>
87
88	<DD>
89	<A HREF="ftp://iubio.bio.indiana.edu/molbio/evolve">ftp://iubio.bio.indiana.edu/molbio/evolve</A>
90	(IUBio archive ftp, USA)
91	</DD>
92
93	<DD>
94	<A HREF="ftp://ftp.ebi.ac.uk/pub/software">ftp://ftp.ebi.ac.uk/pub/software</A>
95	(European Bioinformatics Institute, UK)
96	</DD>
97
98	<DD>
99	<A HREF="ftp://ftp.pasteur.fr/pub/GenSoft">ftp://ftp.pasteur.fr/pub/GenSoft</A>
100	(Institut Pasteur, France)
101	</DD>
102
103	</DL>
104	TREE-PUZZLE is written in ANSI C. It will run on most personal computers and
105	workstations if compiled by an appropriate C compiler.
106	The tree reconstruction part of TREE-PUZZLE has been parallelized using
107	the Message Passing Interface (MPI)
108	library standard (<A HREF="#snir1998">Snir et al.</A>, 1998 and
109	<A HREF="#gropp1998">Gropp et al.</A>, 1998). If desired to run
110	TREE-PUZZLE in parallel you need an implementation of the MPI library on your
111	system as well.
112
113	<P>Please read the <A HREF="#Installation">installation section</A>
114	for more details.
115
116	<P>We suggest that this documentation should be read before using TREE-PUZZLE
117	the first time. If you do not have the time to read this manual completely
118	please do read at least the sections <A HREF="#Input/Output Conventions">Input/Output
119	Conventions</A> and <A HREF="#Quick Start">Quick Start </A>below. Then
120	you should be able to use the TREE-PUZZLE program, especially if you have some
121	experience with the PHYLIP programs. The other sections should then be read
122	at a later time.
123
124	<P>To find out what's new in version 5.0 please read the
125	<A HREF="#Version History">Version History</A>.
126
127	<P>
128	<HR ALIGN=center WIDTH="100%" SIZE=2>
129	<CENTER><H2>Contents</H2></CENTER><P>
130
131	<UL>
132	<LI><A HREF="#Legal Stuff">Legal Stuff</A></LI>
133
134	<LI><A HREF="#Installation">Installation</A>
135	<UL>
136	<LI><A HREF="#Unix">UNIX</A></LI>
137	<LI><A HREF="#MacOS">MacOS</A></LI>
138	<LI><A HREF="#Win32">Windows 95/98/NT</A></LI>
139	<LI><A HREF="#VMS">VMS</A></LI>
140	<LI><A HREF="#MPI">Parallel TREE-PUZZLE</A></LI>
141	</UL>
142	</LI>
143
144	<LI><A HREF="#Introduction">Introduction</A></LI>
145	<LI><A HREF="#Input/Output Conventions">Input/Output Conventions</A>
146	<UL>
147	<LI><A HREF="#Sequence Input">Sequence Input</A></LI>
148	<LI><A HREF="#General Output">General Output</A></LI>
149	<LI><A HREF="#Distance Output">Distance Output</A></LI>
150	<LI><A HREF="#Tree Output">Tree Output</A></LI>
151	<LI><A HREF="#Tree Input">Tree Input</A></LI>
152	<LI><A HREF="#Likelihood Mapping Output">Likelihood Mapping Output</A></LI>
153	</UL>
154	</LI>
155
156	<LI><A HREF="#Quick Start">Quick Start</A></LI>
157	<LI><A HREF="#Models of Sequence Evolution">Models of Sequence Evolution</A>
158	<UL>
159	<LI><A HREF="#Models of Substitution">Models of Substitution</A></LI>
160	<LI><A HREF="#Models of Rate Heterogeneity">Models of Rate Heterogeneity</A></LI>
161	</UL>
162	</LI>
163
164	<LI><A HREF="#Options Available">Available Options</A></LI>
165	<LI><A HREF="#Other Features">Other Features</A></LI>
166	<LI><A HREF="#Interpretation and Hints">Interpretation and Hints</A>
167	<UL>
168	<LI><A HREF="#Quartet Puzzling Support Values">Quartet Puzzling Support Values</A></LI>
169	<LI><A HREF="#Percentage of Unresolved Quartets">Percentage of Unresolved Quartets</A></LI>
170	<LI><A HREF="#Automatic Parameter Estimation">Automatic Parameter Estimation</A></LI>
171	<LI><A HREF="#Likelihood Mapping">Likelihood Mapping</A></LI>
172	<LI><A HREF="#Batch Mode">Batch Mode</A></LI>
173	</UL>
174	</LI>
175	<LI><A HREF="#Limits and Error Messages">Limits and Error Messages</A></LI>
176	<LI><A HREF="#Are Quartets Reliable">Are Quartets Reliable?</A></LI>
177	<LI><A HREF="#Other Programs">Other Programs</A></LI>
178	<LI><A HREF="#Acknowledgements">Acknowledgements</A></LI>
179	<LI><A HREF="#References">References</A></LI>
180	<LI><A HREF="#Known Bugs">Known Bugs</A></LI>
181	<LI><A HREF="#Version History">Version History</A></LI>
182	</UL>
183
184	<HR>
185	<H2>
186	<A NAME="Legal Stuff"></A>Legal Stuff</H2>
187	TREE-PUZZLE 5.0 is (c) 1999-2000 Heiko A. Schmidt, Korbinian Strimmer, Martin Vingron, and Arndt von Haeseler.<BR>
188	Earlier PUZZLE versions were (c) 1995-1999 by Korbinian Strimmer and Arndt von Haeseler.<BR>
189	The software and its accompanying documentation are provided as
190	is, without guarantee of support or maintenance. The whole package is
191	licensed under the GNU public license, except for the parts indicated in
192	the sources where the copyright of the authors does not apply. Please see
193	<A
194	HREF="http://www.opensource.org/licenses/gpl-license.html">http://www.opensource.org/licenses/gpl-license.html</A> for details.
195
196	<H2>
197	<A NAME="Installation"></A>Installation</H2>
198	The source code of the TREE-PUZZLE software is 100% identical across platforms.
199	However, installation procedures differ.
200
201	<H3>
202	<A NAME="Unix"></A>UNIX</H3>
203	Get the file <B>tree-puzzle-5.0.tar</B>. If you received a compressed tar file
204	(<B>tree-puzzle-5.0.tar.Z</B> or <B>tree-puzzle-5.0.tar.gz</B>) you have to decompress
205	it first (using the "uncompress" or "gunzip" command). Then untar the file
206	with
207	<PRE> tar xvf tree-puzzle-5.0.tar</PRE>
208	The newly created directory "tree-puzzle-5.0" contains four subdirectories called
209	"doc", "data", "bin", and "src". The "doc" directory
210	contains this manual in HTML format. The "data"
211	directory contains example input files. The "src" directory contains the
212	ANSI C sources of TREE-PUZZLE. Switch to this directory by typing
213	<PRE> cd tree-puzzle-5.0</PRE>
214	To compile we recommend the GNU gcc (or GNU egcs) compiler. If gcc is installed
215	just type
216	<PRE> sh ./configure</PRE>
217	<PRE> make</PRE>
218	<PRE> make install</PRE>
219	and the executable <TT>puzzle</TT> is compiled and put into the <TT>/usr/local/bin</TT> directory.
220	If you want to have <TT>puzzle</TT> installed into another directory you can set this
221	by setting the <TT>--prefix=/name/of/the/wanted/directory</TT> directive at the
222	<TT>sh ./configure</TT> command line.
223	The parallel version should have been built and installed as well, if <TT>configure</TT>
224	found a known MPI compiler (cf. <A HREF="#MPI">Parallel TREE-PUZZLE</A> section).
225
226
227	Then type
228	<PRE> make clean</PRE>
229	and everything will be nicely cleaned up.
230
231	If your compiler is not the GNU gcc compiler and not found by <TT>configure</TT> you will have to
232	modify that, by setting the <TT>CC</TT> variable (e.g. <TT>setenv CC cc</TT> under <TT>csh</TT> or
233	<TT>CC=cc; export CC</TT> under <TT>sh</TT>) before running <TT>sh ./configure</TT>.
234	If you still cannot compile properly then your compiler or its runtime library
235	is most probably not ANSI compliant (e.g., old SUN compilers). In most
236	cases, however, you will succeed to compile by changing some parameters
237	in the "makefile". Ask your local Unix expert for help.
238
239	<H3>
240	<A NAME="MacOS"></A>MacOS</H3>
241	Get the file <B>tree-puzzle-5.0.hqx</B>. After decoding this BinHex file (this
242	is done automatically on a properly installed system, otherwise use programs
243	like "StuffIt Expander" or ask your local Mac expert) you will find a folder
244	called "tree-puzzle-5.0" on your hard disk. This folder contains the four subfolders
245	"doc", "data", "bin", and "src". The "doc" folder contains
246	this manual in HTML format. The "data" folder contains
247	example input files. The "bin" folder contains a Macintosh PPC executable
248	with a default memory partition of 3000K.
249	There is no 68k executable. <u>If you get a memory allocation error while running
250	TREE-PUZZLE you have to increase TREE-PUZZLEŽs memory partition with the "Get Info" command
251	of the Macintosh Finder</u>. The "src" folder contains the ANSI C sources of TREE-PUZZLE.
252
253	<P>The MacOS executables have been compiled for the PowerMac using Metrowerks CodeWarrior.
254
255	<P>Note: It is probably a good idea to install PPC Linux (or MkLinux) on your Macintosh.
256	TREE-PUZZLE (as any other program) runs 20-50% faster under Linux compared to the
257	same program under MacOS (on the same machine!), and the Mac does not freeze
258	during execution because of LinuxŽs multitasking capabilities (maybe this changes in MacOS X).
259
260
261	<H3>
262	<A NAME="Win32"></A>Windows 95/98/NT</H3>
263
264	Get the file <B>tree-puzzle-5.0.zip</B>. After uncompressing (using, e.g., WinZip
265	or a similar tool) a directory "tree-puzzle-5.0" is created containing
266	four subdirectories called "doc", "data", "bin", and "src". The "doc" directory
267	contains this manual in HTML format. The "data"
268	directory contains example input files. The "src" directory contains the
269	ANSI C sources of TREE-PUZZLE. The "bin" directory contains the executable
270	<TT>puzzle.exe</TT>. To use TREE-PUZZLE the system path to the executable
271	needs to be set correctly. Ask your local Windows expert for help.
272
273	<P>The executable has been compiled using
274	Microsoft Visual C++ and the "makefile.w32" (contained in "src").
275
276	<P>If you have a Linux partition on your PC we recommend
277	to install and use TREE-PUZZLE under Linux (see <A HREF="#Unix">Unix</A> section) because it runs
278	TREE-PUZZLE significantly faster than Windows.
279
280	<H3>
281	<A NAME="VMS"></A>VMS</H3>
282
283
284	<P>Get the Unix sources and install the package on your computer
285	(ask your local VMS expert for help). Go to the subdirectory
286	"src" and compile TREE-PUZZLE using the command file "makefile.com".
287
288	<H3>
289	<A NAME="MPI"></A>Parallel TREE-PUZZLE</H3>
290
291
292	<P>To compile and run the parallelized TREE-PUZZLE you need an implementation
293	of the Message Passing Interface (MPI) library, a widely used
294	message passing library standard. Implementations of the MPI libraries
295	are available for almost all parallel platforms and computer systems,
296	and there are free implementations for most platforms as well.
297
298	<P>To find an MPI implementation suitable for your platform visit
299	the following web sites:
300	<UL>
301	<LI><A HREF="http://www-unix.mcs.anl.gov/mpi/implementations.html">http://www-unix.mcs.anl.gov/mpi/implementations.html</A>
302	<LI><A HREF="http://WWW.ERC.MsState.Edu/labs/hpcl/projects/mpi/implementations.html">http://WWW.ERC.MsState.Edu/labs/hpcl/projects/mpi/implementations.html</A>
303	<LI><A HREF="http://www.mpi.nd.edu/MPI/">http://www.mpi.nd.edu/MPI/</A>
304	</UL>
305
306	Although MPI is also available on Macintosh and Windows systems,
307	the developers never ran the parallel version on those
308	platforms.
309
310	<P>To install the parallel version of TREE-PUZZLE you need the
311	Unix sources for TREE-PUZZLE and install the package on your computer
312	as described above.
313	The <TT>configure</TT> should configure the Makefiles apropriately.
314	If there is no known MPI compiler found on the system the parallel
315	version is not configured.
316	(If problems occur ask your local system administrator for help.)
317
318	<P>Than you should be able to compile the parallel version of TREE-PUZZLE
319	using the following commands:
320	<PRE> sh ./configure</PRE>
321	<PRE> make</PRE>
322	<PRE> make install</PRE>
323	and the executable <TT>ppuzzle</TT> is compiled and put into the <TT>/usr/local/bin</TT> directory.
324	If you want to have the executable installed into another directory please proceede as
325	described in the <A HREF="#Unix">Unix</A> section.
326
327	If your compiler is non out of <TT>mpcc</TT> (IBM), <TT>hcc</TT> (LAM),
328	<TT>mpicc_lam</TT> (LAM under LINUX), <TT>mpicc_mpich</TT> (MPICH under LINUX),
329	and <TT>mpicc</TT> (LAM, MPICH, HP-UX, etc.) and not found by <TT>configure</TT> you will have to
330	modify that by setting the <TT>MPICC</TT> variable (e.g. <TT>setenv MPICC /another/mpicc</TT>
331	under <TT>csh</TT> or <TT>MPICC=/another/mpicc; export MPICC</TT> under <TT>sh</TT>)
332	before running <TT>sh ./configure</TT>.
333
334	The way you have to start <TT>ppuzzle</TT> depends on the MPI implementation
335	installed. So please refer to your MPI manual or ask your local MPI expert
336	for help.
337
338	<P><B>Note:</B>
339	<BR>The parallelization of the tree reconstruction method follows a
340	master-worker-concept, i.e., a master process handles the scheduling of
341	the computation to the <em>n</em> worker processes, while the worker processes are
342	doing almost all the computation work of evaluating the quartets and
343	constructing the puzzling step trees.
344
345	<BR>Since the master process does not require a lot of CPU time,
346	it can be scheduled sharing one processor with a worker process.
347	Thus, you can run <TT>ppuzzle</TT> by assigning <em>n+1</em> processes.
348
349	<BR>If you want to evaluate a usertree or perform likelihood
350	mapping analysis it is not recommended to do a parallel run, because all
351	the computation will be done by the master process. Hence a run of the
352	sequential version of TREE-PUZZLE is more appropriate for usertree or likelihood
353	mapping analysis.
354
355	<H2>
356	<A NAME="Introduction"></A>Introduction</H2>
357	TREE-PUZZLE is an ANSI C application to reconstruct phylogenetic trees from
358	molecular sequence data by maximum likelihood. It implements a fast tree
359	search algorithm, quartet puzzling, that allows analysis of large data
360	sets and automatically assigns estimations of support to each internal
361	branch. Rate heterogeneity (invariable sites plus Gamma distributed rates)
362	is incorporated in all models of substitution available (nucleotides: SH,
363	TN, HKY, F84, and submodels; amino acids: Dayhoff, JTT, mtREV24, BLOSUM
364	62, VT, and WAG; two-state data: F81). All parameters including rate heterogeneity can
365	be estimated from the data by maximum likelihood approaches. TREE-PUZZLE also
366	computes pairwise maximum likelihood distances as well as branch lengths
367	for user specified trees. In addition, TREE-PUZZLE offers a novel method, likelihood
368	mapping, to investigate the support of internal branches without computing
369	an overall tree.
370	<H2>
371	<A NAME="Input/Output Conventions"></A>Input/Output Conventions</H2>
372
373	A few things of the name conventions have changed compared to
374	earlier (< 5.0) PUZZLE releases. From version 5.0 onwards
375	names of the sequence input file and the usertree file can be specified
376	at the command line (e.g. '<TT>puzzle infilename intreename</TT>',
377	where <TT>infilename</TT> is the name of the sequence file and <TT>intreename</TT>
378	is the name of the usertree file).
379	If only the input filename or no
380	filename is given at the command line the TREE-PUZZLE software searches
381	for input files named "<TT>infile</TT>" and/or "<TT>intree</TT>" respectively.
382
383	<P>The naming conventions of the output files have changed as well.
384	As prefix of the output filenames the name of the sequence input file
385	(or the usertree file in the usertree analysis case) is used and an
386	extension added to denote the content of the file. If no input filename
387	is given at the command line the default filenames of the earlier
388	versions are used.
389
390	The following extensions/default filenames are possible:
391	<DL><DT><DD>
392	<TABLE><TR><TD><B>Extension</B></TD><TD><B>default filename</B></TD><TD><B>file content</B></TD></TR>
393	<TR><TD><TT>.puzzle </TT></TD><TD><TT>outfile </TT></TD><TD>for the TREE-PUZZLE report</TD></TR>
394	<TR><TD><TT>.dist </TT></TD><TD><TT>outdist </TT></TD><TD>for the ML distances</TD></TR>
395	<TR><TD><TT>.tree </TT></TD><TD><TT>outtree </TT></TD><TD>for the final tree(s)</TD></TR>
396	<TR><TD><TT>.qlist </TT></TD><TD><TT>outqlist </TT></TD><TD>for the list of unresolved quartets</TD></TR>
397	<TR><TD><TT>.ptorder</TT></TD><TD><TT>outptorder </TT></TD><TD>for the list of unique puzzling step tree topologies</TD></TR>
398	<TR><TD><TT>.pstep </TT></TD><TD><TT>outpstep </TT></TD><TD>for the list of puzzling step tree topologies in chronological order</TD></TR>
399	<TR><TD><TT>.eps </TT></TD><TD><TT>outlm.eps </TT></TD><TD>for the EPS file generated in the likelihood mapping analysis</TD></TR>
400	</TABLE></DL>
401
402	The file types are described in detail below. In the following
403	"INFILENAME" denotes the prefix, which is the sequence input filename
404	or the usertree filename respectively.
405
406	<H3>
407	<A NAME="Sequence Input"></A>Sequence Input</H3>
408	TREE-PUZZLE requests sequence input in PHYLIP INTERLEAVED format (sometimes
409	also called PHYLIP 3.4 format). Many sequence editors and alignment programs
410	(e.g., CLUSTAL W) output data in this format. The "data" directory
411	contains four example input files ("globin.a", "marswolf.n", "atp6.a",
412	"primates.b") that can be used as templates for own data files.
413	The default name of the sequence input file is "infile", if no
414	input filename is given at the command line.
415	If an "infile" or a file with the given name is not present TREE-PUZZLE
416	will request an alternative file name. Sequences names in the
417	input file are allowed to contain blanks but all blanks will internally
418	be converted to underscores "_". Sequences can be in upper or lower case,
419	any spaces or control characters are ignored. The dot "." is recognized
420	as character matching to the first sequence, it can be used in all sequences except in the
421	first sequence. Valid symbols for nucleotides are A, C, G, T and
422	U, and for amino acids A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S,
423	T, V, W, and Y. All other visible characters (including gaps, question
424	marks etc.) are treated as N (DNA/RNA) or X (amino acids). For two-state
425	data the symbols 0 and 1 are allowed. The first sequence in the data set is
426	considered the default outgroup.
427	<H3>
428	<A NAME="General Output"></A>General Output</H3>
429	All results are written to the TREE-PUZZLE report file (INFILENAME.puzzle or
430	outfile). If the option "List all unresolved quartets" is invoked a file
431	called "INFILENAME.qlist"/"outqlist" is created showing all these quartets.
432	If the option "List puzzling step trees" is set accordingly the files
433	"INFILENAME.pstep"/"outpstep" and/or "INFILENAME.ptorder"/"outptorder" are
434	generated.
435
436	<P>The "INFILENAME.ptorder"/"outptorder" file contains the unique tree
437	topologies in PHYLIP format preceded by PHYLIP-format comment (in parenthesis).
438	A typical line in the ptorder file looks like this:
439
440	<P><TT>[ 2. 60 6.00 2 5 1000 ](chicken,((cat,(horse,(mouse,rat))),(opossum,platypus)));</TT></P>
441
442	The entries (separated by single blanks) in the parenthesis mean the following:
443	<UL>
444	<LI><B>2.</B> - Topology occurs second-most among all
445	intermediate tree topologies (= order number).
446	<LI><B>60</B> - Topology occurs 60 times.
447	<LI><B>6.00</B> - Topology occurs in 6.00 % of the intermediate tree topologies.
448	<LI><B>2</B> - unique topology ID (needed for the pstep file)
449	<LI><B>5</B> - Sum of uniquely occuring topologies.
450	<LI><B>1000</B> - Sum of intermediate trees estimated during the analysis.
451	</UL>
452
453	<P>The "INFILENAME.pstep"/"outpstep" file contains a log of the
454	puzzling steps performed and the occuring tree topologies.
455
456	A typical line in the pstep file contains the following entries
457	(separated by tabstops):
458
459	<P><TT>"6. 55 698 3 5 828"</TT></P>
460
461	The entries in the rows mean the following:
462	<UL>
463	<LI><B>6.</B> - 6th block of intermediate trees performed.
464	<LI><B>55</B> - number of intermediate trees inferred in this block.
465	<LI><B>698</B> - occurances of this topology so far.
466	<LI><B>3</B> - unique topology ID (for lookup in the ptorder file).
467	<LI><B>5</B> - number unique topologies occurred so far.
468	<LI><B>828</B> - number of puzzling step performed so far.
469	</UL>
470	In the case of a sequential run (<TT>puzzle</TT>) the entries of this
471	file are more resolved, because every block consists of one intermediate tree.
472
473	<H3>
474	<A NAME="Distance Output"></A>Distance Output</H3>
475	TREE-PUZZLE automatically computes pairwise maximum likelihood distances for
476	all the sequences in the data file. They are written in the TREE-PUZZLE report
477	file "INFILENAME.puzzle"/"outfile" and in the separate file
478	"INFILENAME.dist"/"outdist". The format of distance file is PHYLIP compatible
479	(i.e. it can directly be used as input for PHYLIP distance-based programs
480	such as "neighbor").
481	<H3>
482	<A NAME="Tree Output"></A>Tree Output</H3>
483	The quartet puzzling tree with its support values
484	and with maximum likelihood branch lengths is displayed as ASCII drawing
485	in the TREE-PUZZLE report in "INFILENAME.puzzle"/"outfile". The same tree
486	is written into the "INFILENAME.tree"/"outtree" file in CLUSTAL W format.
487	If clock-like maximum-likelihood branch lengths are computed
488	there will be both an unrooted and a rooted tree in the
489	"INFILENAME.puzzle"/"outfile". The tree convention follows the NEWICK format
490	(as implemented in PHYLIP or CLUSTAL W): the tree topology is described
491	by the usual round brackets
492	<TT>(a,b,(c,d));</TT>
493	where branch lengths are written after the colon a:0.22,b:0.33.
494	Support values for each branch
495	are displayed as internal node labels, i.e., they follow directly after each
496	node before the branch length to each node. Here is an example:
497
498	<P>(Gibbon:0.1393, ((Human:0.0414, Chimpanzee:0.0538)99:0.0175, Gorilla:0.0577)98:0.0531,
499	Orangutan:0.1003);
500
501	<P>The likelihood value of each tree is added in parenthesis before
502	the tree string (e.g. "[ lh=-1621.201605 ]"). Parenthesis mark comments
503	in the Newick or PHYLIP tree format. In some cases the
504	comment has to be removed before using them with other programs.
505
506	<P>With the programs
507	<a href="http://taxonomy.zoology.gla.ac.uk/rod/treeview.html">TreeView</a> and
508	<a href="ftp://rdp.life.uiuc.edu/pub/RDP/programs/TreeTool/">TreeTool</a>
509	it is possible to view a tree both
510	with its branch lengths and simultaneously with the support values for the internal
511	branches (here 98% and 99%). Note, the PHYLIP programs DRAWTREE and DRAWGRAM may
512	also be used with the CLUSTAL W treefile format. However, in the current version
513	(3.5) they ignore the internal labels and simply print the tree
514	topology along with branch lengths.
515
516	<H3>
517	<A NAME="Tree Input"></A>Tree Input</H3>
518	TREE-PUZZLE optionally also reads input trees. The default name for the file
519	containing the input tree is "intree", if not given at the command line,
520	but if you choose the input tree option and there is no file with the
521	given name or "intree" present you will be prompted for an alternative
522	name. The format of the input trees is identical to the trees in the
523	"INFILENAME.tree"/"outtree" file.
524	However, it is sufficient to provide the tree topology only, you
525	don't need to specify branch lengths (that are ignored anyway) or
526	internal labels (that are read, stored, and written back to the
527	"INFILENAME.tree"/"outtree" file).
528	The input trees needs not to be unrooted, they can also be rooted. It is
529	important that sequence names in the input tree file do not contain blanks
530	(use underscores!). The trees can be multifurcating.
531	The format of the tree input file is easy: just put the
532	trees into the file. TREE-PUZZLE counts the ';' at the end of each tree description
533	to determine how many input trees there are. Any header (e.g., with the
534	number of trees) is ignored (this is useful in conjunction with programs
535	like MOLPHY that need this header). If there is more than one tree TREE-PUZZLE
536	performs the Kishino-Hasegawa test.
537	<H3>
538	<A NAME="Likelihood Mapping Output"></A>Likelihood Mapping Output</H3>
539	TREE-PUZZLE also offers likelihood mapping analysis, a method to investigate
540	support for internal branches of a tree without computing an overall tree
541	and to graphically visualize
542	phylogenetic content of a sequence alignment. The results of likelihood
543	mapping are written in ASCII to the "INFILENAME.puzzle"/"outfile" as well
544	as to a file called "INFILENAME.eps" or "outlm.eps" respectively.
545	This file contains in encapsulated Postscript format (EPSF)
546	a picture of the triangle that forms the basis of the likelihood mapping
547	analysis. You may print it out on a Postscript capable printer or view
548	it with a suitable program. The "INFILENAME.eps"/"outlm.eps" file can be
549	edited by hand (it is plain ASCII text!) or by drawing programs that
550	understand the Postcript language (e.g., Adobe Ilustrator).
551	<H2>
552	<A NAME="Quick Start"></A>Quick Start</H2>
553	Prepare your sequence input file and, optionally, your tree input
554	file. Then start the TREE-PUZZLE program. TREE-PUZZLE will choose
555	automatically the nucleotide or the amino acid mode. If more than 85% of
556	the characters (not counting the - and ?) in the sequences are A, C, G,
557	T, U or N, it will be assumed that the sequences consists of nucleotides.
558	If your data set contains amino acids TREE-PUZZLE suggests whether you have
559	amino acids encoded on mtDNA or on nuclear DNA, and selects the appropriate
560	model of amino acid evolution. If your data set contains nucleotides the
561	default model of sequence evolution chosen is the HKY model. Parameters
562	need not to be specified, they will be estimated by a maximum likelihood
563	procedure from the data. If TREE-PUZZLE detects a usertree file stated at the
564	command line or one called "intree" it automatically switches to the input
565	tree mode.
566
567	<P>Then, a menu (PHYLIP "look and feel") appears with default options set.
568	It is possible to change all available options. For example, if you want
569	to incorporate rate heterogeneity you have to select option "w" as rate
570	heterogeneity is switched off by default. Then type "y" at the input prompt
571	and start the analysis. You will see a number of status messages on the
572	screen during computation. When the analysis is finished all output files
573	(e.g., "outfile", "outtree", "outdist", "outqlist", "outlm.eps", "outpstep",
574	"outptlist" or "INFILENAME.puzzle", "INFILENAME.tree", "INFILENAME.dist",
575	"INFILENAME.qlist", "INFILENAME.eps", "INFILENAME.pstep", "INFILENAME.ptorder")
576	will be in the same directory as the input files.
577
578	<P>To obtain a high quality picture of the output tree (including node labels)
579	you might want to use use the TreeView program by Roderic Page. It is
580	available free of charge and runs on MacOS and MS-Windows. It can be retrieved
581	from <A HREF="http://taxonomy.zoology.gla.ac.uk/rod/treeview.html">http://taxonomy.zoology.gla.ac.uk/rod/treeview.html</A>.
582	TreeView understands the CLUSTAL W treefile conventions, reads multifurcating
583	trees and is able to simultaneously display branch lengths and support values
584	for each branch. Open the "INFILENAME.tree"/"outtree" file with TreeView,
585	choose "Phylogram" to draw branch lengths, and select "Show internal edge
586	labels".
587
588	<P>On a Unix you can use the TreeTool program to display and
589	manipulate TREE-PUZZLE trees (See <A HREF="ftp://rdp.life.uiuc.edu/pub/RDP/programs/TreeTool/">ftp://rdp.life.uiuc.edu/pub/RDP/programs/TreeTool</A>
590	for precompiled Sun executables. A version that runs on Linux has been prepared by
591	<A HREF="mailto:cato@biochem.kth.se">Anders Holmberg</A> from the Dept. of Biochemistry at
592	the Royal Institute of Technology, Stockholm).
593
594	<H2>
595	<A NAME="Models of Sequence Evolution"></A>Models of Sequence Evolution</H2>
596	Here we give a brief overview over the models implemented in TREE-PUZZLE. Formulas
597	are written in TeX style.
598	<H3>
599	<A NAME="Models of Substitution"></A>Models of Substitution</H3>
600	The substitution process is modelled as reversible time homogeneous stationary
601	Markov process. If the corresponding stationary nucleotide (amino acid)
602	frequencies are denoted pi_i the most general rate matrix for the transition
603	from nucleotide (amino acid) i to j can be written as
604	<PRE>
605	\| Q_{ij} pi_j for i != j
606	R_{ij} = \|
607	\| - Sum_m Q_{im} pi_m for i == j
608	</PRE>
609	The matrix Q_{ij} is symmetric with Q_{ii} == 0 (diagonals are zero). For
610	nucleotides the most general model built into TREE-PUZZLE is the Tamura-Nei
611	model (TN, <A HREF="#tamura1993">Tamura and Nei</A>, 1993).
612	The matrix Q_{ij} for this model equals
613	<PRE>
614	\| 4tgamma/(gamma+1) for i -> j pyrimidine transition
615	\|
616	Q_{ij} = \| 4*t/(gamma+1) for i -> j purine transition
617	\|
618	\| 1 for i -> j transversion
619	</PRE>
620	The parameter gamma is called the "Y/R transition parameter" whereas t
621	is the "Transition/transversion parameter". If gamma is equal to 1 we
622	get the HKY model (<A HREF="#hasegawa1985">Hasegawa et al.</A>, 1985).
623	Note, the ratio of the transition and transversion
624	rates (without frequencies) is kappa = 2*t. There is a subtle but important
625	difference between the <I>transition-transversion parameter</I>, the
626	<I>expected transition-transversion ratio</I>, and the <I>observed
627	transition transversion ratio</I>.
628	The <I>transition-transversion parameter</I> simply is a parameter in the
629	rate matrix. The <I>expected transition-transversion ratio</I> is the ratio of
630	actually occurring transitions to actually occurring transversions taking
631	into account nucleotide frequencies in the alignment. Due to saturation
632	and multiple hits not all substitutions are observable. Thus, the <I>observed
633	transition-transversion ratio</I> counts observable transitions and transversions
634	only. If the base frequencies in the HKY model are homogeneous (pi_i =
635	0.25) HKY further reduces to the Kimura model. In this case t is identical
636	to the expected transition/transversion ratio. If t is set to 0.5 the Jukes-Cantor
637	model is obtained. The F84 model (as implemented in the various PHYLIP
638	programs, <A HREF="#felsenstein1984">Felsenstein</A>, 1984)
639	is a special case of the Tamura-Nei model.
640
641	<P>For amino acids the matrix Q_{ij} is fixed and does not contain any free
642	parameters. Depending on the type of input data four different Q_{ij} matrices
643	are available in TREE-PUZZLE.
644	The Dayhoff (<A HREF="#dayhoff1978">Dayhoff et al.</A>, 1978) and
645	JTT (<A HREF="#jones1992">Jones et al.</A>, 1992) matrices are for use with
646	proteins encoded on nuclear DNA, the mtREV24 (<A HREF="#adachi1996">Adachi
647	and Hasegawa</A>, 1996) matrix is for use with proteins encoded on mtDNA,
648	and the BLOSUM 62 (<A HREF="#henikoff1992">Henikoff and Henikoff</A>,
649	1992) and the WAG model (<A HREF="#whelan2000">Whelan and Goldman</A>)
650	are for more distantly related amino acid sequences.
651	The WAG matrix has been infered from a database of 3905 globular protein
652	sequences, forming 182 distinct gene families spanning a broad range of
653	evolutionary distances (<A HREF="#whelan2000">Whelan and Goldman</A>).
654
655	The VT model is based an new estimator for amino acid replacement rates,
656	the resolvent method. The VT matrix has been computed from a large set
657	alignments of varying degree of divergence. Hence VT is for use with
658	proteins of distant relatedness as well (<A HREF="#mueller2000">Mueller and Vingron</A>, 2000).
659
660	<P>For doublets (pairs of dependent nucleotides) the SH model
661	(<A HREF="#schoeniger1994">Schoeniger and von Haeseler</A>, 1994) is
662	implemented in TREE-PUZZLE. The corresponding matrix Q_{ij} reads
663	<PRE>
664	\| 2*t for i -> j transition substitution
665	\|
666	Q_{ij} = \| 1 for i -> j transversion substitution
667	\|
668	\| 0 for i -> j two substitutions
669	</PRE>
670	The SH model basically is a F81 model
671	(<A HREF="#felsenstein1981">Felsenstein</A>, 1981) for single substitutions
672	in doublets.
673	<H3>
674	<A NAME="Models of Rate Heterogeneity"></A>Models of Rate Heterogeneity</H3>
675	Rate heterogeneity is taken into account by considering invariable sites
676	and by introducing Gamma-distributed rates for the variable sites.
677
678	<P>For invariable sites the parameter theta ("Fraction of invariable sites")
679	determines the probability of a given site to be invariable. If a site
680	is invariable the probability for the constant site patterns is pi_i, the
681	frequency of each nucleotide (amino acid).
682
683	<P>The rates r for variable sites are determined by a discrete Gamma
684	distribution that approximates the continuous Gamma distribution
685	<PRE>
686	alpha alpha-1
687	alpha r
688	g(r) = ------------------------
689	alpha r
690	e Gamma(alpha)
691	</PRE>
692	where the parameter alpha ranges from alpha = infinity (no rate heterogeneity)
693	to alpha < 1 (strong heterogeneity). The mean expectation of r under this
694	distribution is 1.
695
696	<P>A mixed model of rate heterogeneity (Gamma plus invariable sites)
697	is also available. In this case the total rate heterogeneity rho
698	(as defined by <A HREF="#gu1995">Gu et al.</A>, 1995) computes as rho = (1+theta
699	alpha)/(1+alpha).
700
701	<H2>
702	<A NAME="Options Available"></A>Available Options</H2>
703	All options can be selected and changed after TREE-PUZZLE has read the input
704	file. Depending on the input files options are preselected and displayed
705	in a menu ("PHYLIP look and feel"):
706	<PRE>
707	GENERAL OPTIONS
708	b Type of analysis? Tree reconstruction
709	k Tree search procedure? Quartet puzzling
710	v Approximate quartet likelihood? No
711	u List unresolved quartets? No
712	n Number of puzzling steps? 1000
713	j List puzzling step trees? No
714	o Display as outgroup? Gibbon
715	z Compute clocklike branch lengths? No
716	e Parameter estimates? Approximate (faster)
717	x Parameter estimation uses? Neighbor-joining tree
718	SUBSTITUTION PROCESS
719	d Type of sequence input data? Nucleotides
720	m Model of substitution? HKY (Hasegawa et al. 1985)
721	t Transition/transversion parameter? Estimate from data set
722	f Nucleotide frequencies? Estimate from data set
723	RATE HETEROGENEITY
724	w Model of rate heterogeneity? Uniform rate
725
726	Quit [q], confirm [y], or change [menu] settings:
727	</PRE>
728	By typing the letters shown in the menu you can either change settings
729	or enter new parameters. Some options (for example "m" and "w") can be
730	invoked several times to switch through a number of different settings.
731	The parameters of the models of sequence evolution can be estimated from
732	the data by a variety of procedures based on maximum likelihood. The analysis
733	is started by typing "y" at the input prompt. To quit the program
734	type "q".
735
736	<P>The following table lists in alphabetical order all TREE-PUZZLE options.
737	Be aware, however, not all of them are accessible at the same time:
738	<TABLE CELLPADDING=2 >
739	<TR VALIGN=TOP>
740	<TD>
741	<CENTER><B>Option</B></CENTER>
742	</TD>
743	<TD>
744	<CENTER><B>Description</B></CENTER>
745	</TD>
746	</TR>
747
748	<TR VALIGN=TOP>
749	<TD>
750	<CENTER>a</CENTER>
751	</TD>
752	<TD>Gamma rate heterogeneity parameter alpha. This is the so-called shape
753	parameter of the Gamma distribution.</TD>
754	</TR>
755
756	<TR VALIGN=TOP>
757	<TD>
758	<CENTER>b</CENTER>
759	</TD>
760	<TD>Type of analysis. Allows to switch between tree reconstruction by maximum
761	likelihood and likelihood mapping.</TD>
762	</TR>
763
764	<TR VALIGN=TOP>
765	<TD>
766	<CENTER>c</CENTER>
767	</TD>
768	<TD>Number of rate categories (4-16) for the discrete Gamma distribution
769	(rate heterogeneity).</TD>
770	</TR>
771
772	<TR VALIGN=TOP>
773	<TD>
774	<CENTER>d</CENTER>
775	</TD>
776	<TD>Data type. Specifies whether nucleotide, amino acid sequences, or
777	two-state data serve as input. The default is automatically set by
778	inspection of the input data.
779	After TREE-PUZZLE has selected an appropriate data type (marked by 'Auto:')
780	the 'd'-option changes the type in the following order:
781	selected type -> Nucleotides -> Amino acids -> automatically selected type.</TD>
782	</TR>
783
784	<TR VALIGN=TOP>
785	<TD>
786	<CENTER>e</CENTER>
787	</TD>
788	<TD>Approximation option. Determines whether an approximate or the exact
789	likelihood function is used to estimate parameters of the models of sequence
790	evolution. The approximate likelihood function is in most cases sufficient
791	and is faster.</TD>
792	</TR>
793
794	<TR VALIGN=TOP>
795	<TD>
796	<CENTER>f</CENTER>
797	</TD>
798	<TD>Base frequencies. The maximum likelihood calculation needs the frequency
799	of each nucleotide (amino acid, doublet) as input. TREE-PUZZLE estimates these
800	values from the sequence input data. This option allows specification of
801	other values.</TD>
802	</TR>
803
804	<TR VALIGN=TOP>
805	<TD>
806	<CENTER>g</CENTER>
807	</TD>
808	<TD>Group sequences in clusters. Allows to define clusters of sequences
809	as needed for the likelihood mapping analysis. Only available when likelihood
810	mapping is selected ("b" option).</TD>
811	</TR>
812
813	<TR VALIGN=TOP>
814	<TD>
815	<CENTER>h</CENTER>
816	</TD>
817	<TD>Codon positions or definition of doublets. For nucleotide data only.
818	If the TN or HKY model of substitution is used and the number of sites
819	in the alignment is a multiple of three the analysis can be restricted
820	to each of the three codon positions and to the 1st and 2nd positions.
821	If the SH model is used this options allows to specify that the 1st and
822	2nd codon positions in the alignment define a doublet.</TD>
823	</TR>
824
825	<TR VALIGN=TOP>
826	<TD>
827	<CENTER>i</CENTER>
828	</TD>
829	<TD>Fraction of invariable sites. Probability of a site to be invariable.
830	This parameter can be estimated from the data by TREE-PUZZLE
831	(only if the approximation option for the likelihood function is
832	turned off).</TD>
833	</TR>
834
835	<TR VALIGN=TOP>
836	<TD>
837	<CENTER>j</CENTER>
838	</TD>
839	<TD>List puzzling steps trees. Writes all intermediate trees (puzzling
840	step trees) used to compute the quartet puzzling tree into a file, either
841	as a list of topologies ordered by number of occurrences (*.ptorder), or
842	as list about the chronological occurrence of the topologies (*.pstep), or
843	both.</TD>
844	</TR>
845
846	<TR VALIGN=TOP>
847	<TD>
848	<CENTER>k</CENTER>
849	</TD>
850	<TD>Tree search. Determines how the overall tree is obtained. The topology
851	is either computed with the quartet puzzling algorithm or is defined by
852	the user. Maximum likelihood branch lengths will be computed for this tree.
853	Alternatively, a maximum likelihood distance matrix only can also be computed
854	(no overall tree). </TD>
855	</TR>
856
857	<TR VALIGN=TOP>
858	<TD>
859	<CENTER>l</CENTER>
860	</TD>
861	<TD>Location of root. Only for computation of clock-like maximum likelihood
862	branch lengths. Allows to specify the branch where the root should be placed
863	in an unrooted tree topology. For example, in the tree (a,b,(c,d)) l =
864	1 places the root at the branch leading to sequence a whereas l=5 places
865	the root at the internal branch.</TD>
866	</TR>
867
868	<TR VALIGN=TOP>
869	<TD>
870	<CENTER>m</CENTER>
871	</TD>
872	<TD>Model of substitution. The following models are implemented for nucleotides:
873	the <A HREF="#tamura1993">Tamura-Nei</A> (TN) model,
874	the <A HREF="#hasegawa1985">Hasegawa et al.</A> (HKY) model, and
875	the <A HREF="#schoeniger1994">Schoeniger & von Haeseler</A> (SH) model.
876	The SH model describes the evolution of
877	pairs of dependent nucleotides (pairs are the first and the second nucleotide,
878	the third and the fourth nucleotide and so on). It allows for specification
879	of the transition-transversion ratio. The original model
880	(<A HREF="#schoeniger1994">Schoeniger & von Haeseler</A>, 1994)
881	is obtained by setting the transition-transversion parameter to 0.5.
882	The <A HREF="#jukes1969">Jukes-Cantor</A> (1969),
883	the <A HREF="#felsenstein1981">Felsenstein</A> (1981), and
884	the <A HREF="#kimura1980">Kimura</A> (1980) model are all special cases of
885	the HKY model.
886	<BR>For amino acid sequence data
887	the <A HREF="#dayhoff1978">Dayhoff et al.</A> (Dayhoff) model,
888	the <A HREF="#jones1992">Jones et al.</A> (JTT) model,
889	the <A HREF="#adachi1996">Adachi and Hasegawa</A> (mtREV24) model,
890	the <A HREF="#henikoff1992">Henikoff and Henikoff</A> (BLOSUM 62),
891	the <A HREF="#mueller2000">Mueller and Vingron</A> (VT), and
892	the <A HREF="#whelan2000">Whelan and Goldman</A> (WAG) substitution
893	model are implemented in TREE-PUZZLE.
894	The mtREV24 model describes the evolution of amino acids encoded on mtDNA,
895	and BLOSUM 62 is for distantly related amino acid sequences, as well as the
896	VT model.
897	After TREE-PUZZLE has selected an appropriate amino acid substitution model
898	(marked by 'Auto:') the 'm'-option changes the model in the following order:
899	selected model -> Dayhoff -> JTT -> mtREV24 -> BLOSUM62 -> VT -> WAG ->
900	automatically selected model
901	<BR>For more information
902	please read the section in this manual about models of sequence evolution.
903	See also option "w" (model of rate heterogeneity).</TD>
904	</TR>
905
906	<TR VALIGN=TOP>
907	<TD>
908	<CENTER>n</CENTER>
909	</TD>
910	<TD>If tree reconstruction is selected: number of puzzling steps. Parameter
911	of the quartet puzzling tree search. Generally,
912	the more sequences are used the more puzzling steps are advised. The default
913	value varies depending on the number of sequences (at least 1000).<br>
914
915	If likelihood mapping is selected: number of quartets in a likelihood mapping analysis. Equal to the number
916	of dots in the likelihood mapping diagram. By default 10000 dots/quartets
917	are assumed. To use all possible quartets in clustered likelihood mapping
918	you have to specify a value of n=0.
919	</TD>
920	</TR>
921
922	<TR VALIGN=TOP>
923	<TD>
924	<CENTER>o</CENTER>
925	</TD>
926	<TD>Outgroup. For displaying purposes of the unrooted quartet puzzling
927	tree only. The default outgroup is the first sequence of the data set.</TD>
928	</TR>
929
930	<TR VALIGN=TOP>
931	<TD>
932	<CENTER>p</CENTER>
933	</TD>
934	<TD>Constrain the TN model to the F84 model. This option is only available
935	for the Tamura-Nei model. With this option the expected (!) transition-transversion
936	ratio for the F84 model have to be entered and TREE-PUZZLE computes the corresponding
937	parameters of the TN model (this depends on base frequencies of the data).
938	This allows to compare the results of TREE-PUZZLE and the PHYLIP maximum likelihood
939	programs which use the F84 model.
940	</TD>
941	</TR>
942
943	<TR VALIGN=TOP>
944	<TD>
945	<CENTER>q</CENTER>
946	</TD>
947	<TD>Quits analysis.</TD>
948	</TR>
949
950	<TR VALIGN=TOP>
951	<TD>
952	<CENTER>r</CENTER>
953	</TD>
954	<TD>Y/R transition parameter. This option is only available for the TN
955	model. This parameter is the ratio of the rates for pyrimidine transitions
956	and purine transitions. You do not need to specify this parameter as TREE-PUZZLE
957	estimates it from the data. For precise definition please read the section
958	in this manual about models of sequence evolution.</TD>
959	</TR>
960
961	<TR VALIGN=TOP>
962	<TD>
963	<CENTER>s</CENTER>
964	</TD>
965	<TD>Symmetrize doublet frequencies. This option is only available for the
966	SH model. With this option the doublet frequencies are symmetrized. For
967	example, the frequencies of "AT" and "TA" are then set to the average of both
968	frequencies.</TD>
969	</TR>
970
971	<TR VALIGN=TOP>
972	<TD>
973	<CENTER>t</CENTER>
974	</TD>
975	<TD>Transition/transversion parameter. For nucleotide data only. You do not
976	need to specify this parameter as TREE-PUZZLE estimates it from the data. The
977	precise definition of this parameter is given in the section on models
978	of sequence evolution in this manual.</TD>
979	</TR>
980
981	<TR VALIGN=TOP>
982	<TD>
983	<CENTER>u</CENTER>
984	</TD>
985	<TD>Show unresolved quartets. During the quartet puzzling tree search TREE-PUZZLE
986	counts the number of unresolved quartet trees. An unresolved quartet is
987	a quartet where the maximum likelihood values for each of the three possible
988	quartet topologies are so similar that it is not possible to prefer one
989	of them (<A HREF="#strimmer1997">Strimmer, Goldman, and von Haeseler</A>, 1997).
990	If this option is selected you will get a detailed list of all starlike
991	quartets. Note, for some data
992	sets there may be a lot of unresolved quartets. In this case a list of
993	all unresolved quartets is probably not very useful and also needs a lot
994	of disk space.</TD>
995	</TR>
996
997	<TR VALIGN=TOP>
998	<TD>
999	<CENTER>v</CENTER>
1000	</TD>
1001	<TD>Approximate quartet likelihood. For the quartet puzzling tree search
1002	only. Only for very small data sets it is necessary to compute an exact
1003	maximum likelihood. For larger data sets this option should always be turned
1004	on.</TD>
1005	</TR>
1006
1007	<TR VALIGN=TOP>
1008	<TD>
1009	<CENTER>w</CENTER>
1010	</TD>
1011	<TD>Model of rate heterogeneity. TREE-PUZZLE provides several different models
1012	of rate heterogeneity: uniform rate over all sites (rate homogeneity),
1013	Gamma distributed rates, two rates (1 invariable + 1 variable), and a mixed
1014	model (1 invariable rate + Gamma distributed rates). All necessary parameters
1015	can be estimated by TREE-PUZZLE. Note that whenever invariable sites are taken
1016	into account the parameter estimation will invoke the "e" option to use
1017	an exact likelihood function. For more detailed information please read
1018	the section in this manual about models of sequence evolution. See also
1019	option "m" (model of substitution).</TD>
1020	</TR>
1021
1022	<TR VALIGN=TOP>
1023	<TD>
1024	<CENTER>x</CENTER>
1025	</TD>
1026	<TD>Selects the methods used in the estimation of the model parameters.
1027	Neighbor-joining tree means that a NJ tree is used to estimate the parameters.
1028	Quartet sampling means that a number of random sets of four sequences are
1029	selected to estimate parameters.</TD>
1030	</TR>
1031
1032	<TR VALIGN=TOP>
1033	<TD>
1034	<CENTER>y</CENTER>
1035	</TD>
1036	<TD>Starts analysis.</TD>
1037	</TR>
1038
1039	<TR VALIGN=TOP>
1040	<TD>
1041	<CENTER>z</CENTER>
1042	</TD>
1043	<TD>Computation of clock-like maximum likelihood branch lengths. This option
1044	also invokes the likelihood ratio clock test.</TD>
1045	</TR>
1046	</TABLE>
1047
1048	<H2>
1049	<A NAME="Other Features"></A>Other Features</H2>
1050	For nucleotide data TREE-PUZZLE computes the expected transition/transversion
1051	ratio and the expected pyrimidine transition/purine transition ratio
1052	corresponding to the selected model. Base frequencies play an important
1053	role in the calculation of both numbers.
1054
1055	<P>TREE-PUZZLE also tests with a 5% level chi-square-test whether the base composition
1056	of each sequence is identical to the average base composition of the whole
1057	alignment. All sequences with deviating composition are listed in the TREE-PUZZLE
1058	report file. It is desired that no sequence (possibly except for the outgroup)
1059	has a deviating base composition. Otherwise a basic assumption implicit
1060	in the maximum likelihood calculation is violated.
1061
1062	<P>A hidden feature of TREE-PUZZLE (since version 2.5) is the employment of
1063	a weighting scheme of quartets (<A HREF="#strimmer1997">Strimmer, Goldman,
1064	and von Haeseler</A>, 1997) in the quartet puzzling tree search.
1065
1066	<P>TREE-PUZZLE also computes the average distance between all pairs of sequences
1067	(maximum likelihood distances). The average distances can be viewed as
1068	a rough measure for the overall sequence divergence.
1069
1070	<P>If more than one input tree is provided TREE-PUZZLE uses the
1071	<A HREF="#kishino1989">Kishino-Hasegawa</A> test (1989) to check which
1072	trees are significantly worse than the best tree.
1073
1074	<P>If clock-like maximum-likelihood branch lengths are computed TREE-PUZZLE
1075	checks with the help of a likelihood-ratio test
1076	(<A HREF="#felsenstein1988">Felsenstein</A>, 1988) whether
1077	the data set is clock-like.
1078
1079	<P>TREE-PUZZLE also detects sequences that occur more than once in the data
1080	and that therefore can be removed from the data set to speed up analysis.
1081
1082	<P>If rate heterogeneity is taken into account in the analysis TREE-PUZZLE also
1083	computes the most probable assignment of rate categories to sequence positions,
1084	according <A HREF="#felsenstein1996">Felsenstein and Churchill</A> (1996).
1085
1086	<H2>
1087	<A NAME="Interpretation and Hints"></A>Interpretation and Hints</H2>
1088
1089	<H3>
1090	<A NAME="Quartet Puzzling Support Values"></A>Quartet Puzzling Support
1091	Values</H3>
1092	The quartet puzzling (QP) tree search estimates support values for each
1093	internal branch. They can be interpreted in much the same way as
1094	bootstrap values (though they should not be confused with them).
1095	Branches showing a QP reliability from 90% to 100% can be considered
1096	very strongly supported. Branches with lower reliability (> 70%) can
1097	in principle be also trusted but in this case it is advisable to
1098	check how well the respective internal branch does in comparison to other
1099	branches in the tree (i.e. check relative reliability).
1100	If you are interested in a branch with a low confidence it is also
1101	important to check the alternative groupings that are not included
1102	in the QP tree (they are listed in the TREE-PUZZLE report file in .* format).
1103	There should be a substantial gap between the lowest reliability
1104	value of the QP tree and
1105	the most frequent grouping that is not included in the QP tree.
1106	<H3>
1107	<A NAME="Percentage of Unresolved Quartets"></A>Percentage of Unresolved
1108	Quartets</H3>
1109	TREE-PUZZLE computes the number and the percentage of completely unresolved
1110	maximum likelihood quartets. An unresolved quartet is a quartet where the
1111	maximum likelihood values for each of the three possible quartet topologies
1112	are so similar that it is not possible to prefer one of them
1113	(<A HREF="#strimmer1997">Strimmer, Goldman, and von Haeseler</A>, 1997).
1114	The percentage of the unresolved quartets
1115	among all possible quartets is an indicator of the suitability of the data
1116	for phylogenetic analysis. A high percentage usually results in a highly
1117	multifurcating quartet puzzling tree. If you only have a few unresolved
1118	quartets we recommend to invoke option "u" to get a list of all these quartets.
1119	In a likelihood mapping analysis the percentage of completely unresolved
1120	quartets is shown in the central region of the triangle diagram.
1121	<H3>
1122	<A NAME="Automatic Parameter Estimation"></A>Automatic Parameter Estimation</H3>
1123	TREE-PUZZLE estimates both the parameters of the models of substitution (TN,
1124	HKY) and of the model of rate variation (Gamma distribution, fraction of
1125	invariable sites) without prior knowledge of an overall tree by a number
1126	of different strategies based on maximum likelihood. For all estimated
1127	parameters a corresponding standard error (S.E.) is computed. If you have
1128	good arguments to choose a different set of parameters than the values
1129	obtained by TREE-PUZZLE don't hesitate to use them. If sequences are extremely
1130	similar it is very hard for every algorithm to extract information about
1131	the model of substitution from the data set. Also, be careful if the
1132	estimated parameter values
1133	are very close to the internal upper and lower bounds:
1134	<TABLE CELLPADDING=2 >
1135	<TR VALIGN=TOP>
1136	<TD><B>Parameter (Symbol)</B> </TD>
1137
1138	<TD><B>Minimal Value</B> </TD>
1139
1140	<TD><B>Maximal Value</B> </TD>
1141	</TR>
1142
1143	<TR VALIGN=TOP>
1144	<TD>Transition/transversion parameter (t) </TD>
1145
1146	<TD>0.20 </TD>
1147
1148	<TD>30.00 </TD>
1149	</TR>
1150
1151	<TR VALIGN=TOP>
1152	<TD>Y/R transition parameter (gamma) </TD>
1153
1154	<TD>0.10 </TD>
1155
1156	<TD>6.00 </TD>
1157	</TR>
1158
1159	<TR VALIGN=TOP>
1160	<TD>Fraction of invariable sites (theta) </TD>
1161
1162	<TD>0.00 </TD>
1163
1164	<TD>0.99 </TD>
1165	</TR>
1166
1167	<TR VALIGN=TOP>
1168	<TD>Gamma rate heterogeneity parameter (alpha) </TD>
1169
1170	<TD>0.01 </TD>
1171
1172	<TD>99 </TD>
1173	</TR>
1174	</TABLE>
1175
1176	<H3>
1177	<A NAME="Likelihood Mapping"></A>Likelihood Mapping</H3>
1178	Likelihood mapping (<A HREF="#strimmer1997">Strimmer and von Haeseler</A>,
1179	1997) is a method to analyzethe support for internal branches in a tree
1180	without having to compute an overall tree.
1181	Every internal branch in an a completely resolved tree defines
1182	up to four clusters of sequences. Sometimes only the relationship of these
1183	groups are of interest and not details of the structure of the clusters
1184	themselves. Then a likelihood mapping analysis is sufficient.
1185	The corresponding likelihood mapping triangle diagrams (as contained in
1186	various output files generated by TREE-PUZZLE) will
1187	illucidate the possible relationships in detail.
1188
1189	<H3><A NAME="Batch Mode"></A>Batch Mode</H3>
1190	Running TREE-PUZZLE from a Unix batch file is straightforward despite the lack
1191	of command switches. For example, to run TREE-PUZZLE with a the transition/transversion
1192	parameter equal to 10 the following lines in a batch file are sufficient:
1193	<PRE>
1194	puzzle << !
1195	t
1196	10
1197	y
1198	!
1199	</PRE>
1200	All other parameters can also be accessed the same way.
1201
1202	<H2>
1203	<A NAME="Limits and Error Messages"></A>Limits and Error Messages</H2>
1204	TREE-PUZZLE has a built-in limit to allow data sets only up to 257 sequences
1205	in order to avoid overflow of internal integer variables. At least 32767
1206	sites should be possible depending on the compiler used. Computation time
1207	will be the largest constraint even if sufficient computer memory is available.
1208	If rate heterogeneity is taken into account every additional category slows
1209	down the overall computation by the amount of time needed for one complete
1210	run assuming rate homogeneity.
1211
1212	<P>If problems are encountered TREE-PUZZLE terminates program execution and
1213	returns a plain text error message. Depending on the severity errors can be
1214	classified into three groups:
1215	<TABLE CELLPADDING=2 >
1216	<TR VALIGN=TOP>
1217	<TD>"HALT " errors: </TD>
1218
1219	<TD>Very severe. You should never ever see one of these messages. If so,
1220	please contact the developers! </TD>
1221	</TR>
1222
1223	<TR VALIGN=TOP>
1224	<TD>"Unable to proceed" errors: </TD>
1225
1226	<TD>Harmless but annoying. Mostly memory errors (not enough RAM) or problems
1227	with the format of the input files. </TD>
1228	</TR>
1229
1230	<TR VALIGN=TOP>
1231	<TD>Other errors: </TD>
1232
1233	<TD>Completely uncritical. Occur mostly when options of TREE-PUZZLE are being
1234	set. </TD>
1235	</TR>
1236	</TABLE>
1237	A standard machine (1996 Unix workstation) with 32 to 64 MB RAM TREE-PUZZLE
1238	can easily do maximum likelihood tree searches including estimation of
1239	support values for data sets with 50-100 sequences. As likelihood mapping
1240	is not memory consuming and computationally quite fast it can be applied
1241	to large data sets as well.
1242	<H2>
1243	<A NAME="Are Quartets Reliable"></A>Are Quartets Reliable?</H2>
1244	Quartets may be intrinsically one of the most difficult phylogenies to
1245	resolve accurately (cf. <A HREF="#hillis1996">Hillis</A>, 1996).
1246	It has been asked whether this is
1247	a problem for quartet puzzling because it works with quartets.
1248
1249	<P>However, this is not true. According to Hillis' findings
1250	(<A HREF="#hillis1996">Hillis</A>, 1996),
1251	quartets can be hard, but extra information helps. That is, if all you
1252	have are data on species (A, B, C, D) then it might be relatively difficult
1253	to find the correct tree for them. But if you have additional data (species
1254	E, F, G, ...) and try to find a tree for all the species, then that part
1255	of the tree relating (A, B, C, D) will more likely be correct than if you
1256	had just the data for (A, B, C, D). In Hillis' big 'model' tree, there
1257	are many examples of subsets of 4 species which in themselves might be
1258	hard to resolve correctly, but which are correctly resolved thanks to the
1259	(...large amount of...) additional data. TREE-PUZZLE (quartet puzzling) also
1260	gains advantage from extra data in the same way. It's 'understanding' or
1261	resolution of the quartet (A, B, C, D) might be incorrect, but the information
1262	on the relationships of (A, B, C, D) implicit in its treatment of (A, B,
1263	C, E), (A, B, E, D), (A, E, C, D), (E, B, C, D), (A, B, C, F), (A, B, F,
1264	D), (A, F, C, D), (F, B, C, D), (A, B, C, G), etc. etc. should overcome
1265	this problem.
1266
1267	<P>The facts about how well TREE-PUZZLE actually works have been investigated
1268	in the <A HREF="#strimmer1996">Strimmer and von Haeseler</A> (1996) and
1269	<A HREF="#strimmer1997">Strimmer, Goldman, and von Haeseler</A> (1997) papers.
1270	Their results cannot be altered by Hillis' findings.
1271	Considered as a heuristic search for maximum likelihood trees, quartet
1272	puzzling works very well.
1273
1274	<P>(This section follows N. Goldman, personal communication).
1275	<H2>
1276	<A NAME="Other Programs"></A>Other Programs</H2>
1277	There are a number of other very useful and widespread programs to reconstruct
1278	phylogenetic relationships and to analyse molecular sequence data that
1279	are available free of charge. Here are the URLS of some web pages that
1280	provide links to most of them (including the PHYLIP package and
1281	the MOLPHY and PAML maximum likelihood programs):
1282	<DL>
1283
1284	<DD>
1285	Joe Felsenstein's list of programs (well-organized and pretty exhaustive):<br>
1286	<A
1287	HREF="http://evolution.genetics.washington.edu/phylip/software.html">http://evolution.genetics.washington.edu/phylip/software.html</A></DD>
1288
1289
1290	<DD>
1291	"Tree of Life" software page:<br>
1292	<A HREF="http://phylogeny.arizona.edu/tree/programs/programs.html">http://phylogeny.arizona.edu/tree/programs/programs.html</A></DD>
1293
1294
1295	<DD>
1296	European Bioinformatics Institute:<br>
1297	<A HREF="http://www.ebi.ac.uk/biocat/biocat.html">http://www.ebi.ac.uk/biocat/biocat.html</A></DD>
1298
1299	</DL>
1300
1301	<H2>
1302	<A NAME="Acknowledgements"></A>Acknowledgements</H2>
1303	The maximum likelihood kernel of TREE-PUZZLE is an offspring of the program
1304	NucML/ProtML version 2.2 by Jun Adachi and Masami Hasegawa (<A HREF="ftp://sunmh.ism.ac.jp/pub/molphy">ftp://sunmh.ism.ac.jp/pub/molphy</A>).
1305	We thank them for generously allowing us to use the source code of their
1306	program.
1307	We would also like to thank
1308	the <A HREF="http://www.ebi.ac.uk">European Bioinformatics Institute (EBI)</A>,
1309	the <A HREF="http://www.pasteur.fr">Institut Pasteur</A>,
1310	and the <A HREF="http://www.indiana.edu">University of Indiana</A>
1311	(i.e. Don Gilbert)
1312	for kindly distributing the TREE-PUZZLE program.
1313
1314	We thank Stephane Bortzmeyer for his with debugging of
1315	<EM>floating point exception</EM> errors.
1316
1317	We also thank Peter Foster for pointing out the inconsistency
1318	in the invariable site models in respect to other programs.
1319
1320	Finally we thank the
1321	<A HREF="http://www.dfg.de">Deutsche Forschungsgemeinschaft</A>
1322	(VI 160/3-1 and Ha 1628/4-1) and the Max-Planck-Society
1323	for financial support.
1324
1325	<H2><A NAME="References"></A>References</H2>
1326
1327	<A NAME="adachi1996"></A>
1328	Adachi, J., and M. Hasegawa. 1996. MOLPHY: programs for molecular phylogenetics,
1329	version 2.3. Institute of Statistical Mathematics, Tokyo.
1330
1331	<P><A NAME="adachi1996"></A>
1332	Adachi, J., and M. Hasegawa. 1996. Model of amino acid substitution
1333	in proteins encoded by mitochondrial DNA. <I>J. Mol. Evol.</I> <B>42</B>:
1334	459-468.
1335
1336	<P><A NAME="dayhoff1978"></A>
1337	Dayhoff, M. O., R. M. Schwartz, and B. C. Orcutt. 1978. A model of evolutionary
1338	change in proteins. In: Dayhoff, M. O. (ed.) Atlas of Protein Sequence
1339	Structure, Vol. 5, Suppl. 3. National Biomedical Research Foundation, Washington
1340	DC, pp. 345-352.
1341
1342	<P><A NAME="felsenstein1981"></A>
1343	Felsenstein, J. 1981. Evolutionary trees from DNA sequences: A maximum
1344	likelihood approach. <I>J. Mol. Evol.</I> <B>17</B>: 368-376.
1345
1346	<P><A NAME="felsenstein1984"></A>
1347	Felsenstein, J. 1984. Distance methods for inferring phylogenies:
1348	A Justification. <I>Evolution</I> <B>38</B>: 16-24.
1349
1350	<P><A NAME="felsenstein1988"></A>
1351	Felsenstein, J. 1988. Phylogenies from molecular sequences: Inference
1352	and reliability. <I>Annu. Rev. Genet.</I> <B>22</B>: 521-565.
1353
1354	<P><A NAME="felsenstein1993"></A>
1355	Felsenstein, J. 1993. PHYLIP (Phylogeny Inference Package) version 3.5c.
1356	Distributed by the author. Department of Genetics, University of Washington,
1357	Seattle.
1358
1359	<P><A NAME="felsenstein1996"></A>
1360	Felsenstein, J., and G.A. Churchill. 1996. A hidden Markov model approach
1361	to variation among sites in rate of evolution. <I>Mol. Biol. Evol.</I>
1362	<B>13</B>: 93-104.
1363
1364	<P><A NAME="gropp1998"></A>
1365	Gropp, W., S. Huss-Lederman, A. Lumsdaine, E. Lusk, B. Nitzberg,
1366	W. Saphir, and M. Snir. 1998. MPI - The Complete Reference: Volume 2,
1367	The MPI Extensions. 2nd Edition, The MIT Press, Cambridge, MA.
1368
1369	<P><A NAME="gu1995"></A>
1370	Gu, X., Y.-X. Fu, and W.-H. Li. 1995. Maximum likelihood estimation
1371	of the heterogeneity of substitution rate among nucleotide sites. <I>Mol.
1372	Biol. Evol.</I> <B>12</B>: 546-557.
1373
1374	<P><A NAME="hasegawa1985"></A
1375	>Hasegawa, M., H. Kishino, and K. Yano. 1985. Dating of the human-ape
1376	splitting by a molecular clock of mitochondrial DNA. <I>J. Mol. Evol.</I>
1377	<B>22</B>: 160-174.
1378
1379	<P><A NAME="henikoff1992"></A>
1380	Henikoff, S., J. G. Henikoff. 1992. Amino acid substitution matrices
1381	from protein blocks. <I>PNAS (USA)</I> <B>89</B>:10915-10919.
1382
1383	<P><A NAME="hillis1996"></A>
1384	Hillis, D. M. 1996. Inferring complex phylogenies. <I>Nature</I>
1385	<B>383</B>:130-131.
1386
1387	<P><A NAME="jukes1969"></A>
1388	Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules.
1389	In: Munro, H. N. (ed.) Mammalian Protein Metabolism, New York: Academic
1390	Press, pp. 21-132.
1391
1392	<P><A NAME="jones1992"></A>
1393	Jones, D. T., W. R. Taylor, and J. M. Thornton. 1992. The rapid generation
1394	of mutation data matrices from protein sequences. <I>CABIOS</I> <B>8</B>:
1395	275-282.
1396
1397	<P><A NAME="kimura1980"></A>
1398	Kimura, M. 1980. A simple method for estimating evolutionary rates of
1399	base substitutions through comparative studies of nucleotide sequences.
1400	<I>J. Mol. Evol.</I> <B>16</B>: 111-120.
1401
1402	<P><A NAME="kishino1989"></A>
1403	Kishino, H., and M. Hasegawa. 1989. Evaluation of the maximum likelihood
1404	estimate of the evolutionary tree topologies from DNA sequence data, and
1405	the branching order in Hominoidea. <I>J. Mol. Evol.</I> <B>29</B>: 170-179.
1406
1407	<P><A NAME="mueller2000"></A>
1408	Mueller, T., and M. Vingron. 2000. Modeling Amino Acid Replacement.
1409	<I>J. Comp. Biol.</I>, to appear
1410	(<A HREF="http://www.dkfz-heidelberg.de/tbi/people/tmueller/paper/paper.ps">preprint of the article</A>)
1411
1412	<P><A NAME="saitou1987"></A>
1413	Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method
1414	for reconstructing phylogenetic trees. <I>Mol. Biol. Evol.</I> <B>4</B>:
1415	1406-425.
1416
1417	<P><A NAME="schoeniger1994"></A>
1418	Schoeniger, M., and A. von Haeseler. 1994. A stochastic model for
1419	the evolution of autocorrelated DNA sequences. <I>Mol. Phyl. Evol.</I>
1420	<B>3</B>: 240-247.
1421
1422	<P><A NAME="snir1998"></A>
1423	Snir, M., S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra.
1424	1998. MPI - The Complete Reference: Volume 1, The MPI Core. 2nd Edition,
1425	The MIT Press, Cambridge, MA.
1426
1427	<P><A NAME="strimmer1996"></A>
1428	Strimmer, K., and A. von Haeseler. 1996. Quartet puzzling: a quartet
1429	maximum likelihood method for reconstructing tree topologies. <I>Mol. Biol.
1430	Evol.</I> <B>13</B>: 964-969.
1431
1432	<P><A NAME="strimmer1997"></A>
1433	Strimmer, K., N. Goldman, and A. von Haeseler. 1997. Bayesian probabilities
1434	and quartet puzzling. <I>Mol. Biol. Evol.</I> <B>14</B>: 210-211.
1435
1436	<P><A NAME="strimmer1997"></A>
1437	Strimmer, K., and A. von Haeseler. 1997. Likelihood-mapping: a simple
1438	method to visualize phylogenetic content of a sequence alignment. <I>PNAS
1439	(USA).</I> <B>94</B>:6815-6819.
1440
1441	<P><A NAME="tamura1993"></A>
1442	Tamura, K., and M. Nei. 1993. Estimation of the number of nucleotide
1443	substitutions in the control region of mitochondrial DNA in humans and
1444	chimpanzees. <I>Mol. Biol. Evol.</I> <B>10</B>: 512-526.
1445
1446	<P><A NAME="tamura1994"></A>
1447	Tamura K. 1994. Model selection in the estimation of the number of
1448	nucleotide substitutions. <I>Mol. Biol. Evol.</I> <B>11</B>: 154-157.
1449
1450	<P><A NAME="thompson1994"></A>
1451	Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: Improving
1452	the sensitivity of progressive multiple sequence alignment through sequence
1453	weighting, positions-specific gap penalties and weight matrix choice. <I>Nucl.
1454	Acids Res.</I> <B>22</B>: 4673-4680.
1455
1456	<P><A NAME="whelan2000"></A>
1457	Whelan, S. and Goldman, N. 2000. A new empirical model of
1458	amino acid evolution. <I>Manuscript in prep.</I>
1459
1460	<P><A NAME="yang1994"></A>
1461	Yang, Z. 1994. Maximum likelihood phylogenetic estimation from DNA sequences
1462	with variable rates over sites: approximate methods. <I>J. Mol. Evol.</I>
1463	<B>39</B>:306-314.
1464
1465
1466	<H2>
1467	<A NAME="Known Bugs"></A>Known Bugs</H2>
1468
1469	On Alpha based computers sometimes <EM>floating point exception</EM>
1470	errors occur. Some of those result on a bug in the malloc routine
1471	in the system routines of the Compaq operating system. We recomend
1472	to use the GNU cc compiler
1473	(<TT><A HREF="http://egcs.gnu.org">http://egcs.gnu.org</A></TT>),
1474	which does not use the system malloc routine.
1475
1476	For other occurrances of the <EM>floating point exception</EM>
1477	we need datasets and information about the operating system
1478	to reproduce and debug those errors.
1479
1480	<H2>
1481	<A NAME="Version History"></A>Version History</H2>
1482	The TREE-PUZZLE program has first been distributed in 1995 under the name
1483	PUZZLE. Since then it has
1484	been continually improved. Here is a list of the most important changes.
1485	<TABLE CELLPADDING=2 >
1486
1487	<TR VALIGN=TOP>
1488	<TD>5.0</TD>
1489
1490	<TD>Puzzle tree reconstruction part parallelized using the MPI standard
1491	(Message Passing Interface).
1492	<BR>Possibility added to give input file and user tree file at the command line.
1493	Output files renamed to the form PREFIX.EXTENSION, where PREFIX is the
1494	input file name or, if used, the user tree file name.
1495	The EXTENSION could be one of the following: puzzle (PUZZLE report),
1496	tree (tree file), dist (ML distance file), eps (likelihood mapping output
1497	in eps format), qlist (bad quartets), qstep (puzzling step tree IDs as they
1498	occur in the analysis), or qtorder (sorted unique list of puzzling step trees).
1499	<BR>The likelihood value is added to the treefile as a leading comment
1500	("[ lh=x.xxx ]") to the tree string.
1501	<BR>VT (variable time) matrix (<A HREF="#mueller2000">Mueller and
1502	Vingron</A>, 2000) and WAG matrix (<A HREF="#whelan2000">Whelan and
1503	Goldman</A>, 2000)
1504	added to the AA substitution models.
1505	<BR>The Data type and AA-model options in the menu now show the
1506	automatically set type/model first. These can now be changed using 'd' or
1507	'm' key in an order independent from the type/model selected. This makes
1508	it possible to select a desired AA substitution model or data type by
1509	piping letters to the standard input without knowing PUZZLE's preselection.
1510	<BR>Parameters are written to file when estimated before evaluation of
1511	the quartets.
1512	<BR>The inconsistency to respect to other programs in handling
1513	invariable sites has been fixed.
1514	<BR>Some minor bug fixes (e.g. the clockbug and another in the optimization
1515	routine have been fixed).
1516	</TD>
1517	</TR>
1518
1519	<TR VALIGN=TOP>
1520	<TD>4.0.2</TD>
1521
1522	<TD>Update to provide precompiled Windows 95/98/NT executables. In addition:
1523	Internal rearrangement of rate matrices.
1524	Improved BLOSUM 62 matrix. Endless input loop for input
1525	files restricted to 10 trials.
1526	Source code clean up to remove compile time warnings.
1527	Explicit quit option in menu. Changes in NJ tree code.
1528	Updates of documentation (address changes, correction of errors).
1529	</TD>
1530	</TR>
1531
1532	<TR VALIGN=TOP>
1533	<TD>4.0.1</TD>
1534
1535	<TD>Maintenance release. Correction of mtREV matrix. Fix of the "intree bug".
1536	Removal of stringent runtime-compatibility check to allow out-of-the-box compile
1537	on alpha. More accurate gamma distribution allowing 16 instead of 8 categories
1538	and ensuring a better alpha > 1.0. Update of documentation (mainly address changes).
1539	More Unix-like file layout, and change of license to GPL.
1540	</TD>
1541	</TR>
1542
1543	<TR VALIGN=TOP>
1544	<TD>4.0 </TD>
1545
1546	<TD>Executables for Windows 95/NT and OS/2 instead of MS-DOS. Computation
1547	of clock-like branch lengths (also for amino acids and for non-binary trees).
1548	Automatic likelihood ratio clock test. Model for two-state sequences data
1549	(0,1) included. Display of most probable assignment of rates to sites.
1550	Identification of groups of identical sequences. Possibility to read multiple
1551	input trees. Kishino-Hasegawa test to check whether trees are significantly
1552	different. BLOSUM 62 model of amino acid substitution
1553	(<A HREF="#henikoff1992">Henikoff-Henikoff</A>, 1992).
1554	Use of parameter alpha instead of eta = 1/(1+alpha) (for rate heterogeneity).
1555
1556	Improvements to user interface. SH model can be applied to 1st and 2nd
1557	codon positions. Automatic check for compatible compiler settings. Workaround
1558	for severe runtime problem when the gcc compiler was used.</TD>
1559	</TR>
1560
1561	<TR VALIGN=TOP>
1562	<TD>3.1 </TD>
1563
1564	<TD>Much improved user interface to rate heterogeneity (less confusing
1565	menu, rearranged outfile, additional out-of-range check). Possibility to
1566	read rooted input trees (automatic removal of basal bifurcation). Computation
1567	of average distance between all pairs of sequences. Fix of a bug that caused
1568	PUZZLE 3.0 to crash on some systems (DEC Alpha). Cosmetic changes in program
1569	and documentation. </TD>
1570	</TR>
1571
1572	<TR VALIGN=TOP>
1573	<TD>3.0 </TD>
1574
1575	<TD>Rate heterogeneity included in all models of substitution (Gamma distribution
1576	plus invariable sites). Likelihood mapping analysis with Postscript output
1577	added. Much more sophisticated maximum likelihood parameter estimation
1578	for all model parameters including those of rate heterogeneity. Codon positions
1579	selectable. Update to mtREV24. New icon. Less verbose runtime messages.
1580	HTML documentation. Better internal error classification. More information
1581	in outfile (number of constant positions etc.). </TD>
1582	</TR>
1583
1584	<TR VALIGN=TOP>
1585	<TD>2.5.1 </TD>
1586
1587	<TD>Fix of a bug (present only in version 2.5) related to computation of
1588	the variance of the maximum likelihood branch lengths that caused occasional
1589	crashes of PUZZLE on some systems when applied to data sets containing many
1590	very similar sequences. Drop of support for non-FPU Macintosh version.
1591	Corrections in manual. </TD>
1592	</TR>
1593
1594	<TR VALIGN=TOP>
1595	<TD>2.5 </TD>
1596
1597	<TD>Improved QP algorithm (<A HREF="#strimmer1997">Strimmer, Goldman, and
1598	von Haeseler</A>, 1997). Bug
1599	fixes in ML engine, computation of ML distances and ML branch lengths,
1600	optional input of a user tree, F84 model added, estimation of all TN model
1601	parameters and corresponding standard errors, CLUSTAL W treefile convention
1602	adopted to allow to show branch lengths and QP support values simultaneously,
1603	display of unresolved quartets, update of mtREV matrix, source code more
1604	compatible with some almost-ANSI compilers, more safety checks in the code. </TD>
1605	</TR>
1606
1607	<TR VALIGN=TOP>
1608	<TD>2.4 </TD>
1609
1610	<TD>Automatic data type recognition, chi-square-test on base composition,
1611	automatic selection of best amino acid model, estimation of transition-transversion
1612	parameter, ASCII plot of quartet puzzling tree into the outfile. </TD>
1613	</TR>
1614
1615	<TR VALIGN=TOP>
1616	<TD>2.3 </TD>
1617
1618	<TD>More models, many usability improvements, built-in consensus tree routines,
1619	more supported systems, bug fixes, no more dependencies of input order.
1620	First EBI distributed version. </TD>
1621	</TR>
1622
1623	<TR VALIGN=TOP>
1624	<TD>2.2 </TD>
1625
1626	<TD>Optimized internal data structure requiring much less computer memory.
1627	Bug fixes. </TD>
1628	</TR>
1629
1630	<TR VALIGN=TOP>
1631	<TD>2.1 </TD>
1632
1633	<TD>Bug fixes concerning algorithm and transition/transversion parameter. </TD>
1634	</TR>
1635
1636	<TR VALIGN=TOP>
1637	<TD>2.0 </TD>
1638
1639	<TD>Complete revision merging the maximum likelihood and the quartet puzzling
1640	routines into one user friendly program. First electronic distribution. </TD>
1641	</TR>
1642
1643	<TR VALIGN=TOP>
1644	<TD>1.0 </TD>
1645
1646	<TD>First public release, presented at the 1995 phylogenetic workshop (15-17
1647	June 1995) at the University of Bielefeld, Germany. </TD>
1648	</TR>
1649	</TABLE>
1650
1651	</BODY>
1652	</HTML>
1653

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: branches/stable/GDE/TREEPUZZLE/doc/manual.html

Download in other formats: