1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> |
---|
2 | |
---|
3 | <HTML> |
---|
4 | <!-- To view this document properly please use a HTML browser --> |
---|
5 | |
---|
6 | <HEAD> |
---|
7 | <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> |
---|
8 | <TITLE>Documentation of TREE-PUZZLE 5.0</TITLE> |
---|
9 | </HEAD> |
---|
10 | <BODY BGCOLOR="#FFFFFF"> |
---|
11 | |
---|
12 | <H1> |
---|
13 | <img ALT="PUZZLE Logo" SRC="puzzle.gif" HSPACE=10 BORDER=0 height=32 width=32 align=LEFT> |
---|
14 | <b><font size="+3">TREE-PUZZLE Manual</font></b> |
---|
15 | <img ALT="PPUZZLE Logo" SRC="ppuzzle.gif" HSPACE=10 BORDER=0 height=32 width=32> |
---|
16 | </H1> |
---|
17 | <B>Maximum likelihood analysis for nucleotide, amino acid, and two-state data</B> |
---|
18 | |
---|
19 | |
---|
20 | <P>Version 5.0 |
---|
21 | <BR>October 2000 |
---|
22 | <BR>Copyright 1999-2000 by Heiko A. Schmidt, Korbinian Strimmer, Martin Vingron, and Arndt von Haeseler |
---|
23 | <BR>Copyright 1995-1999 by Korbinian Strimmer and Arndt von Haeseler |
---|
24 | |
---|
25 | <P><b>Heiko A. Schmidt</b>, |
---|
26 | email: h.schmidt@dkfz-heidelberg.de, |
---|
27 | <A HREF="http://www.dkfz-heidelberg.de/tbi/">Theoretical Bioinformatics</A>, |
---|
28 | <A HREF="http://www.dkfz-heidelberg.de/">DKFZ</A>, |
---|
29 | Im Neuenheimer Feld 280, D-69124 Heidelberg, Germany. |
---|
30 | |
---|
31 | <P><b>Korbinian Strimmer</b>, |
---|
32 | email: korbinian.strimmer@zoo.ox.ac.uk, |
---|
33 | <A HREF="http://www.zoo.ox.ac.uk/">Department of Zoology</A>, |
---|
34 | <A HREF="http://www.ox.ac.uk/">University of Oxford</A>, |
---|
35 | South Parks Road, Oxford OX1 3PS, UK. |
---|
36 | |
---|
37 | <P><b>Martin Vingron</b>, |
---|
38 | email: vingron@dkfz-heidelberg.de, |
---|
39 | <A HREF="http://www.dkfz-heidelberg.de/tbi/">Theoretical Bioinformatics</A>, |
---|
40 | <A HREF="http://www.dkfz-heidelberg.de/">DKFZ</A>, |
---|
41 | Im Neuenheimer Feld 280, D-69124 Heidelberg, Germany. |
---|
42 | |
---|
43 | <P><b>Arndt von Haeseler</b>, |
---|
44 | email: haeseler@eva.mpg.de, |
---|
45 | <A HREF="http://www.eva.mpg.de/">Max-Planck-Institute for Evolutionary Anthropology</A>, |
---|
46 | Inselstr. 22, D-04103 Leipzig, Germany. |
---|
47 | |
---|
48 | <p><font size ="-1" color ="brown">The official name of the program has been |
---|
49 | changed to TREE-PUZZLE to avoid legal conflict with the Fraunhofer |
---|
50 | Gesellschaft. We are sorry for any inconvenience this may cause to you. |
---|
51 | Any reference to PUZZLE in this package is only colloquial and refers |
---|
52 | to TREE-PUZZLE. |
---|
53 | </font> |
---|
54 | |
---|
55 | <P>TREE-PUZZLE is a computer program to reconstruct phylogenetic trees from |
---|
56 | molecular sequence data by maximum likelihood. It implements a fast tree |
---|
57 | search algorithm, quartet puzzling, that allows analysis of large data |
---|
58 | sets and automatically assigns estimations of support to each internal |
---|
59 | branch. TREE-PUZZLE also computes pairwise maximum likelihood distances as well |
---|
60 | as branch lengths for user specified trees. Branch lengths can also be |
---|
61 | calculated under the clock-assumption. In addition, TREE-PUZZLE offers a novel |
---|
62 | method, likelihood mapping, to investigate the support of a hypothesized |
---|
63 | internal branch without computing an overall tree and to visualize the |
---|
64 | phylogenetic content of a sequence alignment. TREE-PUZZLE also conducts a number |
---|
65 | of statistical tests on the data set (chi-square test for homogeneity of |
---|
66 | base composition, likelihood ratio to test the clock hypothesis, Kishino-Hasegawa |
---|
67 | test). The models of substitution provided by TREE-PUZZLE are TN, HKY, F84, |
---|
68 | SH for nucleotides, Dayhoff, JTT, mtREV24, BLOSUM 62, VT, WAG for amino acids, and |
---|
69 | F81 for two-state data. Rate heterogeneity is modelled by a discrete Gamma |
---|
70 | distribution and by allowing invariable sites. The corresponding parameters |
---|
71 | can be inferred from the data set. |
---|
72 | |
---|
73 | <P>TREE-PUZZLE is available free of charge from |
---|
74 | <DL> |
---|
75 | <DD> |
---|
76 | <A HREF="http://www.tree-puzzle.de/">http://www.tree-puzzle.de/</A> (TREE-PUZZLE home page) |
---|
77 | </DD> |
---|
78 | |
---|
79 | <DD> |
---|
80 | <A HREF="http://www.dkfz-heidelberg.de/tbi/tree-puzzle/">http://www.dkfz-heidelberg.de/tbi/tree-puzzle/</A> (TREE-PUZZLE home page mirror at DKFZ) |
---|
81 | </DD> |
---|
82 | |
---|
83 | <DD> |
---|
84 | <A HREF="http://iubio.bio.indiana.edu/soft/molbio/evolve">http://iubio.bio.indiana.edu/soft/molbio/evolve</A> |
---|
85 | (IUBio archive www, USA) |
---|
86 | </DD> |
---|
87 | |
---|
88 | <DD> |
---|
89 | <A HREF="ftp://iubio.bio.indiana.edu/molbio/evolve">ftp://iubio.bio.indiana.edu/molbio/evolve</A> |
---|
90 | (IUBio archive ftp, USA) |
---|
91 | </DD> |
---|
92 | |
---|
93 | <DD> |
---|
94 | <A HREF="ftp://ftp.ebi.ac.uk/pub/software">ftp://ftp.ebi.ac.uk/pub/software</A> |
---|
95 | (European Bioinformatics Institute, UK) |
---|
96 | </DD> |
---|
97 | |
---|
98 | <DD> |
---|
99 | <A HREF="ftp://ftp.pasteur.fr/pub/GenSoft">ftp://ftp.pasteur.fr/pub/GenSoft</A> |
---|
100 | (Institut Pasteur, France) |
---|
101 | </DD> |
---|
102 | |
---|
103 | </DL> |
---|
104 | TREE-PUZZLE is written in ANSI C. It will run on most personal computers and |
---|
105 | workstations if compiled by an appropriate C compiler. |
---|
106 | The tree reconstruction part of TREE-PUZZLE has been parallelized using |
---|
107 | the Message Passing Interface (MPI) |
---|
108 | library standard (<A HREF="#snir1998">Snir et al.</A>, 1998 and |
---|
109 | <A HREF="#gropp1998">Gropp et al.</A>, 1998). If desired to run |
---|
110 | TREE-PUZZLE in parallel you need an implementation of the MPI library on your |
---|
111 | system as well. |
---|
112 | |
---|
113 | <P>Please read the <A HREF="#Installation">installation section</A> |
---|
114 | for more details. |
---|
115 | |
---|
116 | <P>We suggest that this documentation should be read before using TREE-PUZZLE |
---|
117 | the first time. If you do not have the time to read this manual completely |
---|
118 | please do read at least the sections <A HREF="#Input/Output Conventions">Input/Output |
---|
119 | Conventions</A> and <A HREF="#Quick Start">Quick Start </A>below. Then |
---|
120 | you should be able to use the TREE-PUZZLE program, especially if you have some |
---|
121 | experience with the PHYLIP programs. The other sections should then be read |
---|
122 | at a later time. |
---|
123 | |
---|
124 | <P>To find out what's new in version 5.0 please read the |
---|
125 | <A HREF="#Version History">Version History</A>. |
---|
126 | |
---|
127 | <P> |
---|
128 | <HR ALIGN=center WIDTH="100%" SIZE=2> |
---|
129 | <CENTER><H2>Contents</H2></CENTER><P> |
---|
130 | |
---|
131 | <UL> |
---|
132 | <LI><A HREF="#Legal Stuff">Legal Stuff</A></LI> |
---|
133 | |
---|
134 | <LI><A HREF="#Installation">Installation</A> |
---|
135 | <UL> |
---|
136 | <LI><A HREF="#Unix">UNIX</A></LI> |
---|
137 | <LI><A HREF="#MacOS">MacOS</A></LI> |
---|
138 | <LI><A HREF="#Win32">Windows 95/98/NT</A></LI> |
---|
139 | <LI><A HREF="#VMS">VMS</A></LI> |
---|
140 | <LI><A HREF="#MPI">Parallel TREE-PUZZLE</A></LI> |
---|
141 | </UL> |
---|
142 | </LI> |
---|
143 | |
---|
144 | <LI><A HREF="#Introduction">Introduction</A></LI> |
---|
145 | <LI><A HREF="#Input/Output Conventions">Input/Output Conventions</A> |
---|
146 | <UL> |
---|
147 | <LI><A HREF="#Sequence Input">Sequence Input</A></LI> |
---|
148 | <LI><A HREF="#General Output">General Output</A></LI> |
---|
149 | <LI><A HREF="#Distance Output">Distance Output</A></LI> |
---|
150 | <LI><A HREF="#Tree Output">Tree Output</A></LI> |
---|
151 | <LI><A HREF="#Tree Input">Tree Input</A></LI> |
---|
152 | <LI><A HREF="#Likelihood Mapping Output">Likelihood Mapping Output</A></LI> |
---|
153 | </UL> |
---|
154 | </LI> |
---|
155 | |
---|
156 | <LI><A HREF="#Quick Start">Quick Start</A></LI> |
---|
157 | <LI><A HREF="#Models of Sequence Evolution">Models of Sequence Evolution</A> |
---|
158 | <UL> |
---|
159 | <LI><A HREF="#Models of Substitution">Models of Substitution</A></LI> |
---|
160 | <LI><A HREF="#Models of Rate Heterogeneity">Models of Rate Heterogeneity</A></LI> |
---|
161 | </UL> |
---|
162 | </LI> |
---|
163 | |
---|
164 | <LI><A HREF="#Options Available">Available Options</A></LI> |
---|
165 | <LI><A HREF="#Other Features">Other Features</A></LI> |
---|
166 | <LI><A HREF="#Interpretation and Hints">Interpretation and Hints</A> |
---|
167 | <UL> |
---|
168 | <LI><A HREF="#Quartet Puzzling Support Values">Quartet Puzzling Support Values</A></LI> |
---|
169 | <LI><A HREF="#Percentage of Unresolved Quartets">Percentage of Unresolved Quartets</A></LI> |
---|
170 | <LI><A HREF="#Automatic Parameter Estimation">Automatic Parameter Estimation</A></LI> |
---|
171 | <LI><A HREF="#Likelihood Mapping">Likelihood Mapping</A></LI> |
---|
172 | <LI><A HREF="#Batch Mode">Batch Mode</A></LI> |
---|
173 | </UL> |
---|
174 | </LI> |
---|
175 | <LI><A HREF="#Limits and Error Messages">Limits and Error Messages</A></LI> |
---|
176 | <LI><A HREF="#Are Quartets Reliable">Are Quartets Reliable?</A></LI> |
---|
177 | <LI><A HREF="#Other Programs">Other Programs</A></LI> |
---|
178 | <LI><A HREF="#Acknowledgements">Acknowledgements</A></LI> |
---|
179 | <LI><A HREF="#References">References</A></LI> |
---|
180 | <LI><A HREF="#Known Bugs">Known Bugs</A></LI> |
---|
181 | <LI><A HREF="#Version History">Version History</A></LI> |
---|
182 | </UL> |
---|
183 | |
---|
184 | <HR> |
---|
185 | <H2> |
---|
186 | <A NAME="Legal Stuff"></A>Legal Stuff</H2> |
---|
187 | TREE-PUZZLE 5.0 is (c) 1999-2000 Heiko A. Schmidt, Korbinian Strimmer, Martin Vingron, and Arndt von Haeseler.<BR> |
---|
188 | Earlier PUZZLE versions were (c) 1995-1999 by Korbinian Strimmer and Arndt von Haeseler.<BR> |
---|
189 | The software and its accompanying documentation are provided as |
---|
190 | is, without guarantee of support or maintenance. The whole package is |
---|
191 | licensed under the GNU public license, except for the parts indicated in |
---|
192 | the sources where the copyright of the authors does not apply. Please see |
---|
193 | <A |
---|
194 | HREF="http://www.opensource.org/licenses/gpl-license.html">http://www.opensource.org/licenses/gpl-license.html</A> for details. |
---|
195 | |
---|
196 | <H2> |
---|
197 | <A NAME="Installation"></A>Installation</H2> |
---|
198 | The source code of the TREE-PUZZLE software is 100% identical across platforms. |
---|
199 | However, installation procedures differ. |
---|
200 | |
---|
201 | <H3> |
---|
202 | <A NAME="Unix"></A>UNIX</H3> |
---|
203 | Get the file <B>tree-puzzle-5.0.tar</B>. If you received a compressed tar file |
---|
204 | (<B>tree-puzzle-5.0.tar.Z</B> or <B>tree-puzzle-5.0.tar.gz</B>) you have to decompress |
---|
205 | it first (using the "uncompress" or "gunzip" command). Then untar the file |
---|
206 | with |
---|
207 | <PRE> tar xvf tree-puzzle-5.0.tar</PRE> |
---|
208 | The newly created directory "tree-puzzle-5.0" contains four subdirectories called |
---|
209 | "doc", "data", "bin", and "src". The "doc" directory |
---|
210 | contains this manual in HTML format. The "data" |
---|
211 | directory contains example input files. The "src" directory contains the |
---|
212 | ANSI C sources of TREE-PUZZLE. Switch to this directory by typing |
---|
213 | <PRE> cd tree-puzzle-5.0</PRE> |
---|
214 | To compile we recommend the GNU gcc (or GNU egcs) compiler. If gcc is installed |
---|
215 | just type |
---|
216 | <PRE> sh ./configure</PRE> |
---|
217 | <PRE> make</PRE> |
---|
218 | <PRE> make install</PRE> |
---|
219 | and the executable <TT>puzzle</TT> is compiled and put into the <TT>/usr/local/bin</TT> directory. |
---|
220 | If you want to have <TT>puzzle</TT> installed into another directory you can set this |
---|
221 | by setting the <TT>--prefix=/name/of/the/wanted/directory</TT> directive at the |
---|
222 | <TT>sh ./configure</TT> command line. |
---|
223 | The parallel version should have been built and installed as well, if <TT>configure</TT> |
---|
224 | found a known MPI compiler (cf. <A HREF="#MPI">Parallel TREE-PUZZLE</A> section). |
---|
225 | |
---|
226 | |
---|
227 | Then type |
---|
228 | <PRE> make clean</PRE> |
---|
229 | and everything will be nicely cleaned up. |
---|
230 | |
---|
231 | If your compiler is not the GNU gcc compiler and not found by <TT>configure</TT> you will have to |
---|
232 | modify that, by setting the <TT>CC</TT> variable (e.g. <TT>setenv CC cc</TT> under <TT>csh</TT> or |
---|
233 | <TT>CC=cc; export CC</TT> under <TT>sh</TT>) before running <TT>sh ./configure</TT>. |
---|
234 | If you still cannot compile properly then your compiler or its runtime library |
---|
235 | is most probably not ANSI compliant (e.g., old SUN compilers). In most |
---|
236 | cases, however, you will succeed to compile by changing some parameters |
---|
237 | in the "makefile". Ask your local Unix expert for help. |
---|
238 | |
---|
239 | <H3> |
---|
240 | <A NAME="MacOS"></A>MacOS</H3> |
---|
241 | Get the file <B>tree-puzzle-5.0.hqx</B>. After decoding this BinHex file (this |
---|
242 | is done automatically on a properly installed system, otherwise use programs |
---|
243 | like "StuffIt Expander" or ask your local Mac expert) you will find a folder |
---|
244 | called "tree-puzzle-5.0" on your hard disk. This folder contains the four subfolders |
---|
245 | "doc", "data", "bin", and "src". The "doc" folder contains |
---|
246 | this manual in HTML format. The "data" folder contains |
---|
247 | example input files. The "bin" folder contains a Macintosh PPC executable |
---|
248 | with a default memory partition of 3000K. |
---|
249 | There is no 68k executable. <u>If you get a memory allocation error while running |
---|
250 | TREE-PUZZLE you have to increase TREE-PUZZLEŽs memory partition with the "Get Info" command |
---|
251 | of the Macintosh Finder</u>. The "src" folder contains the ANSI C sources of TREE-PUZZLE. |
---|
252 | |
---|
253 | <P>The MacOS executables have been compiled for the PowerMac using Metrowerks CodeWarrior. |
---|
254 | |
---|
255 | <P>Note: It is probably a good idea to install PPC Linux (or MkLinux) on your Macintosh. |
---|
256 | TREE-PUZZLE (as any other program) runs 20-50% faster under Linux compared to the |
---|
257 | same program under MacOS (on the same machine!), and the Mac does not freeze |
---|
258 | during execution because of LinuxŽs multitasking capabilities (maybe this changes in MacOS X). |
---|
259 | |
---|
260 | |
---|
261 | <H3> |
---|
262 | <A NAME="Win32"></A>Windows 95/98/NT</H3> |
---|
263 | |
---|
264 | Get the file <B>tree-puzzle-5.0.zip</B>. After uncompressing (using, e.g., WinZip |
---|
265 | or a similar tool) a directory "tree-puzzle-5.0" is created containing |
---|
266 | four subdirectories called "doc", "data", "bin", and "src". The "doc" directory |
---|
267 | contains this manual in HTML format. The "data" |
---|
268 | directory contains example input files. The "src" directory contains the |
---|
269 | ANSI C sources of TREE-PUZZLE. The "bin" directory contains the executable |
---|
270 | <TT>puzzle.exe</TT>. To use TREE-PUZZLE the system path to the executable |
---|
271 | needs to be set correctly. Ask your local Windows expert for help. |
---|
272 | |
---|
273 | <P>The executable has been compiled using |
---|
274 | Microsoft Visual C++ and the "makefile.w32" (contained in "src"). |
---|
275 | |
---|
276 | <P>If you have a Linux partition on your PC we recommend |
---|
277 | to install and use TREE-PUZZLE under Linux (see <A HREF="#Unix">Unix</A> section) because it runs |
---|
278 | TREE-PUZZLE significantly faster than Windows. |
---|
279 | |
---|
280 | <H3> |
---|
281 | <A NAME="VMS"></A>VMS</H3> |
---|
282 | |
---|
283 | |
---|
284 | <P>Get the Unix sources and install the package on your computer |
---|
285 | (ask your local VMS expert for help). Go to the subdirectory |
---|
286 | "src" and compile TREE-PUZZLE using the command file "makefile.com". |
---|
287 | |
---|
288 | <H3> |
---|
289 | <A NAME="MPI"></A>Parallel TREE-PUZZLE</H3> |
---|
290 | |
---|
291 | |
---|
292 | <P>To compile and run the parallelized TREE-PUZZLE you need an implementation |
---|
293 | of the Message Passing Interface (MPI) library, a widely used |
---|
294 | message passing library standard. Implementations of the MPI libraries |
---|
295 | are available for almost all parallel platforms and computer systems, |
---|
296 | and there are free implementations for most platforms as well. |
---|
297 | |
---|
298 | <P>To find an MPI implementation suitable for your platform visit |
---|
299 | the following web sites: |
---|
300 | <UL> |
---|
301 | <LI><A HREF="http://www-unix.mcs.anl.gov/mpi/implementations.html">http://www-unix.mcs.anl.gov/mpi/implementations.html</A> |
---|
302 | <LI><A HREF="http://WWW.ERC.MsState.Edu/labs/hpcl/projects/mpi/implementations.html">http://WWW.ERC.MsState.Edu/labs/hpcl/projects/mpi/implementations.html</A> |
---|
303 | <LI><A HREF="http://www.mpi.nd.edu/MPI/">http://www.mpi.nd.edu/MPI/</A> |
---|
304 | </UL> |
---|
305 | |
---|
306 | Although MPI is also available on Macintosh and Windows systems, |
---|
307 | the developers never ran the parallel version on those |
---|
308 | platforms. |
---|
309 | |
---|
310 | <P>To install the parallel version of TREE-PUZZLE you need the |
---|
311 | Unix sources for TREE-PUZZLE and install the package on your computer |
---|
312 | as described above. |
---|
313 | The <TT>configure</TT> should configure the Makefiles apropriately. |
---|
314 | If there is no known MPI compiler found on the system the parallel |
---|
315 | version is not configured. |
---|
316 | (If problems occur ask your local system administrator for help.) |
---|
317 | |
---|
318 | <P>Than you should be able to compile the parallel version of TREE-PUZZLE |
---|
319 | using the following commands: |
---|
320 | <PRE> sh ./configure</PRE> |
---|
321 | <PRE> make</PRE> |
---|
322 | <PRE> make install</PRE> |
---|
323 | and the executable <TT>ppuzzle</TT> is compiled and put into the <TT>/usr/local/bin</TT> directory. |
---|
324 | If you want to have the executable installed into another directory please proceede as |
---|
325 | described in the <A HREF="#Unix">Unix</A> section. |
---|
326 | |
---|
327 | If your compiler is non out of <TT>mpcc</TT> (IBM), <TT>hcc</TT> (LAM), |
---|
328 | <TT>mpicc_lam</TT> (LAM under LINUX), <TT>mpicc_mpich</TT> (MPICH under LINUX), |
---|
329 | and <TT>mpicc</TT> (LAM, MPICH, HP-UX, etc.) and not found by <TT>configure</TT> you will have to |
---|
330 | modify that by setting the <TT>MPICC</TT> variable (e.g. <TT>setenv MPICC /another/mpicc</TT> |
---|
331 | under <TT>csh</TT> or <TT>MPICC=/another/mpicc; export MPICC</TT> under <TT>sh</TT>) |
---|
332 | before running <TT>sh ./configure</TT>. |
---|
333 | |
---|
334 | The way you have to start <TT>ppuzzle</TT> depends on the MPI implementation |
---|
335 | installed. So please refer to your MPI manual or ask your local MPI expert |
---|
336 | for help. |
---|
337 | |
---|
338 | <P><B>Note:</B> |
---|
339 | <BR>The parallelization of the tree reconstruction method follows a |
---|
340 | master-worker-concept, i.e., a master process handles the scheduling of |
---|
341 | the computation to the <em>n</em> worker processes, while the worker processes are |
---|
342 | doing almost all the computation work of evaluating the quartets and |
---|
343 | constructing the puzzling step trees. |
---|
344 | |
---|
345 | <BR>Since the master process does not require a lot of CPU time, |
---|
346 | it can be scheduled sharing one processor with a worker process. |
---|
347 | Thus, you can run <TT>ppuzzle</TT> by assigning <em>n+1</em> processes. |
---|
348 | |
---|
349 | <BR>If you want to evaluate a usertree or perform likelihood |
---|
350 | mapping analysis it is not recommended to do a parallel run, because all |
---|
351 | the computation will be done by the master process. Hence a run of the |
---|
352 | sequential version of TREE-PUZZLE is more appropriate for usertree or likelihood |
---|
353 | mapping analysis. |
---|
354 | |
---|
355 | <H2> |
---|
356 | <A NAME="Introduction"></A>Introduction</H2> |
---|
357 | TREE-PUZZLE is an ANSI C application to reconstruct phylogenetic trees from |
---|
358 | molecular sequence data by maximum likelihood. It implements a fast tree |
---|
359 | search algorithm, quartet puzzling, that allows analysis of large data |
---|
360 | sets and automatically assigns estimations of support to each internal |
---|
361 | branch. Rate heterogeneity (invariable sites plus Gamma distributed rates) |
---|
362 | is incorporated in all models of substitution available (nucleotides: SH, |
---|
363 | TN, HKY, F84, and submodels; amino acids: Dayhoff, JTT, mtREV24, BLOSUM |
---|
364 | 62, VT, and WAG; two-state data: F81). All parameters including rate heterogeneity can |
---|
365 | be estimated from the data by maximum likelihood approaches. TREE-PUZZLE also |
---|
366 | computes pairwise maximum likelihood distances as well as branch lengths |
---|
367 | for user specified trees. In addition, TREE-PUZZLE offers a novel method, likelihood |
---|
368 | mapping, to investigate the support of internal branches without computing |
---|
369 | an overall tree. |
---|
370 | <H2> |
---|
371 | <A NAME="Input/Output Conventions"></A>Input/Output Conventions</H2> |
---|
372 | |
---|
373 | A few things of the name conventions have changed compared to |
---|
374 | earlier (< 5.0) PUZZLE releases. From version 5.0 onwards |
---|
375 | names of the sequence input file and the usertree file can be specified |
---|
376 | at the command line (e.g. '<TT>puzzle infilename intreename</TT>', |
---|
377 | where <TT>infilename</TT> is the name of the sequence file and <TT>intreename</TT> |
---|
378 | is the name of the usertree file). |
---|
379 | If only the input filename or no |
---|
380 | filename is given at the command line the TREE-PUZZLE software searches |
---|
381 | for input files named "<TT>infile</TT>" and/or "<TT>intree</TT>" respectively. |
---|
382 | |
---|
383 | <P>The naming conventions of the output files have changed as well. |
---|
384 | As prefix of the output filenames the name of the sequence input file |
---|
385 | (or the usertree file in the usertree analysis case) is used and an |
---|
386 | extension added to denote the content of the file. If no input filename |
---|
387 | is given at the command line the default filenames of the earlier |
---|
388 | versions are used. |
---|
389 | |
---|
390 | The following extensions/default filenames are possible: |
---|
391 | <DL><DT><DD> |
---|
392 | <TABLE><TR><TD><B>Extension</B></TD><TD><B>default filename</B></TD><TD><B>file content</B></TD></TR> |
---|
393 | <TR><TD><TT>.puzzle </TT></TD><TD><TT>outfile </TT></TD><TD>for the TREE-PUZZLE report</TD></TR> |
---|
394 | <TR><TD><TT>.dist </TT></TD><TD><TT>outdist </TT></TD><TD>for the ML distances</TD></TR> |
---|
395 | <TR><TD><TT>.tree </TT></TD><TD><TT>outtree </TT></TD><TD>for the final tree(s)</TD></TR> |
---|
396 | <TR><TD><TT>.qlist </TT></TD><TD><TT>outqlist </TT></TD><TD>for the list of unresolved quartets</TD></TR> |
---|
397 | <TR><TD><TT>.ptorder</TT></TD><TD><TT>outptorder </TT></TD><TD>for the list of unique puzzling step tree topologies</TD></TR> |
---|
398 | <TR><TD><TT>.pstep </TT></TD><TD><TT>outpstep </TT></TD><TD>for the list of puzzling step tree topologies in chronological order</TD></TR> |
---|
399 | <TR><TD><TT>.eps </TT></TD><TD><TT>outlm.eps </TT></TD><TD>for the EPS file generated in the likelihood mapping analysis</TD></TR> |
---|
400 | </TABLE></DL> |
---|
401 | |
---|
402 | The file types are described in detail below. In the following |
---|
403 | "INFILENAME" denotes the prefix, which is the sequence input filename |
---|
404 | or the usertree filename respectively. |
---|
405 | |
---|
406 | <H3> |
---|
407 | <A NAME="Sequence Input"></A>Sequence Input</H3> |
---|
408 | TREE-PUZZLE requests sequence input in PHYLIP INTERLEAVED format (sometimes |
---|
409 | also called PHYLIP 3.4 format). Many sequence editors and alignment programs |
---|
410 | (e.g., CLUSTAL W) output data in this format. The "data" directory |
---|
411 | contains four example input files ("globin.a", "marswolf.n", "atp6.a", |
---|
412 | "primates.b") that can be used as templates for own data files. |
---|
413 | The default name of the sequence input file is "infile", if no |
---|
414 | input filename is given at the command line. |
---|
415 | If an "infile" or a file with the given name is not present TREE-PUZZLE |
---|
416 | will request an alternative file name. Sequences names in the |
---|
417 | input file are allowed to contain blanks but all blanks will internally |
---|
418 | be converted to underscores "_". Sequences can be in upper or lower case, |
---|
419 | any spaces or control characters are ignored. The dot "." is recognized |
---|
420 | as character matching to the first sequence, it can be used in all sequences except in the |
---|
421 | first sequence. Valid symbols for nucleotides are A, C, G, T and |
---|
422 | U, and for amino acids A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, |
---|
423 | T, V, W, and Y. All other visible characters (including gaps, question |
---|
424 | marks etc.) are treated as N (DNA/RNA) or X (amino acids). For two-state |
---|
425 | data the symbols 0 and 1 are allowed. The first sequence in the data set is |
---|
426 | considered the default outgroup. |
---|
427 | <H3> |
---|
428 | <A NAME="General Output"></A>General Output</H3> |
---|
429 | All results are written to the TREE-PUZZLE report file (INFILENAME.puzzle or |
---|
430 | outfile). If the option "List all unresolved quartets" is invoked a file |
---|
431 | called "INFILENAME.qlist"/"outqlist" is created showing all these quartets. |
---|
432 | If the option "List puzzling step trees" is set accordingly the files |
---|
433 | "INFILENAME.pstep"/"outpstep" and/or "INFILENAME.ptorder"/"outptorder" are |
---|
434 | generated. |
---|
435 | |
---|
436 | <P>The "INFILENAME.ptorder"/"outptorder" file contains the unique tree |
---|
437 | topologies in PHYLIP format preceded by PHYLIP-format comment (in parenthesis). |
---|
438 | A typical line in the ptorder file looks like this: |
---|
439 | |
---|
440 | <P><TT>[ 2. 60 6.00 2 5 1000 ](chicken,((cat,(horse,(mouse,rat))),(opossum,platypus)));</TT></P> |
---|
441 | |
---|
442 | The entries (separated by single blanks) in the parenthesis mean the following: |
---|
443 | <UL> |
---|
444 | <LI><B>2.</B> - Topology occurs second-most among all |
---|
445 | intermediate tree topologies (= order number). |
---|
446 | <LI><B>60</B> - Topology occurs 60 times. |
---|
447 | <LI><B>6.00</B> - Topology occurs in 6.00 % of the intermediate tree topologies. |
---|
448 | <LI><B>2</B> - unique topology ID (needed for the pstep file) |
---|
449 | <LI><B>5</B> - Sum of uniquely occuring topologies. |
---|
450 | <LI><B>1000</B> - Sum of intermediate trees estimated during the analysis. |
---|
451 | </UL> |
---|
452 | |
---|
453 | <P>The "INFILENAME.pstep"/"outpstep" file contains a log of the |
---|
454 | puzzling steps performed and the occuring tree topologies. |
---|
455 | |
---|
456 | A typical line in the pstep file contains the following entries |
---|
457 | (separated by tabstops): |
---|
458 | |
---|
459 | <P><TT>"6. 55 698 3 5 828"</TT></P> |
---|
460 | |
---|
461 | The entries in the rows mean the following: |
---|
462 | <UL> |
---|
463 | <LI><B>6.</B> - 6th block of intermediate trees performed. |
---|
464 | <LI><B>55</B> - number of intermediate trees inferred in this block. |
---|
465 | <LI><B>698</B> - occurances of this topology so far. |
---|
466 | <LI><B>3</B> - unique topology ID (for lookup in the ptorder file). |
---|
467 | <LI><B>5</B> - number unique topologies occurred so far. |
---|
468 | <LI><B>828</B> - number of puzzling step performed so far. |
---|
469 | </UL> |
---|
470 | In the case of a sequential run (<TT>puzzle</TT>) the entries of this |
---|
471 | file are more resolved, because every block consists of one intermediate tree. |
---|
472 | |
---|
473 | <H3> |
---|
474 | <A NAME="Distance Output"></A>Distance Output</H3> |
---|
475 | TREE-PUZZLE automatically computes pairwise maximum likelihood distances for |
---|
476 | all the sequences in the data file. They are written in the TREE-PUZZLE report |
---|
477 | file "INFILENAME.puzzle"/"outfile" and in the separate file |
---|
478 | "INFILENAME.dist"/"outdist". The format of distance file is PHYLIP compatible |
---|
479 | (i.e. it can directly be used as input for PHYLIP distance-based programs |
---|
480 | such as "neighbor"). |
---|
481 | <H3> |
---|
482 | <A NAME="Tree Output"></A>Tree Output</H3> |
---|
483 | The quartet puzzling tree with its support values |
---|
484 | and with maximum likelihood branch lengths is displayed as ASCII drawing |
---|
485 | in the TREE-PUZZLE report in "INFILENAME.puzzle"/"outfile". The same tree |
---|
486 | is written into the "INFILENAME.tree"/"outtree" file in CLUSTAL W format. |
---|
487 | If clock-like maximum-likelihood branch lengths are computed |
---|
488 | there will be both an unrooted and a rooted tree in the |
---|
489 | "INFILENAME.puzzle"/"outfile". The tree convention follows the NEWICK format |
---|
490 | (as implemented in PHYLIP or CLUSTAL W): the tree topology is described |
---|
491 | by the usual round brackets |
---|
492 | <TT>(a,b,(c,d));</TT> |
---|
493 | where branch lengths are written after the colon a:0.22,b:0.33. |
---|
494 | Support values for each branch |
---|
495 | are displayed as internal node labels, i.e., they follow directly after each |
---|
496 | node before the branch length to each node. Here is an example: |
---|
497 | |
---|
498 | <P>(Gibbon:0.1393, ((Human:0.0414, Chimpanzee:0.0538)99:0.0175, Gorilla:0.0577)98:0.0531, |
---|
499 | Orangutan:0.1003); |
---|
500 | |
---|
501 | <P>The likelihood value of each tree is added in parenthesis before |
---|
502 | the tree string (e.g. "[ lh=-1621.201605 ]"). Parenthesis mark comments |
---|
503 | in the Newick or PHYLIP tree format. In some cases the |
---|
504 | comment has to be removed before using them with other programs. |
---|
505 | |
---|
506 | <P>With the programs |
---|
507 | <a href="http://taxonomy.zoology.gla.ac.uk/rod/treeview.html">TreeView</a> and |
---|
508 | <a href="ftp://rdp.life.uiuc.edu/pub/RDP/programs/TreeTool/">TreeTool</a> |
---|
509 | it is possible to view a tree both |
---|
510 | with its branch lengths and simultaneously with the support values for the internal |
---|
511 | branches (here 98% and 99%). Note, the PHYLIP programs DRAWTREE and DRAWGRAM may |
---|
512 | also be used with the CLUSTAL W treefile format. However, in the current version |
---|
513 | (3.5) they ignore the internal labels and simply print the tree |
---|
514 | topology along with branch lengths. |
---|
515 | |
---|
516 | <H3> |
---|
517 | <A NAME="Tree Input"></A>Tree Input</H3> |
---|
518 | TREE-PUZZLE optionally also reads input trees. The default name for the file |
---|
519 | containing the input tree is "intree", if not given at the command line, |
---|
520 | but if you choose the input tree option and there is no file with the |
---|
521 | given name or "intree" present you will be prompted for an alternative |
---|
522 | name. The format of the input trees is identical to the trees in the |
---|
523 | "INFILENAME.tree"/"outtree" file. |
---|
524 | However, it is sufficient to provide the tree topology only, you |
---|
525 | don't need to specify branch lengths (that are ignored anyway) or |
---|
526 | internal labels (that are read, stored, and written back to the |
---|
527 | "INFILENAME.tree"/"outtree" file). |
---|
528 | The input trees needs not to be unrooted, they can also be rooted. It is |
---|
529 | important that sequence names in the input tree file do not contain blanks |
---|
530 | (use underscores!). The trees can be multifurcating. |
---|
531 | The format of the tree input file is easy: just put the |
---|
532 | trees into the file. TREE-PUZZLE counts the ';' at the end of each tree description |
---|
533 | to determine how many input trees there are. Any header (e.g., with the |
---|
534 | number of trees) is ignored (this is useful in conjunction with programs |
---|
535 | like MOLPHY that need this header). If there is more than one tree TREE-PUZZLE |
---|
536 | performs the Kishino-Hasegawa test. |
---|
537 | <H3> |
---|
538 | <A NAME="Likelihood Mapping Output"></A>Likelihood Mapping Output</H3> |
---|
539 | TREE-PUZZLE also offers likelihood mapping analysis, a method to investigate |
---|
540 | support for internal branches of a tree without computing an overall tree |
---|
541 | and to graphically visualize |
---|
542 | phylogenetic content of a sequence alignment. The results of likelihood |
---|
543 | mapping are written in ASCII to the "INFILENAME.puzzle"/"outfile" as well |
---|
544 | as to a file called "INFILENAME.eps" or "outlm.eps" respectively. |
---|
545 | This file contains in encapsulated Postscript format (EPSF) |
---|
546 | a picture of the triangle that forms the basis of the likelihood mapping |
---|
547 | analysis. You may print it out on a Postscript capable printer or view |
---|
548 | it with a suitable program. The "INFILENAME.eps"/"outlm.eps" file can be |
---|
549 | edited by hand (it is plain ASCII text!) or by drawing programs that |
---|
550 | understand the Postcript language (e.g., Adobe Ilustrator). |
---|
551 | <H2> |
---|
552 | <A NAME="Quick Start"></A>Quick Start</H2> |
---|
553 | Prepare your sequence input file and, optionally, your tree input |
---|
554 | file. Then start the TREE-PUZZLE program. TREE-PUZZLE will choose |
---|
555 | automatically the nucleotide or the amino acid mode. If more than 85% of |
---|
556 | the characters (not counting the - and ?) in the sequences are A, C, G, |
---|
557 | T, U or N, it will be assumed that the sequences consists of nucleotides. |
---|
558 | If your data set contains amino acids TREE-PUZZLE suggests whether you have |
---|
559 | amino acids encoded on mtDNA or on nuclear DNA, and selects the appropriate |
---|
560 | model of amino acid evolution. If your data set contains nucleotides the |
---|
561 | default model of sequence evolution chosen is the HKY model. Parameters |
---|
562 | need not to be specified, they will be estimated by a maximum likelihood |
---|
563 | procedure from the data. If TREE-PUZZLE detects a usertree file stated at the |
---|
564 | command line or one called "intree" it automatically switches to the input |
---|
565 | tree mode. |
---|
566 | |
---|
567 | <P>Then, a menu (PHYLIP "look and feel") appears with default options set. |
---|
568 | It is possible to change all available options. For example, if you want |
---|
569 | to incorporate rate heterogeneity you have to select option "w" as rate |
---|
570 | heterogeneity is switched off by default. Then type "y" at the input prompt |
---|
571 | and start the analysis. You will see a number of status messages on the |
---|
572 | screen during computation. When the analysis is finished all output files |
---|
573 | (e.g., "outfile", "outtree", "outdist", "outqlist", "outlm.eps", "outpstep", |
---|
574 | "outptlist" or "INFILENAME.puzzle", "INFILENAME.tree", "INFILENAME.dist", |
---|
575 | "INFILENAME.qlist", "INFILENAME.eps", "INFILENAME.pstep", "INFILENAME.ptorder") |
---|
576 | will be in the same directory as the input files. |
---|
577 | |
---|
578 | <P>To obtain a high quality picture of the output tree (including node labels) |
---|
579 | you might want to use use the TreeView program by Roderic Page. It is |
---|
580 | available free of charge and runs on MacOS and MS-Windows. It can be retrieved |
---|
581 | from <A HREF="http://taxonomy.zoology.gla.ac.uk/rod/treeview.html">http://taxonomy.zoology.gla.ac.uk/rod/treeview.html</A>. |
---|
582 | TreeView understands the CLUSTAL W treefile conventions, reads multifurcating |
---|
583 | trees and is able to simultaneously display branch lengths and support values |
---|
584 | for each branch. Open the "INFILENAME.tree"/"outtree" file with TreeView, |
---|
585 | choose "Phylogram" to draw branch lengths, and select "Show internal edge |
---|
586 | labels". |
---|
587 | |
---|
588 | <P>On a Unix you can use the TreeTool program to display and |
---|
589 | manipulate TREE-PUZZLE trees (See <A HREF="ftp://rdp.life.uiuc.edu/pub/RDP/programs/TreeTool/">ftp://rdp.life.uiuc.edu/pub/RDP/programs/TreeTool</A> |
---|
590 | for precompiled Sun executables. A version that runs on Linux has been prepared by |
---|
591 | <A HREF="mailto:cato@biochem.kth.se">Anders Holmberg</A> from the Dept. of Biochemistry at |
---|
592 | the Royal Institute of Technology, Stockholm). |
---|
593 | |
---|
594 | <H2> |
---|
595 | <A NAME="Models of Sequence Evolution"></A>Models of Sequence Evolution</H2> |
---|
596 | Here we give a brief overview over the models implemented in TREE-PUZZLE. Formulas |
---|
597 | are written in TeX style. |
---|
598 | <H3> |
---|
599 | <A NAME="Models of Substitution"></A>Models of Substitution</H3> |
---|
600 | The substitution process is modelled as reversible time homogeneous stationary |
---|
601 | Markov process. If the corresponding stationary nucleotide (amino acid) |
---|
602 | frequencies are denoted pi_i the most general rate matrix for the transition |
---|
603 | from nucleotide (amino acid) i to j can be written as |
---|
604 | <PRE> |
---|
605 | | Q_{ij} pi_j for i != j |
---|
606 | R_{ij} = | |
---|
607 | | - Sum_m Q_{im} pi_m for i == j |
---|
608 | </PRE> |
---|
609 | The matrix Q_{ij} is symmetric with Q_{ii} == 0 (diagonals are zero). For |
---|
610 | nucleotides the most general model built into TREE-PUZZLE is the Tamura-Nei |
---|
611 | model (TN, <A HREF="#tamura1993">Tamura and Nei</A>, 1993). |
---|
612 | The matrix Q_{ij} for this model equals |
---|
613 | <PRE> |
---|
614 | | 4*t*gamma/(gamma+1) for i -> j pyrimidine transition |
---|
615 | | |
---|
616 | Q_{ij} = | 4*t/(gamma+1) for i -> j purine transition |
---|
617 | | |
---|
618 | | 1 for i -> j transversion |
---|
619 | </PRE> |
---|
620 | The parameter gamma is called the "Y/R transition parameter" whereas t |
---|
621 | is the "Transition/transversion parameter". If gamma is equal to 1 we |
---|
622 | get the HKY model (<A HREF="#hasegawa1985">Hasegawa et al.</A>, 1985). |
---|
623 | Note, the ratio of the transition and transversion |
---|
624 | rates (without frequencies) is kappa = 2*t. There is a subtle but important |
---|
625 | difference between the <I>transition-transversion parameter</I>, the |
---|
626 | <I>expected transition-transversion ratio</I>, and the <I>observed |
---|
627 | transition transversion ratio</I>. |
---|
628 | The <I>transition-transversion parameter</I> simply is a parameter in the |
---|
629 | rate matrix. The <I>expected transition-transversion ratio</I> is the ratio of |
---|
630 | actually occurring transitions to actually occurring transversions taking |
---|
631 | into account nucleotide frequencies in the alignment. Due to saturation |
---|
632 | and multiple hits not all substitutions are observable. Thus, the <I>observed |
---|
633 | transition-transversion ratio</I> counts observable transitions and transversions |
---|
634 | only. If the base frequencies in the HKY model are homogeneous (pi_i = |
---|
635 | 0.25) HKY further reduces to the Kimura model. In this case t is identical |
---|
636 | to the expected transition/transversion ratio. If t is set to 0.5 the Jukes-Cantor |
---|
637 | model is obtained. The F84 model (as implemented in the various PHYLIP |
---|
638 | programs, <A HREF="#felsenstein1984">Felsenstein</A>, 1984) |
---|
639 | is a special case of the Tamura-Nei model. |
---|
640 | |
---|
641 | <P>For amino acids the matrix Q_{ij} is fixed and does not contain any free |
---|
642 | parameters. Depending on the type of input data four different Q_{ij} matrices |
---|
643 | are available in TREE-PUZZLE. |
---|
644 | The Dayhoff (<A HREF="#dayhoff1978">Dayhoff et al.</A>, 1978) and |
---|
645 | JTT (<A HREF="#jones1992">Jones et al.</A>, 1992) matrices are for use with |
---|
646 | proteins encoded on nuclear DNA, the mtREV24 (<A HREF="#adachi1996">Adachi |
---|
647 | and Hasegawa</A>, 1996) matrix is for use with proteins encoded on mtDNA, |
---|
648 | and the BLOSUM 62 (<A HREF="#henikoff1992">Henikoff and Henikoff</A>, |
---|
649 | 1992) and the WAG model (<A HREF="#whelan2000">Whelan and Goldman</A>) |
---|
650 | are for more distantly related amino acid sequences. |
---|
651 | The WAG matrix has been infered from a database of 3905 globular protein |
---|
652 | sequences, forming 182 distinct gene families spanning a broad range of |
---|
653 | evolutionary distances (<A HREF="#whelan2000">Whelan and Goldman</A>). |
---|
654 | |
---|
655 | The VT model is based an new estimator for amino acid replacement rates, |
---|
656 | the resolvent method. The VT matrix has been computed from a large set |
---|
657 | alignments of varying degree of divergence. Hence VT is for use with |
---|
658 | proteins of distant relatedness as well (<A HREF="#mueller2000">Mueller and Vingron</A>, 2000). |
---|
659 | |
---|
660 | <P>For doublets (pairs of dependent nucleotides) the SH model |
---|
661 | (<A HREF="#schoeniger1994">Schoeniger and von Haeseler</A>, 1994) is |
---|
662 | implemented in TREE-PUZZLE. The corresponding matrix Q_{ij} reads |
---|
663 | <PRE> |
---|
664 | | 2*t for i -> j transition substitution |
---|
665 | | |
---|
666 | Q_{ij} = | 1 for i -> j transversion substitution |
---|
667 | | |
---|
668 | | 0 for i -> j two substitutions |
---|
669 | </PRE> |
---|
670 | The SH model basically is a F81 model |
---|
671 | (<A HREF="#felsenstein1981">Felsenstein</A>, 1981) for single substitutions |
---|
672 | in doublets. |
---|
673 | <H3> |
---|
674 | <A NAME="Models of Rate Heterogeneity"></A>Models of Rate Heterogeneity</H3> |
---|
675 | Rate heterogeneity is taken into account by considering invariable sites |
---|
676 | and by introducing Gamma-distributed rates for the variable sites. |
---|
677 | |
---|
678 | <P>For invariable sites the parameter theta ("Fraction of invariable sites") |
---|
679 | determines the probability of a given site to be invariable. If a site |
---|
680 | is invariable the probability for the constant site patterns is pi_i, the |
---|
681 | frequency of each nucleotide (amino acid). |
---|
682 | |
---|
683 | <P>The rates r for variable sites are determined by a discrete Gamma |
---|
684 | distribution that approximates the continuous Gamma distribution |
---|
685 | <PRE> |
---|
686 | alpha alpha-1 |
---|
687 | alpha r |
---|
688 | g(r) = ------------------------ |
---|
689 | alpha r |
---|
690 | e Gamma(alpha) |
---|
691 | </PRE> |
---|
692 | where the parameter alpha ranges from alpha = infinity (no rate heterogeneity) |
---|
693 | to alpha < 1 (strong heterogeneity). The mean expectation of r under this |
---|
694 | distribution is 1. |
---|
695 | |
---|
696 | <P>A mixed model of rate heterogeneity (Gamma plus invariable sites) |
---|
697 | is also available. In this case the total rate heterogeneity rho |
---|
698 | (as defined by <A HREF="#gu1995">Gu et al.</A>, 1995) computes as rho = (1+theta |
---|
699 | alpha)/(1+alpha). |
---|
700 | |
---|
701 | <H2> |
---|
702 | <A NAME="Options Available"></A>Available Options</H2> |
---|
703 | All options can be selected and changed after TREE-PUZZLE has read the input |
---|
704 | file. Depending on the input files options are preselected and displayed |
---|
705 | in a menu ("PHYLIP look and feel"): |
---|
706 | <PRE> |
---|
707 | GENERAL OPTIONS |
---|
708 | b Type of analysis? Tree reconstruction |
---|
709 | k Tree search procedure? Quartet puzzling |
---|
710 | v Approximate quartet likelihood? No |
---|
711 | u List unresolved quartets? No |
---|
712 | n Number of puzzling steps? 1000 |
---|
713 | j List puzzling step trees? No |
---|
714 | o Display as outgroup? Gibbon |
---|
715 | z Compute clocklike branch lengths? No |
---|
716 | e Parameter estimates? Approximate (faster) |
---|
717 | x Parameter estimation uses? Neighbor-joining tree |
---|
718 | SUBSTITUTION PROCESS |
---|
719 | d Type of sequence input data? Nucleotides |
---|
720 | m Model of substitution? HKY (Hasegawa et al. 1985) |
---|
721 | t Transition/transversion parameter? Estimate from data set |
---|
722 | f Nucleotide frequencies? Estimate from data set |
---|
723 | RATE HETEROGENEITY |
---|
724 | w Model of rate heterogeneity? Uniform rate |
---|
725 | |
---|
726 | Quit [q], confirm [y], or change [menu] settings: |
---|
727 | </PRE> |
---|
728 | By typing the letters shown in the menu you can either change settings |
---|
729 | or enter new parameters. Some options (for example "m" and "w") can be |
---|
730 | invoked several times to switch through a number of different settings. |
---|
731 | The parameters of the models of sequence evolution can be estimated from |
---|
732 | the data by a variety of procedures based on maximum likelihood. The analysis |
---|
733 | is started by typing "y" at the input prompt. To quit the program |
---|
734 | type "q". |
---|
735 | |
---|
736 | <P>The following table lists in alphabetical order all TREE-PUZZLE options. |
---|
737 | Be aware, however, not all of them are accessible at the same time: |
---|
738 | <TABLE CELLPADDING=2 > |
---|
739 | <TR VALIGN=TOP> |
---|
740 | <TD> |
---|
741 | <CENTER><B>Option</B></CENTER> |
---|
742 | </TD> |
---|
743 | <TD> |
---|
744 | <CENTER><B>Description</B></CENTER> |
---|
745 | </TD> |
---|
746 | </TR> |
---|
747 | |
---|
748 | <TR VALIGN=TOP> |
---|
749 | <TD> |
---|
750 | <CENTER>a</CENTER> |
---|
751 | </TD> |
---|
752 | <TD>Gamma rate heterogeneity parameter alpha. This is the so-called shape |
---|
753 | parameter of the Gamma distribution.</TD> |
---|
754 | </TR> |
---|
755 | |
---|
756 | <TR VALIGN=TOP> |
---|
757 | <TD> |
---|
758 | <CENTER>b</CENTER> |
---|
759 | </TD> |
---|
760 | <TD>Type of analysis. Allows to switch between tree reconstruction by maximum |
---|
761 | likelihood and likelihood mapping.</TD> |
---|
762 | </TR> |
---|
763 | |
---|
764 | <TR VALIGN=TOP> |
---|
765 | <TD> |
---|
766 | <CENTER>c</CENTER> |
---|
767 | </TD> |
---|
768 | <TD>Number of rate categories (4-16) for the discrete Gamma distribution |
---|
769 | (rate heterogeneity).</TD> |
---|
770 | </TR> |
---|
771 | |
---|
772 | <TR VALIGN=TOP> |
---|
773 | <TD> |
---|
774 | <CENTER>d</CENTER> |
---|
775 | </TD> |
---|
776 | <TD>Data type. Specifies whether nucleotide, amino acid sequences, or |
---|
777 | two-state data serve as input. The default is automatically set by |
---|
778 | inspection of the input data. |
---|
779 | After TREE-PUZZLE has selected an appropriate data type (marked by 'Auto:') |
---|
780 | the 'd'-option changes the type in the following order: |
---|
781 | selected type -> Nucleotides -> Amino acids -> automatically selected type.</TD> |
---|
782 | </TR> |
---|
783 | |
---|
784 | <TR VALIGN=TOP> |
---|
785 | <TD> |
---|
786 | <CENTER>e</CENTER> |
---|
787 | </TD> |
---|
788 | <TD>Approximation option. Determines whether an approximate or the exact |
---|
789 | likelihood function is used to estimate parameters of the models of sequence |
---|
790 | evolution. The approximate likelihood function is in most cases sufficient |
---|
791 | and is faster.</TD> |
---|
792 | </TR> |
---|
793 | |
---|
794 | <TR VALIGN=TOP> |
---|
795 | <TD> |
---|
796 | <CENTER>f</CENTER> |
---|
797 | </TD> |
---|
798 | <TD>Base frequencies. The maximum likelihood calculation needs the frequency |
---|
799 | of each nucleotide (amino acid, doublet) as input. TREE-PUZZLE estimates these |
---|
800 | values from the sequence input data. This option allows specification of |
---|
801 | other values.</TD> |
---|
802 | </TR> |
---|
803 | |
---|
804 | <TR VALIGN=TOP> |
---|
805 | <TD> |
---|
806 | <CENTER>g</CENTER> |
---|
807 | </TD> |
---|
808 | <TD>Group sequences in clusters. Allows to define clusters of sequences |
---|
809 | as needed for the likelihood mapping analysis. Only available when likelihood |
---|
810 | mapping is selected ("b" option).</TD> |
---|
811 | </TR> |
---|
812 | |
---|
813 | <TR VALIGN=TOP> |
---|
814 | <TD> |
---|
815 | <CENTER>h</CENTER> |
---|
816 | </TD> |
---|
817 | <TD>Codon positions or definition of doublets. For nucleotide data only. |
---|
818 | If the TN or HKY model of substitution is used and the number of sites |
---|
819 | in the alignment is a multiple of three the analysis can be restricted |
---|
820 | to each of the three codon positions and to the 1st and 2nd positions. |
---|
821 | If the SH model is used this options allows to specify that the 1st and |
---|
822 | 2nd codon positions in the alignment define a doublet.</TD> |
---|
823 | </TR> |
---|
824 | |
---|
825 | <TR VALIGN=TOP> |
---|
826 | <TD> |
---|
827 | <CENTER>i</CENTER> |
---|
828 | </TD> |
---|
829 | <TD>Fraction of invariable sites. Probability of a site to be invariable. |
---|
830 | This parameter can be estimated from the data by TREE-PUZZLE |
---|
831 | (only if the approximation option for the likelihood function is |
---|
832 | turned off).</TD> |
---|
833 | </TR> |
---|
834 | |
---|
835 | <TR VALIGN=TOP> |
---|
836 | <TD> |
---|
837 | <CENTER>j</CENTER> |
---|
838 | </TD> |
---|
839 | <TD>List puzzling steps trees. Writes all intermediate trees (puzzling |
---|
840 | step trees) used to compute the quartet puzzling tree into a file, either |
---|
841 | as a list of topologies ordered by number of occurrences (*.ptorder), or |
---|
842 | as list about the chronological occurrence of the topologies (*.pstep), or |
---|
843 | both.</TD> |
---|
844 | </TR> |
---|
845 | |
---|
846 | <TR VALIGN=TOP> |
---|
847 | <TD> |
---|
848 | <CENTER>k</CENTER> |
---|
849 | </TD> |
---|
850 | <TD>Tree search. Determines how the overall tree is obtained. The topology |
---|
851 | is either computed with the quartet puzzling algorithm or is defined by |
---|
852 | the user. Maximum likelihood branch lengths will be computed for this tree. |
---|
853 | Alternatively, a maximum likelihood distance matrix only can also be computed |
---|
854 | (no overall tree). </TD> |
---|
855 | </TR> |
---|
856 | |
---|
857 | <TR VALIGN=TOP> |
---|
858 | <TD> |
---|
859 | <CENTER>l</CENTER> |
---|
860 | </TD> |
---|
861 | <TD>Location of root. Only for computation of clock-like maximum likelihood |
---|
862 | branch lengths. Allows to specify the branch where the root should be placed |
---|
863 | in an unrooted tree topology. For example, in the tree (a,b,(c,d)) l = |
---|
864 | 1 places the root at the branch leading to sequence a whereas l=5 places |
---|
865 | the root at the internal branch.</TD> |
---|
866 | </TR> |
---|
867 | |
---|
868 | <TR VALIGN=TOP> |
---|
869 | <TD> |
---|
870 | <CENTER>m</CENTER> |
---|
871 | </TD> |
---|
872 | <TD>Model of substitution. The following models are implemented for nucleotides: |
---|
873 | the <A HREF="#tamura1993">Tamura-Nei</A> (TN) model, |
---|
874 | the <A HREF="#hasegawa1985">Hasegawa et al.</A> (HKY) model, and |
---|
875 | the <A HREF="#schoeniger1994">Schoeniger & von Haeseler</A> (SH) model. |
---|
876 | The SH model describes the evolution of |
---|
877 | pairs of dependent nucleotides (pairs are the first and the second nucleotide, |
---|
878 | the third and the fourth nucleotide and so on). It allows for specification |
---|
879 | of the transition-transversion ratio. The original model |
---|
880 | (<A HREF="#schoeniger1994">Schoeniger & von Haeseler</A>, 1994) |
---|
881 | is obtained by setting the transition-transversion parameter to 0.5. |
---|
882 | The <A HREF="#jukes1969">Jukes-Cantor</A> (1969), |
---|
883 | the <A HREF="#felsenstein1981">Felsenstein</A> (1981), and |
---|
884 | the <A HREF="#kimura1980">Kimura</A> (1980) model are all special cases of |
---|
885 | the HKY model. |
---|
886 | <BR>For amino acid sequence data |
---|
887 | the <A HREF="#dayhoff1978">Dayhoff et al.</A> (Dayhoff) model, |
---|
888 | the <A HREF="#jones1992">Jones et al.</A> (JTT) model, |
---|
889 | the <A HREF="#adachi1996">Adachi and Hasegawa</A> (mtREV24) model, |
---|
890 | the <A HREF="#henikoff1992">Henikoff and Henikoff</A> (BLOSUM 62), |
---|
891 | the <A HREF="#mueller2000">Mueller and Vingron</A> (VT), and |
---|
892 | the <A HREF="#whelan2000">Whelan and Goldman</A> (WAG) substitution |
---|
893 | model are implemented in TREE-PUZZLE. |
---|
894 | The mtREV24 model describes the evolution of amino acids encoded on mtDNA, |
---|
895 | and BLOSUM 62 is for distantly related amino acid sequences, as well as the |
---|
896 | VT model. |
---|
897 | After TREE-PUZZLE has selected an appropriate amino acid substitution model |
---|
898 | (marked by 'Auto:') the 'm'-option changes the model in the following order: |
---|
899 | selected model -> Dayhoff -> JTT -> mtREV24 -> BLOSUM62 -> VT -> WAG -> |
---|
900 | automatically selected model |
---|
901 | <BR>For more information |
---|
902 | please read the section in this manual about models of sequence evolution. |
---|
903 | See also option "w" (model of rate heterogeneity).</TD> |
---|
904 | </TR> |
---|
905 | |
---|
906 | <TR VALIGN=TOP> |
---|
907 | <TD> |
---|
908 | <CENTER>n</CENTER> |
---|
909 | </TD> |
---|
910 | <TD>If tree reconstruction is selected: number of puzzling steps. Parameter |
---|
911 | of the quartet puzzling tree search. Generally, |
---|
912 | the more sequences are used the more puzzling steps are advised. The default |
---|
913 | value varies depending on the number of sequences (at least 1000).<br> |
---|
914 | |
---|
915 | If likelihood mapping is selected: number of quartets in a likelihood mapping analysis. Equal to the number |
---|
916 | of dots in the likelihood mapping diagram. By default 10000 dots/quartets |
---|
917 | are assumed. To use all possible quartets in clustered likelihood mapping |
---|
918 | you have to specify a value of n=0. |
---|
919 | </TD> |
---|
920 | </TR> |
---|
921 | |
---|
922 | <TR VALIGN=TOP> |
---|
923 | <TD> |
---|
924 | <CENTER>o</CENTER> |
---|
925 | </TD> |
---|
926 | <TD>Outgroup. For displaying purposes of the unrooted quartet puzzling |
---|
927 | tree only. The default outgroup is the first sequence of the data set.</TD> |
---|
928 | </TR> |
---|
929 | |
---|
930 | <TR VALIGN=TOP> |
---|
931 | <TD> |
---|
932 | <CENTER>p</CENTER> |
---|
933 | </TD> |
---|
934 | <TD>Constrain the TN model to the F84 model. This option is only available |
---|
935 | for the Tamura-Nei model. With this option the expected (!) transition-transversion |
---|
936 | ratio for the F84 model have to be entered and TREE-PUZZLE computes the corresponding |
---|
937 | parameters of the TN model (this depends on base frequencies of the data). |
---|
938 | This allows to compare the results of TREE-PUZZLE and the PHYLIP maximum likelihood |
---|
939 | programs which use the F84 model. |
---|
940 | </TD> |
---|
941 | </TR> |
---|
942 | |
---|
943 | <TR VALIGN=TOP> |
---|
944 | <TD> |
---|
945 | <CENTER>q</CENTER> |
---|
946 | </TD> |
---|
947 | <TD>Quits analysis.</TD> |
---|
948 | </TR> |
---|
949 | |
---|
950 | <TR VALIGN=TOP> |
---|
951 | <TD> |
---|
952 | <CENTER>r</CENTER> |
---|
953 | </TD> |
---|
954 | <TD>Y/R transition parameter. This option is only available for the TN |
---|
955 | model. This parameter is the ratio of the rates for pyrimidine transitions |
---|
956 | and purine transitions. You do not need to specify this parameter as TREE-PUZZLE |
---|
957 | estimates it from the data. For precise definition please read the section |
---|
958 | in this manual about models of sequence evolution.</TD> |
---|
959 | </TR> |
---|
960 | |
---|
961 | <TR VALIGN=TOP> |
---|
962 | <TD> |
---|
963 | <CENTER>s</CENTER> |
---|
964 | </TD> |
---|
965 | <TD>Symmetrize doublet frequencies. This option is only available for the |
---|
966 | SH model. With this option the doublet frequencies are symmetrized. For |
---|
967 | example, the frequencies of "AT" and "TA" are then set to the average of both |
---|
968 | frequencies.</TD> |
---|
969 | </TR> |
---|
970 | |
---|
971 | <TR VALIGN=TOP> |
---|
972 | <TD> |
---|
973 | <CENTER>t</CENTER> |
---|
974 | </TD> |
---|
975 | <TD>Transition/transversion parameter. For nucleotide data only. You do not |
---|
976 | need to specify this parameter as TREE-PUZZLE estimates it from the data. The |
---|
977 | precise definition of this parameter is given in the section on models |
---|
978 | of sequence evolution in this manual.</TD> |
---|
979 | </TR> |
---|
980 | |
---|
981 | <TR VALIGN=TOP> |
---|
982 | <TD> |
---|
983 | <CENTER>u</CENTER> |
---|
984 | </TD> |
---|
985 | <TD>Show unresolved quartets. During the quartet puzzling tree search TREE-PUZZLE |
---|
986 | counts the number of unresolved quartet trees. An unresolved quartet is |
---|
987 | a quartet where the maximum likelihood values for each of the three possible |
---|
988 | quartet topologies are so similar that it is not possible to prefer one |
---|
989 | of them (<A HREF="#strimmer1997">Strimmer, Goldman, and von Haeseler</A>, 1997). |
---|
990 | If this option is selected you will get a detailed list of all starlike |
---|
991 | quartets. Note, for some data |
---|
992 | sets there may be a lot of unresolved quartets. In this case a list of |
---|
993 | all unresolved quartets is probably not very useful and also needs a lot |
---|
994 | of disk space.</TD> |
---|
995 | </TR> |
---|
996 | |
---|
997 | <TR VALIGN=TOP> |
---|
998 | <TD> |
---|
999 | <CENTER>v</CENTER> |
---|
1000 | </TD> |
---|
1001 | <TD>Approximate quartet likelihood. For the quartet puzzling tree search |
---|
1002 | only. Only for very small data sets it is necessary to compute an exact |
---|
1003 | maximum likelihood. For larger data sets this option should always be turned |
---|
1004 | on.</TD> |
---|
1005 | </TR> |
---|
1006 | |
---|
1007 | <TR VALIGN=TOP> |
---|
1008 | <TD> |
---|
1009 | <CENTER>w</CENTER> |
---|
1010 | </TD> |
---|
1011 | <TD>Model of rate heterogeneity. TREE-PUZZLE provides several different models |
---|
1012 | of rate heterogeneity: uniform rate over all sites (rate homogeneity), |
---|
1013 | Gamma distributed rates, two rates (1 invariable + 1 variable), and a mixed |
---|
1014 | model (1 invariable rate + Gamma distributed rates). All necessary parameters |
---|
1015 | can be estimated by TREE-PUZZLE. Note that whenever invariable sites are taken |
---|
1016 | into account the parameter estimation will invoke the "e" option to use |
---|
1017 | an exact likelihood function. For more detailed information please read |
---|
1018 | the section in this manual about models of sequence evolution. See also |
---|
1019 | option "m" (model of substitution).</TD> |
---|
1020 | </TR> |
---|
1021 | |
---|
1022 | <TR VALIGN=TOP> |
---|
1023 | <TD> |
---|
1024 | <CENTER>x</CENTER> |
---|
1025 | </TD> |
---|
1026 | <TD>Selects the methods used in the estimation of the model parameters. |
---|
1027 | Neighbor-joining tree means that a NJ tree is used to estimate the parameters. |
---|
1028 | Quartet sampling means that a number of random sets of four sequences are |
---|
1029 | selected to estimate parameters.</TD> |
---|
1030 | </TR> |
---|
1031 | |
---|
1032 | <TR VALIGN=TOP> |
---|
1033 | <TD> |
---|
1034 | <CENTER>y</CENTER> |
---|
1035 | </TD> |
---|
1036 | <TD>Starts analysis.</TD> |
---|
1037 | </TR> |
---|
1038 | |
---|
1039 | <TR VALIGN=TOP> |
---|
1040 | <TD> |
---|
1041 | <CENTER>z</CENTER> |
---|
1042 | </TD> |
---|
1043 | <TD>Computation of clock-like maximum likelihood branch lengths. This option |
---|
1044 | also invokes the likelihood ratio clock test.</TD> |
---|
1045 | </TR> |
---|
1046 | </TABLE> |
---|
1047 | |
---|
1048 | <H2> |
---|
1049 | <A NAME="Other Features"></A>Other Features</H2> |
---|
1050 | For nucleotide data TREE-PUZZLE computes the expected transition/transversion |
---|
1051 | ratio and the expected pyrimidine transition/purine transition ratio |
---|
1052 | corresponding to the selected model. Base frequencies play an important |
---|
1053 | role in the calculation of both numbers. |
---|
1054 | |
---|
1055 | <P>TREE-PUZZLE also tests with a 5% level chi-square-test whether the base composition |
---|
1056 | of each sequence is identical to the average base composition of the whole |
---|
1057 | alignment. All sequences with deviating composition are listed in the TREE-PUZZLE |
---|
1058 | report file. It is desired that no sequence (possibly except for the outgroup) |
---|
1059 | has a deviating base composition. Otherwise a basic assumption implicit |
---|
1060 | in the maximum likelihood calculation is violated. |
---|
1061 | |
---|
1062 | <P>A hidden feature of TREE-PUZZLE (since version 2.5) is the employment of |
---|
1063 | a weighting scheme of quartets (<A HREF="#strimmer1997">Strimmer, Goldman, |
---|
1064 | and von Haeseler</A>, 1997) in the quartet puzzling tree search. |
---|
1065 | |
---|
1066 | <P>TREE-PUZZLE also computes the average distance between all pairs of sequences |
---|
1067 | (maximum likelihood distances). The average distances can be viewed as |
---|
1068 | a rough measure for the overall sequence divergence. |
---|
1069 | |
---|
1070 | <P>If more than one input tree is provided TREE-PUZZLE uses the |
---|
1071 | <A HREF="#kishino1989">Kishino-Hasegawa</A> test (1989) to check which |
---|
1072 | trees are significantly worse than the best tree. |
---|
1073 | |
---|
1074 | <P>If clock-like maximum-likelihood branch lengths are computed TREE-PUZZLE |
---|
1075 | checks with the help of a likelihood-ratio test |
---|
1076 | (<A HREF="#felsenstein1988">Felsenstein</A>, 1988) whether |
---|
1077 | the data set is clock-like. |
---|
1078 | |
---|
1079 | <P>TREE-PUZZLE also detects sequences that occur more than once in the data |
---|
1080 | and that therefore can be removed from the data set to speed up analysis. |
---|
1081 | |
---|
1082 | <P>If rate heterogeneity is taken into account in the analysis TREE-PUZZLE also |
---|
1083 | computes the most probable assignment of rate categories to sequence positions, |
---|
1084 | according <A HREF="#felsenstein1996">Felsenstein and Churchill</A> (1996). |
---|
1085 | |
---|
1086 | <H2> |
---|
1087 | <A NAME="Interpretation and Hints"></A>Interpretation and Hints</H2> |
---|
1088 | |
---|
1089 | <H3> |
---|
1090 | <A NAME="Quartet Puzzling Support Values"></A>Quartet Puzzling Support |
---|
1091 | Values</H3> |
---|
1092 | The quartet puzzling (QP) tree search estimates support values for each |
---|
1093 | internal branch. They can be interpreted in much the same way as |
---|
1094 | bootstrap values (though they should not be confused with them). |
---|
1095 | Branches showing a QP reliability from 90% to 100% can be considered |
---|
1096 | very strongly supported. Branches with lower reliability (> 70%) can |
---|
1097 | in principle be also trusted but in this case it is advisable to |
---|
1098 | check how well the respective internal branch does in comparison to other |
---|
1099 | branches in the tree (i.e. check relative reliability). |
---|
1100 | If you are interested in a branch with a low confidence it is also |
---|
1101 | important to check the alternative groupings that are not included |
---|
1102 | in the QP tree (they are listed in the TREE-PUZZLE report file in *.** format). |
---|
1103 | There should be a substantial gap between the lowest reliability |
---|
1104 | value of the QP tree and |
---|
1105 | the most frequent grouping that is not included in the QP tree. |
---|
1106 | <H3> |
---|
1107 | <A NAME="Percentage of Unresolved Quartets"></A>Percentage of Unresolved |
---|
1108 | Quartets</H3> |
---|
1109 | TREE-PUZZLE computes the number and the percentage of completely unresolved |
---|
1110 | maximum likelihood quartets. An unresolved quartet is a quartet where the |
---|
1111 | maximum likelihood values for each of the three possible quartet topologies |
---|
1112 | are so similar that it is not possible to prefer one of them |
---|
1113 | (<A HREF="#strimmer1997">Strimmer, Goldman, and von Haeseler</A>, 1997). |
---|
1114 | The percentage of the unresolved quartets |
---|
1115 | among all possible quartets is an indicator of the suitability of the data |
---|
1116 | for phylogenetic analysis. A high percentage usually results in a highly |
---|
1117 | multifurcating quartet puzzling tree. If you only have a few unresolved |
---|
1118 | quartets we recommend to invoke option "u" to get a list of all these quartets. |
---|
1119 | In a likelihood mapping analysis the percentage of completely unresolved |
---|
1120 | quartets is shown in the central region of the triangle diagram. |
---|
1121 | <H3> |
---|
1122 | <A NAME="Automatic Parameter Estimation"></A>Automatic Parameter Estimation</H3> |
---|
1123 | TREE-PUZZLE estimates both the parameters of the models of substitution (TN, |
---|
1124 | HKY) and of the model of rate variation (Gamma distribution, fraction of |
---|
1125 | invariable sites) without prior knowledge of an overall tree by a number |
---|
1126 | of different strategies based on maximum likelihood. For all estimated |
---|
1127 | parameters a corresponding standard error (S.E.) is computed. If you have |
---|
1128 | good arguments to choose a different set of parameters than the values |
---|
1129 | obtained by TREE-PUZZLE don't hesitate to use them. If sequences are extremely |
---|
1130 | similar it is very hard for every algorithm to extract information about |
---|
1131 | the model of substitution from the data set. Also, be careful if the |
---|
1132 | estimated parameter values |
---|
1133 | are very close to the internal upper and lower bounds: |
---|
1134 | <TABLE CELLPADDING=2 > |
---|
1135 | <TR VALIGN=TOP> |
---|
1136 | <TD><B>Parameter (Symbol)</B> </TD> |
---|
1137 | |
---|
1138 | <TD><B>Minimal Value</B> </TD> |
---|
1139 | |
---|
1140 | <TD><B>Maximal Value</B> </TD> |
---|
1141 | </TR> |
---|
1142 | |
---|
1143 | <TR VALIGN=TOP> |
---|
1144 | <TD>Transition/transversion parameter (t) </TD> |
---|
1145 | |
---|
1146 | <TD>0.20 </TD> |
---|
1147 | |
---|
1148 | <TD>30.00 </TD> |
---|
1149 | </TR> |
---|
1150 | |
---|
1151 | <TR VALIGN=TOP> |
---|
1152 | <TD>Y/R transition parameter (gamma) </TD> |
---|
1153 | |
---|
1154 | <TD>0.10 </TD> |
---|
1155 | |
---|
1156 | <TD>6.00 </TD> |
---|
1157 | </TR> |
---|
1158 | |
---|
1159 | <TR VALIGN=TOP> |
---|
1160 | <TD>Fraction of invariable sites (theta) </TD> |
---|
1161 | |
---|
1162 | <TD>0.00 </TD> |
---|
1163 | |
---|
1164 | <TD>0.99 </TD> |
---|
1165 | </TR> |
---|
1166 | |
---|
1167 | <TR VALIGN=TOP> |
---|
1168 | <TD>Gamma rate heterogeneity parameter (alpha) </TD> |
---|
1169 | |
---|
1170 | <TD>0.01 </TD> |
---|
1171 | |
---|
1172 | <TD>99 </TD> |
---|
1173 | </TR> |
---|
1174 | </TABLE> |
---|
1175 | |
---|
1176 | <H3> |
---|
1177 | <A NAME="Likelihood Mapping"></A>Likelihood Mapping</H3> |
---|
1178 | Likelihood mapping (<A HREF="#strimmer1997">Strimmer and von Haeseler</A>, |
---|
1179 | 1997) is a method to analyzethe support for internal branches in a tree |
---|
1180 | without having to compute an overall tree. |
---|
1181 | Every internal branch in an a completely resolved tree defines |
---|
1182 | up to four clusters of sequences. Sometimes only the relationship of these |
---|
1183 | groups are of interest and not details of the structure of the clusters |
---|
1184 | themselves. Then a likelihood mapping analysis is sufficient. |
---|
1185 | The corresponding likelihood mapping triangle diagrams (as contained in |
---|
1186 | various output files generated by TREE-PUZZLE) will |
---|
1187 | illucidate the possible relationships in detail. |
---|
1188 | |
---|
1189 | <H3><A NAME="Batch Mode"></A>Batch Mode</H3> |
---|
1190 | Running TREE-PUZZLE from a Unix batch file is straightforward despite the lack |
---|
1191 | of command switches. For example, to run TREE-PUZZLE with a the transition/transversion |
---|
1192 | parameter equal to 10 the following lines in a batch file are sufficient: |
---|
1193 | <PRE> |
---|
1194 | puzzle << ! |
---|
1195 | t |
---|
1196 | 10 |
---|
1197 | y |
---|
1198 | ! |
---|
1199 | </PRE> |
---|
1200 | All other parameters can also be accessed the same way. |
---|
1201 | |
---|
1202 | <H2> |
---|
1203 | <A NAME="Limits and Error Messages"></A>Limits and Error Messages</H2> |
---|
1204 | TREE-PUZZLE has a built-in limit to allow data sets only up to 257 sequences |
---|
1205 | in order to avoid overflow of internal integer variables. At least 32767 |
---|
1206 | sites should be possible depending on the compiler used. Computation time |
---|
1207 | will be the largest constraint even if sufficient computer memory is available. |
---|
1208 | If rate heterogeneity is taken into account every additional category slows |
---|
1209 | down the overall computation by the amount of time needed for one complete |
---|
1210 | run assuming rate homogeneity. |
---|
1211 | |
---|
1212 | <P>If problems are encountered TREE-PUZZLE terminates program execution and |
---|
1213 | returns a plain text error message. Depending on the severity errors can be |
---|
1214 | classified into three groups: |
---|
1215 | <TABLE CELLPADDING=2 > |
---|
1216 | <TR VALIGN=TOP> |
---|
1217 | <TD>"HALT " errors: </TD> |
---|
1218 | |
---|
1219 | <TD>Very severe. You should never ever see one of these messages. If so, |
---|
1220 | please contact the developers! </TD> |
---|
1221 | </TR> |
---|
1222 | |
---|
1223 | <TR VALIGN=TOP> |
---|
1224 | <TD>"Unable to proceed" errors: </TD> |
---|
1225 | |
---|
1226 | <TD>Harmless but annoying. Mostly memory errors (not enough RAM) or problems |
---|
1227 | with the format of the input files. </TD> |
---|
1228 | </TR> |
---|
1229 | |
---|
1230 | <TR VALIGN=TOP> |
---|
1231 | <TD>Other errors: </TD> |
---|
1232 | |
---|
1233 | <TD>Completely uncritical. Occur mostly when options of TREE-PUZZLE are being |
---|
1234 | set. </TD> |
---|
1235 | </TR> |
---|
1236 | </TABLE> |
---|
1237 | A standard machine (1996 Unix workstation) with 32 to 64 MB RAM TREE-PUZZLE |
---|
1238 | can easily do maximum likelihood tree searches including estimation of |
---|
1239 | support values for data sets with 50-100 sequences. As likelihood mapping |
---|
1240 | is not memory consuming and computationally quite fast it can be applied |
---|
1241 | to large data sets as well. |
---|
1242 | <H2> |
---|
1243 | <A NAME="Are Quartets Reliable"></A>Are Quartets Reliable?</H2> |
---|
1244 | Quartets may be intrinsically one of the most difficult phylogenies to |
---|
1245 | resolve accurately (cf. <A HREF="#hillis1996">Hillis</A>, 1996). |
---|
1246 | It has been asked whether this is |
---|
1247 | a problem for quartet puzzling because it works with quartets. |
---|
1248 | |
---|
1249 | <P>However, this is not true. According to Hillis' findings |
---|
1250 | (<A HREF="#hillis1996">Hillis</A>, 1996), |
---|
1251 | quartets can be hard, but extra information helps. That is, if all you |
---|
1252 | have are data on species (A, B, C, D) then it might be relatively difficult |
---|
1253 | to find the correct tree for them. But if you have additional data (species |
---|
1254 | E, F, G, ...) and try to find a tree for all the species, then that part |
---|
1255 | of the tree relating (A, B, C, D) will more likely be correct than if you |
---|
1256 | had just the data for (A, B, C, D). In Hillis' big 'model' tree, there |
---|
1257 | are many examples of subsets of 4 species which in themselves might be |
---|
1258 | hard to resolve correctly, but which are correctly resolved thanks to the |
---|
1259 | (...large amount of...) additional data. TREE-PUZZLE (quartet puzzling) also |
---|
1260 | gains advantage from extra data in the same way. It's 'understanding' or |
---|
1261 | resolution of the quartet (A, B, C, D) might be incorrect, but the information |
---|
1262 | on the relationships of (A, B, C, D) implicit in its treatment of (A, B, |
---|
1263 | C, E), (A, B, E, D), (A, E, C, D), (E, B, C, D), (A, B, C, F), (A, B, F, |
---|
1264 | D), (A, F, C, D), (F, B, C, D), (A, B, C, G), etc. etc. should overcome |
---|
1265 | this problem. |
---|
1266 | |
---|
1267 | <P>The facts about how well TREE-PUZZLE actually works have been investigated |
---|
1268 | in the <A HREF="#strimmer1996">Strimmer and von Haeseler</A> (1996) and |
---|
1269 | <A HREF="#strimmer1997">Strimmer, Goldman, and von Haeseler</A> (1997) papers. |
---|
1270 | Their results cannot be altered by Hillis' findings. |
---|
1271 | Considered as a heuristic search for maximum likelihood trees, quartet |
---|
1272 | puzzling works very well. |
---|
1273 | |
---|
1274 | <P>(This section follows N. Goldman, personal communication). |
---|
1275 | <H2> |
---|
1276 | <A NAME="Other Programs"></A>Other Programs</H2> |
---|
1277 | There are a number of other very useful and widespread programs to reconstruct |
---|
1278 | phylogenetic relationships and to analyse molecular sequence data that |
---|
1279 | are available free of charge. Here are the URLS of some web pages that |
---|
1280 | provide links to most of them (including the PHYLIP package and |
---|
1281 | the MOLPHY and PAML maximum likelihood programs): |
---|
1282 | <DL> |
---|
1283 | |
---|
1284 | <DD> |
---|
1285 | Joe Felsenstein's list of programs (well-organized and pretty exhaustive):<br> |
---|
1286 | <A |
---|
1287 | HREF="http://evolution.genetics.washington.edu/phylip/software.html">http://evolution.genetics.washington.edu/phylip/software.html</A></DD> |
---|
1288 | |
---|
1289 | |
---|
1290 | <DD> |
---|
1291 | "Tree of Life" software page:<br> |
---|
1292 | <A HREF="http://phylogeny.arizona.edu/tree/programs/programs.html">http://phylogeny.arizona.edu/tree/programs/programs.html</A></DD> |
---|
1293 | |
---|
1294 | |
---|
1295 | <DD> |
---|
1296 | European Bioinformatics Institute:<br> |
---|
1297 | <A HREF="http://www.ebi.ac.uk/biocat/biocat.html">http://www.ebi.ac.uk/biocat/biocat.html</A></DD> |
---|
1298 | |
---|
1299 | </DL> |
---|
1300 | |
---|
1301 | <H2> |
---|
1302 | <A NAME="Acknowledgements"></A>Acknowledgements</H2> |
---|
1303 | The maximum likelihood kernel of TREE-PUZZLE is an offspring of the program |
---|
1304 | NucML/ProtML version 2.2 by Jun Adachi and Masami Hasegawa (<A HREF="ftp://sunmh.ism.ac.jp/pub/molphy">ftp://sunmh.ism.ac.jp/pub/molphy</A>). |
---|
1305 | We thank them for generously allowing us to use the source code of their |
---|
1306 | program. |
---|
1307 | We would also like to thank |
---|
1308 | the <A HREF="http://www.ebi.ac.uk">European Bioinformatics Institute (EBI)</A>, |
---|
1309 | the <A HREF="http://www.pasteur.fr">Institut Pasteur</A>, |
---|
1310 | and the <A HREF="http://www.indiana.edu">University of Indiana</A> |
---|
1311 | (i.e. Don Gilbert) |
---|
1312 | for kindly distributing the TREE-PUZZLE program. |
---|
1313 | |
---|
1314 | We thank Stephane Bortzmeyer for his with debugging of |
---|
1315 | <EM>floating point exception</EM> errors. |
---|
1316 | |
---|
1317 | We also thank Peter Foster for pointing out the inconsistency |
---|
1318 | in the invariable site models in respect to other programs. |
---|
1319 | |
---|
1320 | Finally we thank the |
---|
1321 | <A HREF="http://www.dfg.de">Deutsche Forschungsgemeinschaft</A> |
---|
1322 | (VI 160/3-1 and Ha 1628/4-1) and the Max-Planck-Society |
---|
1323 | for financial support. |
---|
1324 | |
---|
1325 | <H2><A NAME="References"></A>References</H2> |
---|
1326 | |
---|
1327 | <A NAME="adachi1996"></A> |
---|
1328 | Adachi, J., and M. Hasegawa. 1996. MOLPHY: programs for molecular phylogenetics, |
---|
1329 | version 2.3. Institute of Statistical Mathematics, Tokyo. |
---|
1330 | |
---|
1331 | <P><A NAME="adachi1996"></A> |
---|
1332 | Adachi, J., and M. Hasegawa. 1996. Model of amino acid substitution |
---|
1333 | in proteins encoded by mitochondrial DNA. <I>J. Mol. Evol.</I> <B>42</B>: |
---|
1334 | 459-468. |
---|
1335 | |
---|
1336 | <P><A NAME="dayhoff1978"></A> |
---|
1337 | Dayhoff, M. O., R. M. Schwartz, and B. C. Orcutt. 1978. A model of evolutionary |
---|
1338 | change in proteins. In: Dayhoff, M. O. (ed.) Atlas of Protein Sequence |
---|
1339 | Structure, Vol. 5, Suppl. 3. National Biomedical Research Foundation, Washington |
---|
1340 | DC, pp. 345-352. |
---|
1341 | |
---|
1342 | <P><A NAME="felsenstein1981"></A> |
---|
1343 | Felsenstein, J. 1981. Evolutionary trees from DNA sequences: A maximum |
---|
1344 | likelihood approach. <I>J. Mol. Evol.</I> <B>17</B>: 368-376. |
---|
1345 | |
---|
1346 | <P><A NAME="felsenstein1984"></A> |
---|
1347 | Felsenstein, J. 1984. Distance methods for inferring phylogenies: |
---|
1348 | A Justification. <I>Evolution</I> <B>38</B>: 16-24. |
---|
1349 | |
---|
1350 | <P><A NAME="felsenstein1988"></A> |
---|
1351 | Felsenstein, J. 1988. Phylogenies from molecular sequences: Inference |
---|
1352 | and reliability. <I>Annu. Rev. Genet.</I> <B>22</B>: 521-565. |
---|
1353 | |
---|
1354 | <P><A NAME="felsenstein1993"></A> |
---|
1355 | Felsenstein, J. 1993. PHYLIP (Phylogeny Inference Package) version 3.5c. |
---|
1356 | Distributed by the author. Department of Genetics, University of Washington, |
---|
1357 | Seattle. |
---|
1358 | |
---|
1359 | <P><A NAME="felsenstein1996"></A> |
---|
1360 | Felsenstein, J., and G.A. Churchill. 1996. A hidden Markov model approach |
---|
1361 | to variation among sites in rate of evolution. <I>Mol. Biol. Evol.</I> |
---|
1362 | <B>13</B>: 93-104. |
---|
1363 | |
---|
1364 | <P><A NAME="gropp1998"></A> |
---|
1365 | Gropp, W., S. Huss-Lederman, A. Lumsdaine, E. Lusk, B. Nitzberg, |
---|
1366 | W. Saphir, and M. Snir. 1998. MPI - The Complete Reference: Volume 2, |
---|
1367 | The MPI Extensions. 2nd Edition, The MIT Press, Cambridge, MA. |
---|
1368 | |
---|
1369 | <P><A NAME="gu1995"></A> |
---|
1370 | Gu, X., Y.-X. Fu, and W.-H. Li. 1995. Maximum likelihood estimation |
---|
1371 | of the heterogeneity of substitution rate among nucleotide sites. <I>Mol. |
---|
1372 | Biol. Evol.</I> <B>12</B>: 546-557. |
---|
1373 | |
---|
1374 | <P><A NAME="hasegawa1985"></A |
---|
1375 | >Hasegawa, M., H. Kishino, and K. Yano. 1985. Dating of the human-ape |
---|
1376 | splitting by a molecular clock of mitochondrial DNA. <I>J. Mol. Evol.</I> |
---|
1377 | <B>22</B>: 160-174. |
---|
1378 | |
---|
1379 | <P><A NAME="henikoff1992"></A> |
---|
1380 | Henikoff, S., J. G. Henikoff. 1992. Amino acid substitution matrices |
---|
1381 | from protein blocks. <I>PNAS (USA)</I> <B>89</B>:10915-10919. |
---|
1382 | |
---|
1383 | <P><A NAME="hillis1996"></A> |
---|
1384 | Hillis, D. M. 1996. Inferring complex phylogenies. <I>Nature</I> |
---|
1385 | <B>383</B>:130-131. |
---|
1386 | |
---|
1387 | <P><A NAME="jukes1969"></A> |
---|
1388 | Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. |
---|
1389 | In: Munro, H. N. (ed.) Mammalian Protein Metabolism, New York: Academic |
---|
1390 | Press, pp. 21-132. |
---|
1391 | |
---|
1392 | <P><A NAME="jones1992"></A> |
---|
1393 | Jones, D. T., W. R. Taylor, and J. M. Thornton. 1992. The rapid generation |
---|
1394 | of mutation data matrices from protein sequences. <I>CABIOS</I> <B>8</B>: |
---|
1395 | 275-282. |
---|
1396 | |
---|
1397 | <P><A NAME="kimura1980"></A> |
---|
1398 | Kimura, M. 1980. A simple method for estimating evolutionary rates of |
---|
1399 | base substitutions through comparative studies of nucleotide sequences. |
---|
1400 | <I>J. Mol. Evol.</I> <B>16</B>: 111-120. |
---|
1401 | |
---|
1402 | <P><A NAME="kishino1989"></A> |
---|
1403 | Kishino, H., and M. Hasegawa. 1989. Evaluation of the maximum likelihood |
---|
1404 | estimate of the evolutionary tree topologies from DNA sequence data, and |
---|
1405 | the branching order in Hominoidea. <I>J. Mol. Evol.</I> <B>29</B>: 170-179. |
---|
1406 | |
---|
1407 | <P><A NAME="mueller2000"></A> |
---|
1408 | Mueller, T., and M. Vingron. 2000. Modeling Amino Acid Replacement. |
---|
1409 | <I>J. Comp. Biol.</I>, to appear |
---|
1410 | (<A HREF="http://www.dkfz-heidelberg.de/tbi/people/tmueller/paper/paper.ps">preprint of the article</A>) |
---|
1411 | |
---|
1412 | <P><A NAME="saitou1987"></A> |
---|
1413 | Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method |
---|
1414 | for reconstructing phylogenetic trees. <I>Mol. Biol. Evol.</I> <B>4</B>: |
---|
1415 | 1406-425. |
---|
1416 | |
---|
1417 | <P><A NAME="schoeniger1994"></A> |
---|
1418 | Schoeniger, M., and A. von Haeseler. 1994. A stochastic model for |
---|
1419 | the evolution of autocorrelated DNA sequences. <I>Mol. Phyl. Evol.</I> |
---|
1420 | <B>3</B>: 240-247. |
---|
1421 | |
---|
1422 | <P><A NAME="snir1998"></A> |
---|
1423 | Snir, M., S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra. |
---|
1424 | 1998. MPI - The Complete Reference: Volume 1, The MPI Core. 2nd Edition, |
---|
1425 | The MIT Press, Cambridge, MA. |
---|
1426 | |
---|
1427 | <P><A NAME="strimmer1996"></A> |
---|
1428 | Strimmer, K., and A. von Haeseler. 1996. Quartet puzzling: a quartet |
---|
1429 | maximum likelihood method for reconstructing tree topologies. <I>Mol. Biol. |
---|
1430 | Evol.</I> <B>13</B>: 964-969. |
---|
1431 | |
---|
1432 | <P><A NAME="strimmer1997"></A> |
---|
1433 | Strimmer, K., N. Goldman, and A. von Haeseler. 1997. Bayesian probabilities |
---|
1434 | and quartet puzzling. <I>Mol. Biol. Evol.</I> <B>14</B>: 210-211. |
---|
1435 | |
---|
1436 | <P><A NAME="strimmer1997"></A> |
---|
1437 | Strimmer, K., and A. von Haeseler. 1997. Likelihood-mapping: a simple |
---|
1438 | method to visualize phylogenetic content of a sequence alignment. <I>PNAS |
---|
1439 | (USA).</I> <B>94</B>:6815-6819. |
---|
1440 | |
---|
1441 | <P><A NAME="tamura1993"></A> |
---|
1442 | Tamura, K., and M. Nei. 1993. Estimation of the number of nucleotide |
---|
1443 | substitutions in the control region of mitochondrial DNA in humans and |
---|
1444 | chimpanzees. <I>Mol. Biol. Evol.</I> <B>10</B>: 512-526. |
---|
1445 | |
---|
1446 | <P><A NAME="tamura1994"></A> |
---|
1447 | Tamura K. 1994. Model selection in the estimation of the number of |
---|
1448 | nucleotide substitutions. <I>Mol. Biol. Evol.</I> <B>11</B>: 154-157. |
---|
1449 | |
---|
1450 | <P><A NAME="thompson1994"></A> |
---|
1451 | Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: Improving |
---|
1452 | the sensitivity of progressive multiple sequence alignment through sequence |
---|
1453 | weighting, positions-specific gap penalties and weight matrix choice. <I>Nucl. |
---|
1454 | Acids Res.</I> <B>22</B>: 4673-4680. |
---|
1455 | |
---|
1456 | <P><A NAME="whelan2000"></A> |
---|
1457 | Whelan, S. and Goldman, N. 2000. A new empirical model of |
---|
1458 | amino acid evolution. <I>Manuscript in prep.</I> |
---|
1459 | |
---|
1460 | <P><A NAME="yang1994"></A> |
---|
1461 | Yang, Z. 1994. Maximum likelihood phylogenetic estimation from DNA sequences |
---|
1462 | with variable rates over sites: approximate methods. <I>J. Mol. Evol.</I> |
---|
1463 | <B>39</B>:306-314. |
---|
1464 | |
---|
1465 | |
---|
1466 | <H2> |
---|
1467 | <A NAME="Known Bugs"></A>Known Bugs</H2> |
---|
1468 | |
---|
1469 | On Alpha based computers sometimes <EM>floating point exception</EM> |
---|
1470 | errors occur. Some of those result on a bug in the malloc routine |
---|
1471 | in the system routines of the Compaq operating system. We recomend |
---|
1472 | to use the GNU cc compiler |
---|
1473 | (<TT><A HREF="http://egcs.gnu.org">http://egcs.gnu.org</A></TT>), |
---|
1474 | which does not use the system malloc routine. |
---|
1475 | |
---|
1476 | For other occurrances of the <EM>floating point exception</EM> |
---|
1477 | we need datasets and information about the operating system |
---|
1478 | to reproduce and debug those errors. |
---|
1479 | |
---|
1480 | <H2> |
---|
1481 | <A NAME="Version History"></A>Version History</H2> |
---|
1482 | The TREE-PUZZLE program has first been distributed in 1995 under the name |
---|
1483 | PUZZLE. Since then it has |
---|
1484 | been continually improved. Here is a list of the most important changes. |
---|
1485 | <TABLE CELLPADDING=2 > |
---|
1486 | |
---|
1487 | <TR VALIGN=TOP> |
---|
1488 | <TD>5.0</TD> |
---|
1489 | |
---|
1490 | <TD>Puzzle tree reconstruction part parallelized using the MPI standard |
---|
1491 | (Message Passing Interface). |
---|
1492 | <BR>Possibility added to give input file and user tree file at the command line. |
---|
1493 | Output files renamed to the form PREFIX.EXTENSION, where PREFIX is the |
---|
1494 | input file name or, if used, the user tree file name. |
---|
1495 | The EXTENSION could be one of the following: puzzle (PUZZLE report), |
---|
1496 | tree (tree file), dist (ML distance file), eps (likelihood mapping output |
---|
1497 | in eps format), qlist (bad quartets), qstep (puzzling step tree IDs as they |
---|
1498 | occur in the analysis), or qtorder (sorted unique list of puzzling step trees). |
---|
1499 | <BR>The likelihood value is added to the treefile as a leading comment |
---|
1500 | ("[ lh=x.xxx ]") to the tree string. |
---|
1501 | <BR>VT (variable time) matrix (<A HREF="#mueller2000">Mueller and |
---|
1502 | Vingron</A>, 2000) and WAG matrix (<A HREF="#whelan2000">Whelan and |
---|
1503 | Goldman</A>, 2000) |
---|
1504 | added to the AA substitution models. |
---|
1505 | <BR>The Data type and AA-model options in the menu now show the |
---|
1506 | automatically set type/model first. These can now be changed using 'd' or |
---|
1507 | 'm' key in an order independent from the type/model selected. This makes |
---|
1508 | it possible to select a desired AA substitution model or data type by |
---|
1509 | piping letters to the standard input without knowing PUZZLE's preselection. |
---|
1510 | <BR>Parameters are written to file when estimated before evaluation of |
---|
1511 | the quartets. |
---|
1512 | <BR>The inconsistency to respect to other programs in handling |
---|
1513 | invariable sites has been fixed. |
---|
1514 | <BR>Some minor bug fixes (e.g. the clockbug and another in the optimization |
---|
1515 | routine have been fixed). |
---|
1516 | </TD> |
---|
1517 | </TR> |
---|
1518 | |
---|
1519 | <TR VALIGN=TOP> |
---|
1520 | <TD>4.0.2</TD> |
---|
1521 | |
---|
1522 | <TD>Update to provide precompiled Windows 95/98/NT executables. In addition: |
---|
1523 | Internal rearrangement of rate matrices. |
---|
1524 | Improved BLOSUM 62 matrix. Endless input loop for input |
---|
1525 | files restricted to 10 trials. |
---|
1526 | Source code clean up to remove compile time warnings. |
---|
1527 | Explicit quit option in menu. Changes in NJ tree code. |
---|
1528 | Updates of documentation (address changes, correction of errors). |
---|
1529 | </TD> |
---|
1530 | </TR> |
---|
1531 | |
---|
1532 | <TR VALIGN=TOP> |
---|
1533 | <TD>4.0.1</TD> |
---|
1534 | |
---|
1535 | <TD>Maintenance release. Correction of mtREV matrix. Fix of the "intree bug". |
---|
1536 | Removal of stringent runtime-compatibility check to allow out-of-the-box compile |
---|
1537 | on alpha. More accurate gamma distribution allowing 16 instead of 8 categories |
---|
1538 | and ensuring a better alpha > 1.0. Update of documentation (mainly address changes). |
---|
1539 | More Unix-like file layout, and change of license to GPL. |
---|
1540 | </TD> |
---|
1541 | </TR> |
---|
1542 | |
---|
1543 | <TR VALIGN=TOP> |
---|
1544 | <TD>4.0 </TD> |
---|
1545 | |
---|
1546 | <TD>Executables for Windows 95/NT and OS/2 instead of MS-DOS. Computation |
---|
1547 | of clock-like branch lengths (also for amino acids and for non-binary trees). |
---|
1548 | Automatic likelihood ratio clock test. Model for two-state sequences data |
---|
1549 | (0,1) included. Display of most probable assignment of rates to sites. |
---|
1550 | Identification of groups of identical sequences. Possibility to read multiple |
---|
1551 | input trees. Kishino-Hasegawa test to check whether trees are significantly |
---|
1552 | different. BLOSUM 62 model of amino acid substitution |
---|
1553 | (<A HREF="#henikoff1992">Henikoff-Henikoff</A>, 1992). |
---|
1554 | Use of parameter alpha instead of eta = 1/(1+alpha) (for rate heterogeneity). |
---|
1555 | |
---|
1556 | Improvements to user interface. SH model can be applied to 1st and 2nd |
---|
1557 | codon positions. Automatic check for compatible compiler settings. Workaround |
---|
1558 | for severe runtime problem when the gcc compiler was used.</TD> |
---|
1559 | </TR> |
---|
1560 | |
---|
1561 | <TR VALIGN=TOP> |
---|
1562 | <TD>3.1 </TD> |
---|
1563 | |
---|
1564 | <TD>Much improved user interface to rate heterogeneity (less confusing |
---|
1565 | menu, rearranged outfile, additional out-of-range check). Possibility to |
---|
1566 | read rooted input trees (automatic removal of basal bifurcation). Computation |
---|
1567 | of average distance between all pairs of sequences. Fix of a bug that caused |
---|
1568 | PUZZLE 3.0 to crash on some systems (DEC Alpha). Cosmetic changes in program |
---|
1569 | and documentation. </TD> |
---|
1570 | </TR> |
---|
1571 | |
---|
1572 | <TR VALIGN=TOP> |
---|
1573 | <TD>3.0 </TD> |
---|
1574 | |
---|
1575 | <TD>Rate heterogeneity included in all models of substitution (Gamma distribution |
---|
1576 | plus invariable sites). Likelihood mapping analysis with Postscript output |
---|
1577 | added. Much more sophisticated maximum likelihood parameter estimation |
---|
1578 | for all model parameters including those of rate heterogeneity. Codon positions |
---|
1579 | selectable. Update to mtREV24. New icon. Less verbose runtime messages. |
---|
1580 | HTML documentation. Better internal error classification. More information |
---|
1581 | in outfile (number of constant positions etc.). </TD> |
---|
1582 | </TR> |
---|
1583 | |
---|
1584 | <TR VALIGN=TOP> |
---|
1585 | <TD>2.5.1 </TD> |
---|
1586 | |
---|
1587 | <TD>Fix of a bug (present only in version 2.5) related to computation of |
---|
1588 | the variance of the maximum likelihood branch lengths that caused occasional |
---|
1589 | crashes of PUZZLE on some systems when applied to data sets containing many |
---|
1590 | very similar sequences. Drop of support for non-FPU Macintosh version. |
---|
1591 | Corrections in manual. </TD> |
---|
1592 | </TR> |
---|
1593 | |
---|
1594 | <TR VALIGN=TOP> |
---|
1595 | <TD>2.5 </TD> |
---|
1596 | |
---|
1597 | <TD>Improved QP algorithm (<A HREF="#strimmer1997">Strimmer, Goldman, and |
---|
1598 | von Haeseler</A>, 1997). Bug |
---|
1599 | fixes in ML engine, computation of ML distances and ML branch lengths, |
---|
1600 | optional input of a user tree, F84 model added, estimation of all TN model |
---|
1601 | parameters and corresponding standard errors, CLUSTAL W treefile convention |
---|
1602 | adopted to allow to show branch lengths and QP support values simultaneously, |
---|
1603 | display of unresolved quartets, update of mtREV matrix, source code more |
---|
1604 | compatible with some almost-ANSI compilers, more safety checks in the code. </TD> |
---|
1605 | </TR> |
---|
1606 | |
---|
1607 | <TR VALIGN=TOP> |
---|
1608 | <TD>2.4 </TD> |
---|
1609 | |
---|
1610 | <TD>Automatic data type recognition, chi-square-test on base composition, |
---|
1611 | automatic selection of best amino acid model, estimation of transition-transversion |
---|
1612 | parameter, ASCII plot of quartet puzzling tree into the outfile. </TD> |
---|
1613 | </TR> |
---|
1614 | |
---|
1615 | <TR VALIGN=TOP> |
---|
1616 | <TD>2.3 </TD> |
---|
1617 | |
---|
1618 | <TD>More models, many usability improvements, built-in consensus tree routines, |
---|
1619 | more supported systems, bug fixes, no more dependencies of input order. |
---|
1620 | First EBI distributed version. </TD> |
---|
1621 | </TR> |
---|
1622 | |
---|
1623 | <TR VALIGN=TOP> |
---|
1624 | <TD>2.2 </TD> |
---|
1625 | |
---|
1626 | <TD>Optimized internal data structure requiring much less computer memory. |
---|
1627 | Bug fixes. </TD> |
---|
1628 | </TR> |
---|
1629 | |
---|
1630 | <TR VALIGN=TOP> |
---|
1631 | <TD>2.1 </TD> |
---|
1632 | |
---|
1633 | <TD>Bug fixes concerning algorithm and transition/transversion parameter. </TD> |
---|
1634 | </TR> |
---|
1635 | |
---|
1636 | <TR VALIGN=TOP> |
---|
1637 | <TD>2.0 </TD> |
---|
1638 | |
---|
1639 | <TD>Complete revision merging the maximum likelihood and the quartet puzzling |
---|
1640 | routines into one user friendly program. First electronic distribution. </TD> |
---|
1641 | </TR> |
---|
1642 | |
---|
1643 | <TR VALIGN=TOP> |
---|
1644 | <TD>1.0 </TD> |
---|
1645 | |
---|
1646 | <TD>First public release, presented at the 1995 phylogenetic workshop (15-17 |
---|
1647 | June 1995) at the University of Bielefeld, Germany. </TD> |
---|
1648 | </TR> |
---|
1649 | </TABLE> |
---|
1650 | |
---|
1651 | </BODY> |
---|
1652 | </HTML> |
---|
1653 | |
---|