| 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> |
|---|
| 2 | |
|---|
| 3 | <HTML> |
|---|
| 4 | <!-- To view this document properly please use a HTML browser --> |
|---|
| 5 | |
|---|
| 6 | <HEAD> |
|---|
| 7 | <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> |
|---|
| 8 | <TITLE>Documentation of TREE-PUZZLE 5.0</TITLE> |
|---|
| 9 | </HEAD> |
|---|
| 10 | <BODY BGCOLOR="#FFFFFF"> |
|---|
| 11 | |
|---|
| 12 | <H1> |
|---|
| 13 | <img ALT="PUZZLE Logo" SRC="puzzle.gif" HSPACE=10 BORDER=0 height=32 width=32 align=LEFT> |
|---|
| 14 | <b><font size="+3">TREE-PUZZLE Manual</font></b> |
|---|
| 15 | <img ALT="PPUZZLE Logo" SRC="ppuzzle.gif" HSPACE=10 BORDER=0 height=32 width=32> |
|---|
| 16 | </H1> |
|---|
| 17 | <B>Maximum likelihood analysis for nucleotide, amino acid, and two-state data</B> |
|---|
| 18 | |
|---|
| 19 | |
|---|
| 20 | <P>Version 5.0 |
|---|
| 21 | <BR>October 2000 |
|---|
| 22 | <BR>Copyright 1999-2000 by Heiko A. Schmidt, Korbinian Strimmer, Martin Vingron, and Arndt von Haeseler |
|---|
| 23 | <BR>Copyright 1995-1999 by Korbinian Strimmer and Arndt von Haeseler |
|---|
| 24 | |
|---|
| 25 | <P><b>Heiko A. Schmidt</b>, |
|---|
| 26 | email: h.schmidt@dkfz-heidelberg.de, |
|---|
| 27 | <A HREF="http://www.dkfz-heidelberg.de/tbi/">Theoretical Bioinformatics</A>, |
|---|
| 28 | <A HREF="http://www.dkfz-heidelberg.de/">DKFZ</A>, |
|---|
| 29 | Im Neuenheimer Feld 280, D-69124 Heidelberg, Germany. |
|---|
| 30 | |
|---|
| 31 | <P><b>Korbinian Strimmer</b>, |
|---|
| 32 | email: korbinian.strimmer@zoo.ox.ac.uk, |
|---|
| 33 | <A HREF="http://www.zoo.ox.ac.uk/">Department of Zoology</A>, |
|---|
| 34 | <A HREF="http://www.ox.ac.uk/">University of Oxford</A>, |
|---|
| 35 | South Parks Road, Oxford OX1 3PS, UK. |
|---|
| 36 | |
|---|
| 37 | <P><b>Martin Vingron</b>, |
|---|
| 38 | email: vingron@dkfz-heidelberg.de, |
|---|
| 39 | <A HREF="http://www.dkfz-heidelberg.de/tbi/">Theoretical Bioinformatics</A>, |
|---|
| 40 | <A HREF="http://www.dkfz-heidelberg.de/">DKFZ</A>, |
|---|
| 41 | Im Neuenheimer Feld 280, D-69124 Heidelberg, Germany. |
|---|
| 42 | |
|---|
| 43 | <P><b>Arndt von Haeseler</b>, |
|---|
| 44 | email: haeseler@eva.mpg.de, |
|---|
| 45 | <A HREF="http://www.eva.mpg.de/">Max-Planck-Institute for Evolutionary Anthropology</A>, |
|---|
| 46 | Inselstr. 22, D-04103 Leipzig, Germany. |
|---|
| 47 | |
|---|
| 48 | <p><font size ="-1" color ="brown">The official name of the program has been |
|---|
| 49 | changed to TREE-PUZZLE to avoid legal conflict with the Fraunhofer |
|---|
| 50 | Gesellschaft. We are sorry for any inconvenience this may cause to you. |
|---|
| 51 | Any reference to PUZZLE in this package is only colloquial and refers |
|---|
| 52 | to TREE-PUZZLE. |
|---|
| 53 | </font> |
|---|
| 54 | |
|---|
| 55 | <P>TREE-PUZZLE is a computer program to reconstruct phylogenetic trees from |
|---|
| 56 | molecular sequence data by maximum likelihood. It implements a fast tree |
|---|
| 57 | search algorithm, quartet puzzling, that allows analysis of large data |
|---|
| 58 | sets and automatically assigns estimations of support to each internal |
|---|
| 59 | branch. TREE-PUZZLE also computes pairwise maximum likelihood distances as well |
|---|
| 60 | as branch lengths for user specified trees. Branch lengths can also be |
|---|
| 61 | calculated under the clock-assumption. In addition, TREE-PUZZLE offers a novel |
|---|
| 62 | method, likelihood mapping, to investigate the support of a hypothesized |
|---|
| 63 | internal branch without computing an overall tree and to visualize the |
|---|
| 64 | phylogenetic content of a sequence alignment. TREE-PUZZLE also conducts a number |
|---|
| 65 | of statistical tests on the data set (chi-square test for homogeneity of |
|---|
| 66 | base composition, likelihood ratio to test the clock hypothesis, Kishino-Hasegawa |
|---|
| 67 | test). The models of substitution provided by TREE-PUZZLE are TN, HKY, F84, |
|---|
| 68 | SH for nucleotides, Dayhoff, JTT, mtREV24, BLOSUM 62, VT, WAG for amino acids, and |
|---|
| 69 | F81 for two-state data. Rate heterogeneity is modelled by a discrete Gamma |
|---|
| 70 | distribution and by allowing invariable sites. The corresponding parameters |
|---|
| 71 | can be inferred from the data set. |
|---|
| 72 | |
|---|
| 73 | <P>TREE-PUZZLE is available free of charge from |
|---|
| 74 | <DL> |
|---|
| 75 | <DD> |
|---|
| 76 | <A HREF="http://www.tree-puzzle.de/">http://www.tree-puzzle.de/</A> (TREE-PUZZLE home page) |
|---|
| 77 | </DD> |
|---|
| 78 | |
|---|
| 79 | <DD> |
|---|
| 80 | <A HREF="http://www.dkfz-heidelberg.de/tbi/tree-puzzle/">http://www.dkfz-heidelberg.de/tbi/tree-puzzle/</A> (TREE-PUZZLE home page mirror at DKFZ) |
|---|
| 81 | </DD> |
|---|
| 82 | |
|---|
| 83 | <DD> |
|---|
| 84 | <A HREF="http://iubio.bio.indiana.edu/soft/molbio/evolve">http://iubio.bio.indiana.edu/soft/molbio/evolve</A> |
|---|
| 85 | (IUBio archive www, USA) |
|---|
| 86 | </DD> |
|---|
| 87 | |
|---|
| 88 | <DD> |
|---|
| 89 | <A HREF="ftp://iubio.bio.indiana.edu/molbio/evolve">ftp://iubio.bio.indiana.edu/molbio/evolve</A> |
|---|
| 90 | (IUBio archive ftp, USA) |
|---|
| 91 | </DD> |
|---|
| 92 | |
|---|
| 93 | <DD> |
|---|
| 94 | <A HREF="ftp://ftp.ebi.ac.uk/pub/software">ftp://ftp.ebi.ac.uk/pub/software</A> |
|---|
| 95 | (European Bioinformatics Institute, UK) |
|---|
| 96 | </DD> |
|---|
| 97 | |
|---|
| 98 | <DD> |
|---|
| 99 | <A HREF="ftp://ftp.pasteur.fr/pub/GenSoft">ftp://ftp.pasteur.fr/pub/GenSoft</A> |
|---|
| 100 | (Institut Pasteur, France) |
|---|
| 101 | </DD> |
|---|
| 102 | |
|---|
| 103 | </DL> |
|---|
| 104 | TREE-PUZZLE is written in ANSI C. It will run on most personal computers and |
|---|
| 105 | workstations if compiled by an appropriate C compiler. |
|---|
| 106 | The tree reconstruction part of TREE-PUZZLE has been parallelized using |
|---|
| 107 | the Message Passing Interface (MPI) |
|---|
| 108 | library standard (<A HREF="#snir1998">Snir et al.</A>, 1998 and |
|---|
| 109 | <A HREF="#gropp1998">Gropp et al.</A>, 1998). If desired to run |
|---|
| 110 | TREE-PUZZLE in parallel you need an implementation of the MPI library on your |
|---|
| 111 | system as well. |
|---|
| 112 | |
|---|
| 113 | <P>Please read the <A HREF="#Installation">installation section</A> |
|---|
| 114 | for more details. |
|---|
| 115 | |
|---|
| 116 | <P>We suggest that this documentation should be read before using TREE-PUZZLE |
|---|
| 117 | the first time. If you do not have the time to read this manual completely |
|---|
| 118 | please do read at least the sections <A HREF="#Input/Output Conventions">Input/Output |
|---|
| 119 | Conventions</A> and <A HREF="#Quick Start">Quick Start </A>below. Then |
|---|
| 120 | you should be able to use the TREE-PUZZLE program, especially if you have some |
|---|
| 121 | experience with the PHYLIP programs. The other sections should then be read |
|---|
| 122 | at a later time. |
|---|
| 123 | |
|---|
| 124 | <P>To find out what's new in version 5.0 please read the |
|---|
| 125 | <A HREF="#Version History">Version History</A>. |
|---|
| 126 | |
|---|
| 127 | <P> |
|---|
| 128 | <HR ALIGN=center WIDTH="100%" SIZE=2> |
|---|
| 129 | <CENTER><H2>Contents</H2></CENTER><P> |
|---|
| 130 | |
|---|
| 131 | <UL> |
|---|
| 132 | <LI><A HREF="#Legal Stuff">Legal Stuff</A></LI> |
|---|
| 133 | |
|---|
| 134 | <LI><A HREF="#Installation">Installation</A> |
|---|
| 135 | <UL> |
|---|
| 136 | <LI><A HREF="#Unix">UNIX</A></LI> |
|---|
| 137 | <LI><A HREF="#MacOS">MacOS</A></LI> |
|---|
| 138 | <LI><A HREF="#Win32">Windows 95/98/NT</A></LI> |
|---|
| 139 | <LI><A HREF="#VMS">VMS</A></LI> |
|---|
| 140 | <LI><A HREF="#MPI">Parallel TREE-PUZZLE</A></LI> |
|---|
| 141 | </UL> |
|---|
| 142 | </LI> |
|---|
| 143 | |
|---|
| 144 | <LI><A HREF="#Introduction">Introduction</A></LI> |
|---|
| 145 | <LI><A HREF="#Input/Output Conventions">Input/Output Conventions</A> |
|---|
| 146 | <UL> |
|---|
| 147 | <LI><A HREF="#Sequence Input">Sequence Input</A></LI> |
|---|
| 148 | <LI><A HREF="#General Output">General Output</A></LI> |
|---|
| 149 | <LI><A HREF="#Distance Output">Distance Output</A></LI> |
|---|
| 150 | <LI><A HREF="#Tree Output">Tree Output</A></LI> |
|---|
| 151 | <LI><A HREF="#Tree Input">Tree Input</A></LI> |
|---|
| 152 | <LI><A HREF="#Likelihood Mapping Output">Likelihood Mapping Output</A></LI> |
|---|
| 153 | </UL> |
|---|
| 154 | </LI> |
|---|
| 155 | |
|---|
| 156 | <LI><A HREF="#Quick Start">Quick Start</A></LI> |
|---|
| 157 | <LI><A HREF="#Models of Sequence Evolution">Models of Sequence Evolution</A> |
|---|
| 158 | <UL> |
|---|
| 159 | <LI><A HREF="#Models of Substitution">Models of Substitution</A></LI> |
|---|
| 160 | <LI><A HREF="#Models of Rate Heterogeneity">Models of Rate Heterogeneity</A></LI> |
|---|
| 161 | </UL> |
|---|
| 162 | </LI> |
|---|
| 163 | |
|---|
| 164 | <LI><A HREF="#Options Available">Available Options</A></LI> |
|---|
| 165 | <LI><A HREF="#Other Features">Other Features</A></LI> |
|---|
| 166 | <LI><A HREF="#Interpretation and Hints">Interpretation and Hints</A> |
|---|
| 167 | <UL> |
|---|
| 168 | <LI><A HREF="#Quartet Puzzling Support Values">Quartet Puzzling Support Values</A></LI> |
|---|
| 169 | <LI><A HREF="#Percentage of Unresolved Quartets">Percentage of Unresolved Quartets</A></LI> |
|---|
| 170 | <LI><A HREF="#Automatic Parameter Estimation">Automatic Parameter Estimation</A></LI> |
|---|
| 171 | <LI><A HREF="#Likelihood Mapping">Likelihood Mapping</A></LI> |
|---|
| 172 | <LI><A HREF="#Batch Mode">Batch Mode</A></LI> |
|---|
| 173 | </UL> |
|---|
| 174 | </LI> |
|---|
| 175 | <LI><A HREF="#Limits and Error Messages">Limits and Error Messages</A></LI> |
|---|
| 176 | <LI><A HREF="#Are Quartets Reliable">Are Quartets Reliable?</A></LI> |
|---|
| 177 | <LI><A HREF="#Other Programs">Other Programs</A></LI> |
|---|
| 178 | <LI><A HREF="#Acknowledgements">Acknowledgements</A></LI> |
|---|
| 179 | <LI><A HREF="#References">References</A></LI> |
|---|
| 180 | <LI><A HREF="#Known Bugs">Known Bugs</A></LI> |
|---|
| 181 | <LI><A HREF="#Version History">Version History</A></LI> |
|---|
| 182 | </UL> |
|---|
| 183 | |
|---|
| 184 | <HR> |
|---|
| 185 | <H2> |
|---|
| 186 | <A NAME="Legal Stuff"></A>Legal Stuff</H2> |
|---|
| 187 | TREE-PUZZLE 5.0 is (c) 1999-2000 Heiko A. Schmidt, Korbinian Strimmer, Martin Vingron, and Arndt von Haeseler.<BR> |
|---|
| 188 | Earlier PUZZLE versions were (c) 1995-1999 by Korbinian Strimmer and Arndt von Haeseler.<BR> |
|---|
| 189 | The software and its accompanying documentation are provided as |
|---|
| 190 | is, without guarantee of support or maintenance. The whole package is |
|---|
| 191 | licensed under the GNU public license, except for the parts indicated in |
|---|
| 192 | the sources where the copyright of the authors does not apply. Please see |
|---|
| 193 | <A |
|---|
| 194 | HREF="http://www.opensource.org/licenses/gpl-license.html">http://www.opensource.org/licenses/gpl-license.html</A> for details. |
|---|
| 195 | |
|---|
| 196 | <H2> |
|---|
| 197 | <A NAME="Installation"></A>Installation</H2> |
|---|
| 198 | The source code of the TREE-PUZZLE software is 100% identical across platforms. |
|---|
| 199 | However, installation procedures differ. |
|---|
| 200 | |
|---|
| 201 | <H3> |
|---|
| 202 | <A NAME="Unix"></A>UNIX</H3> |
|---|
| 203 | Get the file <B>tree-puzzle-5.0.tar</B>. If you received a compressed tar file |
|---|
| 204 | (<B>tree-puzzle-5.0.tar.Z</B> or <B>tree-puzzle-5.0.tar.gz</B>) you have to decompress |
|---|
| 205 | it first (using the "uncompress" or "gunzip" command). Then untar the file |
|---|
| 206 | with |
|---|
| 207 | <PRE> tar xvf tree-puzzle-5.0.tar</PRE> |
|---|
| 208 | The newly created directory "tree-puzzle-5.0" contains four subdirectories called |
|---|
| 209 | "doc", "data", "bin", and "src". The "doc" directory |
|---|
| 210 | contains this manual in HTML format. The "data" |
|---|
| 211 | directory contains example input files. The "src" directory contains the |
|---|
| 212 | ANSI C sources of TREE-PUZZLE. Switch to this directory by typing |
|---|
| 213 | <PRE> cd tree-puzzle-5.0</PRE> |
|---|
| 214 | To compile we recommend the GNU gcc (or GNU egcs) compiler. If gcc is installed |
|---|
| 215 | just type |
|---|
| 216 | <PRE> sh ./configure</PRE> |
|---|
| 217 | <PRE> make</PRE> |
|---|
| 218 | <PRE> make install</PRE> |
|---|
| 219 | and the executable <TT>puzzle</TT> is compiled and put into the <TT>/usr/local/bin</TT> directory. |
|---|
| 220 | If you want to have <TT>puzzle</TT> installed into another directory you can set this |
|---|
| 221 | by setting the <TT>--prefix=/name/of/the/wanted/directory</TT> directive at the |
|---|
| 222 | <TT>sh ./configure</TT> command line. |
|---|
| 223 | The parallel version should have been built and installed as well, if <TT>configure</TT> |
|---|
| 224 | found a known MPI compiler (cf. <A HREF="#MPI">Parallel TREE-PUZZLE</A> section). |
|---|
| 225 | |
|---|
| 226 | |
|---|
| 227 | Then type |
|---|
| 228 | <PRE> make clean</PRE> |
|---|
| 229 | and everything will be nicely cleaned up. |
|---|
| 230 | |
|---|
| 231 | If your compiler is not the GNU gcc compiler and not found by <TT>configure</TT> you will have to |
|---|
| 232 | modify that, by setting the <TT>CC</TT> variable (e.g. <TT>setenv CC cc</TT> under <TT>csh</TT> or |
|---|
| 233 | <TT>CC=cc; export CC</TT> under <TT>sh</TT>) before running <TT>sh ./configure</TT>. |
|---|
| 234 | If you still cannot compile properly then your compiler or its runtime library |
|---|
| 235 | is most probably not ANSI compliant (e.g., old SUN compilers). In most |
|---|
| 236 | cases, however, you will succeed to compile by changing some parameters |
|---|
| 237 | in the "makefile". Ask your local Unix expert for help. |
|---|
| 238 | |
|---|
| 239 | <H3> |
|---|
| 240 | <A NAME="MacOS"></A>MacOS</H3> |
|---|
| 241 | Get the file <B>tree-puzzle-5.0.hqx</B>. After decoding this BinHex file (this |
|---|
| 242 | is done automatically on a properly installed system, otherwise use programs |
|---|
| 243 | like "StuffIt Expander" or ask your local Mac expert) you will find a folder |
|---|
| 244 | called "tree-puzzle-5.0" on your hard disk. This folder contains the four subfolders |
|---|
| 245 | "doc", "data", "bin", and "src". The "doc" folder contains |
|---|
| 246 | this manual in HTML format. The "data" folder contains |
|---|
| 247 | example input files. The "bin" folder contains a Macintosh PPC executable |
|---|
| 248 | with a default memory partition of 3000K. |
|---|
| 249 | There is no 68k executable. <u>If you get a memory allocation error while running |
|---|
| 250 | TREE-PUZZLE you have to increase TREE-PUZZLEŽs memory partition with the "Get Info" command |
|---|
| 251 | of the Macintosh Finder</u>. The "src" folder contains the ANSI C sources of TREE-PUZZLE. |
|---|
| 252 | |
|---|
| 253 | <P>The MacOS executables have been compiled for the PowerMac using Metrowerks CodeWarrior. |
|---|
| 254 | |
|---|
| 255 | <P>Note: It is probably a good idea to install PPC Linux (or MkLinux) on your Macintosh. |
|---|
| 256 | TREE-PUZZLE (as any other program) runs 20-50% faster under Linux compared to the |
|---|
| 257 | same program under MacOS (on the same machine!), and the Mac does not freeze |
|---|
| 258 | during execution because of LinuxŽs multitasking capabilities (maybe this changes in MacOS X). |
|---|
| 259 | |
|---|
| 260 | |
|---|
| 261 | <H3> |
|---|
| 262 | <A NAME="Win32"></A>Windows 95/98/NT</H3> |
|---|
| 263 | |
|---|
| 264 | Get the file <B>tree-puzzle-5.0.zip</B>. After uncompressing (using, e.g., WinZip |
|---|
| 265 | or a similar tool) a directory "tree-puzzle-5.0" is created containing |
|---|
| 266 | four subdirectories called "doc", "data", "bin", and "src". The "doc" directory |
|---|
| 267 | contains this manual in HTML format. The "data" |
|---|
| 268 | directory contains example input files. The "src" directory contains the |
|---|
| 269 | ANSI C sources of TREE-PUZZLE. The "bin" directory contains the executable |
|---|
| 270 | <TT>puzzle.exe</TT>. To use TREE-PUZZLE the system path to the executable |
|---|
| 271 | needs to be set correctly. Ask your local Windows expert for help. |
|---|
| 272 | |
|---|
| 273 | <P>The executable has been compiled using |
|---|
| 274 | Microsoft Visual C++ and the "makefile.w32" (contained in "src"). |
|---|
| 275 | |
|---|
| 276 | <P>If you have a Linux partition on your PC we recommend |
|---|
| 277 | to install and use TREE-PUZZLE under Linux (see <A HREF="#Unix">Unix</A> section) because it runs |
|---|
| 278 | TREE-PUZZLE significantly faster than Windows. |
|---|
| 279 | |
|---|
| 280 | <H3> |
|---|
| 281 | <A NAME="VMS"></A>VMS</H3> |
|---|
| 282 | |
|---|
| 283 | |
|---|
| 284 | <P>Get the Unix sources and install the package on your computer |
|---|
| 285 | (ask your local VMS expert for help). Go to the subdirectory |
|---|
| 286 | "src" and compile TREE-PUZZLE using the command file "makefile.com". |
|---|
| 287 | |
|---|
| 288 | <H3> |
|---|
| 289 | <A NAME="MPI"></A>Parallel TREE-PUZZLE</H3> |
|---|
| 290 | |
|---|
| 291 | |
|---|
| 292 | <P>To compile and run the parallelized TREE-PUZZLE you need an implementation |
|---|
| 293 | of the Message Passing Interface (MPI) library, a widely used |
|---|
| 294 | message passing library standard. Implementations of the MPI libraries |
|---|
| 295 | are available for almost all parallel platforms and computer systems, |
|---|
| 296 | and there are free implementations for most platforms as well. |
|---|
| 297 | |
|---|
| 298 | <P>To find an MPI implementation suitable for your platform visit |
|---|
| 299 | the following web sites: |
|---|
| 300 | <UL> |
|---|
| 301 | <LI><A HREF="http://www-unix.mcs.anl.gov/mpi/implementations.html">http://www-unix.mcs.anl.gov/mpi/implementations.html</A> |
|---|
| 302 | <LI><A HREF="http://WWW.ERC.MsState.Edu/labs/hpcl/projects/mpi/implementations.html">http://WWW.ERC.MsState.Edu/labs/hpcl/projects/mpi/implementations.html</A> |
|---|
| 303 | <LI><A HREF="http://www.mpi.nd.edu/MPI/">http://www.mpi.nd.edu/MPI/</A> |
|---|
| 304 | </UL> |
|---|
| 305 | |
|---|
| 306 | Although MPI is also available on Macintosh and Windows systems, |
|---|
| 307 | the developers never ran the parallel version on those |
|---|
| 308 | platforms. |
|---|
| 309 | |
|---|
| 310 | <P>To install the parallel version of TREE-PUZZLE you need the |
|---|
| 311 | Unix sources for TREE-PUZZLE and install the package on your computer |
|---|
| 312 | as described above. |
|---|
| 313 | The <TT>configure</TT> should configure the Makefiles apropriately. |
|---|
| 314 | If there is no known MPI compiler found on the system the parallel |
|---|
| 315 | version is not configured. |
|---|
| 316 | (If problems occur ask your local system administrator for help.) |
|---|
| 317 | |
|---|
| 318 | <P>Than you should be able to compile the parallel version of TREE-PUZZLE |
|---|
| 319 | using the following commands: |
|---|
| 320 | <PRE> sh ./configure</PRE> |
|---|
| 321 | <PRE> make</PRE> |
|---|
| 322 | <PRE> make install</PRE> |
|---|
| 323 | and the executable <TT>ppuzzle</TT> is compiled and put into the <TT>/usr/local/bin</TT> directory. |
|---|
| 324 | If you want to have the executable installed into another directory please proceede as |
|---|
| 325 | described in the <A HREF="#Unix">Unix</A> section. |
|---|
| 326 | |
|---|
| 327 | If your compiler is non out of <TT>mpcc</TT> (IBM), <TT>hcc</TT> (LAM), |
|---|
| 328 | <TT>mpicc_lam</TT> (LAM under LINUX), <TT>mpicc_mpich</TT> (MPICH under LINUX), |
|---|
| 329 | and <TT>mpicc</TT> (LAM, MPICH, HP-UX, etc.) and not found by <TT>configure</TT> you will have to |
|---|
| 330 | modify that by setting the <TT>MPICC</TT> variable (e.g. <TT>setenv MPICC /another/mpicc</TT> |
|---|
| 331 | under <TT>csh</TT> or <TT>MPICC=/another/mpicc; export MPICC</TT> under <TT>sh</TT>) |
|---|
| 332 | before running <TT>sh ./configure</TT>. |
|---|
| 333 | |
|---|
| 334 | The way you have to start <TT>ppuzzle</TT> depends on the MPI implementation |
|---|
| 335 | installed. So please refer to your MPI manual or ask your local MPI expert |
|---|
| 336 | for help. |
|---|
| 337 | |
|---|
| 338 | <P><B>Note:</B> |
|---|
| 339 | <BR>The parallelization of the tree reconstruction method follows a |
|---|
| 340 | master-worker-concept, i.e., a master process handles the scheduling of |
|---|
| 341 | the computation to the <em>n</em> worker processes, while the worker processes are |
|---|
| 342 | doing almost all the computation work of evaluating the quartets and |
|---|
| 343 | constructing the puzzling step trees. |
|---|
| 344 | |
|---|
| 345 | <BR>Since the master process does not require a lot of CPU time, |
|---|
| 346 | it can be scheduled sharing one processor with a worker process. |
|---|
| 347 | Thus, you can run <TT>ppuzzle</TT> by assigning <em>n+1</em> processes. |
|---|
| 348 | |
|---|
| 349 | <BR>If you want to evaluate a usertree or perform likelihood |
|---|
| 350 | mapping analysis it is not recommended to do a parallel run, because all |
|---|
| 351 | the computation will be done by the master process. Hence a run of the |
|---|
| 352 | sequential version of TREE-PUZZLE is more appropriate for usertree or likelihood |
|---|
| 353 | mapping analysis. |
|---|
| 354 | |
|---|
| 355 | <H2> |
|---|
| 356 | <A NAME="Introduction"></A>Introduction</H2> |
|---|
| 357 | TREE-PUZZLE is an ANSI C application to reconstruct phylogenetic trees from |
|---|
| 358 | molecular sequence data by maximum likelihood. It implements a fast tree |
|---|
| 359 | search algorithm, quartet puzzling, that allows analysis of large data |
|---|
| 360 | sets and automatically assigns estimations of support to each internal |
|---|
| 361 | branch. Rate heterogeneity (invariable sites plus Gamma distributed rates) |
|---|
| 362 | is incorporated in all models of substitution available (nucleotides: SH, |
|---|
| 363 | TN, HKY, F84, and submodels; amino acids: Dayhoff, JTT, mtREV24, BLOSUM |
|---|
| 364 | 62, VT, and WAG; two-state data: F81). All parameters including rate heterogeneity can |
|---|
| 365 | be estimated from the data by maximum likelihood approaches. TREE-PUZZLE also |
|---|
| 366 | computes pairwise maximum likelihood distances as well as branch lengths |
|---|
| 367 | for user specified trees. In addition, TREE-PUZZLE offers a novel method, likelihood |
|---|
| 368 | mapping, to investigate the support of internal branches without computing |
|---|
| 369 | an overall tree. |
|---|
| 370 | <H2> |
|---|
| 371 | <A NAME="Input/Output Conventions"></A>Input/Output Conventions</H2> |
|---|
| 372 | |
|---|
| 373 | A few things of the name conventions have changed compared to |
|---|
| 374 | earlier (< 5.0) PUZZLE releases. From version 5.0 onwards |
|---|
| 375 | names of the sequence input file and the usertree file can be specified |
|---|
| 376 | at the command line (e.g. '<TT>puzzle infilename intreename</TT>', |
|---|
| 377 | where <TT>infilename</TT> is the name of the sequence file and <TT>intreename</TT> |
|---|
| 378 | is the name of the usertree file). |
|---|
| 379 | If only the input filename or no |
|---|
| 380 | filename is given at the command line the TREE-PUZZLE software searches |
|---|
| 381 | for input files named "<TT>infile</TT>" and/or "<TT>intree</TT>" respectively. |
|---|
| 382 | |
|---|
| 383 | <P>The naming conventions of the output files have changed as well. |
|---|
| 384 | As prefix of the output filenames the name of the sequence input file |
|---|
| 385 | (or the usertree file in the usertree analysis case) is used and an |
|---|
| 386 | extension added to denote the content of the file. If no input filename |
|---|
| 387 | is given at the command line the default filenames of the earlier |
|---|
| 388 | versions are used. |
|---|
| 389 | |
|---|
| 390 | The following extensions/default filenames are possible: |
|---|
| 391 | <DL><DT><DD> |
|---|
| 392 | <TABLE><TR><TD><B>Extension</B></TD><TD><B>default filename</B></TD><TD><B>file content</B></TD></TR> |
|---|
| 393 | <TR><TD><TT>.puzzle </TT></TD><TD><TT>outfile </TT></TD><TD>for the TREE-PUZZLE report</TD></TR> |
|---|
| 394 | <TR><TD><TT>.dist </TT></TD><TD><TT>outdist </TT></TD><TD>for the ML distances</TD></TR> |
|---|
| 395 | <TR><TD><TT>.tree </TT></TD><TD><TT>outtree </TT></TD><TD>for the final tree(s)</TD></TR> |
|---|
| 396 | <TR><TD><TT>.qlist </TT></TD><TD><TT>outqlist </TT></TD><TD>for the list of unresolved quartets</TD></TR> |
|---|
| 397 | <TR><TD><TT>.ptorder</TT></TD><TD><TT>outptorder </TT></TD><TD>for the list of unique puzzling step tree topologies</TD></TR> |
|---|
| 398 | <TR><TD><TT>.pstep </TT></TD><TD><TT>outpstep </TT></TD><TD>for the list of puzzling step tree topologies in chronological order</TD></TR> |
|---|
| 399 | <TR><TD><TT>.eps </TT></TD><TD><TT>outlm.eps </TT></TD><TD>for the EPS file generated in the likelihood mapping analysis</TD></TR> |
|---|
| 400 | </TABLE></DL> |
|---|
| 401 | |
|---|
| 402 | The file types are described in detail below. In the following |
|---|
| 403 | "INFILENAME" denotes the prefix, which is the sequence input filename |
|---|
| 404 | or the usertree filename respectively. |
|---|
| 405 | |
|---|
| 406 | <H3> |
|---|
| 407 | <A NAME="Sequence Input"></A>Sequence Input</H3> |
|---|
| 408 | TREE-PUZZLE requests sequence input in PHYLIP INTERLEAVED format (sometimes |
|---|
| 409 | also called PHYLIP 3.4 format). Many sequence editors and alignment programs |
|---|
| 410 | (e.g., CLUSTAL W) output data in this format. The "data" directory |
|---|
| 411 | contains four example input files ("globin.a", "marswolf.n", "atp6.a", |
|---|
| 412 | "primates.b") that can be used as templates for own data files. |
|---|
| 413 | The default name of the sequence input file is "infile", if no |
|---|
| 414 | input filename is given at the command line. |
|---|
| 415 | If an "infile" or a file with the given name is not present TREE-PUZZLE |
|---|
| 416 | will request an alternative file name. Sequences names in the |
|---|
| 417 | input file are allowed to contain blanks but all blanks will internally |
|---|
| 418 | be converted to underscores "_". Sequences can be in upper or lower case, |
|---|
| 419 | any spaces or control characters are ignored. The dot "." is recognized |
|---|
| 420 | as character matching to the first sequence, it can be used in all sequences except in the |
|---|
| 421 | first sequence. Valid symbols for nucleotides are A, C, G, T and |
|---|
| 422 | U, and for amino acids A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, |
|---|
| 423 | T, V, W, and Y. All other visible characters (including gaps, question |
|---|
| 424 | marks etc.) are treated as N (DNA/RNA) or X (amino acids). For two-state |
|---|
| 425 | data the symbols 0 and 1 are allowed. The first sequence in the data set is |
|---|
| 426 | considered the default outgroup. |
|---|
| 427 | <H3> |
|---|
| 428 | <A NAME="General Output"></A>General Output</H3> |
|---|
| 429 | All results are written to the TREE-PUZZLE report file (INFILENAME.puzzle or |
|---|
| 430 | outfile). If the option "List all unresolved quartets" is invoked a file |
|---|
| 431 | called "INFILENAME.qlist"/"outqlist" is created showing all these quartets. |
|---|
| 432 | If the option "List puzzling step trees" is set accordingly the files |
|---|
| 433 | "INFILENAME.pstep"/"outpstep" and/or "INFILENAME.ptorder"/"outptorder" are |
|---|
| 434 | generated. |
|---|
| 435 | |
|---|
| 436 | <P>The "INFILENAME.ptorder"/"outptorder" file contains the unique tree |
|---|
| 437 | topologies in PHYLIP format preceded by PHYLIP-format comment (in parenthesis). |
|---|
| 438 | A typical line in the ptorder file looks like this: |
|---|
| 439 | |
|---|
| 440 | <P><TT>[ 2. 60 6.00 2 5 1000 ](chicken,((cat,(horse,(mouse,rat))),(opossum,platypus)));</TT></P> |
|---|
| 441 | |
|---|
| 442 | The entries (separated by single blanks) in the parenthesis mean the following: |
|---|
| 443 | <UL> |
|---|
| 444 | <LI><B>2.</B> - Topology occurs second-most among all |
|---|
| 445 | intermediate tree topologies (= order number). |
|---|
| 446 | <LI><B>60</B> - Topology occurs 60 times. |
|---|
| 447 | <LI><B>6.00</B> - Topology occurs in 6.00 % of the intermediate tree topologies. |
|---|
| 448 | <LI><B>2</B> - unique topology ID (needed for the pstep file) |
|---|
| 449 | <LI><B>5</B> - Sum of uniquely occuring topologies. |
|---|
| 450 | <LI><B>1000</B> - Sum of intermediate trees estimated during the analysis. |
|---|
| 451 | </UL> |
|---|
| 452 | |
|---|
| 453 | <P>The "INFILENAME.pstep"/"outpstep" file contains a log of the |
|---|
| 454 | puzzling steps performed and the occuring tree topologies. |
|---|
| 455 | |
|---|
| 456 | A typical line in the pstep file contains the following entries |
|---|
| 457 | (separated by tabstops): |
|---|
| 458 | |
|---|
| 459 | <P><TT>"6. 55 698 3 5 828"</TT></P> |
|---|
| 460 | |
|---|
| 461 | The entries in the rows mean the following: |
|---|
| 462 | <UL> |
|---|
| 463 | <LI><B>6.</B> - 6th block of intermediate trees performed. |
|---|
| 464 | <LI><B>55</B> - number of intermediate trees inferred in this block. |
|---|
| 465 | <LI><B>698</B> - occurances of this topology so far. |
|---|
| 466 | <LI><B>3</B> - unique topology ID (for lookup in the ptorder file). |
|---|
| 467 | <LI><B>5</B> - number unique topologies occurred so far. |
|---|
| 468 | <LI><B>828</B> - number of puzzling step performed so far. |
|---|
| 469 | </UL> |
|---|
| 470 | In the case of a sequential run (<TT>puzzle</TT>) the entries of this |
|---|
| 471 | file are more resolved, because every block consists of one intermediate tree. |
|---|
| 472 | |
|---|
| 473 | <H3> |
|---|
| 474 | <A NAME="Distance Output"></A>Distance Output</H3> |
|---|
| 475 | TREE-PUZZLE automatically computes pairwise maximum likelihood distances for |
|---|
| 476 | all the sequences in the data file. They are written in the TREE-PUZZLE report |
|---|
| 477 | file "INFILENAME.puzzle"/"outfile" and in the separate file |
|---|
| 478 | "INFILENAME.dist"/"outdist". The format of distance file is PHYLIP compatible |
|---|
| 479 | (i.e. it can directly be used as input for PHYLIP distance-based programs |
|---|
| 480 | such as "neighbor"). |
|---|
| 481 | <H3> |
|---|
| 482 | <A NAME="Tree Output"></A>Tree Output</H3> |
|---|
| 483 | The quartet puzzling tree with its support values |
|---|
| 484 | and with maximum likelihood branch lengths is displayed as ASCII drawing |
|---|
| 485 | in the TREE-PUZZLE report in "INFILENAME.puzzle"/"outfile". The same tree |
|---|
| 486 | is written into the "INFILENAME.tree"/"outtree" file in CLUSTAL W format. |
|---|
| 487 | If clock-like maximum-likelihood branch lengths are computed |
|---|
| 488 | there will be both an unrooted and a rooted tree in the |
|---|
| 489 | "INFILENAME.puzzle"/"outfile". The tree convention follows the NEWICK format |
|---|
| 490 | (as implemented in PHYLIP or CLUSTAL W): the tree topology is described |
|---|
| 491 | by the usual round brackets |
|---|
| 492 | <TT>(a,b,(c,d));</TT> |
|---|
| 493 | where branch lengths are written after the colon a:0.22,b:0.33. |
|---|
| 494 | Support values for each branch |
|---|
| 495 | are displayed as internal node labels, i.e., they follow directly after each |
|---|
| 496 | node before the branch length to each node. Here is an example: |
|---|
| 497 | |
|---|
| 498 | <P>(Gibbon:0.1393, ((Human:0.0414, Chimpanzee:0.0538)99:0.0175, Gorilla:0.0577)98:0.0531, |
|---|
| 499 | Orangutan:0.1003); |
|---|
| 500 | |
|---|
| 501 | <P>The likelihood value of each tree is added in parenthesis before |
|---|
| 502 | the tree string (e.g. "[ lh=-1621.201605 ]"). Parenthesis mark comments |
|---|
| 503 | in the Newick or PHYLIP tree format. In some cases the |
|---|
| 504 | comment has to be removed before using them with other programs. |
|---|
| 505 | |
|---|
| 506 | <P>With the programs |
|---|
| 507 | <a href="http://taxonomy.zoology.gla.ac.uk/rod/treeview.html">TreeView</a> and |
|---|
| 508 | <a href="ftp://rdp.life.uiuc.edu/pub/RDP/programs/TreeTool/">TreeTool</a> |
|---|
| 509 | it is possible to view a tree both |
|---|
| 510 | with its branch lengths and simultaneously with the support values for the internal |
|---|
| 511 | branches (here 98% and 99%). Note, the PHYLIP programs DRAWTREE and DRAWGRAM may |
|---|
| 512 | also be used with the CLUSTAL W treefile format. However, in the current version |
|---|
| 513 | (3.5) they ignore the internal labels and simply print the tree |
|---|
| 514 | topology along with branch lengths. |
|---|
| 515 | |
|---|
| 516 | <H3> |
|---|
| 517 | <A NAME="Tree Input"></A>Tree Input</H3> |
|---|
| 518 | TREE-PUZZLE optionally also reads input trees. The default name for the file |
|---|
| 519 | containing the input tree is "intree", if not given at the command line, |
|---|
| 520 | but if you choose the input tree option and there is no file with the |
|---|
| 521 | given name or "intree" present you will be prompted for an alternative |
|---|
| 522 | name. The format of the input trees is identical to the trees in the |
|---|
| 523 | "INFILENAME.tree"/"outtree" file. |
|---|
| 524 | However, it is sufficient to provide the tree topology only, you |
|---|
| 525 | don't need to specify branch lengths (that are ignored anyway) or |
|---|
| 526 | internal labels (that are read, stored, and written back to the |
|---|
| 527 | "INFILENAME.tree"/"outtree" file). |
|---|
| 528 | The input trees needs not to be unrooted, they can also be rooted. It is |
|---|
| 529 | important that sequence names in the input tree file do not contain blanks |
|---|
| 530 | (use underscores!). The trees can be multifurcating. |
|---|
| 531 | The format of the tree input file is easy: just put the |
|---|
| 532 | trees into the file. TREE-PUZZLE counts the ';' at the end of each tree description |
|---|
| 533 | to determine how many input trees there are. Any header (e.g., with the |
|---|
| 534 | number of trees) is ignored (this is useful in conjunction with programs |
|---|
| 535 | like MOLPHY that need this header). If there is more than one tree TREE-PUZZLE |
|---|
| 536 | performs the Kishino-Hasegawa test. |
|---|
| 537 | <H3> |
|---|
| 538 | <A NAME="Likelihood Mapping Output"></A>Likelihood Mapping Output</H3> |
|---|
| 539 | TREE-PUZZLE also offers likelihood mapping analysis, a method to investigate |
|---|
| 540 | support for internal branches of a tree without computing an overall tree |
|---|
| 541 | and to graphically visualize |
|---|
| 542 | phylogenetic content of a sequence alignment. The results of likelihood |
|---|
| 543 | mapping are written in ASCII to the "INFILENAME.puzzle"/"outfile" as well |
|---|
| 544 | as to a file called "INFILENAME.eps" or "outlm.eps" respectively. |
|---|
| 545 | This file contains in encapsulated Postscript format (EPSF) |
|---|
| 546 | a picture of the triangle that forms the basis of the likelihood mapping |
|---|
| 547 | analysis. You may print it out on a Postscript capable printer or view |
|---|
| 548 | it with a suitable program. The "INFILENAME.eps"/"outlm.eps" file can be |
|---|
| 549 | edited by hand (it is plain ASCII text!) or by drawing programs that |
|---|
| 550 | understand the Postcript language (e.g., Adobe Ilustrator). |
|---|
| 551 | <H2> |
|---|
| 552 | <A NAME="Quick Start"></A>Quick Start</H2> |
|---|
| 553 | Prepare your sequence input file and, optionally, your tree input |
|---|
| 554 | file. Then start the TREE-PUZZLE program. TREE-PUZZLE will choose |
|---|
| 555 | automatically the nucleotide or the amino acid mode. If more than 85% of |
|---|
| 556 | the characters (not counting the - and ?) in the sequences are A, C, G, |
|---|
| 557 | T, U or N, it will be assumed that the sequences consists of nucleotides. |
|---|
| 558 | If your data set contains amino acids TREE-PUZZLE suggests whether you have |
|---|
| 559 | amino acids encoded on mtDNA or on nuclear DNA, and selects the appropriate |
|---|
| 560 | model of amino acid evolution. If your data set contains nucleotides the |
|---|
| 561 | default model of sequence evolution chosen is the HKY model. Parameters |
|---|
| 562 | need not to be specified, they will be estimated by a maximum likelihood |
|---|
| 563 | procedure from the data. If TREE-PUZZLE detects a usertree file stated at the |
|---|
| 564 | command line or one called "intree" it automatically switches to the input |
|---|
| 565 | tree mode. |
|---|
| 566 | |
|---|
| 567 | <P>Then, a menu (PHYLIP "look and feel") appears with default options set. |
|---|
| 568 | It is possible to change all available options. For example, if you want |
|---|
| 569 | to incorporate rate heterogeneity you have to select option "w" as rate |
|---|
| 570 | heterogeneity is switched off by default. Then type "y" at the input prompt |
|---|
| 571 | and start the analysis. You will see a number of status messages on the |
|---|
| 572 | screen during computation. When the analysis is finished all output files |
|---|
| 573 | (e.g., "outfile", "outtree", "outdist", "outqlist", "outlm.eps", "outpstep", |
|---|
| 574 | "outptlist" or "INFILENAME.puzzle", "INFILENAME.tree", "INFILENAME.dist", |
|---|
| 575 | "INFILENAME.qlist", "INFILENAME.eps", "INFILENAME.pstep", "INFILENAME.ptorder") |
|---|
| 576 | will be in the same directory as the input files. |
|---|
| 577 | |
|---|
| 578 | <P>To obtain a high quality picture of the output tree (including node labels) |
|---|
| 579 | you might want to use use the TreeView program by Roderic Page. It is |
|---|
| 580 | available free of charge and runs on MacOS and MS-Windows. It can be retrieved |
|---|
| 581 | from <A HREF="http://taxonomy.zoology.gla.ac.uk/rod/treeview.html">http://taxonomy.zoology.gla.ac.uk/rod/treeview.html</A>. |
|---|
| 582 | TreeView understands the CLUSTAL W treefile conventions, reads multifurcating |
|---|
| 583 | trees and is able to simultaneously display branch lengths and support values |
|---|
| 584 | for each branch. Open the "INFILENAME.tree"/"outtree" file with TreeView, |
|---|
| 585 | choose "Phylogram" to draw branch lengths, and select "Show internal edge |
|---|
| 586 | labels". |
|---|
| 587 | |
|---|
| 588 | <P>On a Unix you can use the TreeTool program to display and |
|---|
| 589 | manipulate TREE-PUZZLE trees (See <A HREF="ftp://rdp.life.uiuc.edu/pub/RDP/programs/TreeTool/">ftp://rdp.life.uiuc.edu/pub/RDP/programs/TreeTool</A> |
|---|
| 590 | for precompiled Sun executables. A version that runs on Linux has been prepared by |
|---|
| 591 | <A HREF="mailto:cato@biochem.kth.se">Anders Holmberg</A> from the Dept. of Biochemistry at |
|---|
| 592 | the Royal Institute of Technology, Stockholm). |
|---|
| 593 | |
|---|
| 594 | <H2> |
|---|
| 595 | <A NAME="Models of Sequence Evolution"></A>Models of Sequence Evolution</H2> |
|---|
| 596 | Here we give a brief overview over the models implemented in TREE-PUZZLE. Formulas |
|---|
| 597 | are written in TeX style. |
|---|
| 598 | <H3> |
|---|
| 599 | <A NAME="Models of Substitution"></A>Models of Substitution</H3> |
|---|
| 600 | The substitution process is modelled as reversible time homogeneous stationary |
|---|
| 601 | Markov process. If the corresponding stationary nucleotide (amino acid) |
|---|
| 602 | frequencies are denoted pi_i the most general rate matrix for the transition |
|---|
| 603 | from nucleotide (amino acid) i to j can be written as |
|---|
| 604 | <PRE> |
|---|
| 605 | | Q_{ij} pi_j for i != j |
|---|
| 606 | R_{ij} = | |
|---|
| 607 | | - Sum_m Q_{im} pi_m for i == j |
|---|
| 608 | </PRE> |
|---|
| 609 | The matrix Q_{ij} is symmetric with Q_{ii} == 0 (diagonals are zero). For |
|---|
| 610 | nucleotides the most general model built into TREE-PUZZLE is the Tamura-Nei |
|---|
| 611 | model (TN, <A HREF="#tamura1993">Tamura and Nei</A>, 1993). |
|---|
| 612 | The matrix Q_{ij} for this model equals |
|---|
| 613 | <PRE> |
|---|
| 614 | | 4*t*gamma/(gamma+1) for i -> j pyrimidine transition |
|---|
| 615 | | |
|---|
| 616 | Q_{ij} = | 4*t/(gamma+1) for i -> j purine transition |
|---|
| 617 | | |
|---|
| 618 | | 1 for i -> j transversion |
|---|
| 619 | </PRE> |
|---|
| 620 | The parameter gamma is called the "Y/R transition parameter" whereas t |
|---|
| 621 | is the "Transition/transversion parameter". If gamma is equal to 1 we |
|---|
| 622 | get the HKY model (<A HREF="#hasegawa1985">Hasegawa et al.</A>, 1985). |
|---|
| 623 | Note, the ratio of the transition and transversion |
|---|
| 624 | rates (without frequencies) is kappa = 2*t. There is a subtle but important |
|---|
| 625 | difference between the <I>transition-transversion parameter</I>, the |
|---|
| 626 | <I>expected transition-transversion ratio</I>, and the <I>observed |
|---|
| 627 | transition transversion ratio</I>. |
|---|
| 628 | The <I>transition-transversion parameter</I> simply is a parameter in the |
|---|
| 629 | rate matrix. The <I>expected transition-transversion ratio</I> is the ratio of |
|---|
| 630 | actually occurring transitions to actually occurring transversions taking |
|---|
| 631 | into account nucleotide frequencies in the alignment. Due to saturation |
|---|
| 632 | and multiple hits not all substitutions are observable. Thus, the <I>observed |
|---|
| 633 | transition-transversion ratio</I> counts observable transitions and transversions |
|---|
| 634 | only. If the base frequencies in the HKY model are homogeneous (pi_i = |
|---|
| 635 | 0.25) HKY further reduces to the Kimura model. In this case t is identical |
|---|
| 636 | to the expected transition/transversion ratio. If t is set to 0.5 the Jukes-Cantor |
|---|
| 637 | model is obtained. The F84 model (as implemented in the various PHYLIP |
|---|
| 638 | programs, <A HREF="#felsenstein1984">Felsenstein</A>, 1984) |
|---|
| 639 | is a special case of the Tamura-Nei model. |
|---|
| 640 | |
|---|
| 641 | <P>For amino acids the matrix Q_{ij} is fixed and does not contain any free |
|---|
| 642 | parameters. Depending on the type of input data four different Q_{ij} matrices |
|---|
| 643 | are available in TREE-PUZZLE. |
|---|
| 644 | The Dayhoff (<A HREF="#dayhoff1978">Dayhoff et al.</A>, 1978) and |
|---|
| 645 | JTT (<A HREF="#jones1992">Jones et al.</A>, 1992) matrices are for use with |
|---|
| 646 | proteins encoded on nuclear DNA, the mtREV24 (<A HREF="#adachi1996">Adachi |
|---|
| 647 | and Hasegawa</A>, 1996) matrix is for use with proteins encoded on mtDNA, |
|---|
| 648 | and the BLOSUM 62 (<A HREF="#henikoff1992">Henikoff and Henikoff</A>, |
|---|
| 649 | 1992) and the WAG model (<A HREF="#whelan2000">Whelan and Goldman</A>) |
|---|
| 650 | are for more distantly related amino acid sequences. |
|---|
| 651 | The WAG matrix has been infered from a database of 3905 globular protein |
|---|
| 652 | sequences, forming 182 distinct gene families spanning a broad range of |
|---|
| 653 | evolutionary distances (<A HREF="#whelan2000">Whelan and Goldman</A>). |
|---|
| 654 | |
|---|
| 655 | The VT model is based an new estimator for amino acid replacement rates, |
|---|
| 656 | the resolvent method. The VT matrix has been computed from a large set |
|---|
| 657 | alignments of varying degree of divergence. Hence VT is for use with |
|---|
| 658 | proteins of distant relatedness as well (<A HREF="#mueller2000">Mueller and Vingron</A>, 2000). |
|---|
| 659 | |
|---|
| 660 | <P>For doublets (pairs of dependent nucleotides) the SH model |
|---|
| 661 | (<A HREF="#schoeniger1994">Schoeniger and von Haeseler</A>, 1994) is |
|---|
| 662 | implemented in TREE-PUZZLE. The corresponding matrix Q_{ij} reads |
|---|
| 663 | <PRE> |
|---|
| 664 | | 2*t for i -> j transition substitution |
|---|
| 665 | | |
|---|
| 666 | Q_{ij} = | 1 for i -> j transversion substitution |
|---|
| 667 | | |
|---|
| 668 | | 0 for i -> j two substitutions |
|---|
| 669 | </PRE> |
|---|
| 670 | The SH model basically is a F81 model |
|---|
| 671 | (<A HREF="#felsenstein1981">Felsenstein</A>, 1981) for single substitutions |
|---|
| 672 | in doublets. |
|---|
| 673 | <H3> |
|---|
| 674 | <A NAME="Models of Rate Heterogeneity"></A>Models of Rate Heterogeneity</H3> |
|---|
| 675 | Rate heterogeneity is taken into account by considering invariable sites |
|---|
| 676 | and by introducing Gamma-distributed rates for the variable sites. |
|---|
| 677 | |
|---|
| 678 | <P>For invariable sites the parameter theta ("Fraction of invariable sites") |
|---|
| 679 | determines the probability of a given site to be invariable. If a site |
|---|
| 680 | is invariable the probability for the constant site patterns is pi_i, the |
|---|
| 681 | frequency of each nucleotide (amino acid). |
|---|
| 682 | |
|---|
| 683 | <P>The rates r for variable sites are determined by a discrete Gamma |
|---|
| 684 | distribution that approximates the continuous Gamma distribution |
|---|
| 685 | <PRE> |
|---|
| 686 | alpha alpha-1 |
|---|
| 687 | alpha r |
|---|
| 688 | g(r) = ------------------------ |
|---|
| 689 | alpha r |
|---|
| 690 | e Gamma(alpha) |
|---|
| 691 | </PRE> |
|---|
| 692 | where the parameter alpha ranges from alpha = infinity (no rate heterogeneity) |
|---|
| 693 | to alpha < 1 (strong heterogeneity). The mean expectation of r under this |
|---|
| 694 | distribution is 1. |
|---|
| 695 | |
|---|
| 696 | <P>A mixed model of rate heterogeneity (Gamma plus invariable sites) |
|---|
| 697 | is also available. In this case the total rate heterogeneity rho |
|---|
| 698 | (as defined by <A HREF="#gu1995">Gu et al.</A>, 1995) computes as rho = (1+theta |
|---|
| 699 | alpha)/(1+alpha). |
|---|
| 700 | |
|---|
| 701 | <H2> |
|---|
| 702 | <A NAME="Options Available"></A>Available Options</H2> |
|---|
| 703 | All options can be selected and changed after TREE-PUZZLE has read the input |
|---|
| 704 | file. Depending on the input files options are preselected and displayed |
|---|
| 705 | in a menu ("PHYLIP look and feel"): |
|---|
| 706 | <PRE> |
|---|
| 707 | GENERAL OPTIONS |
|---|
| 708 | b Type of analysis? Tree reconstruction |
|---|
| 709 | k Tree search procedure? Quartet puzzling |
|---|
| 710 | v Approximate quartet likelihood? No |
|---|
| 711 | u List unresolved quartets? No |
|---|
| 712 | n Number of puzzling steps? 1000 |
|---|
| 713 | j List puzzling step trees? No |
|---|
| 714 | o Display as outgroup? Gibbon |
|---|
| 715 | z Compute clocklike branch lengths? No |
|---|
| 716 | e Parameter estimates? Approximate (faster) |
|---|
| 717 | x Parameter estimation uses? Neighbor-joining tree |
|---|
| 718 | SUBSTITUTION PROCESS |
|---|
| 719 | d Type of sequence input data? Nucleotides |
|---|
| 720 | m Model of substitution? HKY (Hasegawa et al. 1985) |
|---|
| 721 | t Transition/transversion parameter? Estimate from data set |
|---|
| 722 | f Nucleotide frequencies? Estimate from data set |
|---|
| 723 | RATE HETEROGENEITY |
|---|
| 724 | w Model of rate heterogeneity? Uniform rate |
|---|
| 725 | |
|---|
| 726 | Quit [q], confirm [y], or change [menu] settings: |
|---|
| 727 | </PRE> |
|---|
| 728 | By typing the letters shown in the menu you can either change settings |
|---|
| 729 | or enter new parameters. Some options (for example "m" and "w") can be |
|---|
| 730 | invoked several times to switch through a number of different settings. |
|---|
| 731 | The parameters of the models of sequence evolution can be estimated from |
|---|
| 732 | the data by a variety of procedures based on maximum likelihood. The analysis |
|---|
| 733 | is started by typing "y" at the input prompt. To quit the program |
|---|
| 734 | type "q". |
|---|
| 735 | |
|---|
| 736 | <P>The following table lists in alphabetical order all TREE-PUZZLE options. |
|---|
| 737 | Be aware, however, not all of them are accessible at the same time: |
|---|
| 738 | <TABLE CELLPADDING=2 > |
|---|
| 739 | <TR VALIGN=TOP> |
|---|
| 740 | <TD> |
|---|
| 741 | <CENTER><B>Option</B></CENTER> |
|---|
| 742 | </TD> |
|---|
| 743 | <TD> |
|---|
| 744 | <CENTER><B>Description</B></CENTER> |
|---|
| 745 | </TD> |
|---|
| 746 | </TR> |
|---|
| 747 | |
|---|
| 748 | <TR VALIGN=TOP> |
|---|
| 749 | <TD> |
|---|
| 750 | <CENTER>a</CENTER> |
|---|
| 751 | </TD> |
|---|
| 752 | <TD>Gamma rate heterogeneity parameter alpha. This is the so-called shape |
|---|
| 753 | parameter of the Gamma distribution.</TD> |
|---|
| 754 | </TR> |
|---|
| 755 | |
|---|
| 756 | <TR VALIGN=TOP> |
|---|
| 757 | <TD> |
|---|
| 758 | <CENTER>b</CENTER> |
|---|
| 759 | </TD> |
|---|
| 760 | <TD>Type of analysis. Allows to switch between tree reconstruction by maximum |
|---|
| 761 | likelihood and likelihood mapping.</TD> |
|---|
| 762 | </TR> |
|---|
| 763 | |
|---|
| 764 | <TR VALIGN=TOP> |
|---|
| 765 | <TD> |
|---|
| 766 | <CENTER>c</CENTER> |
|---|
| 767 | </TD> |
|---|
| 768 | <TD>Number of rate categories (4-16) for the discrete Gamma distribution |
|---|
| 769 | (rate heterogeneity).</TD> |
|---|
| 770 | </TR> |
|---|
| 771 | |
|---|
| 772 | <TR VALIGN=TOP> |
|---|
| 773 | <TD> |
|---|
| 774 | <CENTER>d</CENTER> |
|---|
| 775 | </TD> |
|---|
| 776 | <TD>Data type. Specifies whether nucleotide, amino acid sequences, or |
|---|
| 777 | two-state data serve as input. The default is automatically set by |
|---|
| 778 | inspection of the input data. |
|---|
| 779 | After TREE-PUZZLE has selected an appropriate data type (marked by 'Auto:') |
|---|
| 780 | the 'd'-option changes the type in the following order: |
|---|
| 781 | selected type -> Nucleotides -> Amino acids -> automatically selected type.</TD> |
|---|
| 782 | </TR> |
|---|
| 783 | |
|---|
| 784 | <TR VALIGN=TOP> |
|---|
| 785 | <TD> |
|---|
| 786 | <CENTER>e</CENTER> |
|---|
| 787 | </TD> |
|---|
| 788 | <TD>Approximation option. Determines whether an approximate or the exact |
|---|
| 789 | likelihood function is used to estimate parameters of the models of sequence |
|---|
| 790 | evolution. The approximate likelihood function is in most cases sufficient |
|---|
| 791 | and is faster.</TD> |
|---|
| 792 | </TR> |
|---|
| 793 | |
|---|
| 794 | <TR VALIGN=TOP> |
|---|
| 795 | <TD> |
|---|
| 796 | <CENTER>f</CENTER> |
|---|
| 797 | </TD> |
|---|
| 798 | <TD>Base frequencies. The maximum likelihood calculation needs the frequency |
|---|
| 799 | of each nucleotide (amino acid, doublet) as input. TREE-PUZZLE estimates these |
|---|
| 800 | values from the sequence input data. This option allows specification of |
|---|
| 801 | other values.</TD> |
|---|
| 802 | </TR> |
|---|
| 803 | |
|---|
| 804 | <TR VALIGN=TOP> |
|---|
| 805 | <TD> |
|---|
| 806 | <CENTER>g</CENTER> |
|---|
| 807 | </TD> |
|---|
| 808 | <TD>Group sequences in clusters. Allows to define clusters of sequences |
|---|
| 809 | as needed for the likelihood mapping analysis. Only available when likelihood |
|---|
| 810 | mapping is selected ("b" option).</TD> |
|---|
| 811 | </TR> |
|---|
| 812 | |
|---|
| 813 | <TR VALIGN=TOP> |
|---|
| 814 | <TD> |
|---|
| 815 | <CENTER>h</CENTER> |
|---|
| 816 | </TD> |
|---|
| 817 | <TD>Codon positions or definition of doublets. For nucleotide data only. |
|---|
| 818 | If the TN or HKY model of substitution is used and the number of sites |
|---|
| 819 | in the alignment is a multiple of three the analysis can be restricted |
|---|
| 820 | to each of the three codon positions and to the 1st and 2nd positions. |
|---|
| 821 | If the SH model is used this options allows to specify that the 1st and |
|---|
| 822 | 2nd codon positions in the alignment define a doublet.</TD> |
|---|
| 823 | </TR> |
|---|
| 824 | |
|---|
| 825 | <TR VALIGN=TOP> |
|---|
| 826 | <TD> |
|---|
| 827 | <CENTER>i</CENTER> |
|---|
| 828 | </TD> |
|---|
| 829 | <TD>Fraction of invariable sites. Probability of a site to be invariable. |
|---|
| 830 | This parameter can be estimated from the data by TREE-PUZZLE |
|---|
| 831 | (only if the approximation option for the likelihood function is |
|---|
| 832 | turned off).</TD> |
|---|
| 833 | </TR> |
|---|
| 834 | |
|---|
| 835 | <TR VALIGN=TOP> |
|---|
| 836 | <TD> |
|---|
| 837 | <CENTER>j</CENTER> |
|---|
| 838 | </TD> |
|---|
| 839 | <TD>List puzzling steps trees. Writes all intermediate trees (puzzling |
|---|
| 840 | step trees) used to compute the quartet puzzling tree into a file, either |
|---|
| 841 | as a list of topologies ordered by number of occurrences (*.ptorder), or |
|---|
| 842 | as list about the chronological occurrence of the topologies (*.pstep), or |
|---|
| 843 | both.</TD> |
|---|
| 844 | </TR> |
|---|
| 845 | |
|---|
| 846 | <TR VALIGN=TOP> |
|---|
| 847 | <TD> |
|---|
| 848 | <CENTER>k</CENTER> |
|---|
| 849 | </TD> |
|---|
| 850 | <TD>Tree search. Determines how the overall tree is obtained. The topology |
|---|
| 851 | is either computed with the quartet puzzling algorithm or is defined by |
|---|
| 852 | the user. Maximum likelihood branch lengths will be computed for this tree. |
|---|
| 853 | Alternatively, a maximum likelihood distance matrix only can also be computed |
|---|
| 854 | (no overall tree). </TD> |
|---|
| 855 | </TR> |
|---|
| 856 | |
|---|
| 857 | <TR VALIGN=TOP> |
|---|
| 858 | <TD> |
|---|
| 859 | <CENTER>l</CENTER> |
|---|
| 860 | </TD> |
|---|
| 861 | <TD>Location of root. Only for computation of clock-like maximum likelihood |
|---|
| 862 | branch lengths. Allows to specify the branch where the root should be placed |
|---|
| 863 | in an unrooted tree topology. For example, in the tree (a,b,(c,d)) l = |
|---|
| 864 | 1 places the root at the branch leading to sequence a whereas l=5 places |
|---|
| 865 | the root at the internal branch.</TD> |
|---|
| 866 | </TR> |
|---|
| 867 | |
|---|
| 868 | <TR VALIGN=TOP> |
|---|
| 869 | <TD> |
|---|
| 870 | <CENTER>m</CENTER> |
|---|
| 871 | </TD> |
|---|
| 872 | <TD>Model of substitution. The following models are implemented for nucleotides: |
|---|
| 873 | the <A HREF="#tamura1993">Tamura-Nei</A> (TN) model, |
|---|
| 874 | the <A HREF="#hasegawa1985">Hasegawa et al.</A> (HKY) model, and |
|---|
| 875 | the <A HREF="#schoeniger1994">Schoeniger & von Haeseler</A> (SH) model. |
|---|
| 876 | The SH model describes the evolution of |
|---|
| 877 | pairs of dependent nucleotides (pairs are the first and the second nucleotide, |
|---|
| 878 | the third and the fourth nucleotide and so on). It allows for specification |
|---|
| 879 | of the transition-transversion ratio. The original model |
|---|
| 880 | (<A HREF="#schoeniger1994">Schoeniger & von Haeseler</A>, 1994) |
|---|
| 881 | is obtained by setting the transition-transversion parameter to 0.5. |
|---|
| 882 | The <A HREF="#jukes1969">Jukes-Cantor</A> (1969), |
|---|
| 883 | the <A HREF="#felsenstein1981">Felsenstein</A> (1981), and |
|---|
| 884 | the <A HREF="#kimura1980">Kimura</A> (1980) model are all special cases of |
|---|
| 885 | the HKY model. |
|---|
| 886 | <BR>For amino acid sequence data |
|---|
| 887 | the <A HREF="#dayhoff1978">Dayhoff et al.</A> (Dayhoff) model, |
|---|
| 888 | the <A HREF="#jones1992">Jones et al.</A> (JTT) model, |
|---|
| 889 | the <A HREF="#adachi1996">Adachi and Hasegawa</A> (mtREV24) model, |
|---|
| 890 | the <A HREF="#henikoff1992">Henikoff and Henikoff</A> (BLOSUM 62), |
|---|
| 891 | the <A HREF="#mueller2000">Mueller and Vingron</A> (VT), and |
|---|
| 892 | the <A HREF="#whelan2000">Whelan and Goldman</A> (WAG) substitution |
|---|
| 893 | model are implemented in TREE-PUZZLE. |
|---|
| 894 | The mtREV24 model describes the evolution of amino acids encoded on mtDNA, |
|---|
| 895 | and BLOSUM 62 is for distantly related amino acid sequences, as well as the |
|---|
| 896 | VT model. |
|---|
| 897 | After TREE-PUZZLE has selected an appropriate amino acid substitution model |
|---|
| 898 | (marked by 'Auto:') the 'm'-option changes the model in the following order: |
|---|
| 899 | selected model -> Dayhoff -> JTT -> mtREV24 -> BLOSUM62 -> VT -> WAG -> |
|---|
| 900 | automatically selected model |
|---|
| 901 | <BR>For more information |
|---|
| 902 | please read the section in this manual about models of sequence evolution. |
|---|
| 903 | See also option "w" (model of rate heterogeneity).</TD> |
|---|
| 904 | </TR> |
|---|
| 905 | |
|---|
| 906 | <TR VALIGN=TOP> |
|---|
| 907 | <TD> |
|---|
| 908 | <CENTER>n</CENTER> |
|---|
| 909 | </TD> |
|---|
| 910 | <TD>If tree reconstruction is selected: number of puzzling steps. Parameter |
|---|
| 911 | of the quartet puzzling tree search. Generally, |
|---|
| 912 | the more sequences are used the more puzzling steps are advised. The default |
|---|
| 913 | value varies depending on the number of sequences (at least 1000).<br> |
|---|
| 914 | |
|---|
| 915 | If likelihood mapping is selected: number of quartets in a likelihood mapping analysis. Equal to the number |
|---|
| 916 | of dots in the likelihood mapping diagram. By default 10000 dots/quartets |
|---|
| 917 | are assumed. To use all possible quartets in clustered likelihood mapping |
|---|
| 918 | you have to specify a value of n=0. |
|---|
| 919 | </TD> |
|---|
| 920 | </TR> |
|---|
| 921 | |
|---|
| 922 | <TR VALIGN=TOP> |
|---|
| 923 | <TD> |
|---|
| 924 | <CENTER>o</CENTER> |
|---|
| 925 | </TD> |
|---|
| 926 | <TD>Outgroup. For displaying purposes of the unrooted quartet puzzling |
|---|
| 927 | tree only. The default outgroup is the first sequence of the data set.</TD> |
|---|
| 928 | </TR> |
|---|
| 929 | |
|---|
| 930 | <TR VALIGN=TOP> |
|---|
| 931 | <TD> |
|---|
| 932 | <CENTER>p</CENTER> |
|---|
| 933 | </TD> |
|---|
| 934 | <TD>Constrain the TN model to the F84 model. This option is only available |
|---|
| 935 | for the Tamura-Nei model. With this option the expected (!) transition-transversion |
|---|
| 936 | ratio for the F84 model have to be entered and TREE-PUZZLE computes the corresponding |
|---|
| 937 | parameters of the TN model (this depends on base frequencies of the data). |
|---|
| 938 | This allows to compare the results of TREE-PUZZLE and the PHYLIP maximum likelihood |
|---|
| 939 | programs which use the F84 model. |
|---|
| 940 | </TD> |
|---|
| 941 | </TR> |
|---|
| 942 | |
|---|
| 943 | <TR VALIGN=TOP> |
|---|
| 944 | <TD> |
|---|
| 945 | <CENTER>q</CENTER> |
|---|
| 946 | </TD> |
|---|
| 947 | <TD>Quits analysis.</TD> |
|---|
| 948 | </TR> |
|---|
| 949 | |
|---|
| 950 | <TR VALIGN=TOP> |
|---|
| 951 | <TD> |
|---|
| 952 | <CENTER>r</CENTER> |
|---|
| 953 | </TD> |
|---|
| 954 | <TD>Y/R transition parameter. This option is only available for the TN |
|---|
| 955 | model. This parameter is the ratio of the rates for pyrimidine transitions |
|---|
| 956 | and purine transitions. You do not need to specify this parameter as TREE-PUZZLE |
|---|
| 957 | estimates it from the data. For precise definition please read the section |
|---|
| 958 | in this manual about models of sequence evolution.</TD> |
|---|
| 959 | </TR> |
|---|
| 960 | |
|---|
| 961 | <TR VALIGN=TOP> |
|---|
| 962 | <TD> |
|---|
| 963 | <CENTER>s</CENTER> |
|---|
| 964 | </TD> |
|---|
| 965 | <TD>Symmetrize doublet frequencies. This option is only available for the |
|---|
| 966 | SH model. With this option the doublet frequencies are symmetrized. For |
|---|
| 967 | example, the frequencies of "AT" and "TA" are then set to the average of both |
|---|
| 968 | frequencies.</TD> |
|---|
| 969 | </TR> |
|---|
| 970 | |
|---|
| 971 | <TR VALIGN=TOP> |
|---|
| 972 | <TD> |
|---|
| 973 | <CENTER>t</CENTER> |
|---|
| 974 | </TD> |
|---|
| 975 | <TD>Transition/transversion parameter. For nucleotide data only. You do not |
|---|
| 976 | need to specify this parameter as TREE-PUZZLE estimates it from the data. The |
|---|
| 977 | precise definition of this parameter is given in the section on models |
|---|
| 978 | of sequence evolution in this manual.</TD> |
|---|
| 979 | </TR> |
|---|
| 980 | |
|---|
| 981 | <TR VALIGN=TOP> |
|---|
| 982 | <TD> |
|---|
| 983 | <CENTER>u</CENTER> |
|---|
| 984 | </TD> |
|---|
| 985 | <TD>Show unresolved quartets. During the quartet puzzling tree search TREE-PUZZLE |
|---|
| 986 | counts the number of unresolved quartet trees. An unresolved quartet is |
|---|
| 987 | a quartet where the maximum likelihood values for each of the three possible |
|---|
| 988 | quartet topologies are so similar that it is not possible to prefer one |
|---|
| 989 | of them (<A HREF="#strimmer1997">Strimmer, Goldman, and von Haeseler</A>, 1997). |
|---|
| 990 | If this option is selected you will get a detailed list of all starlike |
|---|
| 991 | quartets. Note, for some data |
|---|
| 992 | sets there may be a lot of unresolved quartets. In this case a list of |
|---|
| 993 | all unresolved quartets is probably not very useful and also needs a lot |
|---|
| 994 | of disk space.</TD> |
|---|
| 995 | </TR> |
|---|
| 996 | |
|---|
| 997 | <TR VALIGN=TOP> |
|---|
| 998 | <TD> |
|---|
| 999 | <CENTER>v</CENTER> |
|---|
| 1000 | </TD> |
|---|
| 1001 | <TD>Approximate quartet likelihood. For the quartet puzzling tree search |
|---|
| 1002 | only. Only for very small data sets it is necessary to compute an exact |
|---|
| 1003 | maximum likelihood. For larger data sets this option should always be turned |
|---|
| 1004 | on.</TD> |
|---|
| 1005 | </TR> |
|---|
| 1006 | |
|---|
| 1007 | <TR VALIGN=TOP> |
|---|
| 1008 | <TD> |
|---|
| 1009 | <CENTER>w</CENTER> |
|---|
| 1010 | </TD> |
|---|
| 1011 | <TD>Model of rate heterogeneity. TREE-PUZZLE provides several different models |
|---|
| 1012 | of rate heterogeneity: uniform rate over all sites (rate homogeneity), |
|---|
| 1013 | Gamma distributed rates, two rates (1 invariable + 1 variable), and a mixed |
|---|
| 1014 | model (1 invariable rate + Gamma distributed rates). All necessary parameters |
|---|
| 1015 | can be estimated by TREE-PUZZLE. Note that whenever invariable sites are taken |
|---|
| 1016 | into account the parameter estimation will invoke the "e" option to use |
|---|
| 1017 | an exact likelihood function. For more detailed information please read |
|---|
| 1018 | the section in this manual about models of sequence evolution. See also |
|---|
| 1019 | option "m" (model of substitution).</TD> |
|---|
| 1020 | </TR> |
|---|
| 1021 | |
|---|
| 1022 | <TR VALIGN=TOP> |
|---|
| 1023 | <TD> |
|---|
| 1024 | <CENTER>x</CENTER> |
|---|
| 1025 | </TD> |
|---|
| 1026 | <TD>Selects the methods used in the estimation of the model parameters. |
|---|
| 1027 | Neighbor-joining tree means that a NJ tree is used to estimate the parameters. |
|---|
| 1028 | Quartet sampling means that a number of random sets of four sequences are |
|---|
| 1029 | selected to estimate parameters.</TD> |
|---|
| 1030 | </TR> |
|---|
| 1031 | |
|---|
| 1032 | <TR VALIGN=TOP> |
|---|
| 1033 | <TD> |
|---|
| 1034 | <CENTER>y</CENTER> |
|---|
| 1035 | </TD> |
|---|
| 1036 | <TD>Starts analysis.</TD> |
|---|
| 1037 | </TR> |
|---|
| 1038 | |
|---|
| 1039 | <TR VALIGN=TOP> |
|---|
| 1040 | <TD> |
|---|
| 1041 | <CENTER>z</CENTER> |
|---|
| 1042 | </TD> |
|---|
| 1043 | <TD>Computation of clock-like maximum likelihood branch lengths. This option |
|---|
| 1044 | also invokes the likelihood ratio clock test.</TD> |
|---|
| 1045 | </TR> |
|---|
| 1046 | </TABLE> |
|---|
| 1047 | |
|---|
| 1048 | <H2> |
|---|
| 1049 | <A NAME="Other Features"></A>Other Features</H2> |
|---|
| 1050 | For nucleotide data TREE-PUZZLE computes the expected transition/transversion |
|---|
| 1051 | ratio and the expected pyrimidine transition/purine transition ratio |
|---|
| 1052 | corresponding to the selected model. Base frequencies play an important |
|---|
| 1053 | role in the calculation of both numbers. |
|---|
| 1054 | |
|---|
| 1055 | <P>TREE-PUZZLE also tests with a 5% level chi-square-test whether the base composition |
|---|
| 1056 | of each sequence is identical to the average base composition of the whole |
|---|
| 1057 | alignment. All sequences with deviating composition are listed in the TREE-PUZZLE |
|---|
| 1058 | report file. It is desired that no sequence (possibly except for the outgroup) |
|---|
| 1059 | has a deviating base composition. Otherwise a basic assumption implicit |
|---|
| 1060 | in the maximum likelihood calculation is violated. |
|---|
| 1061 | |
|---|
| 1062 | <P>A hidden feature of TREE-PUZZLE (since version 2.5) is the employment of |
|---|
| 1063 | a weighting scheme of quartets (<A HREF="#strimmer1997">Strimmer, Goldman, |
|---|
| 1064 | and von Haeseler</A>, 1997) in the quartet puzzling tree search. |
|---|
| 1065 | |
|---|
| 1066 | <P>TREE-PUZZLE also computes the average distance between all pairs of sequences |
|---|
| 1067 | (maximum likelihood distances). The average distances can be viewed as |
|---|
| 1068 | a rough measure for the overall sequence divergence. |
|---|
| 1069 | |
|---|
| 1070 | <P>If more than one input tree is provided TREE-PUZZLE uses the |
|---|
| 1071 | <A HREF="#kishino1989">Kishino-Hasegawa</A> test (1989) to check which |
|---|
| 1072 | trees are significantly worse than the best tree. |
|---|
| 1073 | |
|---|
| 1074 | <P>If clock-like maximum-likelihood branch lengths are computed TREE-PUZZLE |
|---|
| 1075 | checks with the help of a likelihood-ratio test |
|---|
| 1076 | (<A HREF="#felsenstein1988">Felsenstein</A>, 1988) whether |
|---|
| 1077 | the data set is clock-like. |
|---|
| 1078 | |
|---|
| 1079 | <P>TREE-PUZZLE also detects sequences that occur more than once in the data |
|---|
| 1080 | and that therefore can be removed from the data set to speed up analysis. |
|---|
| 1081 | |
|---|
| 1082 | <P>If rate heterogeneity is taken into account in the analysis TREE-PUZZLE also |
|---|
| 1083 | computes the most probable assignment of rate categories to sequence positions, |
|---|
| 1084 | according <A HREF="#felsenstein1996">Felsenstein and Churchill</A> (1996). |
|---|
| 1085 | |
|---|
| 1086 | <H2> |
|---|
| 1087 | <A NAME="Interpretation and Hints"></A>Interpretation and Hints</H2> |
|---|
| 1088 | |
|---|
| 1089 | <H3> |
|---|
| 1090 | <A NAME="Quartet Puzzling Support Values"></A>Quartet Puzzling Support |
|---|
| 1091 | Values</H3> |
|---|
| 1092 | The quartet puzzling (QP) tree search estimates support values for each |
|---|
| 1093 | internal branch. They can be interpreted in much the same way as |
|---|
| 1094 | bootstrap values (though they should not be confused with them). |
|---|
| 1095 | Branches showing a QP reliability from 90% to 100% can be considered |
|---|
| 1096 | very strongly supported. Branches with lower reliability (> 70%) can |
|---|
| 1097 | in principle be also trusted but in this case it is advisable to |
|---|
| 1098 | check how well the respective internal branch does in comparison to other |
|---|
| 1099 | branches in the tree (i.e. check relative reliability). |
|---|
| 1100 | If you are interested in a branch with a low confidence it is also |
|---|
| 1101 | important to check the alternative groupings that are not included |
|---|
| 1102 | in the QP tree (they are listed in the TREE-PUZZLE report file in *.** format). |
|---|
| 1103 | There should be a substantial gap between the lowest reliability |
|---|
| 1104 | value of the QP tree and |
|---|
| 1105 | the most frequent grouping that is not included in the QP tree. |
|---|
| 1106 | <H3> |
|---|
| 1107 | <A NAME="Percentage of Unresolved Quartets"></A>Percentage of Unresolved |
|---|
| 1108 | Quartets</H3> |
|---|
| 1109 | TREE-PUZZLE computes the number and the percentage of completely unresolved |
|---|
| 1110 | maximum likelihood quartets. An unresolved quartet is a quartet where the |
|---|
| 1111 | maximum likelihood values for each of the three possible quartet topologies |
|---|
| 1112 | are so similar that it is not possible to prefer one of them |
|---|
| 1113 | (<A HREF="#strimmer1997">Strimmer, Goldman, and von Haeseler</A>, 1997). |
|---|
| 1114 | The percentage of the unresolved quartets |
|---|
| 1115 | among all possible quartets is an indicator of the suitability of the data |
|---|
| 1116 | for phylogenetic analysis. A high percentage usually results in a highly |
|---|
| 1117 | multifurcating quartet puzzling tree. If you only have a few unresolved |
|---|
| 1118 | quartets we recommend to invoke option "u" to get a list of all these quartets. |
|---|
| 1119 | In a likelihood mapping analysis the percentage of completely unresolved |
|---|
| 1120 | quartets is shown in the central region of the triangle diagram. |
|---|
| 1121 | <H3> |
|---|
| 1122 | <A NAME="Automatic Parameter Estimation"></A>Automatic Parameter Estimation</H3> |
|---|
| 1123 | TREE-PUZZLE estimates both the parameters of the models of substitution (TN, |
|---|
| 1124 | HKY) and of the model of rate variation (Gamma distribution, fraction of |
|---|
| 1125 | invariable sites) without prior knowledge of an overall tree by a number |
|---|
| 1126 | of different strategies based on maximum likelihood. For all estimated |
|---|
| 1127 | parameters a corresponding standard error (S.E.) is computed. If you have |
|---|
| 1128 | good arguments to choose a different set of parameters than the values |
|---|
| 1129 | obtained by TREE-PUZZLE don't hesitate to use them. If sequences are extremely |
|---|
| 1130 | similar it is very hard for every algorithm to extract information about |
|---|
| 1131 | the model of substitution from the data set. Also, be careful if the |
|---|
| 1132 | estimated parameter values |
|---|
| 1133 | are very close to the internal upper and lower bounds: |
|---|
| 1134 | <TABLE CELLPADDING=2 > |
|---|
| 1135 | <TR VALIGN=TOP> |
|---|
| 1136 | <TD><B>Parameter (Symbol)</B> </TD> |
|---|
| 1137 | |
|---|
| 1138 | <TD><B>Minimal Value</B> </TD> |
|---|
| 1139 | |
|---|
| 1140 | <TD><B>Maximal Value</B> </TD> |
|---|
| 1141 | </TR> |
|---|
| 1142 | |
|---|
| 1143 | <TR VALIGN=TOP> |
|---|
| 1144 | <TD>Transition/transversion parameter (t) </TD> |
|---|
| 1145 | |
|---|
| 1146 | <TD>0.20 </TD> |
|---|
| 1147 | |
|---|
| 1148 | <TD>30.00 </TD> |
|---|
| 1149 | </TR> |
|---|
| 1150 | |
|---|
| 1151 | <TR VALIGN=TOP> |
|---|
| 1152 | <TD>Y/R transition parameter (gamma) </TD> |
|---|
| 1153 | |
|---|
| 1154 | <TD>0.10 </TD> |
|---|
| 1155 | |
|---|
| 1156 | <TD>6.00 </TD> |
|---|
| 1157 | </TR> |
|---|
| 1158 | |
|---|
| 1159 | <TR VALIGN=TOP> |
|---|
| 1160 | <TD>Fraction of invariable sites (theta) </TD> |
|---|
| 1161 | |
|---|
| 1162 | <TD>0.00 </TD> |
|---|
| 1163 | |
|---|
| 1164 | <TD>0.99 </TD> |
|---|
| 1165 | </TR> |
|---|
| 1166 | |
|---|
| 1167 | <TR VALIGN=TOP> |
|---|
| 1168 | <TD>Gamma rate heterogeneity parameter (alpha) </TD> |
|---|
| 1169 | |
|---|
| 1170 | <TD>0.01 </TD> |
|---|
| 1171 | |
|---|
| 1172 | <TD>99 </TD> |
|---|
| 1173 | </TR> |
|---|
| 1174 | </TABLE> |
|---|
| 1175 | |
|---|
| 1176 | <H3> |
|---|
| 1177 | <A NAME="Likelihood Mapping"></A>Likelihood Mapping</H3> |
|---|
| 1178 | Likelihood mapping (<A HREF="#strimmer1997">Strimmer and von Haeseler</A>, |
|---|
| 1179 | 1997) is a method to analyzethe support for internal branches in a tree |
|---|
| 1180 | without having to compute an overall tree. |
|---|
| 1181 | Every internal branch in an a completely resolved tree defines |
|---|
| 1182 | up to four clusters of sequences. Sometimes only the relationship of these |
|---|
| 1183 | groups are of interest and not details of the structure of the clusters |
|---|
| 1184 | themselves. Then a likelihood mapping analysis is sufficient. |
|---|
| 1185 | The corresponding likelihood mapping triangle diagrams (as contained in |
|---|
| 1186 | various output files generated by TREE-PUZZLE) will |
|---|
| 1187 | illucidate the possible relationships in detail. |
|---|
| 1188 | |
|---|
| 1189 | <H3><A NAME="Batch Mode"></A>Batch Mode</H3> |
|---|
| 1190 | Running TREE-PUZZLE from a Unix batch file is straightforward despite the lack |
|---|
| 1191 | of command switches. For example, to run TREE-PUZZLE with a the transition/transversion |
|---|
| 1192 | parameter equal to 10 the following lines in a batch file are sufficient: |
|---|
| 1193 | <PRE> |
|---|
| 1194 | puzzle << ! |
|---|
| 1195 | t |
|---|
| 1196 | 10 |
|---|
| 1197 | y |
|---|
| 1198 | ! |
|---|
| 1199 | </PRE> |
|---|
| 1200 | All other parameters can also be accessed the same way. |
|---|
| 1201 | |
|---|
| 1202 | <H2> |
|---|
| 1203 | <A NAME="Limits and Error Messages"></A>Limits and Error Messages</H2> |
|---|
| 1204 | TREE-PUZZLE has a built-in limit to allow data sets only up to 257 sequences |
|---|
| 1205 | in order to avoid overflow of internal integer variables. At least 32767 |
|---|
| 1206 | sites should be possible depending on the compiler used. Computation time |
|---|
| 1207 | will be the largest constraint even if sufficient computer memory is available. |
|---|
| 1208 | If rate heterogeneity is taken into account every additional category slows |
|---|
| 1209 | down the overall computation by the amount of time needed for one complete |
|---|
| 1210 | run assuming rate homogeneity. |
|---|
| 1211 | |
|---|
| 1212 | <P>If problems are encountered TREE-PUZZLE terminates program execution and |
|---|
| 1213 | returns a plain text error message. Depending on the severity errors can be |
|---|
| 1214 | classified into three groups: |
|---|
| 1215 | <TABLE CELLPADDING=2 > |
|---|
| 1216 | <TR VALIGN=TOP> |
|---|
| 1217 | <TD>"HALT " errors: </TD> |
|---|
| 1218 | |
|---|
| 1219 | <TD>Very severe. You should never ever see one of these messages. If so, |
|---|
| 1220 | please contact the developers! </TD> |
|---|
| 1221 | </TR> |
|---|
| 1222 | |
|---|
| 1223 | <TR VALIGN=TOP> |
|---|
| 1224 | <TD>"Unable to proceed" errors: </TD> |
|---|
| 1225 | |
|---|
| 1226 | <TD>Harmless but annoying. Mostly memory errors (not enough RAM) or problems |
|---|
| 1227 | with the format of the input files. </TD> |
|---|
| 1228 | </TR> |
|---|
| 1229 | |
|---|
| 1230 | <TR VALIGN=TOP> |
|---|
| 1231 | <TD>Other errors: </TD> |
|---|
| 1232 | |
|---|
| 1233 | <TD>Completely uncritical. Occur mostly when options of TREE-PUZZLE are being |
|---|
| 1234 | set. </TD> |
|---|
| 1235 | </TR> |
|---|
| 1236 | </TABLE> |
|---|
| 1237 | A standard machine (1996 Unix workstation) with 32 to 64 MB RAM TREE-PUZZLE |
|---|
| 1238 | can easily do maximum likelihood tree searches including estimation of |
|---|
| 1239 | support values for data sets with 50-100 sequences. As likelihood mapping |
|---|
| 1240 | is not memory consuming and computationally quite fast it can be applied |
|---|
| 1241 | to large data sets as well. |
|---|
| 1242 | <H2> |
|---|
| 1243 | <A NAME="Are Quartets Reliable"></A>Are Quartets Reliable?</H2> |
|---|
| 1244 | Quartets may be intrinsically one of the most difficult phylogenies to |
|---|
| 1245 | resolve accurately (cf. <A HREF="#hillis1996">Hillis</A>, 1996). |
|---|
| 1246 | It has been asked whether this is |
|---|
| 1247 | a problem for quartet puzzling because it works with quartets. |
|---|
| 1248 | |
|---|
| 1249 | <P>However, this is not true. According to Hillis' findings |
|---|
| 1250 | (<A HREF="#hillis1996">Hillis</A>, 1996), |
|---|
| 1251 | quartets can be hard, but extra information helps. That is, if all you |
|---|
| 1252 | have are data on species (A, B, C, D) then it might be relatively difficult |
|---|
| 1253 | to find the correct tree for them. But if you have additional data (species |
|---|
| 1254 | E, F, G, ...) and try to find a tree for all the species, then that part |
|---|
| 1255 | of the tree relating (A, B, C, D) will more likely be correct than if you |
|---|
| 1256 | had just the data for (A, B, C, D). In Hillis' big 'model' tree, there |
|---|
| 1257 | are many examples of subsets of 4 species which in themselves might be |
|---|
| 1258 | hard to resolve correctly, but which are correctly resolved thanks to the |
|---|
| 1259 | (...large amount of...) additional data. TREE-PUZZLE (quartet puzzling) also |
|---|
| 1260 | gains advantage from extra data in the same way. It's 'understanding' or |
|---|
| 1261 | resolution of the quartet (A, B, C, D) might be incorrect, but the information |
|---|
| 1262 | on the relationships of (A, B, C, D) implicit in its treatment of (A, B, |
|---|
| 1263 | C, E), (A, B, E, D), (A, E, C, D), (E, B, C, D), (A, B, C, F), (A, B, F, |
|---|
| 1264 | D), (A, F, C, D), (F, B, C, D), (A, B, C, G), etc. etc. should overcome |
|---|
| 1265 | this problem. |
|---|
| 1266 | |
|---|
| 1267 | <P>The facts about how well TREE-PUZZLE actually works have been investigated |
|---|
| 1268 | in the <A HREF="#strimmer1996">Strimmer and von Haeseler</A> (1996) and |
|---|
| 1269 | <A HREF="#strimmer1997">Strimmer, Goldman, and von Haeseler</A> (1997) papers. |
|---|
| 1270 | Their results cannot be altered by Hillis' findings. |
|---|
| 1271 | Considered as a heuristic search for maximum likelihood trees, quartet |
|---|
| 1272 | puzzling works very well. |
|---|
| 1273 | |
|---|
| 1274 | <P>(This section follows N. Goldman, personal communication). |
|---|
| 1275 | <H2> |
|---|
| 1276 | <A NAME="Other Programs"></A>Other Programs</H2> |
|---|
| 1277 | There are a number of other very useful and widespread programs to reconstruct |
|---|
| 1278 | phylogenetic relationships and to analyse molecular sequence data that |
|---|
| 1279 | are available free of charge. Here are the URLS of some web pages that |
|---|
| 1280 | provide links to most of them (including the PHYLIP package and |
|---|
| 1281 | the MOLPHY and PAML maximum likelihood programs): |
|---|
| 1282 | <DL> |
|---|
| 1283 | |
|---|
| 1284 | <DD> |
|---|
| 1285 | Joe Felsenstein's list of programs (well-organized and pretty exhaustive):<br> |
|---|
| 1286 | <A |
|---|
| 1287 | HREF="http://evolution.genetics.washington.edu/phylip/software.html">http://evolution.genetics.washington.edu/phylip/software.html</A></DD> |
|---|
| 1288 | |
|---|
| 1289 | |
|---|
| 1290 | <DD> |
|---|
| 1291 | "Tree of Life" software page:<br> |
|---|
| 1292 | <A HREF="http://phylogeny.arizona.edu/tree/programs/programs.html">http://phylogeny.arizona.edu/tree/programs/programs.html</A></DD> |
|---|
| 1293 | |
|---|
| 1294 | |
|---|
| 1295 | <DD> |
|---|
| 1296 | European Bioinformatics Institute:<br> |
|---|
| 1297 | <A HREF="http://www.ebi.ac.uk/biocat/biocat.html">http://www.ebi.ac.uk/biocat/biocat.html</A></DD> |
|---|
| 1298 | |
|---|
| 1299 | </DL> |
|---|
| 1300 | |
|---|
| 1301 | <H2> |
|---|
| 1302 | <A NAME="Acknowledgements"></A>Acknowledgements</H2> |
|---|
| 1303 | The maximum likelihood kernel of TREE-PUZZLE is an offspring of the program |
|---|
| 1304 | NucML/ProtML version 2.2 by Jun Adachi and Masami Hasegawa (<A HREF="ftp://sunmh.ism.ac.jp/pub/molphy">ftp://sunmh.ism.ac.jp/pub/molphy</A>). |
|---|
| 1305 | We thank them for generously allowing us to use the source code of their |
|---|
| 1306 | program. |
|---|
| 1307 | We would also like to thank |
|---|
| 1308 | the <A HREF="http://www.ebi.ac.uk">European Bioinformatics Institute (EBI)</A>, |
|---|
| 1309 | the <A HREF="http://www.pasteur.fr">Institut Pasteur</A>, |
|---|
| 1310 | and the <A HREF="http://www.indiana.edu">University of Indiana</A> |
|---|
| 1311 | (i.e. Don Gilbert) |
|---|
| 1312 | for kindly distributing the TREE-PUZZLE program. |
|---|
| 1313 | |
|---|
| 1314 | We thank Stephane Bortzmeyer for his with debugging of |
|---|
| 1315 | <EM>floating point exception</EM> errors. |
|---|
| 1316 | |
|---|
| 1317 | We also thank Peter Foster for pointing out the inconsistency |
|---|
| 1318 | in the invariable site models in respect to other programs. |
|---|
| 1319 | |
|---|
| 1320 | Finally we thank the |
|---|
| 1321 | <A HREF="http://www.dfg.de">Deutsche Forschungsgemeinschaft</A> |
|---|
| 1322 | (VI 160/3-1 and Ha 1628/4-1) and the Max-Planck-Society |
|---|
| 1323 | for financial support. |
|---|
| 1324 | |
|---|
| 1325 | <H2><A NAME="References"></A>References</H2> |
|---|
| 1326 | |
|---|
| 1327 | <A NAME="adachi1996"></A> |
|---|
| 1328 | Adachi, J., and M. Hasegawa. 1996. MOLPHY: programs for molecular phylogenetics, |
|---|
| 1329 | version 2.3. Institute of Statistical Mathematics, Tokyo. |
|---|
| 1330 | |
|---|
| 1331 | <P><A NAME="adachi1996"></A> |
|---|
| 1332 | Adachi, J., and M. Hasegawa. 1996. Model of amino acid substitution |
|---|
| 1333 | in proteins encoded by mitochondrial DNA. <I>J. Mol. Evol.</I> <B>42</B>: |
|---|
| 1334 | 459-468. |
|---|
| 1335 | |
|---|
| 1336 | <P><A NAME="dayhoff1978"></A> |
|---|
| 1337 | Dayhoff, M. O., R. M. Schwartz, and B. C. Orcutt. 1978. A model of evolutionary |
|---|
| 1338 | change in proteins. In: Dayhoff, M. O. (ed.) Atlas of Protein Sequence |
|---|
| 1339 | Structure, Vol. 5, Suppl. 3. National Biomedical Research Foundation, Washington |
|---|
| 1340 | DC, pp. 345-352. |
|---|
| 1341 | |
|---|
| 1342 | <P><A NAME="felsenstein1981"></A> |
|---|
| 1343 | Felsenstein, J. 1981. Evolutionary trees from DNA sequences: A maximum |
|---|
| 1344 | likelihood approach. <I>J. Mol. Evol.</I> <B>17</B>: 368-376. |
|---|
| 1345 | |
|---|
| 1346 | <P><A NAME="felsenstein1984"></A> |
|---|
| 1347 | Felsenstein, J. 1984. Distance methods for inferring phylogenies: |
|---|
| 1348 | A Justification. <I>Evolution</I> <B>38</B>: 16-24. |
|---|
| 1349 | |
|---|
| 1350 | <P><A NAME="felsenstein1988"></A> |
|---|
| 1351 | Felsenstein, J. 1988. Phylogenies from molecular sequences: Inference |
|---|
| 1352 | and reliability. <I>Annu. Rev. Genet.</I> <B>22</B>: 521-565. |
|---|
| 1353 | |
|---|
| 1354 | <P><A NAME="felsenstein1993"></A> |
|---|
| 1355 | Felsenstein, J. 1993. PHYLIP (Phylogeny Inference Package) version 3.5c. |
|---|
| 1356 | Distributed by the author. Department of Genetics, University of Washington, |
|---|
| 1357 | Seattle. |
|---|
| 1358 | |
|---|
| 1359 | <P><A NAME="felsenstein1996"></A> |
|---|
| 1360 | Felsenstein, J., and G.A. Churchill. 1996. A hidden Markov model approach |
|---|
| 1361 | to variation among sites in rate of evolution. <I>Mol. Biol. Evol.</I> |
|---|
| 1362 | <B>13</B>: 93-104. |
|---|
| 1363 | |
|---|
| 1364 | <P><A NAME="gropp1998"></A> |
|---|
| 1365 | Gropp, W., S. Huss-Lederman, A. Lumsdaine, E. Lusk, B. Nitzberg, |
|---|
| 1366 | W. Saphir, and M. Snir. 1998. MPI - The Complete Reference: Volume 2, |
|---|
| 1367 | The MPI Extensions. 2nd Edition, The MIT Press, Cambridge, MA. |
|---|
| 1368 | |
|---|
| 1369 | <P><A NAME="gu1995"></A> |
|---|
| 1370 | Gu, X., Y.-X. Fu, and W.-H. Li. 1995. Maximum likelihood estimation |
|---|
| 1371 | of the heterogeneity of substitution rate among nucleotide sites. <I>Mol. |
|---|
| 1372 | Biol. Evol.</I> <B>12</B>: 546-557. |
|---|
| 1373 | |
|---|
| 1374 | <P><A NAME="hasegawa1985"></A |
|---|
| 1375 | >Hasegawa, M., H. Kishino, and K. Yano. 1985. Dating of the human-ape |
|---|
| 1376 | splitting by a molecular clock of mitochondrial DNA. <I>J. Mol. Evol.</I> |
|---|
| 1377 | <B>22</B>: 160-174. |
|---|
| 1378 | |
|---|
| 1379 | <P><A NAME="henikoff1992"></A> |
|---|
| 1380 | Henikoff, S., J. G. Henikoff. 1992. Amino acid substitution matrices |
|---|
| 1381 | from protein blocks. <I>PNAS (USA)</I> <B>89</B>:10915-10919. |
|---|
| 1382 | |
|---|
| 1383 | <P><A NAME="hillis1996"></A> |
|---|
| 1384 | Hillis, D. M. 1996. Inferring complex phylogenies. <I>Nature</I> |
|---|
| 1385 | <B>383</B>:130-131. |
|---|
| 1386 | |
|---|
| 1387 | <P><A NAME="jukes1969"></A> |
|---|
| 1388 | Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. |
|---|
| 1389 | In: Munro, H. N. (ed.) Mammalian Protein Metabolism, New York: Academic |
|---|
| 1390 | Press, pp. 21-132. |
|---|
| 1391 | |
|---|
| 1392 | <P><A NAME="jones1992"></A> |
|---|
| 1393 | Jones, D. T., W. R. Taylor, and J. M. Thornton. 1992. The rapid generation |
|---|
| 1394 | of mutation data matrices from protein sequences. <I>CABIOS</I> <B>8</B>: |
|---|
| 1395 | 275-282. |
|---|
| 1396 | |
|---|
| 1397 | <P><A NAME="kimura1980"></A> |
|---|
| 1398 | Kimura, M. 1980. A simple method for estimating evolutionary rates of |
|---|
| 1399 | base substitutions through comparative studies of nucleotide sequences. |
|---|
| 1400 | <I>J. Mol. Evol.</I> <B>16</B>: 111-120. |
|---|
| 1401 | |
|---|
| 1402 | <P><A NAME="kishino1989"></A> |
|---|
| 1403 | Kishino, H., and M. Hasegawa. 1989. Evaluation of the maximum likelihood |
|---|
| 1404 | estimate of the evolutionary tree topologies from DNA sequence data, and |
|---|
| 1405 | the branching order in Hominoidea. <I>J. Mol. Evol.</I> <B>29</B>: 170-179. |
|---|
| 1406 | |
|---|
| 1407 | <P><A NAME="mueller2000"></A> |
|---|
| 1408 | Mueller, T., and M. Vingron. 2000. Modeling Amino Acid Replacement. |
|---|
| 1409 | <I>J. Comp. Biol.</I>, to appear |
|---|
| 1410 | (<A HREF="http://www.dkfz-heidelberg.de/tbi/people/tmueller/paper/paper.ps">preprint of the article</A>) |
|---|
| 1411 | |
|---|
| 1412 | <P><A NAME="saitou1987"></A> |
|---|
| 1413 | Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method |
|---|
| 1414 | for reconstructing phylogenetic trees. <I>Mol. Biol. Evol.</I> <B>4</B>: |
|---|
| 1415 | 1406-425. |
|---|
| 1416 | |
|---|
| 1417 | <P><A NAME="schoeniger1994"></A> |
|---|
| 1418 | Schoeniger, M., and A. von Haeseler. 1994. A stochastic model for |
|---|
| 1419 | the evolution of autocorrelated DNA sequences. <I>Mol. Phyl. Evol.</I> |
|---|
| 1420 | <B>3</B>: 240-247. |
|---|
| 1421 | |
|---|
| 1422 | <P><A NAME="snir1998"></A> |
|---|
| 1423 | Snir, M., S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra. |
|---|
| 1424 | 1998. MPI - The Complete Reference: Volume 1, The MPI Core. 2nd Edition, |
|---|
| 1425 | The MIT Press, Cambridge, MA. |
|---|
| 1426 | |
|---|
| 1427 | <P><A NAME="strimmer1996"></A> |
|---|
| 1428 | Strimmer, K., and A. von Haeseler. 1996. Quartet puzzling: a quartet |
|---|
| 1429 | maximum likelihood method for reconstructing tree topologies. <I>Mol. Biol. |
|---|
| 1430 | Evol.</I> <B>13</B>: 964-969. |
|---|
| 1431 | |
|---|
| 1432 | <P><A NAME="strimmer1997"></A> |
|---|
| 1433 | Strimmer, K., N. Goldman, and A. von Haeseler. 1997. Bayesian probabilities |
|---|
| 1434 | and quartet puzzling. <I>Mol. Biol. Evol.</I> <B>14</B>: 210-211. |
|---|
| 1435 | |
|---|
| 1436 | <P><A NAME="strimmer1997"></A> |
|---|
| 1437 | Strimmer, K., and A. von Haeseler. 1997. Likelihood-mapping: a simple |
|---|
| 1438 | method to visualize phylogenetic content of a sequence alignment. <I>PNAS |
|---|
| 1439 | (USA).</I> <B>94</B>:6815-6819. |
|---|
| 1440 | |
|---|
| 1441 | <P><A NAME="tamura1993"></A> |
|---|
| 1442 | Tamura, K., and M. Nei. 1993. Estimation of the number of nucleotide |
|---|
| 1443 | substitutions in the control region of mitochondrial DNA in humans and |
|---|
| 1444 | chimpanzees. <I>Mol. Biol. Evol.</I> <B>10</B>: 512-526. |
|---|
| 1445 | |
|---|
| 1446 | <P><A NAME="tamura1994"></A> |
|---|
| 1447 | Tamura K. 1994. Model selection in the estimation of the number of |
|---|
| 1448 | nucleotide substitutions. <I>Mol. Biol. Evol.</I> <B>11</B>: 154-157. |
|---|
| 1449 | |
|---|
| 1450 | <P><A NAME="thompson1994"></A> |
|---|
| 1451 | Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: Improving |
|---|
| 1452 | the sensitivity of progressive multiple sequence alignment through sequence |
|---|
| 1453 | weighting, positions-specific gap penalties and weight matrix choice. <I>Nucl. |
|---|
| 1454 | Acids Res.</I> <B>22</B>: 4673-4680. |
|---|
| 1455 | |
|---|
| 1456 | <P><A NAME="whelan2000"></A> |
|---|
| 1457 | Whelan, S. and Goldman, N. 2000. A new empirical model of |
|---|
| 1458 | amino acid evolution. <I>Manuscript in prep.</I> |
|---|
| 1459 | |
|---|
| 1460 | <P><A NAME="yang1994"></A> |
|---|
| 1461 | Yang, Z. 1994. Maximum likelihood phylogenetic estimation from DNA sequences |
|---|
| 1462 | with variable rates over sites: approximate methods. <I>J. Mol. Evol.</I> |
|---|
| 1463 | <B>39</B>:306-314. |
|---|
| 1464 | |
|---|
| 1465 | |
|---|
| 1466 | <H2> |
|---|
| 1467 | <A NAME="Known Bugs"></A>Known Bugs</H2> |
|---|
| 1468 | |
|---|
| 1469 | On Alpha based computers sometimes <EM>floating point exception</EM> |
|---|
| 1470 | errors occur. Some of those result on a bug in the malloc routine |
|---|
| 1471 | in the system routines of the Compaq operating system. We recomend |
|---|
| 1472 | to use the GNU cc compiler |
|---|
| 1473 | (<TT><A HREF="http://egcs.gnu.org">http://egcs.gnu.org</A></TT>), |
|---|
| 1474 | which does not use the system malloc routine. |
|---|
| 1475 | |
|---|
| 1476 | For other occurrances of the <EM>floating point exception</EM> |
|---|
| 1477 | we need datasets and information about the operating system |
|---|
| 1478 | to reproduce and debug those errors. |
|---|
| 1479 | |
|---|
| 1480 | <H2> |
|---|
| 1481 | <A NAME="Version History"></A>Version History</H2> |
|---|
| 1482 | The TREE-PUZZLE program has first been distributed in 1995 under the name |
|---|
| 1483 | PUZZLE. Since then it has |
|---|
| 1484 | been continually improved. Here is a list of the most important changes. |
|---|
| 1485 | <TABLE CELLPADDING=2 > |
|---|
| 1486 | |
|---|
| 1487 | <TR VALIGN=TOP> |
|---|
| 1488 | <TD>5.0</TD> |
|---|
| 1489 | |
|---|
| 1490 | <TD>Puzzle tree reconstruction part parallelized using the MPI standard |
|---|
| 1491 | (Message Passing Interface). |
|---|
| 1492 | <BR>Possibility added to give input file and user tree file at the command line. |
|---|
| 1493 | Output files renamed to the form PREFIX.EXTENSION, where PREFIX is the |
|---|
| 1494 | input file name or, if used, the user tree file name. |
|---|
| 1495 | The EXTENSION could be one of the following: puzzle (PUZZLE report), |
|---|
| 1496 | tree (tree file), dist (ML distance file), eps (likelihood mapping output |
|---|
| 1497 | in eps format), qlist (bad quartets), qstep (puzzling step tree IDs as they |
|---|
| 1498 | occur in the analysis), or qtorder (sorted unique list of puzzling step trees). |
|---|
| 1499 | <BR>The likelihood value is added to the treefile as a leading comment |
|---|
| 1500 | ("[ lh=x.xxx ]") to the tree string. |
|---|
| 1501 | <BR>VT (variable time) matrix (<A HREF="#mueller2000">Mueller and |
|---|
| 1502 | Vingron</A>, 2000) and WAG matrix (<A HREF="#whelan2000">Whelan and |
|---|
| 1503 | Goldman</A>, 2000) |
|---|
| 1504 | added to the AA substitution models. |
|---|
| 1505 | <BR>The Data type and AA-model options in the menu now show the |
|---|
| 1506 | automatically set type/model first. These can now be changed using 'd' or |
|---|
| 1507 | 'm' key in an order independent from the type/model selected. This makes |
|---|
| 1508 | it possible to select a desired AA substitution model or data type by |
|---|
| 1509 | piping letters to the standard input without knowing PUZZLE's preselection. |
|---|
| 1510 | <BR>Parameters are written to file when estimated before evaluation of |
|---|
| 1511 | the quartets. |
|---|
| 1512 | <BR>The inconsistency to respect to other programs in handling |
|---|
| 1513 | invariable sites has been fixed. |
|---|
| 1514 | <BR>Some minor bug fixes (e.g. the clockbug and another in the optimization |
|---|
| 1515 | routine have been fixed). |
|---|
| 1516 | </TD> |
|---|
| 1517 | </TR> |
|---|
| 1518 | |
|---|
| 1519 | <TR VALIGN=TOP> |
|---|
| 1520 | <TD>4.0.2</TD> |
|---|
| 1521 | |
|---|
| 1522 | <TD>Update to provide precompiled Windows 95/98/NT executables. In addition: |
|---|
| 1523 | Internal rearrangement of rate matrices. |
|---|
| 1524 | Improved BLOSUM 62 matrix. Endless input loop for input |
|---|
| 1525 | files restricted to 10 trials. |
|---|
| 1526 | Source code clean up to remove compile time warnings. |
|---|
| 1527 | Explicit quit option in menu. Changes in NJ tree code. |
|---|
| 1528 | Updates of documentation (address changes, correction of errors). |
|---|
| 1529 | </TD> |
|---|
| 1530 | </TR> |
|---|
| 1531 | |
|---|
| 1532 | <TR VALIGN=TOP> |
|---|
| 1533 | <TD>4.0.1</TD> |
|---|
| 1534 | |
|---|
| 1535 | <TD>Maintenance release. Correction of mtREV matrix. Fix of the "intree bug". |
|---|
| 1536 | Removal of stringent runtime-compatibility check to allow out-of-the-box compile |
|---|
| 1537 | on alpha. More accurate gamma distribution allowing 16 instead of 8 categories |
|---|
| 1538 | and ensuring a better alpha > 1.0. Update of documentation (mainly address changes). |
|---|
| 1539 | More Unix-like file layout, and change of license to GPL. |
|---|
| 1540 | </TD> |
|---|
| 1541 | </TR> |
|---|
| 1542 | |
|---|
| 1543 | <TR VALIGN=TOP> |
|---|
| 1544 | <TD>4.0 </TD> |
|---|
| 1545 | |
|---|
| 1546 | <TD>Executables for Windows 95/NT and OS/2 instead of MS-DOS. Computation |
|---|
| 1547 | of clock-like branch lengths (also for amino acids and for non-binary trees). |
|---|
| 1548 | Automatic likelihood ratio clock test. Model for two-state sequences data |
|---|
| 1549 | (0,1) included. Display of most probable assignment of rates to sites. |
|---|
| 1550 | Identification of groups of identical sequences. Possibility to read multiple |
|---|
| 1551 | input trees. Kishino-Hasegawa test to check whether trees are significantly |
|---|
| 1552 | different. BLOSUM 62 model of amino acid substitution |
|---|
| 1553 | (<A HREF="#henikoff1992">Henikoff-Henikoff</A>, 1992). |
|---|
| 1554 | Use of parameter alpha instead of eta = 1/(1+alpha) (for rate heterogeneity). |
|---|
| 1555 | |
|---|
| 1556 | Improvements to user interface. SH model can be applied to 1st and 2nd |
|---|
| 1557 | codon positions. Automatic check for compatible compiler settings. Workaround |
|---|
| 1558 | for severe runtime problem when the gcc compiler was used.</TD> |
|---|
| 1559 | </TR> |
|---|
| 1560 | |
|---|
| 1561 | <TR VALIGN=TOP> |
|---|
| 1562 | <TD>3.1 </TD> |
|---|
| 1563 | |
|---|
| 1564 | <TD>Much improved user interface to rate heterogeneity (less confusing |
|---|
| 1565 | menu, rearranged outfile, additional out-of-range check). Possibility to |
|---|
| 1566 | read rooted input trees (automatic removal of basal bifurcation). Computation |
|---|
| 1567 | of average distance between all pairs of sequences. Fix of a bug that caused |
|---|
| 1568 | PUZZLE 3.0 to crash on some systems (DEC Alpha). Cosmetic changes in program |
|---|
| 1569 | and documentation. </TD> |
|---|
| 1570 | </TR> |
|---|
| 1571 | |
|---|
| 1572 | <TR VALIGN=TOP> |
|---|
| 1573 | <TD>3.0 </TD> |
|---|
| 1574 | |
|---|
| 1575 | <TD>Rate heterogeneity included in all models of substitution (Gamma distribution |
|---|
| 1576 | plus invariable sites). Likelihood mapping analysis with Postscript output |
|---|
| 1577 | added. Much more sophisticated maximum likelihood parameter estimation |
|---|
| 1578 | for all model parameters including those of rate heterogeneity. Codon positions |
|---|
| 1579 | selectable. Update to mtREV24. New icon. Less verbose runtime messages. |
|---|
| 1580 | HTML documentation. Better internal error classification. More information |
|---|
| 1581 | in outfile (number of constant positions etc.). </TD> |
|---|
| 1582 | </TR> |
|---|
| 1583 | |
|---|
| 1584 | <TR VALIGN=TOP> |
|---|
| 1585 | <TD>2.5.1 </TD> |
|---|
| 1586 | |
|---|
| 1587 | <TD>Fix of a bug (present only in version 2.5) related to computation of |
|---|
| 1588 | the variance of the maximum likelihood branch lengths that caused occasional |
|---|
| 1589 | crashes of PUZZLE on some systems when applied to data sets containing many |
|---|
| 1590 | very similar sequences. Drop of support for non-FPU Macintosh version. |
|---|
| 1591 | Corrections in manual. </TD> |
|---|
| 1592 | </TR> |
|---|
| 1593 | |
|---|
| 1594 | <TR VALIGN=TOP> |
|---|
| 1595 | <TD>2.5 </TD> |
|---|
| 1596 | |
|---|
| 1597 | <TD>Improved QP algorithm (<A HREF="#strimmer1997">Strimmer, Goldman, and |
|---|
| 1598 | von Haeseler</A>, 1997). Bug |
|---|
| 1599 | fixes in ML engine, computation of ML distances and ML branch lengths, |
|---|
| 1600 | optional input of a user tree, F84 model added, estimation of all TN model |
|---|
| 1601 | parameters and corresponding standard errors, CLUSTAL W treefile convention |
|---|
| 1602 | adopted to allow to show branch lengths and QP support values simultaneously, |
|---|
| 1603 | display of unresolved quartets, update of mtREV matrix, source code more |
|---|
| 1604 | compatible with some almost-ANSI compilers, more safety checks in the code. </TD> |
|---|
| 1605 | </TR> |
|---|
| 1606 | |
|---|
| 1607 | <TR VALIGN=TOP> |
|---|
| 1608 | <TD>2.4 </TD> |
|---|
| 1609 | |
|---|
| 1610 | <TD>Automatic data type recognition, chi-square-test on base composition, |
|---|
| 1611 | automatic selection of best amino acid model, estimation of transition-transversion |
|---|
| 1612 | parameter, ASCII plot of quartet puzzling tree into the outfile. </TD> |
|---|
| 1613 | </TR> |
|---|
| 1614 | |
|---|
| 1615 | <TR VALIGN=TOP> |
|---|
| 1616 | <TD>2.3 </TD> |
|---|
| 1617 | |
|---|
| 1618 | <TD>More models, many usability improvements, built-in consensus tree routines, |
|---|
| 1619 | more supported systems, bug fixes, no more dependencies of input order. |
|---|
| 1620 | First EBI distributed version. </TD> |
|---|
| 1621 | </TR> |
|---|
| 1622 | |
|---|
| 1623 | <TR VALIGN=TOP> |
|---|
| 1624 | <TD>2.2 </TD> |
|---|
| 1625 | |
|---|
| 1626 | <TD>Optimized internal data structure requiring much less computer memory. |
|---|
| 1627 | Bug fixes. </TD> |
|---|
| 1628 | </TR> |
|---|
| 1629 | |
|---|
| 1630 | <TR VALIGN=TOP> |
|---|
| 1631 | <TD>2.1 </TD> |
|---|
| 1632 | |
|---|
| 1633 | <TD>Bug fixes concerning algorithm and transition/transversion parameter. </TD> |
|---|
| 1634 | </TR> |
|---|
| 1635 | |
|---|
| 1636 | <TR VALIGN=TOP> |
|---|
| 1637 | <TD>2.0 </TD> |
|---|
| 1638 | |
|---|
| 1639 | <TD>Complete revision merging the maximum likelihood and the quartet puzzling |
|---|
| 1640 | routines into one user friendly program. First electronic distribution. </TD> |
|---|
| 1641 | </TR> |
|---|
| 1642 | |
|---|
| 1643 | <TR VALIGN=TOP> |
|---|
| 1644 | <TD>1.0 </TD> |
|---|
| 1645 | |
|---|
| 1646 | <TD>First public release, presented at the 1995 phylogenetic workshop (15-17 |
|---|
| 1647 | June 1995) at the University of Bielefeld, Germany. </TD> |
|---|
| 1648 | </TR> |
|---|
| 1649 | </TABLE> |
|---|
| 1650 | |
|---|
| 1651 | </BODY> |
|---|
| 1652 | </HTML> |
|---|
| 1653 | |
|---|