| 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> |
|---|
| 2 | <HTML> |
|---|
| 3 | <HEAD> |
|---|
| 4 | <TITLE>main</TITLE> |
|---|
| 5 | <META NAME="description" CONTENT="main"> |
|---|
| 6 | <META NAME="keywords" CONTENT="PHYLIP", "main", "documentation"> |
|---|
| 7 | <META NAME="resource-type" CONTENT="document"> |
|---|
| 8 | <META NAME="distribution" CONTENT="global"> |
|---|
| 9 | <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> |
|---|
| 10 | </HEAD> |
|---|
| 11 | <BODY BGCOLOR="#ccffff"> |
|---|
| 12 | <P> |
|---|
| 13 | <DIV ALIGN="CENTER"> |
|---|
| 14 | <H1>PHYLIP</H1> |
|---|
| 15 | <H2>Phylogeny Inference Package</H2> |
|---|
| 16 | <P> |
|---|
| 17 | <IMG SRC="phylip.gif" ALT="PHYLIP Logo"> |
|---|
| 18 | <P> |
|---|
| 19 | <H3>Version 3.6(alpha3)</H3> |
|---|
| 20 | <P> |
|---|
| 21 | <H3>July, 2002</H3> |
|---|
| 22 | <P> |
|---|
| 23 | <H2>by Joseph Felsenstein</H2> |
|---|
| 24 | <P> |
|---|
| 25 | <BR> |
|---|
| 26 | <TABLE> |
|---|
| 27 | <TR><TD> |
|---|
| 28 | <FONT SIZE="+2"> |
|---|
| 29 | Department of Genome Sciences<BR> |
|---|
| 30 | University of Washington<BR> |
|---|
| 31 | Box 357730<BR> |
|---|
| 32 | Seattle, WA 98195-7730<BR> |
|---|
| 33 | USA |
|---|
| 34 | </FONT> |
|---|
| 35 | </TD></TR> |
|---|
| 36 | </TABLE> |
|---|
| 37 | <H2>E-mail address: <TT>joe@gs.washington.edu</TT></H2> |
|---|
| 38 | </DIV> |
|---|
| 39 | <P> |
|---|
| 40 | <DIV ALIGN="CENTER"> |
|---|
| 41 | <A NAME="contents"><HR><P></A> |
|---|
| 42 | <H2>Contents of this document</H2></DIV> |
|---|
| 43 | <P> |
|---|
| 44 | <BR> |
|---|
| 45 | <A HREF="#contents">Contents of this document |
|---|
| 46 | <BR> |
|---|
| 47 | <A HREF="#description">A Brief Description of the Programs</A> |
|---|
| 48 | <BR> |
|---|
| 49 | <A HREF="#copyright">Copyright Notice for PHYLIP</A> |
|---|
| 50 | <BR> |
|---|
| 51 | <A HREF="#documentation">The Documentation Files and How to Read Them</A> |
|---|
| 52 | <BR> |
|---|
| 53 | <A HREF="#programs">What The Programs Do</A> |
|---|
| 54 | <BR> |
|---|
| 55 | <A HREF="#running">Running the Programs</A> |
|---|
| 56 | <BR> |
|---|
| 57 | A word about input files |
|---|
| 58 | <BR> |
|---|
| 59 | Running the programs on a Windows machine |
|---|
| 60 | <BR> |
|---|
| 61 | Running the programs on a Macintosh |
|---|
| 62 | <BR> |
|---|
| 63 | Running the programs on a Unix system |
|---|
| 64 | <BR> |
|---|
| 65 | Running the programs in MSDOS |
|---|
| 66 | <BR> |
|---|
| 67 | Running the programs in background or under control of a command file |
|---|
| 68 | <BR> |
|---|
| 69 | <A HREF="#inputfiles">Preparing Input Files</A> |
|---|
| 70 | <BR> |
|---|
| 71 | Input and output files |
|---|
| 72 | <BR> |
|---|
| 73 | Data file format |
|---|
| 74 | <BR> |
|---|
| 75 | <A HREF="#menu">The Menu</A> |
|---|
| 76 | <BR> |
|---|
| 77 | <A HREF="#outputfile">The Output File</A> |
|---|
| 78 | <BR> |
|---|
| 79 | <A HREF="#treefile">The Tree File</A> |
|---|
| 80 | <BR> |
|---|
| 81 | <A HREF="#options">The Options and How To Invoke Them</A> |
|---|
| 82 | <BR> |
|---|
| 83 | Common options in the menu |
|---|
| 84 | <BR> |
|---|
| 85 | The <TT>U</TT> (User tree) option |
|---|
| 86 | <BR> |
|---|
| 87 | The <TT>G</TT> (Global) option |
|---|
| 88 | <BR> |
|---|
| 89 | The <TT>J</TT> (Jumble) option |
|---|
| 90 | <BR> |
|---|
| 91 | The <TT>O</TT> (Outgroup) option |
|---|
| 92 | <BR> |
|---|
| 93 | The <TT>T</TT> (Threshold) option |
|---|
| 94 | <BR> |
|---|
| 95 | The <TT>M</TT> (Multiple data sets) option |
|---|
| 96 | <BR> |
|---|
| 97 | The <TT>W</TT> (Weights) option |
|---|
| 98 | <BR> |
|---|
| 99 | The option to write out the trees into a tree file |
|---|
| 100 | <BR> |
|---|
| 101 | The (<TT>0</TT>) terminal type option |
|---|
| 102 | <BR> |
|---|
| 103 | <A HREF="#algorithm">The Algorithm for Constructing Trees</A> |
|---|
| 104 | <BR> |
|---|
| 105 | Local Rearrangements |
|---|
| 106 | <BR> |
|---|
| 107 | Global Rearrangements |
|---|
| 108 | <BR> |
|---|
| 109 | Multiple Jumbles |
|---|
| 110 | <BR> |
|---|
| 111 | Saving multiple tied trees |
|---|
| 112 | <BR> |
|---|
| 113 | Strategy for Finding the Best Tree |
|---|
| 114 | <BR> |
|---|
| 115 | <A HREF="#warning">A Warning on Interpreting Results</A> |
|---|
| 116 | <BR> |
|---|
| 117 | <A HREF="#speed">Relative Speed of Different Programs and Machines</A> |
|---|
| 118 | <BR> |
|---|
| 119 | Relative speed of the different programs |
|---|
| 120 | <BR> |
|---|
| 121 | Speed with different numbers of species |
|---|
| 122 | <BR> |
|---|
| 123 | Relative speed of different machines |
|---|
| 124 | <BR> |
|---|
| 125 | <A HREF="#comments">General Comments on Adapting the Package to Different Computer Systems</A> |
|---|
| 126 | <BR> |
|---|
| 127 | <A HREF="#compiling">Compiling the programs</A> |
|---|
| 128 | <BR> |
|---|
| 129 | Unix and Linux |
|---|
| 130 | <BR> |
|---|
| 131 | Macintosh PowerMacs |
|---|
| 132 | <BR> |
|---|
| 133 | Compiling with Metrowerks Codewarrior |
|---|
| 134 | <BR> |
|---|
| 135 | On Windows systems |
|---|
| 136 | <BR> |
|---|
| 137 | Compiling with Microsoft Visual C++ |
|---|
| 138 | <BR> |
|---|
| 139 | Compiling with Borland C++ |
|---|
| 140 | <BR> |
|---|
| 141 | Compiling with Metrowerks Codewarrior for Windows |
|---|
| 142 | <BR> |
|---|
| 143 | Compiling with Cygnus Gnu C++ |
|---|
| 144 | <BR> |
|---|
| 145 | VMS VAX systems |
|---|
| 146 | <BR> |
|---|
| 147 | Parallel computers |
|---|
| 148 | <BR> |
|---|
| 149 | Other computer systems |
|---|
| 150 | <BR> |
|---|
| 151 | <A HREF="#FAQ">Frequently Asked Questions</A> |
|---|
| 152 | <BR> |
|---|
| 153 | How to make it do various things |
|---|
| 154 | <BR> |
|---|
| 155 | Background information needed: |
|---|
| 156 | <BR> |
|---|
| 157 | Questions about distribution and citation: |
|---|
| 158 | <BR> |
|---|
| 159 | Questions about documentation |
|---|
| 160 | <BR> |
|---|
| 161 | Additional Frequently Asked Questions, or: "Why didn't it occur to you to ... |
|---|
| 162 | <BR> |
|---|
| 163 | (Fortunately) obsolete questions |
|---|
| 164 | <BR> |
|---|
| 165 | <A HREF="#newfeatures">New Features in This Version</A> |
|---|
| 166 | <BR> |
|---|
| 167 | <A HREF="#future">Coming Attractions, Future Plans</A> |
|---|
| 168 | <BR> |
|---|
| 169 | <A HREF="#endorsements">Endorsements</A> |
|---|
| 170 | <BR> |
|---|
| 171 | From the pages of <I>Cladistics</I> |
|---|
| 172 | <BR> |
|---|
| 173 | ... and in the pages of other journals: |
|---|
| 174 | <BR> |
|---|
| 175 | <A HREF="#references">References for the Documentation Files</A> |
|---|
| 176 | <BR> |
|---|
| 177 | <A HREF="#credits">Credits</A> |
|---|
| 178 | <BR> |
|---|
| 179 | <A HREF="#otherprograms">Other Phylogeny Programs Available Elsewhere</A> |
|---|
| 180 | <BR> |
|---|
| 181 | PAUP* |
|---|
| 182 | <BR> |
|---|
| 183 | MacClade |
|---|
| 184 | <BR> |
|---|
| 185 | MEGA |
|---|
| 186 | <BR> |
|---|
| 187 | MOLPHY |
|---|
| 188 | <BR> |
|---|
| 189 | PAML |
|---|
| 190 | <BR> |
|---|
| 191 | TREE-PUZZLE |
|---|
| 192 | <BR> |
|---|
| 193 | DAMBE |
|---|
| 194 | <BR> |
|---|
| 195 | Hennig86 |
|---|
| 196 | <BR> |
|---|
| 197 | RnA |
|---|
| 198 | <BR> |
|---|
| 199 | NONA |
|---|
| 200 | <BR> |
|---|
| 201 | TNT |
|---|
| 202 | <BR> |
|---|
| 203 | <A HREF="#helpme">How You Can Help Me</A> |
|---|
| 204 | <BR> |
|---|
| 205 | <A HREF="#trouble">In Case of Trouble</A> |
|---|
| 206 | <P> |
|---|
| 207 | <A NAME="description"><HR><P></A> |
|---|
| 208 | <DIV ALIGN="CENTER"> |
|---|
| 209 | <H2>A Brief Description of the Programs</H2></DIV> |
|---|
| 210 | <P> |
|---|
| 211 | <TT>PHYLIP</TT>, the Phylogeny Inference Package, is a package of programs for |
|---|
| 212 | inferring phylogenies (evolutionary trees). It has been distributed since |
|---|
| 213 | 1980, and has over 10,000 registered users, making it the most widely |
|---|
| 214 | distributed package of phylogeny programs. It is available free, from |
|---|
| 215 | its web site: |
|---|
| 216 | <P> |
|---|
| 217 | <DIV ALIGN="CENTER"> |
|---|
| 218 | <FONT SIZE=+2><A HREF="http://evolution.gs.washington.edu/phylip.html"> |
|---|
| 219 | <TT>http://evolution.gs.washington.edu/phylip.html</TT></A></FONT> |
|---|
| 220 | |
|---|
| 221 | </DIV> |
|---|
| 222 | <P> |
|---|
| 223 | <TT>PHYLIP</TT> is available as source code in C, and also as executables for |
|---|
| 224 | some common computer systems. It can infer phylogenies by parsimony, |
|---|
| 225 | compatibility, distance matrix methods, and likelihood. It can also |
|---|
| 226 | compute consensus trees, compute distances between trees, draw trees, |
|---|
| 227 | resample data sets by bootstrapping or jackknifing, edit trees, and |
|---|
| 228 | compute distance matrices. It can handle data that are nucleotide |
|---|
| 229 | sequences, protein sequences, gene frequencies, restriction sites, |
|---|
| 230 | restriction fragments, distances, discrete characters, and continuous |
|---|
| 231 | characters. |
|---|
| 232 | <P> |
|---|
| 233 | <BR> |
|---|
| 234 | <A NAME="copyright"><HR><P></A> |
|---|
| 235 | <DIV ALIGN=CENTER> |
|---|
| 236 | <TABLE BORDER=4 WIDTH=80%><TR><TD ALIGN=LEFT> |
|---|
| 237 | <DIV ALIGN="CENTER"> |
|---|
| 238 | <H2>Copyright Notice for PHYLIP</H2></DIV> |
|---|
| 239 | <P> |
|---|
| 240 | The following copyright notice is intended to cover all source code, all |
|---|
| 241 | documentation, and all executable programs of the PHYLIP package. |
|---|
| 242 | <P> |
|---|
| 243 | © Copyright 1980-2002. University of Washington and Joseph Felsenstein. All |
|---|
| 244 | rights reserved. Permission is granted to reproduce, perform, and modify |
|---|
| 245 | these programs and documentation files. Permission is granted to distribute |
|---|
| 246 | or provide access to these |
|---|
| 247 | programs provided that this copyright notice is not removed, the programs are |
|---|
| 248 | not integrated with or called by any product or service that generates |
|---|
| 249 | revenue, and that your distribution of these materials program are free. |
|---|
| 250 | Any modified |
|---|
| 251 | versions of these materials that are distributed or accessible shall indicate |
|---|
| 252 | that they are based on these program. Institutions of higher education are |
|---|
| 253 | granted permission to distribute this material to their students and staff |
|---|
| 254 | for a fee to recover distribution costs. Permission requests for any other |
|---|
| 255 | distribution of this program should be directed to <TT>license@u.washington.edu</TT>. |
|---|
| 256 | <BR> |
|---|
| 257 | </TD></TR></TABLE></DIV> |
|---|
| 258 | |
|---|
| 259 | <BR> |
|---|
| 260 | <A NAME="documentation"><HR><P></A> |
|---|
| 261 | <DIV ALIGN="CENTER"> |
|---|
| 262 | <H2>The Documentation Files and How to Read Them</H2></DIV> |
|---|
| 263 | <P> |
|---|
| 264 | <TT>PHYLIP</TT> comes with an extensive set of documentation files. These |
|---|
| 265 | include the main documentation file (this one), which you should read |
|---|
| 266 | fairly completely. In addition there are files for groups of programs, |
|---|
| 267 | including ones for the <A HREF="sequence.html">molecular sequence</A> |
|---|
| 268 | programs, the <A HREF="distance.html">distance matrix</A> |
|---|
| 269 | programs, the |
|---|
| 270 | <A HREF="contchar.html">gene frequency and continuous characters</A> |
|---|
| 271 | programs, the <A HREF="discrete.html">discrete characters</A> programs, |
|---|
| 272 | and the <A HREF="draw.html">tree drawing</A> programs. Finally, |
|---|
| 273 | each program has its own documentation file. References for the |
|---|
| 274 | documentation files are all gathered together in this main documentation |
|---|
| 275 | file. A good strategy is to: |
|---|
| 276 | <OL> |
|---|
| 277 | <LI>Read this main documentation file. |
|---|
| 278 | <LI>Tentatively decide which programs are of interest to you. |
|---|
| 279 | <LI>Read the documentation files for the groups of programs that |
|---|
| 280 | contain those. |
|---|
| 281 | <LI>Read the documentation files for those individual programs. |
|---|
| 282 | </OL> |
|---|
| 283 | <P> |
|---|
| 284 | <A NAME="programs"><HR><P></A> |
|---|
| 285 | <DIV ALIGN="CENTER"> |
|---|
| 286 | <H2>What The Programs Do</H2></DIV> |
|---|
| 287 | <P> |
|---|
| 288 | Here is a short description of each of the programs. For more detailed |
|---|
| 289 | discussion you should definitely read the documentation file for the |
|---|
| 290 | individual program and the documentation file for the group of programs |
|---|
| 291 | it is in. In this list the name of each program is a link which will |
|---|
| 292 | take you to the documentation file for that program. Note that there is no |
|---|
| 293 | program in the PHYLIP package called PHYLIP. |
|---|
| 294 | <DL> |
|---|
| 295 | <DT><STRONG><A HREF="protpars.html">PROTPARS</A></STRONG> |
|---|
| 296 | <DD>Estimates phylogenies from protein sequences (input using the |
|---|
| 297 | standard one-letter code for amino acids) using the parsimony method, in |
|---|
| 298 | a variant which counts only those nucleotide changes that change the amino |
|---|
| 299 | acid, on the assumption that silent changes are more easily accomplished. |
|---|
| 300 | <DT><STRONG><A HREF="dnapars.html">DNAPARS</A></STRONG> |
|---|
| 301 | <DD>Estimates phylogenies by the parsimony method using nucleic acid |
|---|
| 302 | sequences. Allows use the full IUB ambiguity codes, and estimates |
|---|
| 303 | ancestral nucleotide states. Gaps treated as a fifth nucleotide state. |
|---|
| 304 | Can use 0/1 weights, reconstruct ancestral states, and infer branch |
|---|
| 305 | lengths. |
|---|
| 306 | <DT><STRONG><A HREF="dnamove.html">DNAMOVE</A></STRONG> |
|---|
| 307 | <DD>Interactive construction of phylogenies from nucleic acid |
|---|
| 308 | sequences, with their evaluation by parsimony and compatibility and the |
|---|
| 309 | display of reconstructed ancestral bases. This can be used to find |
|---|
| 310 | parsimony or compatibility estimates by hand. |
|---|
| 311 | <DT><STRONG><A HREF="dnapenny.html">DNAPENNY</A></STRONG> |
|---|
| 312 | <DD>Finds all most parsimonious phylogenies for nucleic acid |
|---|
| 313 | sequences by branch-and-bound search. This may not be practical (depending |
|---|
| 314 | on the data) for more than 15 species or so. |
|---|
| 315 | <DT><STRONG><A HREF="dnacomp.html">DNACOMP</A></STRONG> |
|---|
| 316 | <DD>Estimates phylogenies from nucleic acid sequence data using |
|---|
| 317 | the compatibility criterion, which searches for the largest number of sites |
|---|
| 318 | which could have all states (nucleotides) uniquely evolved on the same |
|---|
| 319 | tree. Compatibility is particularly appropriate when sites vary greatly in |
|---|
| 320 | their rates of evolution, but we do not know in advance which are the less |
|---|
| 321 | reliable ones. |
|---|
| 322 | <DT><STRONG><A HREF="dnainvar.html">DNAINVAR</A></STRONG> |
|---|
| 323 | <DD>For nucleic acid sequence data on four species, computes |
|---|
| 324 | Lake's and Cavender's phylogenetic invariants, which test alternative tree |
|---|
| 325 | topologies. The program also tabulates the frequencies of occurrence of the |
|---|
| 326 | different nucleotide patterns. Lake's invariants are the method which he |
|---|
| 327 | calls "evolutionary parsimony". |
|---|
| 328 | <DT><STRONG><A HREF="dnaml.html">DNAML</A></STRONG> |
|---|
| 329 | <DD>Estimates phylogenies from nucleotide sequences by maximum |
|---|
| 330 | likelihood. The model employed allows for unequal expected frequencies of |
|---|
| 331 | the four nucleotides, for unequal rates of transitions and transversions, |
|---|
| 332 | and for different (prespecified) rates of change in different categories of |
|---|
| 333 | sites, with the program inferring which sites have which rates. It also |
|---|
| 334 | allows different rates of change at known sites. |
|---|
| 335 | <DT><STRONG><A HREF="dnamlk.html">DNAMLK</A></STRONG> |
|---|
| 336 | <DD>Same as DNAML but assumes a molecular clock. The use of the |
|---|
| 337 | two programs together permits a likelihood ratio test of the |
|---|
| 338 | molecular clock hypothesis to be made. |
|---|
| 339 | <DT><STRONG><A HREF="proml.html">PROML</A></STRONG> |
|---|
| 340 | <DD>Estimates phylogenies from protein amino acid sequences by maximum |
|---|
| 341 | likelihood. The PAM or JTTF models can be employed. The program |
|---|
| 342 | can allow for different (prespecified) rates of change in different |
|---|
| 343 | categories of amino acid positions, with the program inferring which |
|---|
| 344 | posiitons have which rates. It also allows different rates of change |
|---|
| 345 | at known sites. |
|---|
| 346 | <DT><STRONG><A HREF="promlk.html">PROMLK</A></STRONG> |
|---|
| 347 | <DD>Same as PROML but assumes a molecular clock. The use of the |
|---|
| 348 | two programs together permits a likelihood ratio test of the |
|---|
| 349 | molecular clock hypothesis to be made. |
|---|
| 350 | <DT><STRONG><A HREF="dnadist.html">DNADIST</A></STRONG> |
|---|
| 351 | <DD>Computes four different distances between species from nucleic |
|---|
| 352 | acid sequences. The distances can then be used in the distance matrix |
|---|
| 353 | programs. The distances are the Jukes-Cantor formula, one based on Kimura's |
|---|
| 354 | 2-parameter method, Jin and Nei's distance which allows for rate variation |
|---|
| 355 | from site to site, and a maximum likelihood method using the model employed |
|---|
| 356 | in DNAML. The latter method of computing distances can be very slow. |
|---|
| 357 | <DT><STRONG><A HREF="protdist.html">PROTDIST</A></STRONG> |
|---|
| 358 | <DD>Computes a distance measure for protein sequences, using |
|---|
| 359 | maximum likelihood estimates based on the Dayhoff PAM matrix, Kimura's 1983 |
|---|
| 360 | approximation to it, or a model based on the genetic code plus a |
|---|
| 361 | constraint on changing to a different category of amino acid. Rate |
|---|
| 362 | variation from site to site is also allowed. The |
|---|
| 363 | distances can be used in the distance matrix programs. |
|---|
| 364 | <DT><STRONG><A HREF="restdist.html">RESTDIST</A></STRONG> |
|---|
| 365 | <DD>Distances calculated from restriction sites data or |
|---|
| 366 | restriction fragments data. The restriction sites option is the one to |
|---|
| 367 | use to also make distances for RAPDs or AFLPs. |
|---|
| 368 | <DT><STRONG><A HREF="restml.html">RESTML</A></STRONG> |
|---|
| 369 | <DD>Estimation of phylogenies by maximum likelihood using |
|---|
| 370 | restriction sites data (not restriction fragments but presence/absence of |
|---|
| 371 | individual sites). It employs the Jukes-Cantor symmetrical model of |
|---|
| 372 | nucleotide change, which does not allow for differences of rate between |
|---|
| 373 | transitions and transversions. This program is <I>very</I> slow. |
|---|
| 374 | <DT><STRONG><A HREF="seqboot.html">SEQBOOT</A></STRONG> |
|---|
| 375 | <DD>Reads in a data set, and produces multiple data sets from |
|---|
| 376 | it by bootstrap resampling. Since most programs in the current version of |
|---|
| 377 | the package allow processing of multiple data sets, this can be used |
|---|
| 378 | together with the consensus tree program CONSENSE to do bootstrap (or |
|---|
| 379 | delete-half-jackknife) analyses with most of the methods in this package. |
|---|
| 380 | This program also allows the Archie/Faith technique of permutation of |
|---|
| 381 | species within characters. It can also rewrite a data set to convert |
|---|
| 382 | it from between the PHYLIP Interleaved and Sequential forms, and into |
|---|
| 383 | a preliminary version of a new XML sequence alignment format |
|---|
| 384 | which is under development. |
|---|
| 385 | <DT><STRONG><A HREF="fitch.html">FITCH</A></STRONG> |
|---|
| 386 | <DD>Estimates phylogenies from distance matrix data under the |
|---|
| 387 | "additive tree model" according to which the distances are expected to |
|---|
| 388 | equal the sums of branch lengths between the species. Uses the |
|---|
| 389 | Fitch-Margoliash criterion and some related least squares criteria. Does |
|---|
| 390 | not assume an evolutionary clock. This program will be useful with |
|---|
| 391 | distances computed from molecular sequences, restriction sites or fragments |
|---|
| 392 | distances, with DNA hybridization measurements, and with genetic distances |
|---|
| 393 | computed from gene frequencies. |
|---|
| 394 | <DT><STRONG><A HREF="kitsch.html">KITSCH</A></STRONG> |
|---|
| 395 | <DD>Estimates phylogenies from distance matrix data under the |
|---|
| 396 | "ultrametric" model which is the same as the additive tree model except |
|---|
| 397 | that an evolutionary clock is assumed. The Fitch-Margoliash criterion and |
|---|
| 398 | other least squares criteria are assumed. This program will be useful with |
|---|
| 399 | distances computed from molecular sequences, restriction sites or |
|---|
| 400 | fragments distances, with distances from DNA hybridization measurements, |
|---|
| 401 | and with genetic distances computed from gene frequencies. |
|---|
| 402 | <DT><STRONG><A HREF="neighbor.html">NEIGHBOR</A></STRONG> |
|---|
| 403 | <DD>An implementation by Mary Kuhner and John Yamato of Saitou and |
|---|
| 404 | Nei's "Neighbor Joining Method," and of the UPGMA (Average Linkage |
|---|
| 405 | clustering) method. Neighbor Joining is a distance matrix method producing |
|---|
| 406 | an unrooted tree without the assumption of a clock. UPGMA does assume a |
|---|
| 407 | clock. The branch lengths are not optimized by the least squares criterion |
|---|
| 408 | but the methods are very fast and thus can handle much larger data sets. |
|---|
| 409 | <DT><STRONG><A HREF="contml.html">CONTML</A></STRONG> |
|---|
| 410 | <DD>Estimates phylogenies from gene frequency data by maximum |
|---|
| 411 | likelihood under a model in which all divergence is due to genetic drift in |
|---|
| 412 | the absence of new mutations. Does not assume a molecular clock. An |
|---|
| 413 | alternative method of analyzing this data is to compute Nei's genetic |
|---|
| 414 | distance and use one of the distance matrix programs. |
|---|
| 415 | This program can also do maximum likelihoodn analysis of continuous |
|---|
| 416 | charactersn that evolve by a Brownian Motion model, but it assumes that |
|---|
| 417 | the characters evolve at equal rates and in an uncorrelated fashion, so |
|---|
| 418 | that it does not take into account the usual correlations of characters. |
|---|
| 419 | <DT><STRONG><A HREF="gendist.html">GENDIST</A></STRONG> |
|---|
| 420 | <DD>Computes one of three different genetic distance formulas |
|---|
| 421 | from gene frequency data. The formulas are Nei's genetic distance, the |
|---|
| 422 | Cavalli-Sforza chord measure, and the genetic distance of Reynolds et. al. |
|---|
| 423 | The former is appropriate for data in which new mutations occur in an |
|---|
| 424 | infinite isoalleles neutral mutation model, the latter two for a model |
|---|
| 425 | without mutation and with pure genetic drift. The distances are written to |
|---|
| 426 | a file in a format appropriate for input to the distance matrix programs. |
|---|
| 427 | <DT><STRONG><A HREF="contrast.html">CONTRAST</A></STRONG> |
|---|
| 428 | <DD>Reads a tree from a tree file, and a data set with continuous |
|---|
| 429 | characters data, and produces the independent contrasts for those |
|---|
| 430 | characters, for use in any multivariate statistics package. Will also |
|---|
| 431 | produce covariances, regressions and correlations between characters for |
|---|
| 432 | those contrasts. Can also correct for within-species sampling variation |
|---|
| 433 | when individual phenotypes are available within a population. |
|---|
| 434 | <DT><STRONG><A HREF="pars.html">PARS</A></STRONG> |
|---|
| 435 | <DD>Multistate discrete-characters parsimony method. Up to 8 states |
|---|
| 436 | (as well as "<TT>?</TT>") are allowed. Cannot do Camin-Sokal or Dollo Parsimony. |
|---|
| 437 | Can reconstruct ancestral states, use character weights, and infer branch |
|---|
| 438 | lengths. |
|---|
| 439 | <DT><STRONG><A HREF="mix.html">MIX</A></STRONG> |
|---|
| 440 | <DD>Estimates phylogenies by some parsimony methods for discrete |
|---|
| 441 | character data with two states (0 and 1). Allows use of the |
|---|
| 442 | Wagner parsimony method, the Camin-Sokal parsimony method, or arbitrary |
|---|
| 443 | mixtures of these. Also reconstructs ancestral states and allows weighting |
|---|
| 444 | of characters (does not infer branch lengths). |
|---|
| 445 | <DT><STRONG><A HREF="move.html">MOVE</A></STRONG> |
|---|
| 446 | <DD>Interactive construction of phylogenies from discrete character |
|---|
| 447 | data with two states (0 and 1). Evaluates parsimony and compatibility |
|---|
| 448 | criteria for those phylogenies and displays reconstructed states throughout |
|---|
| 449 | the tree. This can be used to find parsimony or compatibility estimates by |
|---|
| 450 | hand. |
|---|
| 451 | <DT><STRONG><A HREF="penny.html">PENNY</A></STRONG> |
|---|
| 452 | <DD>Finds all most parsimonious phylogenies for discrete-character |
|---|
| 453 | data with two states, for the Wagner, Camin-Sokal, and mixed parsimony |
|---|
| 454 | criteria using the branch-and-bound method of exact search. May be |
|---|
| 455 | impractical (depending on the data) for more than 10-11 species. |
|---|
| 456 | <DT><STRONG><A HREF="dollop.html">DOLLOP</A></STRONG> |
|---|
| 457 | <DD>Estimates phylogenies by the Dollo or polymorphism parsimony |
|---|
| 458 | criteria for discrete character data with two states (0 and 1). Also |
|---|
| 459 | reconstructs ancestral states and allows weighting of characters. Dollo |
|---|
| 460 | parsimony is particularly appropriate for restriction sites data; with |
|---|
| 461 | ancestor states specified as unknown it may be appropriate for restriction |
|---|
| 462 | fragments data. |
|---|
| 463 | <DT><STRONG><A HREF="dolmove.html">DOLMOVE</A></STRONG> |
|---|
| 464 | <DD>Interactive construction of phylogenies from discrete |
|---|
| 465 | character data with two states (0 and 1) using the Dollo or polymorphism |
|---|
| 466 | parsimony criteria. Evaluates parsimony and compatibility criteria for |
|---|
| 467 | those phylogenies and displays reconstructed states throughout the tree. |
|---|
| 468 | This can be used to find parsimony or compatibility estimates by hand. |
|---|
| 469 | <DT><STRONG><A HREF="dolpenny.html">DOLPENNY</A></STRONG> |
|---|
| 470 | <DD>Finds all most parsimonious phylogenies for |
|---|
| 471 | discrete-character data with two states, for the Dollo or polymorphism |
|---|
| 472 | parsimony criteria using the branch-and-bound method of exact search. May |
|---|
| 473 | be impractical (depending on the data) for more than 10-11 species. |
|---|
| 474 | <DT><STRONG><A HREF="clique.html">CLIQUE</A></STRONG> |
|---|
| 475 | <DD>Finds the largest clique of mutually compatible characters, and |
|---|
| 476 | the phylogeny which they recommend, for discrete character data with two |
|---|
| 477 | states. The largest clique (or all cliques within a given size range of |
|---|
| 478 | the largest one) are found by a very fast branch and bound search method. |
|---|
| 479 | The method does not allow for missing data. For such cases the <TT>T</TT> |
|---|
| 480 | (Threshold) option of PARS or MIX may be a useful alternative. |
|---|
| 481 | Compatibility methods are particular useful when some characters are of |
|---|
| 482 | poor quality and the rest of good quality, but when it is not known in |
|---|
| 483 | advance which ones are which. |
|---|
| 484 | <DT><STRONG><A HREF="factor.html">FACTOR</A></STRONG> |
|---|
| 485 | <DD>Takes discrete multistate data with character state trees and |
|---|
| 486 | produces the corresponding data set with two states (0 and 1). Written by |
|---|
| 487 | Christopher Meacham. This program was formerly used to accomodate |
|---|
| 488 | multistate characters in MIX, but this is less necessary now that PARS is |
|---|
| 489 | available. |
|---|
| 490 | <DT><STRONG><A HREF="drawgram.html">DRAWGRAM</A></STRONG> |
|---|
| 491 | <DD>Plots rooted phylogenies, cladograms, and phenograms in a |
|---|
| 492 | wide variety of user-controllable formats. The program is interactive and |
|---|
| 493 | allows previewing of the tree on PC or Macintosh graphics screens, |
|---|
| 494 | and Tektronix or Digital graphics terminals. Final output can be |
|---|
| 495 | to a file formatted for one of the drawing programs, on |
|---|
| 496 | a laser printer (such as Postscript or PCL-compatible printers), |
|---|
| 497 | on graphics screens or terminals, on pen plotters (Hewlett-Packard or |
|---|
| 498 | Houston Instruments) or on dot matrix printers capable of graphics |
|---|
| 499 | (Epson, Okidata, Imagewriter, or Toshiba). |
|---|
| 500 | <DT><STRONG><A HREF="drawtree.html">DRAWTREE</A></STRONG> |
|---|
| 501 | <DD>Similar to DRAWGRAM but plots unrooted phylogenies. |
|---|
| 502 | <DT><STRONG><A HREF="treedist.html">TREEDIST</A></STRONG> |
|---|
| 503 | <DD>Computes the Robinson-Foulds symmetric difference distance |
|---|
| 504 | between trees, which allows for differences in tree topology (but does not |
|---|
| 505 | use branch lengths). |
|---|
| 506 | <DT><STRONG><A HREF="consense.html">CONSENSE</A></STRONG> |
|---|
| 507 | <DD>Computes consensus trees by the majority-rule consensus tree |
|---|
| 508 | method, which also allows one to easily find the strict consensus tree. |
|---|
| 509 | Is not able to compute the Adams consensus tree. Trees are input in a tree |
|---|
| 510 | file in standard nested-parenthesis notation, which is produced by many of |
|---|
| 511 | the tree estimation programs in the package. This program can be used as |
|---|
| 512 | the final step in doing bootstrap analyses for many of the methods in the |
|---|
| 513 | package. |
|---|
| 514 | <DT><STRONG><A HREF="retree.html">RETREE</A></STRONG> |
|---|
| 515 | <DD>Reads in a tree (with branch lengths if necessary) and allows |
|---|
| 516 | you to reroot the tree, to flip branches, to change species names and |
|---|
| 517 | branch lengths, and then write the result out. Can be used to convert |
|---|
| 518 | between rooted and unrooted trees, and to write the tree into a |
|---|
| 519 | preliminary version of a new XML tree file format which is under |
|---|
| 520 | development. |
|---|
| 521 | </DL> |
|---|
| 522 | <P> |
|---|
| 523 | <A NAME="running"><HR><P></A> |
|---|
| 524 | <DIV ALIGN="CENTER"> |
|---|
| 525 | <H2>Running the Programs</H2></DIV> |
|---|
| 526 | <P> |
|---|
| 527 | This section assumes that you have obtained PHYLIP as compiled executables |
|---|
| 528 | (for Windows, Macintosh, or DOS), or have obtained the source code |
|---|
| 529 | and compiled it yourself (for Linux, Unix, or OpenVMS). For machines for |
|---|
| 530 | which compiled executables are available, there will usually be no need for |
|---|
| 531 | you to have a compiler or compile the programs yourself. This section |
|---|
| 532 | describes how to run the programs. Later in this document we will |
|---|
| 533 | discuss how to download and install PHYLIP (in case you are somehow |
|---|
| 534 | reading this without yet having done that). Normally you will only read |
|---|
| 535 | this document after downloading and installing PHYLIP. |
|---|
| 536 | <P> |
|---|
| 537 | <H3>A word about input files.</H3> |
|---|
| 538 | <P> |
|---|
| 539 | For all of these types of machines, it is |
|---|
| 540 | important to have the input files for the programs (typically data files) |
|---|
| 541 | prepared in advance. They can be prepared in any editor, but it is important |
|---|
| 542 | that they be saved in Text Only ("flat ASCII") format, not in the format that |
|---|
| 543 | word processors such as Microsoft Word want to write. It is up to you to read |
|---|
| 544 | the PHYLIP documentation files which describe the files formats that are |
|---|
| 545 | needed. There is a partial description in the next section of this document. |
|---|
| 546 | The input files can also be obtained by running a program that |
|---|
| 547 | produces output files in PHYLIP format (some of these programs do, and so do |
|---|
| 548 | programs by others such as sequence alignment programs such as ClustalW and |
|---|
| 549 | sequence format conversion programs such as Readseq). There is <I>not</I> any |
|---|
| 550 | input file editor available in any program in PHYLIP (you should <I>not</I> |
|---|
| 551 | simply start running one of the programs and then expect to click a mouse |
|---|
| 552 | somewhere to start creating a data file). |
|---|
| 553 | <P> |
|---|
| 554 | When they start running, the programs look first for input files with |
|---|
| 555 | particular names (such as <TT>infile</TT>, <TT>treefile</TT>, <TT>intree</TT>, or <TT>fontfile</TT>). |
|---|
| 556 | Exactly which file names they look for varies a bit from program to program, |
|---|
| 557 | and you should read the documentation file for the particular program to |
|---|
| 558 | find out. If you have files with those names the programs will use them |
|---|
| 559 | and not ask you for the file name. If they do not find files of those |
|---|
| 560 | names, the programs will say that they cannot find a file of that name, and |
|---|
| 561 | ask you to type in the file name. |
|---|
| 562 | For example, if DnaML looks |
|---|
| 563 | for the file <TT>infile</TT> and does not find one of that name, |
|---|
| 564 | it prints the message: |
|---|
| 565 | <P> |
|---|
| 566 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 567 | <TT>dnaml: can't find input file "infile"<BR> |
|---|
| 568 | Please enter a new file name></TT> |
|---|
| 569 | </TD></TR></TABLE> |
|---|
| 570 | <P><I>This does not mean that an error |
|---|
| 571 | has occurred.</I> All you need to do is to type in the name of the file. |
|---|
| 572 | <P> |
|---|
| 573 | The program looks for the input files in the same directory that the |
|---|
| 574 | program is in (a directory is the same thing as a "folder"). In Windows, Linux, Unix, or MSDOS, if you are asked for the |
|---|
| 575 | file name you can type in the path to the file, as part of the name (thus, |
|---|
| 576 | if the file is in the directory above the current one, you can type in |
|---|
| 577 | a file name such as <TT>../myfile.dna</TT>). If you do not know what a |
|---|
| 578 | "directory" is, or what "above" means, then you are a member of the new |
|---|
| 579 | generation who just clicks the mouse and assumes that a list of file names |
|---|
| 580 | will magically appear. (Typically members of this generation have no idea |
|---|
| 581 | where the files are on their system, and accumulate enormous amounts of |
|---|
| 582 | unnecessary clutter in their file systems.) In this case you should ask |
|---|
| 583 | someone to explain directories to you. |
|---|
| 584 | <P> |
|---|
| 585 | <H3>Running the programs on a Windows machine.</H3> |
|---|
| 586 | <P> |
|---|
| 587 | Double-click on the icon for |
|---|
| 588 | the program. A window should open with a menu in it. Further dialog with the |
|---|
| 589 | program occurs |
|---|
| 590 | by typing on the keyboard in response to what you see in the window. The |
|---|
| 591 | programs can be interrupted either by typing Control-C (which means to |
|---|
| 592 | press down on the <TT>Ctrl</TT> key while typing the letter <TT>C</TT>), or by using |
|---|
| 593 | the mouse to open the <TT>File</TT> menu in the upper-left corner of the program's |
|---|
| 594 | window area and then select <TT>Quit</TT>. Other than this, most PHYLIP programs |
|---|
| 595 | make no use of the mouse. The tree-drawing programs Drawtree and Drawgram |
|---|
| 596 | do allow use of the mouse to select some options. |
|---|
| 597 | <P> |
|---|
| 598 | <H3>Running the programs on a Macintosh.</H3> |
|---|
| 599 | <P> |
|---|
| 600 | Double-click on the icon for |
|---|
| 601 | the program. A window should open. Further dialog with the program occurs |
|---|
| 602 | by typing on the keyboard in response to what you see in the window. The |
|---|
| 603 | programs can be interrupted by using |
|---|
| 604 | the mouse to open the <TT>File</TT> menu in the upper-left corner of the program's |
|---|
| 605 | window area and then select <TT>Quit</TT>. Alternatively, you can use the |
|---|
| 606 | Command-Q key combination. |
|---|
| 607 | <P> |
|---|
| 608 | When you use Quit, the program will ask you whether you want to save |
|---|
| 609 | a file whose name is the program name (often followed by <TT>.out</TT> -- for |
|---|
| 610 | example, if you are using DNAML it will ask you if you want to save file |
|---|
| 611 | <TT>Dnaml.out</TT>. This file is simply a record of everything that |
|---|
| 612 | displayed on the program window, and you usually will not want to save it. |
|---|
| 613 | Pressing the <TT>Enter</TT> key or selecting the Do Not Save button with |
|---|
| 614 | the mouse will keep this from being saved. |
|---|
| 615 | <P> |
|---|
| 616 | If you encounter memory limitations on a Macintosh, and determine that |
|---|
| 617 | this is not due to a problem with the format of the input file, as it |
|---|
| 618 | often will be, you may be able to solve it by raising the limits of the |
|---|
| 619 | stack and heap sizes of the program. To do this click on the program |
|---|
| 620 | and then select <TT>Get Info</TT> from the Finder <TT>File</TT> menu. |
|---|
| 621 | This will open a window which can be made to show the memory limits |
|---|
| 622 | of the program. These can be changed by selecting them and typing in |
|---|
| 623 | larger numbers. This may relieve nagging memory problems. If it does |
|---|
| 624 | not, consult your local documentation and suspect problems with your |
|---|
| 625 | input file format. |
|---|
| 626 | <P> |
|---|
| 627 | <H3>Running the programs on a Unix system.</H3> |
|---|
| 628 | <P> |
|---|
| 629 | Type the name of the program |
|---|
| 630 | in lower-case letters (such as <TT>dnaml</TT>). To interrupt the program while |
|---|
| 631 | it is running, type Control-C (which means to press down on the <TT>Ctrl</TT> key |
|---|
| 632 | while typing the letter <TT>C</TT>). |
|---|
| 633 | <P> |
|---|
| 634 | <H3>Running the programs in MSDOS.</H3> |
|---|
| 635 | <P> |
|---|
| 636 | Type the name of the program |
|---|
| 637 | in lower-case letters (such as <TT>dnaml</TT>). To interrupt the program while |
|---|
| 638 | it is running, type Control-C (which means to press down on the <TT>Ctrl</TT> key |
|---|
| 639 | while typing the letter <TT>C</TT>). |
|---|
| 640 | <P> |
|---|
| 641 | <H3>Running the programs in background or under control of a command file</H3> |
|---|
| 642 | <P> |
|---|
| 643 | In running the programs, you may sometimes want to put them in background |
|---|
| 644 | so you can proceed with other work. On systems with a windowing environment |
|---|
| 645 | they can be put in their own window, and commands like the Unix and Linux |
|---|
| 646 | <TT>nice</TT> command used to make |
|---|
| 647 | them have lower priority so that they do not interfere with interactive |
|---|
| 648 | applications in other windows. This part of the discussion will |
|---|
| 649 | assume either a Windows system or a Unix or Linux system. I will |
|---|
| 650 | note when the commands work on one of these systems but not the other. |
|---|
| 651 | Running jobs in background on Macintosh systems is an arcane art into whose |
|---|
| 652 | mysteries I have not been initiated (or perhaps no one has been initiated). |
|---|
| 653 | <P> |
|---|
| 654 | If there is no windowing |
|---|
| 655 | environment, on a Unix or Linux system you will want to use an |
|---|
| 656 | ampersand (<TT>&</TT>) after the command file name when invoking it to put the |
|---|
| 657 | job in the background. You will have to put all the responses to the |
|---|
| 658 | interactive menu of the program into a file and tell the background job |
|---|
| 659 | to take its input from that file. |
|---|
| 660 | On Windows systems there is no <TT>&</TT> or <TT>nice</TT> command |
|---|
| 661 | but input and output redirection and command files work fine, with the sole |
|---|
| 662 | difference that the a file of commands must have a name ending in |
|---|
| 663 | <TT>.BAT</TT>, such as <TT>FOOFILE.BAT</TT>. |
|---|
| 664 | <P> |
|---|
| 665 | For example: suppose you want to run DNAPARS in a background, taking its |
|---|
| 666 | input data from a file called <TT>sequences.dat</TT>, putting its interactive |
|---|
| 667 | output to file called <TT>screenout</TT>, and using a file called <TT>input</TT> as |
|---|
| 668 | the place to store the interactive input. The file <TT>input</TT> need only |
|---|
| 669 | contain two lines: |
|---|
| 670 | <P> |
|---|
| 671 | <TABLE><TR><TD bgcolor=white> |
|---|
| 672 | <PRE> |
|---|
| 673 | sequences.dat |
|---|
| 674 | Y |
|---|
| 675 | </PRE> |
|---|
| 676 | </TD></TR></TABLE> |
|---|
| 677 | <P> |
|---|
| 678 | which is what you would have typed to run the program interactively, in |
|---|
| 679 | response to the program's request for an input file name if it did not |
|---|
| 680 | find a file named <TT>infile</TT>, in in response the the menu. |
|---|
| 681 | <P> |
|---|
| 682 | To run the program in background, in Unix or Linux you would simply give the command: |
|---|
| 683 | <P> |
|---|
| 684 | <TT>dnapars < input > screenout & |
|---|
| 685 | </TT> |
|---|
| 686 | <P> |
|---|
| 687 | These run the program with input responses coming from <TT>input</TT> and |
|---|
| 688 | interactive output being put into file <TT>screenout</TT>. The usual output |
|---|
| 689 | file and tree file will also be created by this run (keep that in mind |
|---|
| 690 | as if you run any other PHYLIP program from the same directory while |
|---|
| 691 | this one is running in background you may overwrite the output file from |
|---|
| 692 | one program with that from the other!). |
|---|
| 693 | <P> |
|---|
| 694 | If you wanted to give the program lower priority, so that it would |
|---|
| 695 | not interfere with other work, and you have Berkeley Unix type job control |
|---|
| 696 | facilities in your Unix or Linux (and you usually do), you can use the |
|---|
| 697 | <TT>nice</TT> command: |
|---|
| 698 | <P> |
|---|
| 699 | <TT>nice +10 dnapars < input > screenout & |
|---|
| 700 | </TT> |
|---|
| 701 | <P> |
|---|
| 702 | which lowers the priority of the run. To also time the run and put the |
|---|
| 703 | timing at the end of <TT>screenout</TT>, you can do this: |
|---|
| 704 | <P> |
|---|
| 705 | <TT>nice +10 ( time dnapars < input ) >& screenout & |
|---|
| 706 | </TT> |
|---|
| 707 | <P> |
|---|
| 708 | which I will not attempt to explain. |
|---|
| 709 | <P> |
|---|
| 710 | On Unix or Linux systems |
|---|
| 711 | you may also want to explore putting the interactive output into the |
|---|
| 712 | null file <TT>/dev/null</TT> so as to not be bothered with it (but then you |
|---|
| 713 | cannot look at it to see why something went wrong). If you have problems |
|---|
| 714 | with creating output files that are too large, you may want to |
|---|
| 715 | explore carefully the turning off of options in the programs you run. |
|---|
| 716 | <P> |
|---|
| 717 | If you are doing several runs in one, as for example when you do a |
|---|
| 718 | bootstrap analysis using SEQBOOT, DNAPARS (say), and CONSENSE, you |
|---|
| 719 | can use an editor to create a "command file" with these commands: |
|---|
| 720 | <P> |
|---|
| 721 | <TABLE><TR><TD bgcolor=white> |
|---|
| 722 | <PRE> |
|---|
| 723 | seqboot < input1 > screenout |
|---|
| 724 | mv outfile infile |
|---|
| 725 | dnapars < input2 >> screenout |
|---|
| 726 | mv outtree intree |
|---|
| 727 | consense < input3 >> screenout |
|---|
| 728 | </PRE> |
|---|
| 729 | </TD></TR></TABLE> |
|---|
| 730 | <P> |
|---|
| 731 | This is the Unix or Linux version -- in the MSDOS version, the renaming |
|---|
| 732 | of files and the appending of output to the file <TT>screenout</TT> is |
|---|
| 733 | handled differently. |
|---|
| 734 | <P> |
|---|
| 735 | On Unix or Linux the command file might be named something like |
|---|
| 736 | <TT>foofile</TT>, and on Windows systems might be named <TT>foofile.bat</TT>. |
|---|
| 737 | <P> |
|---|
| 738 | On Unix or Linux the command file must be given |
|---|
| 739 | execute permission by using the command <TT>chmod +x foofile</TT> followed |
|---|
| 740 | by the command <TT>rehash</TT>. The job that <TT>foofile</TT> describes |
|---|
| 741 | can be run in background on Unix or Linux by giving the command |
|---|
| 742 | <P> |
|---|
| 743 | <TT>foofile &</TT> |
|---|
| 744 | <P> |
|---|
| 745 | On Windows systems it can be run by |
|---|
| 746 | clicking on the icon of the command file. Its icon will have a little gear |
|---|
| 747 | symbol. |
|---|
| 748 | <P> |
|---|
| 749 | Note that you must also have the interactive input |
|---|
| 750 | commands for SEQBOOT (including the random number seed), DNAPARS, and |
|---|
| 751 | CONSENSE in the separate files <TT>input1</TT>, <TT>input2</TT>, and <TT>input3</TT>. |
|---|
| 752 | Note that when PHYLIP programs attempt to open a new output file (such as |
|---|
| 753 | <TT>outfile</TT>, <TT>outtree</TT>, or <TT>plotfile</TT>, if they see |
|---|
| 754 | a file of that name already in existence they will ask you if you want to |
|---|
| 755 | overwrite it, and offer alternatives including writing to another file, |
|---|
| 756 | appending information to that file, or quitting the program without writing to |
|---|
| 757 | the file. This means that in writing batch files it is important to know |
|---|
| 758 | whether there will be a prompt of this sort. You must know in advance |
|---|
| 759 | whether the file will exist. You may want to put in your batch file a |
|---|
| 760 | command that tests for the existence of a pre-existing output file and |
|---|
| 761 | if so, removes it. You might even want to put in a command that creates a |
|---|
| 762 | file of that name, so that you can be sure it is there! Either way, |
|---|
| 763 | you will then know whether to put into your file of keyboard responses the |
|---|
| 764 | proper response to the inquiry about overwriting that output file. |
|---|
| 765 | <P> |
|---|
| 766 | <A NAME="inputfiles"><HR><P></A> |
|---|
| 767 | <DIV ALIGN="CENTER"> |
|---|
| 768 | <H2>Preparing Input Files</H2></DIV> |
|---|
| 769 | <P> |
|---|
| 770 | The input files for PHYLIP programs must be prepared separately - there is |
|---|
| 771 | no data editor within PHYLIP. You can use a word processor (or text |
|---|
| 772 | editor) to prepare them yourself, or you can use a program that produces |
|---|
| 773 | a PHYLIP-format output. Sequence alignment programs such as ClustalW |
|---|
| 774 | commonly have an option to produce PHYLIP files as output, and some |
|---|
| 775 | other phylogeny programs, such as MacClade and TreeView, are capable of |
|---|
| 776 | producing a PHYLIP-format file. |
|---|
| 777 | <P> |
|---|
| 778 | The format of the input files is discussed below, and you should also |
|---|
| 779 | read the other PHYLIP documentation relevant to the particular type of |
|---|
| 780 | data that you are using, and the particular programs you want to run, as |
|---|
| 781 | there will be more details there. |
|---|
| 782 | <P> |
|---|
| 783 | It is very important that the input files be in "Text Only" or "flat |
|---|
| 784 | ASCII" format. This means that they contain only printable ASCII/ISO |
|---|
| 785 | characters, and not any unprintable characters. Many word processors such |
|---|
| 786 | as Microsoft Word save their files in a format that contains unprintable |
|---|
| 787 | characters, unless you tell them not to. For Microsoft Word you can |
|---|
| 788 | select <TT>Save As</TT> from its <TT>File</TT> menu, and choose <TT>Text Only</TT> |
|---|
| 789 | as the file format. This can also be done in WordPad utility in Windows . |
|---|
| 790 | Other word processors will have equivalent |
|---|
| 791 | options. Text editors such as the <TT>vi</TT> and <TT>emacs</TT> editors on |
|---|
| 792 | Unix and Linux, Windows Notepad, the <TT>SimpleText</TT> editor in MacOS, or the <TT>pico</TT> |
|---|
| 793 | editor that comes with the <TT>pine</TT> |
|---|
| 794 | mailer program, produce their files in Text Only format and should not |
|---|
| 795 | cause any trouble. |
|---|
| 796 | <P> |
|---|
| 797 | <H3>Input and output files</H3> |
|---|
| 798 | <P> |
|---|
| 799 | For most of the PHYLIP programs, information comes from a series of |
|---|
| 800 | input files, and ends up in a series of output files: |
|---|
| 801 | <P> |
|---|
| 802 | <DIV ALIGN="CENTER"> |
|---|
| 803 | <TABLE> |
|---|
| 804 | <TR><TD> |
|---|
| 805 | <PRE> |
|---|
| 806 | ------------------- |
|---|
| 807 | | | |
|---|
| 808 | infile ---------> | | |
|---|
| 809 | | | |
|---|
| 810 | intree ---------> | | -----------> outfile |
|---|
| 811 | | | |
|---|
| 812 | weights --------> | program | -----------> outtree |
|---|
| 813 | | | |
|---|
| 814 | categories -----> | | -----------> plotfile |
|---|
| 815 | | | |
|---|
| 816 | fonftile -------> | | |
|---|
| 817 | | | |
|---|
| 818 | ------------------- |
|---|
| 819 | </PRE> |
|---|
| 820 | </TD></TR> |
|---|
| 821 | </TABLE> |
|---|
| 822 | </DIV><P></P> |
|---|
| 823 | |
|---|
| 824 | <P> |
|---|
| 825 | The programs interact with the user by presenting a menu. Aside from the |
|---|
| 826 | user's choices from the menu, they read |
|---|
| 827 | all other input from files. These files have default names. The program |
|---|
| 828 | will try to find a file of that name - if it does not, it will ask the |
|---|
| 829 | user to supply the name of that file. |
|---|
| 830 | Input data such as DNA sequences |
|---|
| 831 | comes from a file whose default name is <TT>infile</TT>. If the user |
|---|
| 832 | supplies a tree, this is in a file whose default name is <TT>intree</TT>. |
|---|
| 833 | Values of weights for the characters are in <TT>weights</TT>, and the |
|---|
| 834 | tree plotting program need some digitized fonts which are supplied in |
|---|
| 835 | <TT>fontfile</TT> (all these are default names). |
|---|
| 836 | <P> |
|---|
| 837 | For example, if DnaML looks |
|---|
| 838 | for the file <TT>infile</TT> and does not find one of that name, |
|---|
| 839 | it prints the message: |
|---|
| 840 | <P> |
|---|
| 841 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 842 | <TT>dnaml: can't find input file "infile"<BR> |
|---|
| 843 | Please enter a new file name></TT> |
|---|
| 844 | </TD></TR></TABLE> |
|---|
| 845 | <P> |
|---|
| 846 | This simply means that it wants you to type in the name of the |
|---|
| 847 | input file. |
|---|
| 848 | <P> |
|---|
| 849 | Two programs in the package works differently according to an older ("Old |
|---|
| 850 | Style") system. These are <TT>CLIQUE</TT> and <TT>FACTOR</TT>. The information on ancestral |
|---|
| 851 | states is supplied in the data file whose |
|---|
| 852 | default name is <TT>infile</TT>, and for <TT>FACTOR</TT> the Factors |
|---|
| 853 | information is written into the output file rather than being put into a |
|---|
| 854 | separate file called <TT>factors</TT>. See the <A HREF="clique.html">documentation |
|---|
| 855 | page for <TT>CLIQUE</TT></A> |
|---|
| 856 | and the <A HREF="factor.html">documentation page for FACTOR</A> |
|---|
| 857 | for information on these differences. By the time of the final 3.6 |
|---|
| 858 | release we hope to have these last Old Style programs converted to the new |
|---|
| 859 | system. |
|---|
| 860 | <P> |
|---|
| 861 | <H3>Data file format</H3> |
|---|
| 862 | <P> |
|---|
| 863 | I have tried to adhere to a rather stereotyped input and output |
|---|
| 864 | format. For the parsimony, compatibility and maximum likelihood programs, |
|---|
| 865 | excluding the distance matrix methods, the simplest version of the input |
|---|
| 866 | data file looks something like this: |
|---|
| 867 | <P> |
|---|
| 868 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 869 | <PRE> |
|---|
| 870 | 6 13 |
|---|
| 871 | Archaeopt CGATGCTTAC CGC |
|---|
| 872 | HesperorniCGTTACTCGT TGT |
|---|
| 873 | BaluchitheTAATGTTAAT TGT |
|---|
| 874 | B. virginiTAATGTTCGT TGT |
|---|
| 875 | BrontosaurCAAAACCCAT CAT |
|---|
| 876 | B.subtilisGGCAGCCAAT CAC |
|---|
| 877 | </TD></TR></TABLE> |
|---|
| 878 | </PRE> |
|---|
| 879 | <P> |
|---|
| 880 | The first line of the input file contains the number of species and the |
|---|
| 881 | number of characters (in this case sites). These are in free format, separated |
|---|
| 882 | by blanks. The information for each species follows, starting with a |
|---|
| 883 | ten-character species name (which can include blanks and some punctuation |
|---|
| 884 | marks), and continuing with the characters for that species. The name should |
|---|
| 885 | be on the same line as the first character of the data for that species. |
|---|
| 886 | (I will use the term "species" for the tips of the trees, recognizing |
|---|
| 887 | that in some cases these will actually be populations or individual gene |
|---|
| 888 | sequences). |
|---|
| 889 | <P> |
|---|
| 890 | The name should be ten characters in length, filled out to the full |
|---|
| 891 | ten characters by blanks if shorter. Any printable ASCII/ISO character is |
|---|
| 892 | allowed in the name, except for parentheses ("<TT>(</TT>" and "<TT>)</TT>"), square |
|---|
| 893 | brackets ("<TT>[</TT>" and "<TT>]</TT>"), colon ("<TT>:</TT>"), semicolon ("<TT>;</TT>") and comma ("<TT>,</TT>"). |
|---|
| 894 | If you forget to extend the names to ten characters in length by blanks, |
|---|
| 895 | the program will get out of synchronization with the contents of the data |
|---|
| 896 | file, and an error message will result. |
|---|
| 897 | <P> |
|---|
| 898 | In the |
|---|
| 899 | discrete-character programs, DNA sequence programs and protein sequence |
|---|
| 900 | programs the characters are each a |
|---|
| 901 | single letter or digit, sometimes separated by blanks. In |
|---|
| 902 | the continuous-characters programs they are real numbers with decimal points, |
|---|
| 903 | separated by blanks: |
|---|
| 904 | <P> |
|---|
| 905 | <TT>Latimeria 2.03 3.457 100.2 0.0 -3.7</TT> |
|---|
| 906 | <P> |
|---|
| 907 | The conventions about continuing the data beyond one line per species are |
|---|
| 908 | different between the molecular sequence programs and the others. The |
|---|
| 909 | molecular sequence programs can take the data in "aligned" or "interleaved" |
|---|
| 910 | format, in which we first have some lines giving the first part of each of the |
|---|
| 911 | sequences, then some |
|---|
| 912 | lines giving the next part of each, and so on. Thus the sequences might |
|---|
| 913 | look like this: |
|---|
| 914 | <P> |
|---|
| 915 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 916 | <PRE> |
|---|
| 917 | 6 39 |
|---|
| 918 | Archaeopt CGATGCTTAC CGCCGATGCT |
|---|
| 919 | HesperorniCGTTACTCGT TGTCGTTACT |
|---|
| 920 | BaluchitheTAATGTTAAT TGTTAATGTT |
|---|
| 921 | B. virginiTAATGTTCGT TGTTAATGTT |
|---|
| 922 | BrontosaurCAAAACCCAT CATCAAAACC |
|---|
| 923 | B.subtilisGGCAGCCAAT CACGGCAGCC |
|---|
| 924 | |
|---|
| 925 | TACCGCCGAT GCTTACCGC |
|---|
| 926 | CGTTGTCGTT ACTCGTTGT |
|---|
| 927 | AATTGTTAAT GTTAATTGT |
|---|
| 928 | CGTTGTTAAT GTTCGTTGT |
|---|
| 929 | CATCATCAAA ACCCATCAT |
|---|
| 930 | AATCACGGCA GCCAATCAC |
|---|
| 931 | </PRE> |
|---|
| 932 | </TD></TR></TABLE> |
|---|
| 933 | <P> |
|---|
| 934 | Note that in these sequences we have a blank every |
|---|
| 935 | ten sites to make them easier to read: any such blanks are allowed. The blank |
|---|
| 936 | line which separates the two groups of lines (the ones |
|---|
| 937 | containing sites 1-20 and ones containing sites 21-39) may or may not |
|---|
| 938 | be present, but if it is, it should be a line of zero length and not contain |
|---|
| 939 | any extra blank |
|---|
| 940 | characters (this is because of a limitation of the current versions |
|---|
| 941 | of the programs). It is important that the number of sites in each |
|---|
| 942 | group be the same for all species (i.e., it will not be possible to run |
|---|
| 943 | the programs successfully if the first species line contains 20 bases, but |
|---|
| 944 | the first line for the second species contains 21 bases). |
|---|
| 945 | <P> |
|---|
| 946 | Alternatively, an option can be selected in the menu to take the data in |
|---|
| 947 | "sequential" format, with all of the data for the first species, |
|---|
| 948 | then all of the characters for the next species, and so on. This is also |
|---|
| 949 | the way that the discrete characters programs and the gene frequencies |
|---|
| 950 | and quantitative characters programs want to read the data. They do not |
|---|
| 951 | allow the interleaved format. |
|---|
| 952 | <P> |
|---|
| 953 | In the sequential format, the character data can run on to a new line at any |
|---|
| 954 | time (except in the middle of a species name or, in the case of continuous |
|---|
| 955 | character and distance matrix programs where you cannot go to a new line in |
|---|
| 956 | the middle of a real number). Thus it is legal to have: |
|---|
| 957 | <P> |
|---|
| 958 | <TT>Archaeopt 001100 |
|---|
| 959 | <BR> |
|---|
| 960 | 1101 |
|---|
| 961 | <BR> |
|---|
| 962 | </TT> |
|---|
| 963 | <P> |
|---|
| 964 | or even: |
|---|
| 965 | <P> |
|---|
| 966 | <TT>Archaeopt |
|---|
| 967 | <BR> |
|---|
| 968 | 0011001101 |
|---|
| 969 | <BR> |
|---|
| 970 | </TT> |
|---|
| 971 | |
|---|
| 972 | <P> |
|---|
| 973 | though note that the <I>full</I> ten characters of the species name <I>must</I> |
|---|
| 974 | then be present: in the above case there must be a blank after the "t". In all |
|---|
| 975 | cases it is possible to put internal blanks between any of the character |
|---|
| 976 | values, so that |
|---|
| 977 | <P> |
|---|
| 978 | <TT>Archaeopt 0011001101 0111011100 |
|---|
| 979 | </TT> |
|---|
| 980 | <P> |
|---|
| 981 | is allowed. |
|---|
| 982 | <P> |
|---|
| 983 | Note that you can convert molecular sequence data between the interleaved |
|---|
| 984 | and the sequential data formats by using the Rewrite option of the D |
|---|
| 985 | menu item in SEQBOOT. |
|---|
| 986 | <P> |
|---|
| 987 | If you make an error in the format of the input file, the programs can |
|---|
| 988 | sometimes detect that |
|---|
| 989 | they have been fed an illegal character or illegal numerical value and issue |
|---|
| 990 | an error message such as <TT>BAD CHARACTER STATE:</TT>, often printing out the |
|---|
| 991 | bad value, and sometimes the number of the species and character in which it |
|---|
| 992 | occurred. The program will then stop shortly after. One of the things which |
|---|
| 993 | can lead to a bad value is the omission of something earlier in the file, or |
|---|
| 994 | the insertion of something superfluous, which cause the reading of the file to |
|---|
| 995 | get out of synchronization. The program then starts reading things it |
|---|
| 996 | didn't expect, and concludes that they are in error. So if you see this error |
|---|
| 997 | message, you may also want |
|---|
| 998 | to look for the earlier problem that may have led to the program becoming |
|---|
| 999 | confused about what it is reading. |
|---|
| 1000 | <P> |
|---|
| 1001 | Some options are described below, but you should also read the documentation |
|---|
| 1002 | for the groups of the programs and for the individual programs. |
|---|
| 1003 | <BR> |
|---|
| 1004 | <P> |
|---|
| 1005 | <A NAME="menu"><HR><P></A> |
|---|
| 1006 | <H3>The Menu</H3> |
|---|
| 1007 | <P> |
|---|
| 1008 | The menu is straightforward. It typically looks like this (this one is for |
|---|
| 1009 | DNAPARS): |
|---|
| 1010 | <P> |
|---|
| 1011 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 1012 | <PRE> |
|---|
| 1013 | DNA parsimony algorithm, version 3.6 |
|---|
| 1014 | |
|---|
| 1015 | Setting for this run: |
|---|
| 1016 | U Search for best tree? Yes |
|---|
| 1017 | S Search option? More thorough search |
|---|
| 1018 | V Number of trees to save? 100 |
|---|
| 1019 | J Randomize input order of sequences? No. Use input order |
|---|
| 1020 | O Outgroup root? No, use as outgroup species 1 |
|---|
| 1021 | T Use Threshold parsimony? No, use ordinary parsimony |
|---|
| 1022 | N Use Transversion parsimony? No, count all steps |
|---|
| 1023 | W Sites weighted? No |
|---|
| 1024 | M Analyze multiple data sets? No |
|---|
| 1025 | I Input sequences interleaved? Yes |
|---|
| 1026 | 0 Terminal type (IBM PC, ANSI, none)? (none) |
|---|
| 1027 | 1 Print out the data at start of run No |
|---|
| 1028 | 2 Print indications of progress of run Yes |
|---|
| 1029 | 3 Print out tree Yes |
|---|
| 1030 | 4 Print out steps in each site No |
|---|
| 1031 | 5 Print sequences at all nodes of tree No |
|---|
| 1032 | 6 Write out trees onto tree file? Yes |
|---|
| 1033 | |
|---|
| 1034 | Y to accept these or type the letter for one to change |
|---|
| 1035 | </PRE> |
|---|
| 1036 | </TD></TR></TABLE> |
|---|
| 1037 | <P> |
|---|
| 1038 | If you want to accept the default settings (they are shown in the above case) |
|---|
| 1039 | you can simply type <TT>Y</TT> followed by pressing on the <TT>Enter</TT> key. |
|---|
| 1040 | If you want to change any of the options, you should type the letter |
|---|
| 1041 | shown to the left of its entry in the menu. For example, to set a threshold |
|---|
| 1042 | type <TT>T</TT>. Lower-case letters will also work. For many of the options |
|---|
| 1043 | the program will ask for supplementary information, such as the value of |
|---|
| 1044 | the threshold. |
|---|
| 1045 | <P> |
|---|
| 1046 | Note the <TT>Terminal type</TT> entry, which you will find on all menus. It |
|---|
| 1047 | allows you to specify which type of terminal your screen is. The options |
|---|
| 1048 | are an IBM PC screen, an ANSI standard terminal, or <TT>none</TT>. |
|---|
| 1049 | Choosing zero (<TT>0</TT>) toggles |
|---|
| 1050 | among these three options in cyclical order, changing each time the <TT>0</TT> |
|---|
| 1051 | option is chosen. If one of them is right for your terminal the screen will be |
|---|
| 1052 | cleared before the menu is displayed. If none works, the <TT>none</TT> option |
|---|
| 1053 | should probably be chosen. The programs should start with a terminal option |
|---|
| 1054 | appropriate for your computer, but if they do not, you can change the |
|---|
| 1055 | terminal type manually. This is particularly important in program RETREE |
|---|
| 1056 | where a tree is displayed on the screen - if the terminal type is set to the |
|---|
| 1057 | wrong value, the tree can look very strange. |
|---|
| 1058 | <P> |
|---|
| 1059 | The other numbered options control which information the program will |
|---|
| 1060 | display on your screen or on the output files. The option to <TT>Print |
|---|
| 1061 | indications of progress of run</TT> will show information such as the names of |
|---|
| 1062 | the species as they are successively added to the tree, and the |
|---|
| 1063 | progress of rearrangements. You will usually want to see these as |
|---|
| 1064 | reassurance that the program is running and to help you estimate how long |
|---|
| 1065 | it will take. But if you are running the program "in background" as can be |
|---|
| 1066 | done on multitasking and multiuser systems, and do not have the |
|---|
| 1067 | program running in its own window, you may want to turn this option off so |
|---|
| 1068 | that it does not disturb your use of the computer while the program is |
|---|
| 1069 | running. |
|---|
| 1070 | <P> |
|---|
| 1071 | <A NAME="outputfile"><HR><P></A> |
|---|
| 1072 | <H2>The Output File</H2> |
|---|
| 1073 | <BR> |
|---|
| 1074 | <P> |
|---|
| 1075 | Most of the programs write their output onto a file called (usually) <TT>outfile</TT>, and a representation of the trees found onto a file called |
|---|
| 1076 | <TT>outtree</TT>. |
|---|
| 1077 | <P> |
|---|
| 1078 | The exact contents of the output file vary from program to program and also |
|---|
| 1079 | depend on which menu options you have selected. For many programs, if you |
|---|
| 1080 | select all possible output information, the output will consist of |
|---|
| 1081 | (1) the name of the program and its |
|---|
| 1082 | version number, (2) some of the input information printed out, and (3) a series of |
|---|
| 1083 | phylogenies, some with associated information indicating how much change |
|---|
| 1084 | there was in each character or on each part of the tree. A typical rooted tree |
|---|
| 1085 | looks like this: |
|---|
| 1086 | <P> |
|---|
| 1087 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 1088 | <PRE> |
|---|
| 1089 | +-------------------Gibbon |
|---|
| 1090 | +----------------------------2 |
|---|
| 1091 | ! ! +------------------Orang |
|---|
| 1092 | ! +------4 |
|---|
| 1093 | ! ! +---------Gorilla |
|---|
| 1094 | +-----3 +--6 |
|---|
| 1095 | ! ! ! +---------Chimp |
|---|
| 1096 | ! ! +----5 |
|---|
| 1097 | --1 ! +-----Human |
|---|
| 1098 | ! ! |
|---|
| 1099 | ! +-----------------------------------------------Mouse |
|---|
| 1100 | ! |
|---|
| 1101 | +------------------------------------------------Bovine |
|---|
| 1102 | </PRE> |
|---|
| 1103 | </TD></TR></TABLE> |
|---|
| 1104 | <P> |
|---|
| 1105 | The interpretation of the tree is fairly straightforward: it "grows" |
|---|
| 1106 | from left to right. The numbers at the forks are arbitrary and are used (if |
|---|
| 1107 | present) merely to identify the forks. For many of the programs the tree |
|---|
| 1108 | produced is unrooted. Rooted and unrooted trees are printed in nearly the |
|---|
| 1109 | same form, but the unrooted ones are accompanied by the |
|---|
| 1110 | warning message: |
|---|
| 1111 | <P> |
|---|
| 1112 | <TT> remember: this is an unrooted tree! |
|---|
| 1113 | </TT> |
|---|
| 1114 | <P> |
|---|
| 1115 | to indicate that this is an unrooted tree and to warn against |
|---|
| 1116 | taking the position of its root too seriously. Mathematicians still call |
|---|
| 1117 | an unrooted tree a tree, though some systematists unfortunately use the term |
|---|
| 1118 | "network" for an unrooted tree. This conflicts with standard mathematical |
|---|
| 1119 | usage, which reserves the name "network" for a completely different kind of |
|---|
| 1120 | graph). The root of this tree could be anywhere, say on the line leading |
|---|
| 1121 | immediately to <TT>Mouse</TT>. As an exercise, |
|---|
| 1122 | see if you can tell whether the following tree is or is not a different |
|---|
| 1123 | one from the above: |
|---|
| 1124 | <P> |
|---|
| 1125 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 1126 | <PRE> |
|---|
| 1127 | +-----------------------------------------------Mouse |
|---|
| 1128 | ! |
|---|
| 1129 | +---------4 +------------------Orang |
|---|
| 1130 | ! ! +------3 |
|---|
| 1131 | ! ! ! ! +---------Chimp |
|---|
| 1132 | ---6 +----------------------------1 ! +----2 |
|---|
| 1133 | ! ! +--5 +-----Human |
|---|
| 1134 | ! ! ! |
|---|
| 1135 | ! ! +---------Gorilla |
|---|
| 1136 | ! ! |
|---|
| 1137 | ! +-------------------Gibbon |
|---|
| 1138 | ! |
|---|
| 1139 | +-------------------------------------------Bovine |
|---|
| 1140 | |
|---|
| 1141 | remember: this is an unrooted tree! |
|---|
| 1142 | </PRE> |
|---|
| 1143 | </TD></TR></TABLE> |
|---|
| 1144 | <P> |
|---|
| 1145 | (it is <I>not</I> different). It is <I>important</I> also to realize that the |
|---|
| 1146 | lengths of the segments of the printed tree may not be significant: some |
|---|
| 1147 | may actually represent branches of zero length, in the sense that there is no |
|---|
| 1148 | evidence that |
|---|
| 1149 | those branches are nonzero in length. Some of the diagrams of trees attempt |
|---|
| 1150 | to print branches approximately proportional to estimated |
|---|
| 1151 | branch lengths, while in others the lengths are purely conventional and |
|---|
| 1152 | are presented just to make the topology visible. You will have to look closely |
|---|
| 1153 | at the documentation that accompanies each program to see what it presents |
|---|
| 1154 | and what is known about the lengths of the branches on the tree. The above |
|---|
| 1155 | tree attempts to represent branch lengths approximately in the diagram. But |
|---|
| 1156 | even in those cases, some of the smaller branches are likely to be |
|---|
| 1157 | artificially lengthened to make the tree topology clearer. Here is what |
|---|
| 1158 | a tree from DNAPARS looks like, when no attempt is made to make the |
|---|
| 1159 | lengths of branches in the diagram proportional to estimated branch |
|---|
| 1160 | lengths: |
|---|
| 1161 | <P> |
|---|
| 1162 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 1163 | <PRE> |
|---|
| 1164 | +--Human |
|---|
| 1165 | +--5 |
|---|
| 1166 | +--4 +--Chimp |
|---|
| 1167 | ! ! |
|---|
| 1168 | +--3 +-----Gorilla |
|---|
| 1169 | ! ! |
|---|
| 1170 | +--2 +--------Orang |
|---|
| 1171 | ! ! |
|---|
| 1172 | +--1 +-----------Gibbon |
|---|
| 1173 | ! ! |
|---|
| 1174 | --6 +--------------Mouse |
|---|
| 1175 | ! |
|---|
| 1176 | +-----------------Bovine |
|---|
| 1177 | |
|---|
| 1178 | remember: this is an unrooted tree! |
|---|
| 1179 | </PRE> |
|---|
| 1180 | </TD></TR></TABLE> |
|---|
| 1181 | <P> |
|---|
| 1182 | When a tree has branch lengths, it will be accompanied by a table showing |
|---|
| 1183 | for each branch the numbers (or names) of the nodes at each end of the |
|---|
| 1184 | branch, and the length of that branch. For the first tree shown above, |
|---|
| 1185 | the corresponding table is: |
|---|
| 1186 | <P> |
|---|
| 1187 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 1188 | <PRE> |
|---|
| 1189 | Between And Length Approx. Confidence Limits |
|---|
| 1190 | ------- --- ------ ------- ---------- ------ |
|---|
| 1191 | |
|---|
| 1192 | 1 Bovine 0.90216 ( 0.50346, 1.30086) ** |
|---|
| 1193 | 1 Mouse 0.79240 ( 0.42191, 1.16297) ** |
|---|
| 1194 | 1 2 0.48553 ( 0.16602, 0.80496) ** |
|---|
| 1195 | 2 3 0.12113 ( zero, 0.24676) * |
|---|
| 1196 | 3 4 0.04895 ( zero, 0.12668) |
|---|
| 1197 | 4 5 0.07459 ( 0.00735, 0.14180) ** |
|---|
| 1198 | 5 Human 0.10563 ( 0.04234, 0.16889) ** |
|---|
| 1199 | 5 Chimp 0.17158 ( 0.09765, 0.24553) ** |
|---|
| 1200 | 4 Gorilla 0.15266 ( 0.07468, 0.23069) ** |
|---|
| 1201 | 3 Orang 0.30368 ( 0.18735, 0.41999) ** |
|---|
| 1202 | 2 Gibbon 0.33636 ( 0.19264, 0.48009) ** |
|---|
| 1203 | |
|---|
| 1204 | * = significantly positive, P < 0.05 |
|---|
| 1205 | ** = significantly positive, P < 0.01 |
|---|
| 1206 | </PRE> |
|---|
| 1207 | </TD></TR></TABLE> |
|---|
| 1208 | <P> |
|---|
| 1209 | Ignoring the asterisks and the approximate confidence limits, which will be |
|---|
| 1210 | described in the documentation file for DNAML, we can see that the table |
|---|
| 1211 | gives a more precise idea of what the lengths of all the branches are. |
|---|
| 1212 | Similar tables exist in distance matrix and likelihood programs, as well |
|---|
| 1213 | as in the parsimony programs DNAPARS and PARS. |
|---|
| 1214 | <P> |
|---|
| 1215 | Some of the parsimony programs in the package can print out a table |
|---|
| 1216 | of the number of steps that different characters (or sites) require on |
|---|
| 1217 | the tree. This table may not be obvious at first. A typical example looks like |
|---|
| 1218 | this: |
|---|
| 1219 | <P> |
|---|
| 1220 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 1221 | <PRE> |
|---|
| 1222 | steps in each site: |
|---|
| 1223 | 0 1 2 3 4 5 6 7 8 9 |
|---|
| 1224 | *----------------------------------------- |
|---|
| 1225 | 0! 2 2 2 2 1 1 2 2 1 |
|---|
| 1226 | 10! 1 2 3 1 1 1 1 1 1 2 |
|---|
| 1227 | 20! 1 2 2 1 2 2 1 1 1 2 |
|---|
| 1228 | 30! 1 2 1 1 1 2 1 3 1 1 |
|---|
| 1229 | 40! 1 |
|---|
| 1230 | </PRE> |
|---|
| 1231 | </TD></TR></TABLE> |
|---|
| 1232 | <P> |
|---|
| 1233 | The numbers across the top and down the side indicate which site |
|---|
| 1234 | is being referred to. Thus site 23 is column "3" of row "20" |
|---|
| 1235 | and has 1 step in this case. |
|---|
| 1236 | <P> |
|---|
| 1237 | There are many other kinds of information that can appear in the |
|---|
| 1238 | output file, They vary from program to program, and we leave their |
|---|
| 1239 | description to the documentation files for the specific programs. |
|---|
| 1240 | <P> |
|---|
| 1241 | <A NAME="treefile"><HR><P></A> |
|---|
| 1242 | <H2>The Tree File</H2> |
|---|
| 1243 | <P> |
|---|
| 1244 | In output from most programs, |
|---|
| 1245 | a representation of the tree is also written into the tree file |
|---|
| 1246 | <TT>outtree</TT>. The tree is specified by nested pairs |
|---|
| 1247 | of parentheses, enclosing |
|---|
| 1248 | names and separated by commas. We will describe how this works |
|---|
| 1249 | below. If there are any blanks in the names, |
|---|
| 1250 | these must be replaced by the underscore character "<TT>_</TT>". Trailing blanks |
|---|
| 1251 | in the name may be omitted. The pattern of the parentheses indicates |
|---|
| 1252 | the pattern of the tree by having each pair of parentheses enclose all |
|---|
| 1253 | the members of a monophyletic group. The tree file could look like this: |
|---|
| 1254 | <P> |
|---|
| 1255 | <TT>((Mouse,Bovine),(Gibbon,(Orang,(Gorilla,(Chimp,Human))))); |
|---|
| 1256 | </TT> |
|---|
| 1257 | <P> |
|---|
| 1258 | In this tree the first fork separates the lineage leading to |
|---|
| 1259 | <TT>Mouse</TT> and <TT>Bovine</TT> from the lineage leading to the rest. Within the |
|---|
| 1260 | latter group there is a fork separating <TT>Gibbon</TT> from the rest, and so on. |
|---|
| 1261 | The entire tree is enclosed in an outermost pair of parentheses. The tree ends |
|---|
| 1262 | with a semicolon. In some programs such as DNAML, FITCH, and CONTML, |
|---|
| 1263 | the tree will be unrooted. An unrooted tree should have its |
|---|
| 1264 | bottommost fork have a |
|---|
| 1265 | three-way split, with three groups separated by two commas: |
|---|
| 1266 | <P> |
|---|
| 1267 | <TT>(A,(B,(C,D)),(E,F)); |
|---|
| 1268 | </TT> |
|---|
| 1269 | <P> |
|---|
| 1270 | Here the three groups at the bottom node are <TT>A</TT>, <TT>(B,C,D)</TT>, and |
|---|
| 1271 | <TT>(E,F)</TT>. The single three-way split corresponds to one of the interior |
|---|
| 1272 | nodes of the unrooted tree (it can be any interior node of the tree). The |
|---|
| 1273 | remaining forks are encountered as you move out from that first node. |
|---|
| 1274 | In newer programs, some are able to tolerate these other forks being |
|---|
| 1275 | multifurcations (multi-way splits). |
|---|
| 1276 | You should check the documentation files |
|---|
| 1277 | for the particular programs you are using to see in which of these forms |
|---|
| 1278 | you can expect the user tree to be in. Note that many of the programs |
|---|
| 1279 | that actually estimate an unrooted tree (such as DNAPARS) produce trees in the |
|---|
| 1280 | treefile in rooted form! This is done for reasons of arbitrary internal bookkeeping. The placement of the root is arbitrary. We are working toward |
|---|
| 1281 | having all programs be able to read all trees, whether rooted or unrooted, |
|---|
| 1282 | multifurcating or bifurcating, and having them do the right thing with |
|---|
| 1283 | them. But this is a long-term goal and it is not yet achieved. |
|---|
| 1284 | <P> |
|---|
| 1285 | For programs that infer branch lengths, these are given in the trees in the |
|---|
| 1286 | tree file as real numbers following a colon, and placed immediately |
|---|
| 1287 | after the group descended from that branch. Here is a typical tree |
|---|
| 1288 | with branch lengths: |
|---|
| 1289 | <P> |
|---|
| 1290 | <TT>((cat:47.14069,(weasel:18.87953,((dog:25.46154,(raccoon:19.19959,<BR> |
|---|
| 1291 | bear:6.80041):0.84600):3.87382,(sea_lion:11.99700,<BR> |
|---|
| 1292 | seal:12.00300):7.52973):2.09461):20.59201):25.0,monkey:75.85931); |
|---|
| 1293 | </TT> |
|---|
| 1294 | <P> |
|---|
| 1295 | Note that the tree may continue to a new line at any time except in the |
|---|
| 1296 | middle of a name or the middle of a branch length, although in trees |
|---|
| 1297 | written to the tree file this will only be done after a comma. |
|---|
| 1298 | <P> |
|---|
| 1299 | These representations of trees are a subset of the standard adopted |
|---|
| 1300 | on 24 June 1986 at the annual meetings of the Society for the Study of |
|---|
| 1301 | Evolution by an informal committee (its final session in Newick's |
|---|
| 1302 | lobster restaurant - hence its name, the Newick standard) |
|---|
| 1303 | consisting of Wayne Maddison (author of MacClade), David Swofford (PAUP), |
|---|
| 1304 | F. James Rohlf (NTSYS-PC), Chris Meacham (COMPROB and the original |
|---|
| 1305 | PHYLIP tree drawing programs), James Archie, |
|---|
| 1306 | William H.E. Day, and me. This standard is a generalization of |
|---|
| 1307 | PHYLIP's format, itself based on a well-known representation of trees in |
|---|
| 1308 | terms of parenthesis patterns which is due to the famous mathematician |
|---|
| 1309 | Arthur Cayley, and which has been around for over a century. The |
|---|
| 1310 | standard is now employed by most phylogeny computer programs but unfortunately |
|---|
| 1311 | has yet to be decribed in a formal published description. Other |
|---|
| 1312 | descriptions by me and by Gary Olsen can be accessed using the Web at: |
|---|
| 1313 | <P> |
|---|
| 1314 | <DIV ALIGN="CENTER"> |
|---|
| 1315 | <FONT SIZE=+2><A HREF="http://evolution.gs.washington.edu/phylip/newicktree.html"> |
|---|
| 1316 | <TT>http://evolution.gs.washington.edu/phylip/newicktree.html</TT></A></FONT> |
|---|
| 1317 | </DIV> |
|---|
| 1318 | <P> |
|---|
| 1319 | <A NAME="options"><HR><P></A> |
|---|
| 1320 | <H2>The Options and How To Invoke Them</H2> |
|---|
| 1321 | <P> |
|---|
| 1322 | Most of the programs allow various options that alter the amount of |
|---|
| 1323 | information the program is provided or what is done with the |
|---|
| 1324 | information. Options are selected in the menu. |
|---|
| 1325 | <P> |
|---|
| 1326 | <H3>Common options in the menu</H3> |
|---|
| 1327 | <P> |
|---|
| 1328 | A number of the options from the menu, the <TT>U</TT> (User tree), <TT>G</TT> (Global), |
|---|
| 1329 | <TT>J</TT> (Jumble), <TT>O</TT> (Outgroup), <TT>W</TT> (Weights), |
|---|
| 1330 | <TT>T</TT> (Threshold), <TT>M</TT> (multiple data sets), and the tree output options, are used |
|---|
| 1331 | so widely that it is best to discuss them in this document. |
|---|
| 1332 | <P> |
|---|
| 1333 | <B>The <TT>U</TT> (User tree) option.</B> This option toggles between the default |
|---|
| 1334 | setting, which allows the program to search for the best tree, and the |
|---|
| 1335 | User tree setting, which reads a tree or trees ("user trees") from the input |
|---|
| 1336 | tree file and evaluates them. The input tree file's |
|---|
| 1337 | default name is <TT>intree</TT>. In a few cases the trees should |
|---|
| 1338 | be preceded by a line giving the number of trees: |
|---|
| 1339 | <P> |
|---|
| 1340 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 1341 | <PRE> |
|---|
| 1342 | 3 |
|---|
| 1343 | ((Alligator,Bear),((Cow,(Dog,Elephant)),Ferret)); |
|---|
| 1344 | ((Alligator,Bear),(((Cow,Dog),Elephant),Ferret)); |
|---|
| 1345 | ((Alligator,Bear),((Cow,Dog),(Elephant,Ferret))); |
|---|
| 1346 | </PRE> |
|---|
| 1347 | </TD></TR></TABLE> |
|---|
| 1348 | <P> |
|---|
| 1349 | while in most cases the initial line with the number of trees is not |
|---|
| 1350 | required. This is an inconsistency in the programs that we are intending |
|---|
| 1351 | to eliminate soon. Some programs require rooted trees, some unrooted |
|---|
| 1352 | trees, and some can handle multifurcating trees. You should read |
|---|
| 1353 | the documentation for the particular program to find out which it |
|---|
| 1354 | requires. Program RETREE can be used to convert trees among |
|---|
| 1355 | these forms (on saving a tree from RETREE, you are asked whether |
|---|
| 1356 | you want it to be rooted or unrooted). |
|---|
| 1357 | <P> |
|---|
| 1358 | In using the user tree option, check the pattern of parentheses |
|---|
| 1359 | carefully. The programs do not always detect |
|---|
| 1360 | whether the tree makes sense, and if it does not there will probably be |
|---|
| 1361 | a crash (hopefully, but not inevitably, with an error message indicating |
|---|
| 1362 | the nature of the problem). Trees written out by programs are |
|---|
| 1363 | typically in the proper form. |
|---|
| 1364 | <P> |
|---|
| 1365 | Some of the programs require that the user trees be preceded by line with the |
|---|
| 1366 | number of user trees. Some require that they <EM>not</EM> be preceded by |
|---|
| 1367 | this line, and many can tolerate either. I have tried to note for |
|---|
| 1368 | each of these programs which of these forms of the user tree file |
|---|
| 1369 | is appropriate. We hope to bring all programs to the same user tree file |
|---|
| 1370 | format as soon as possible. |
|---|
| 1371 | <P> |
|---|
| 1372 | <B>The <TT>G</TT> (Global) option.</B> In the programs which construct trees (except for |
|---|
| 1373 | NEIGHBOR, the "...PENNY" programs and CLIQUE, and of course |
|---|
| 1374 | the "...MOVE" programs where you construct the trees yourself), |
|---|
| 1375 | after all species have been added to the tree a rearrangements phase |
|---|
| 1376 | ensues. In most of these programs the rearrangements are automatically |
|---|
| 1377 | global, which in this case means that subtrees will be removed from the tree |
|---|
| 1378 | and put back on in all possible ways so as to have a better chance of |
|---|
| 1379 | finding a better tree. Since this can be time consuming (it roughly |
|---|
| 1380 | triples the time taken for a run) it is left as an option in some of the |
|---|
| 1381 | programs, specifically CONTML, FITCH, and DNAML. In these programs |
|---|
| 1382 | the G menu option toggles between the default of local rearrangement and |
|---|
| 1383 | global rearrangement. The rearrangements are explained more below. |
|---|
| 1384 | <P> |
|---|
| 1385 | <B>The <TT>J</TT> (Jumble) option.</B> In most of the tree construction programs |
|---|
| 1386 | (except for the "...PENNY" programs and CLIQUE), the exact |
|---|
| 1387 | details of the search of different trees depend on the order of input of |
|---|
| 1388 | species. In these programs <TT>J</TT> option enables you to tell the program to use |
|---|
| 1389 | a random number |
|---|
| 1390 | generator to choose the input order of species. This option is toggled on |
|---|
| 1391 | and off by |
|---|
| 1392 | selecting option <TT>J</TT> in the menu. The program will then prompt you for |
|---|
| 1393 | a "seed" for the random number generator. The seed should be an integer |
|---|
| 1394 | between 1 and 32767, and should of form 4n+1, |
|---|
| 1395 | which means that it must give a remainder of 1 when divided by 4. This can be |
|---|
| 1396 | judged by looking at the last two digits of the number. Each different seed |
|---|
| 1397 | leads to a different sequence of addition of species. By simply changing the |
|---|
| 1398 | random number seed and re-running the programs one can look for other, and |
|---|
| 1399 | better trees. If the seed entered is not odd, the program will not proceed, |
|---|
| 1400 | but will prompt for another seed. |
|---|
| 1401 | <P> |
|---|
| 1402 | The Jumble option also causes the program to ask you how many times you |
|---|
| 1403 | want to restart the process. If you answer 10, the program will |
|---|
| 1404 | try ten different orders of species in constructing the trees, and the |
|---|
| 1405 | results printed out will reflect this entire search process (that is, |
|---|
| 1406 | the best trees found among all 10 runs will be printed out, not the |
|---|
| 1407 | best trees from each individual run). |
|---|
| 1408 | <P> |
|---|
| 1409 | Some people have asked what are good values of the random number seed. |
|---|
| 1410 | The random number seed is used to start a process of choosing "random" |
|---|
| 1411 | (actually pseudorandom) numbers, which behave as if they were |
|---|
| 1412 | unpredictably randomly chosen between 0 and 2<SUP>32</SUP>-1 (which is |
|---|
| 1413 | 4,294,967,296). You could put in the number 133 and find that the |
|---|
| 1414 | next random number was 1,876,973,009. As they are effectively |
|---|
| 1415 | unpredictable, there is no such thing as a choice that is better than |
|---|
| 1416 | any other, provided that the numbers are of the form 4<I>n</I>+1. However |
|---|
| 1417 | if you re-use a random number seed, the sequence of random numbers |
|---|
| 1418 | that result will be the same as before, resulting in exactly the same |
|---|
| 1419 | series of choices, which may not be what you want. |
|---|
| 1420 | <P> |
|---|
| 1421 | <B>The <TT>O</TT> (Outgroup) option.</B> This specifies which species is to be used |
|---|
| 1422 | to root the tree by having it become the outgroup. This option is |
|---|
| 1423 | toggled on and off by choosing <TT>O</TT> in the menu (the alphabetic |
|---|
| 1424 | character <TT>O</TT>, not the digit <TT>0</TT>). When it is on, the program will |
|---|
| 1425 | then prompt for the |
|---|
| 1426 | number of the outgroup (the species being taken in the numerical order that |
|---|
| 1427 | they occur in the input file). Responding by typing <TT>6</TT> and then an |
|---|
| 1428 | <TT>Enter</TT> character indicates that the sixth species in the data |
|---|
| 1429 | is the outgroup. Outgroup-rooting will not be attempted if the |
|---|
| 1430 | data have already established a root for the tree from some other |
|---|
| 1431 | consideration, and may not be if it is a user-defined tree, |
|---|
| 1432 | despite your invoking the option. Thus programs such as DOLLOP that |
|---|
| 1433 | produce only rooted trees do not allow the Outgroup option. It is also |
|---|
| 1434 | not available in KITSCH, DNAMLK, or CLIQUE. When it is used, the tree as |
|---|
| 1435 | printed out is still listed as being an |
|---|
| 1436 | unrooted tree, though the outgroup is connected to the bottommost node |
|---|
| 1437 | so that it is easy to visually convert the tree into rooted form. |
|---|
| 1438 | <P> |
|---|
| 1439 | <B>The <TT>T</TT> (Threshold) option.</B> This sets a threshold forn the |
|---|
| 1440 | parsimony programs such that if the |
|---|
| 1441 | number of steps counted in a character is higher than the threshold, it |
|---|
| 1442 | will be taken to be the threshold value rather than the actual number of |
|---|
| 1443 | steps. The default is a threshold so high that it will never be |
|---|
| 1444 | surpassed (in which case the steps whill simply be counted). The <TT>T</TT> |
|---|
| 1445 | menu option toggles on and off asking the user to |
|---|
| 1446 | supply a threshold. The use of thresholds to obtain methods intermediate |
|---|
| 1447 | between parsimony and compatibility methods is described in my 1981b paper. |
|---|
| 1448 | When the T option is in force, the program |
|---|
| 1449 | will prompt for the numerical threshold value. This will be a positive |
|---|
| 1450 | real number greater than 1. In programs MIX, MOVE, PENNY, PROTPARS, |
|---|
| 1451 | DNAPARS, DNAMOVE, and DNAPENNY, do not use threshold values less |
|---|
| 1452 | than or equal to 1.0, as they have no meaning and lead to a tree which |
|---|
| 1453 | depends only on considerations such as the input order of species and not at |
|---|
| 1454 | all on the character state data! In programs DOLLOP, DOLMOVE, and DOLPENNY |
|---|
| 1455 | the threshold should never be 0.0 or less, for the same |
|---|
| 1456 | reason. The <TT>T</TT> option is an |
|---|
| 1457 | important and underutilized one: it is, for example, the only way in this |
|---|
| 1458 | package (except for program DNACOMP) to do a compatibility analysis when there |
|---|
| 1459 | are missing data. It is a method of de-weighting characters that evolve |
|---|
| 1460 | rapidly. I wish more people were aware of its properties. |
|---|
| 1461 | <P> |
|---|
| 1462 | <B>The <TT>M</TT> (Multiple data sets) option.</B> In menu programs there is an |
|---|
| 1463 | <TT>M</TT> menu |
|---|
| 1464 | option which allows one to toggle on the multiple data sets option. The |
|---|
| 1465 | program will ask you how many data sets it should expect. The data sets |
|---|
| 1466 | have the same format as the first data set. Here is a (very small) input file |
|---|
| 1467 | with two five-species data sets: |
|---|
| 1468 | <P> |
|---|
| 1469 | <TABLE><TR><TD bgcolor=white> |
|---|
| 1470 | <PRE> |
|---|
| 1471 | 5 6 |
|---|
| 1472 | Alpha CCACCA |
|---|
| 1473 | Beta CCAAAA |
|---|
| 1474 | Gamma CAACCA |
|---|
| 1475 | Delta AACAAC |
|---|
| 1476 | Epsilon AACCCA |
|---|
| 1477 | 5 6 |
|---|
| 1478 | Alpha CACACA |
|---|
| 1479 | Beta CCAACC |
|---|
| 1480 | Gamma CAACAC |
|---|
| 1481 | Delta GCCTGG |
|---|
| 1482 | Epsilon TGCAAT |
|---|
| 1483 | </PRE> |
|---|
| 1484 | </TD></TR></TABLE> |
|---|
| 1485 | <P> |
|---|
| 1486 | The main use of this option will be to allow all of the methods in these |
|---|
| 1487 | programs to be bootstrapped. Using the program SEQBOOT one can take any |
|---|
| 1488 | DNA, protein, restriction sites, gene frequency or binary character data set and |
|---|
| 1489 | make multiple data sets by bootstrapping. Trees can be produced for all of |
|---|
| 1490 | these using the <TT>M</TT> option. They will be written on the tree output file if |
|---|
| 1491 | that option is left in force. Then the program CONSENSE can be used with |
|---|
| 1492 | that tree file as its input file. The result is a majority rule consensus |
|---|
| 1493 | tree which can be used to make confidence intervals. The present version |
|---|
| 1494 | of the package allows, with the use of SEQBOOT and CONSENSE and the M option, |
|---|
| 1495 | bootstrapping of many of the methods in the package. |
|---|
| 1496 | <P> |
|---|
| 1497 | Programs DNAML, DNAPARS and PARS can also take multiple weights |
|---|
| 1498 | instead of multiple data sets. They can then do bootstrapping by |
|---|
| 1499 | reading in one data set, together with a file of weights that show how |
|---|
| 1500 | the characters (or sites) are reweighted in each bootstrap sample. Thus a |
|---|
| 1501 | site that is omitted in a bootstrap sample has effectively been given |
|---|
| 1502 | weight 0, while a site that has been duplicated has effectively been |
|---|
| 1503 | given weight 2. SEQBOOT has a menu selection to produce the file of |
|---|
| 1504 | weights information automatically, instead of producing a file of |
|---|
| 1505 | multiple data sets. |
|---|
| 1506 | <P> |
|---|
| 1507 | <B>The <TT>W</TT> (Weights) option</B>. This signals the program that, in |
|---|
| 1508 | addition to the data set, you want to read in a series of weights that |
|---|
| 1509 | tell how many times each character is to be counted. If the weight |
|---|
| 1510 | for a character is zero (<TT>0</TT>) then that character is in effect to |
|---|
| 1511 | be omitted when the tree is evaluated. If it is (<TT>1</TT>) the |
|---|
| 1512 | character is to be counted once. Some programs allow weights greater than |
|---|
| 1513 | 1 as well. These have the effect that the character is counted as |
|---|
| 1514 | if it were present that many times, so that a weight of 4 means that the |
|---|
| 1515 | character is counted 4 times. |
|---|
| 1516 | The values 0-9 give weights 0 through 9, and the |
|---|
| 1517 | values A-Z give weights 10 through 35. By use of the weights we can |
|---|
| 1518 | give overwhelming weight to some characters, and drop others from the |
|---|
| 1519 | analysis. In the molecular sequence programs only two values of the |
|---|
| 1520 | weights, 0 or 1 are allowed. |
|---|
| 1521 | <P> |
|---|
| 1522 | The weights are used to analyze subsets of the characters, and also can be |
|---|
| 1523 | used for resampling of the data as in bootstrap and jackknife resampling. |
|---|
| 1524 | For those programs that allow weights to be greater than 1, they can also |
|---|
| 1525 | be used to emphasize information from some characters more strongly than |
|---|
| 1526 | others. Of course, you must have some rationale for doing this. |
|---|
| 1527 | <P> |
|---|
| 1528 | The weights are provided as a sequence of digits. Thus they might be |
|---|
| 1529 | <P> |
|---|
| 1530 | <TT>10011111100010100011110001100</TT> |
|---|
| 1531 | <P> |
|---|
| 1532 | The weights are to be provided in an input file |
|---|
| 1533 | whose default name is <TT>weights</TT>. In programs such as SEQBOOT |
|---|
| 1534 | that can also output a file of weights, the input weights have a default |
|---|
| 1535 | file name of <TT>inweights</TT>, and the output file name has a default |
|---|
| 1536 | file name of <TT>outweights</TT>. |
|---|
| 1537 | <P> |
|---|
| 1538 | Weights can be used to analyze different subsets of characters (by weighting |
|---|
| 1539 | the rest as zero). Alternatively, in the discrete characters programs |
|---|
| 1540 | they can be used to force a certain |
|---|
| 1541 | group to appear on the phylogeny (in effect confining consideration to only |
|---|
| 1542 | phylogenies containing that group). This is done by adding an imaginary |
|---|
| 1543 | character that has <TT>1</TT>'s for the members of the group, and <TT>0</TT>'s |
|---|
| 1544 | for all the |
|---|
| 1545 | other species. That imaginary character is then given the highest weight |
|---|
| 1546 | possible: the result will be that any phylogeny that does not contain that |
|---|
| 1547 | group will be penalized by such a heavy amount that it will not (except in |
|---|
| 1548 | the most unusual circumstances) be considered. Of course, the new character |
|---|
| 1549 | brings extra steps to the tree, but the number of these can be calculated |
|---|
| 1550 | in advance and subtracted out of the total when reporting the results. This |
|---|
| 1551 | use of weights is an important one, and one sadly ignored |
|---|
| 1552 | by many users who could profit from it. In the case of molecular sequences |
|---|
| 1553 | we cannot use weights this way, so that to force a given group to appear we |
|---|
| 1554 | have to add a large extra segment of sites to the molecule, with (say) A's |
|---|
| 1555 | for that group and C's for every other species. |
|---|
| 1556 | <P> |
|---|
| 1557 | <B>The option to write out the trees into a tree file</B>. This specifies that you |
|---|
| 1558 | want the program to write |
|---|
| 1559 | out the tree not only on its usual output, but also onto a file in |
|---|
| 1560 | nested-parenthesis notation (as described above). This option is sufficiently |
|---|
| 1561 | useful that it is turned on by default in all programs that allow it. You |
|---|
| 1562 | can optionally turn it off if you wish, by typing the appropriate number |
|---|
| 1563 | from the menu (it varies from program to program). This option is useful for |
|---|
| 1564 | creating tree files that can be directly read into the programs, including |
|---|
| 1565 | the consensus tree and tree distance programs, and the tree plotting programs. |
|---|
| 1566 | <P> |
|---|
| 1567 | The output tree file has a default name of <TT>outtree</TT>. |
|---|
| 1568 | <P> |
|---|
| 1569 | <B>The (<TT>0</TT>) terminal type option</B> . (This is the digit <TT>0</TT>, not |
|---|
| 1570 | the alphabetic character <TT>O</TT>). The program will default to |
|---|
| 1571 | one particular assumption about your terminal (except in the case of |
|---|
| 1572 | Macintoshes, the default will be an ANSI compatible terminal). You can |
|---|
| 1573 | alternatively select it to be either an IBM PC, or nothing. |
|---|
| 1574 | This affects the ability of the programs to clear the screen when they |
|---|
| 1575 | display their menus, and the graphics characters used to display trees |
|---|
| 1576 | in the programs DNAMOVE, MOVE, DOLMOVE, and RETREE. If you are running an |
|---|
| 1577 | MSDOS system and have the ANSI.SYS driver installed in your CONFIG.SYS |
|---|
| 1578 | file, you may find that the screen clears correctly even with the default |
|---|
| 1579 | setting of ANSI. |
|---|
| 1580 | <P> |
|---|
| 1581 | <A NAME="algorithm"><HR><P></A> |
|---|
| 1582 | <DIV ALIGN="CENTER"> |
|---|
| 1583 | <H2>The Algorithm for Constructing Trees</H2></DIV> |
|---|
| 1584 | <P> |
|---|
| 1585 | All of the programs except FACTOR, DNADIST, GENDIST, DNAINVAR, SEQBOOT, |
|---|
| 1586 | CONTRAST, RETREE, and the plotting and |
|---|
| 1587 | consensus tree programs act to construct an estimate of a phylogeny. MOVE, |
|---|
| 1588 | DOLMOVE, and DNAMOVE let you construct it yourself by hand. All of |
|---|
| 1589 | the rest but NEIGHBOR, the "...PENNY" programs and CLIQUE make use of |
|---|
| 1590 | a common approach involving additions and rearrangements. They are |
|---|
| 1591 | trying to minimize or maximize some quantity over the space of all |
|---|
| 1592 | possible evolutionary trees. Each program contains a part that, given |
|---|
| 1593 | the topology of the tree, evaluates the quantity that is being minimized |
|---|
| 1594 | or maximized. The straightforward approach would be to evaluate all |
|---|
| 1595 | possible tree topologies one after another and pick the one which, |
|---|
| 1596 | according to the criterion being used, is best. This would not be |
|---|
| 1597 | possible for more than a small number of species, since the number of |
|---|
| 1598 | possible tree topologies is enormous. A review of the literature on the |
|---|
| 1599 | counting of evolutionary trees will be found one of my papers |
|---|
| 1600 | (Felsenstein, 1978a). |
|---|
| 1601 | <P> |
|---|
| 1602 | Since we cannot search all topologies, these programs are not |
|---|
| 1603 | guaranteed to always find the best tree, although they seem to do quite |
|---|
| 1604 | well in practice. The strategy they employ is as follows: the species |
|---|
| 1605 | are taken in the order in which they appear in the input file. The |
|---|
| 1606 | first two (in some programs the first three) are taken and a tree |
|---|
| 1607 | constructed containing only those. There is only one possible topology for |
|---|
| 1608 | this tree. Then the next species is taken, and we consider where it |
|---|
| 1609 | might be added to the tree. If the initial tree is (say) a rooted tree |
|---|
| 1610 | with two species and we want the resulting three-species tree to be a |
|---|
| 1611 | bifurcating tree, there are only three places where we could add the |
|---|
| 1612 | third species. Each of these is tried, and each time the resulting tree is |
|---|
| 1613 | evaluated according to the criterion. The best one is chosen to be the |
|---|
| 1614 | basis for further operations. Now we consider adding the fourth |
|---|
| 1615 | species, again at each of the five possible places that would result in |
|---|
| 1616 | a bifurcating tree. Again, the best of these is accepted. |
|---|
| 1617 | <P> |
|---|
| 1618 | <H3>Local Rearrangements</H3> |
|---|
| 1619 | <P> |
|---|
| 1620 | The process continues in this manner, with one important exception. After |
|---|
| 1621 | each species is added, and before the next |
|---|
| 1622 | is added, a number of rearrangements of the tree are tried, in an effort |
|---|
| 1623 | to improve it. The algorithms move through the tree, making all |
|---|
| 1624 | possible local rearrangements of the tree. A local rearrangement involves an |
|---|
| 1625 | internal segment of the tree in the following manner. Each internal |
|---|
| 1626 | segment of the tree is of this form (where T1, T2, and T3 are subtrees |
|---|
| 1627 | - parts of the tree that can contain further forks and tips): |
|---|
| 1628 | <P> |
|---|
| 1629 | <PRE> |
|---|
| 1630 | T1 T2 T3 |
|---|
| 1631 | \ / / |
|---|
| 1632 | \ / / |
|---|
| 1633 | \ / / |
|---|
| 1634 | \/ / |
|---|
| 1635 | * / |
|---|
| 1636 | * / |
|---|
| 1637 | * / |
|---|
| 1638 | * / |
|---|
| 1639 | * |
|---|
| 1640 | ! |
|---|
| 1641 | ! |
|---|
| 1642 | </PRE> |
|---|
| 1643 | <P> |
|---|
| 1644 | the segment we are discussing being indicated by the asterisks. A local |
|---|
| 1645 | rearrangement consists of switching the subtrees T1 and T3 or T2 and T3, |
|---|
| 1646 | so as to obtain one of the following: |
|---|
| 1647 | <P> |
|---|
| 1648 | <PRE> |
|---|
| 1649 | T3 T2 T1 T1 T3 T2 |
|---|
| 1650 | \ / / \ / / |
|---|
| 1651 | \ / / \ / / |
|---|
| 1652 | \ / / \ / / |
|---|
| 1653 | \ / / \ / / |
|---|
| 1654 | \ / \ / |
|---|
| 1655 | \ / \ / |
|---|
| 1656 | \ / \ / |
|---|
| 1657 | \ / \ / |
|---|
| 1658 | ! ! |
|---|
| 1659 | ! ! |
|---|
| 1660 | ! ! |
|---|
| 1661 | </PRE> |
|---|
| 1662 | <P> |
|---|
| 1663 | Each time a local rearrangement is successful in finding a better tree, |
|---|
| 1664 | the new arrangement is accepted. The phase of local rearrangements does |
|---|
| 1665 | not end until the program can traverse the entire tree, attempting local |
|---|
| 1666 | rearrangements, without finding any that improve the tree. |
|---|
| 1667 | <P> |
|---|
| 1668 | This strategy of adding species and making local rearrangements will look |
|---|
| 1669 | at about (n-1)x(2n-3) different topologies, though if |
|---|
| 1670 | rearrangements are frequently successful the number may be larger. I |
|---|
| 1671 | have been describing the strategy when rooted trees are being |
|---|
| 1672 | considered. For unrooted trees there is a precisely similar strategy, |
|---|
| 1673 | though the first tree constructed may be a three-species tree and the |
|---|
| 1674 | rearrangements may not start until after the addition of the fifth |
|---|
| 1675 | species. |
|---|
| 1676 | <P> |
|---|
| 1677 | Though we are not guaranteed to have found the best tree topology, |
|---|
| 1678 | we are guaranteed that no nearby topology (i. e. none accessible by a |
|---|
| 1679 | single local rearrangement) is better. In this sense we have reached a |
|---|
| 1680 | local optimum of our criterion. Note that the whole process is |
|---|
| 1681 | dependent on the order in which the species are present in the input |
|---|
| 1682 | file. We can try to find a different and better solution by reordering |
|---|
| 1683 | the species in the input file and running the program again (or, more |
|---|
| 1684 | easily, by using the <TT>J</TT> option). If none of |
|---|
| 1685 | these attempts finds a better solution, then we have some indication |
|---|
| 1686 | that we may have found the best topology, though we can never be certain |
|---|
| 1687 | of this. |
|---|
| 1688 | <P> |
|---|
| 1689 | Note also that a new topology is never accepted unless it is better |
|---|
| 1690 | than the previous one, so that the rearrangement process can never fall |
|---|
| 1691 | into an endless loop. This is also the way ties in our criterion are |
|---|
| 1692 | resolved, namely by sticking with the tree found first. However, the tree |
|---|
| 1693 | construction programs other than CLIQUE, CONTML, FITCH, |
|---|
| 1694 | and DNAML do keep a record of all trees found that are tied with the best one |
|---|
| 1695 | found. This gives you some immediate idea of which parts of the tree can be |
|---|
| 1696 | altered without affecting the quality of the result. |
|---|
| 1697 | <P> |
|---|
| 1698 | |
|---|
| 1699 | <H3>Global Rearrangements</H3> |
|---|
| 1700 | <P> |
|---|
| 1701 | A feature of most of the programs, such as PROTPARS, DNAPARS, |
|---|
| 1702 | DNACOMP, DNAML, DNAMLK, RESTML, KITSCH, FITCH, CONTML, MIX, and DOLLOP, |
|---|
| 1703 | is "global" optimization of the tree. In four of these (CONTML, |
|---|
| 1704 | FITCH, DNAML and DNAMLK) this is an option, <TT>G</TT>. In the others it |
|---|
| 1705 | automatically applies. When |
|---|
| 1706 | it is present there is an additional stage to the search for the best tree. |
|---|
| 1707 | Each possible subtree is removed from the tree from the tree and added back in |
|---|
| 1708 | all possible places. This process continues until all subtrees can be removed |
|---|
| 1709 | and added again without any improvement in the tree. The purpose of this |
|---|
| 1710 | extra rearrangement is to make it less likely that one or more a species gets |
|---|
| 1711 | "stuck" in a suboptimal region of the space of all possible trees. The use of |
|---|
| 1712 | global optimization results in approximately a tripling (3 x ) of the run-time, |
|---|
| 1713 | which is why I have left it as an option in some of the slower programs. |
|---|
| 1714 | <P> |
|---|
| 1715 | What PHYLIP calls "global" rearrangements are more properly called |
|---|
| 1716 | SPR (subtree pruning and regrafting) by Swofford et. al. (1996) as distinct |
|---|
| 1717 | from the NNI (nearest neighbor interchange) rearrangements that PHYLIP |
|---|
| 1718 | also uses, and the TBR (tree bisection and reconnection) rearrangements |
|---|
| 1719 | that it does not use. |
|---|
| 1720 | <P> |
|---|
| 1721 | The programs doing global optimization print out a dot "<TT>.</TT>" after each group is |
|---|
| 1722 | removed and re-added to the tree, to give the user some sign that the |
|---|
| 1723 | rearrangements are proceeding. A new line of dots is started whenever a new |
|---|
| 1724 | round of global rearrangements is started following an improvement in the |
|---|
| 1725 | tree. On the line before the dots are printed there is printed a bar of |
|---|
| 1726 | the form "!---------------!" to show how many dots |
|---|
| 1727 | to expect. The dots will |
|---|
| 1728 | not be printed out at a uniform rate, but the later dots, which represent |
|---|
| 1729 | removal of larger groups from the tree and trying them consequently in fewer |
|---|
| 1730 | places, will print out more quickly. With some compilers each row of dots may |
|---|
| 1731 | not be printed out until it is complete. |
|---|
| 1732 | <P> |
|---|
| 1733 | It should be noted that PENNY, DOLPENNY, DNAPENNY and CLIQUE use a more |
|---|
| 1734 | sophisticated strategy of "depth-first search" with a "branch and bound" |
|---|
| 1735 | search method that guarantees that all |
|---|
| 1736 | of the best trees will be found. In the case |
|---|
| 1737 | of PENNY, DOLPENNY and DNAPENNY there can be a considerable sacrifice of |
|---|
| 1738 | computer time if the number of species is greater than about ten: it is a |
|---|
| 1739 | matter for you to consider whether it is worth it for you to guarantee finding |
|---|
| 1740 | all the most parsimonious trees, and that depends on how much free computer |
|---|
| 1741 | time you have! CLIQUE finds all largest cliques, and does so without undue |
|---|
| 1742 | burning of computer time. Although all of these problems that have been |
|---|
| 1743 | investigated fall into the |
|---|
| 1744 | category of "NP-hard" problems that in effect do not have a rapid solution, |
|---|
| 1745 | the cases that cause this trouble for the largest-cliques algorithm in |
|---|
| 1746 | CLIQUE apparently are not biologically realistic and do not occur in actual |
|---|
| 1747 | data. |
|---|
| 1748 | <P> |
|---|
| 1749 | |
|---|
| 1750 | <H3>Multiple Jumbles</H3> |
|---|
| 1751 | <P> |
|---|
| 1752 | As just mentioned, for most of these programs the search depends on the order |
|---|
| 1753 | in which the species are entered into the tree. Using the <TT>J</TT> (Jumble) |
|---|
| 1754 | option you can supply a random number seed which will allow the program to put |
|---|
| 1755 | the species in in a random order. Jumbling can be |
|---|
| 1756 | done multiple times. For example, if you tell the program to do it |
|---|
| 1757 | 10 times, it will go through the tree-building process 10 times, each with a |
|---|
| 1758 | different random order of adding species. It will keep a record of the trees |
|---|
| 1759 | tied for best over the whole process. In other words, it does not just |
|---|
| 1760 | record the best trees from each of the 10 runs, but records the best ones |
|---|
| 1761 | overall. Of course this is slow, taking 10 times longer than a single run. |
|---|
| 1762 | But it does give us a much greater chance of finding all of the most |
|---|
| 1763 | parsimonious trees. In the terminology of Maddison (1991) it |
|---|
| 1764 | can find different "islands" of trees. The present algorithms do not |
|---|
| 1765 | guarantee us to find all trees in a given "island" from a single run, so |
|---|
| 1766 | multiple runs also help explore those "islands" that are found. |
|---|
| 1767 | <P> |
|---|
| 1768 | <H3>Saving multiple tied trees</H3> |
|---|
| 1769 | <P> |
|---|
| 1770 | For the parsimony and compatibility programs, one can have a perfect tie |
|---|
| 1771 | between two or more trees. In these programs these trees are all |
|---|
| 1772 | saved. For the newer parsimony programs such as DNAPARS and PARS, |
|---|
| 1773 | global rearrangement is carried out on all of these tied trees. This can |
|---|
| 1774 | be turned off in the menu. |
|---|
| 1775 | <P> |
|---|
| 1776 | For trees with criteria which are real numbers, such as the distance |
|---|
| 1777 | matrix programs FITCH and KITSCH, and the likelihood programs DNAML, |
|---|
| 1778 | DNAMLK, CONTML, and RESTML, it is difficult to get an exact tie between |
|---|
| 1779 | trees. Consequently these programs save only the single best tree |
|---|
| 1780 | (even though the others may be only a tiny bit worse). |
|---|
| 1781 | <P> |
|---|
| 1782 | <H3>Strategy for Finding the Best Tree</H3> |
|---|
| 1783 | <P> |
|---|
| 1784 | In practice, it is advisable to use the Jumble option to evaluate many |
|---|
| 1785 | different orderings of the input species. <I>It is advisable to use the |
|---|
| 1786 | Jumble option and specify that it be done many times (as many as ten)</I> |
|---|
| 1787 | to use different orderings |
|---|
| 1788 | of the input species). |
|---|
| 1789 | <P> |
|---|
| 1790 | People who want a magic "black box" program whose results they do |
|---|
| 1791 | not have to question (or think about) often are upset that these |
|---|
| 1792 | programs give results that are dependent on the order in which the species |
|---|
| 1793 | are entered in the data. To me this property is an advantage, for it |
|---|
| 1794 | permits you to try different searches for better trees, simply by |
|---|
| 1795 | varying the input order of species. If you do not use the multiple Jumble |
|---|
| 1796 | option, but do multiple individual runs instead, you |
|---|
| 1797 | can easily decide which to pay most attention to - the one or ones that |
|---|
| 1798 | are best according to the criterion employed (for example, with parsimony, |
|---|
| 1799 | the one out of the runs that results in the tree with the fewest changes). |
|---|
| 1800 | <P> |
|---|
| 1801 | In practice, in a single run, it usually seems best to put species that are |
|---|
| 1802 | likely to be sources of confusion in the topology last, as by the time they are |
|---|
| 1803 | added the arrangement of the earlier species will have stabilized into a |
|---|
| 1804 | good configuration, and then the last few species will by fitted into |
|---|
| 1805 | that topology. There will be less chance this way of a poor initial |
|---|
| 1806 | topology that would affect all subsequent parts of the search. However, |
|---|
| 1807 | a variety of arrangements of the input order of species should be tried, |
|---|
| 1808 | as can be done if the <TT>J</TT> option is used, |
|---|
| 1809 | and no species should be kept in a fixed place in the order of input. |
|---|
| 1810 | Note that the results of the "...PENNY" programs and CLIQUE |
|---|
| 1811 | are not sensitive to the input order of species, and NEIGHBOR is only |
|---|
| 1812 | slightly sensistive to it, so that multiple Jumbling is not possible |
|---|
| 1813 | with those programs. Note also that with global search, which |
|---|
| 1814 | is standard in many programs and in others is an |
|---|
| 1815 | option, each group (including |
|---|
| 1816 | each individual species) will be removed and re-added in all possible |
|---|
| 1817 | positions, so that a species causing confusion will have more chance of moving |
|---|
| 1818 | to a new location than it would without global rearrangement. |
|---|
| 1819 | <P> |
|---|
| 1820 | <A NAME="warning"><HR><P></A> |
|---|
| 1821 | <DIV ALIGN="CENTER"> |
|---|
| 1822 | <H2>A Warning on Interpreting Results</H2></DIV> |
|---|
| 1823 | <P> |
|---|
| 1824 | Probably the most important thing to keep in mind while running any of the |
|---|
| 1825 | parsimony or compatibility programs is not |
|---|
| 1826 | to overinterpret the result. Many users treat the set of most parsimonious |
|---|
| 1827 | trees as if it were a confidence interval. If a group appears in all of the |
|---|
| 1828 | most parsimonious trees then they treat it as well established. Unfortunately |
|---|
| 1829 | <I>the confidence interval on phylogenies appears to be much |
|---|
| 1830 | larger than the set of all most parsimonious trees</I> (Felsenstein, 1985b). |
|---|
| 1831 | Likewise, variation of result among different methods will not be a good |
|---|
| 1832 | indicator of the size of the confidence interval. Consider a simple data set |
|---|
| 1833 | in which, out of 100 binary characters, 51 recommend the unrooted tree |
|---|
| 1834 | <TT>((A,B),(C,D))</TT> and 49 the tree <TT>((A,D),(B,C))</TT>. Many different |
|---|
| 1835 | methods will all give the same result on |
|---|
| 1836 | such a data set: they will estimate the tree as <TT>((A,B),(C,D))</TT>. |
|---|
| 1837 | Nevertheless it is |
|---|
| 1838 | clear that the 51:49 margin by which this tree is favored is not statistically |
|---|
| 1839 | significantly different from 50:50. So <I>consistency among different methods |
|---|
| 1840 | is a poor guide to statistical significance</I>. |
|---|
| 1841 | <P> |
|---|
| 1842 | <A NAME="speed"><HR><P></A> |
|---|
| 1843 | <DIV ALIGN="CENTER"> |
|---|
| 1844 | <H2>Relative Speed of Different<BR> |
|---|
| 1845 | Programs and Machines</H2></DIV> |
|---|
| 1846 | <P> |
|---|
| 1847 | <H3>Relative speed of the different programs</H3> |
|---|
| 1848 | <P> |
|---|
| 1849 | C compilers differ in efficiency of the code they generate, |
|---|
| 1850 | and some deal with some features of the language better than with |
|---|
| 1851 | others. Thus a program which is unusually fast on one computer may be |
|---|
| 1852 | unusually slow on another. Nevertheless, as a rough guide to relative |
|---|
| 1853 | execution speeds, I have tested the programs on three data sets, each of |
|---|
| 1854 | which has 10 species and 40 characters. The first is an imaginary one |
|---|
| 1855 | in which all characters are compatible - ("The Willi Hennig Memorial |
|---|
| 1856 | Data Set" as J. S. Farris once called ones like it). The second is the binary |
|---|
| 1857 | recoded form of the fossil horses data set of Camin and Sokal (1965). |
|---|
| 1858 | The third data set has data that is completely random: 10 species and 20 |
|---|
| 1859 | characters that have a 50% chance that each character state is <TT>0</TT> or |
|---|
| 1860 | <TT>1</TT> (or <TT>A</TT> or <TT>G</TT>). The data sets thus range from a completely |
|---|
| 1861 | compatible one in which there is no homoplasy (paralellism or convergence), |
|---|
| 1862 | through the horses data set, which requires 29 steps where the possible |
|---|
| 1863 | minimum number would be 20, to the random data set, which requires 49 steps. |
|---|
| 1864 | We can thus see how this increasing messiness of the data affects running |
|---|
| 1865 | times. The three data sets have all had 20 sites of <TT>A</TT>'s added to the |
|---|
| 1866 | end of each sequence, so as to prevent likelihood or distance matrix programs |
|---|
| 1867 | from having infinite branch lengths (the test data sets used for timing |
|---|
| 1868 | previous versions of PHYLIP wsere the same except that they lacked these |
|---|
| 1869 | 20 extra sites). |
|---|
| 1870 | <P> |
|---|
| 1871 | Here are the nucleotide sequence versions of the three data sets: |
|---|
| 1872 | <P> |
|---|
| 1873 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 1874 | <PRE> |
|---|
| 1875 | 10 40 |
|---|
| 1876 | A CACACACAAAAAAAAAAACAAAAAAAAAAAAAAAAAAAAA |
|---|
| 1877 | B CACACAACAAAAAAAAAACAAAAAAAAAAAAAAAAAAAAA |
|---|
| 1878 | C CACAACAAAAAAAAAAAACAAAAAAAAAAAAAAAAAAAAA |
|---|
| 1879 | D CAACAAAACAAAAAAAAACAAAAAAAAAAAAAAAAAAAAA |
|---|
| 1880 | E CAACAAAAACAAAAAAAACAAAAAAAAAAAAAAAAAAAAA |
|---|
| 1881 | F ACAAAAAAAACACACAAAACAAAAAAAAAAAAAAAAAAAA |
|---|
| 1882 | G ACAAAAAAAACACAACAAACAAAAAAAAAAAAAAAAAAAA |
|---|
| 1883 | H ACAAAAAAAACAACAAAAACAAAAAAAAAAAAAAAAAAAA |
|---|
| 1884 | I ACAAAAAAAAACAAAACAACAAAAAAAAAAAAAAAAAAAA |
|---|
| 1885 | J ACAAAAAAAAACAAAAACACAAAAAAAAAAAAAAAAAAAA |
|---|
| 1886 | </PRE> |
|---|
| 1887 | </TD></TR></TABLE> |
|---|
| 1888 | <P> |
|---|
| 1889 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 1890 | <PRE> |
|---|
| 1891 | 10 40 |
|---|
| 1892 | MesohippusAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA |
|---|
| 1893 | HypohippusAAACCCCCCCAAAAAAAAACAAAAAAAAAAAAAAAAAAAA |
|---|
| 1894 | ArchaeohipCAAAAAAAAAAAAAAAACACAAAAAAAAAAAAAAAAAAAA |
|---|
| 1895 | ParahippusCAAACAACAACAAAAAAAACAAAAAAAAAAAAAAAAAAAA |
|---|
| 1896 | MerychippuCCAACCACCACCCCACACCCAAAAAAAAAAAAAAAAAAAA |
|---|
| 1897 | M. secunduCCAACCACCACCCACACCCCAAAAAAAAAAAAAAAAAAAA |
|---|
| 1898 | Nannipus CCAACCACAACCCCACACCCAAAAAAAAAAAAAAAAAAAA |
|---|
| 1899 | NeohippariCCAACCCCCCCCCCACACCCAAAAAAAAAAAAAAAAAAAA |
|---|
| 1900 | Calippus CCAACCACAACCCACACCCCAAAAAAAAAAAAAAAAAAAA |
|---|
| 1901 | PliohippusCCCACCCCCCCCCACACCCCAAAAAAAAAAAAAAAAAAAA |
|---|
| 1902 | </PRE> |
|---|
| 1903 | </TD></TR></TABLE> |
|---|
| 1904 | <P> |
|---|
| 1905 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 1906 | <PRE> |
|---|
| 1907 | 10 40 |
|---|
| 1908 | A CACACAACCAAACAAACCACAAAAAAAAAAAAAAAAAAAA |
|---|
| 1909 | B AAACCACACACACAAACCCAAAAAAAAAAAAAAAAAAAAA |
|---|
| 1910 | C ACAAAACCAAACCACCCACAAAAAAAAAAAAAAAAAAAAA |
|---|
| 1911 | D AAAAACACAACACACCAAACAAAAAAAAAAAAAAAAAAAA |
|---|
| 1912 | E AAACAACCACACACAACCAAAAAAAAAAAAAAAAAAAAAA |
|---|
| 1913 | F CCCAAACACCCCCAAAAAACAAAAAAAAAAAAAAAAAAAA |
|---|
| 1914 | G ACACCCCCACACCCACCAACAAAAAAAAAAAAAAAAAAAA |
|---|
| 1915 | H AAAACAACAACCACCCCACCAAAAAAAAAAAAAAAAAAAA |
|---|
| 1916 | I ACACAACAACACAAACAACCAAAAAAAAAAAAAAAAAAAA |
|---|
| 1917 | J CCAAAAACACCCAACCCAACAAAAAAAAAAAAAAAAAAAA |
|---|
| 1918 | </PRE> |
|---|
| 1919 | </TD></TR></TABLE> |
|---|
| 1920 | <P> |
|---|
| 1921 | Here are the timings of many of the version 3.6 programs on these three data |
|---|
| 1922 | sets as run after being compiled by Gnu C and run on a |
|---|
| 1923 | 266 MHz Pentium MMX computer under Linux. |
|---|
| 1924 | <P> |
|---|
| 1925 | <DIV ALIGN="CENTER"> |
|---|
| 1926 | <TABLE CELLPADDING=3 BORDER="1"> |
|---|
| 1927 | <TR><TD ALIGN="LEFT"> </TD> |
|---|
| 1928 | <TD ALIGN="RIGHT">Hennigian Data</TD> |
|---|
| 1929 | <TD ALIGN="RIGHT">Horses Data</TD> |
|---|
| 1930 | <TD ALIGN="RIGHT">Random Data</TD> |
|---|
| 1931 | </TR> |
|---|
| 1932 | <TR><TD ALIGN="LEFT">PROTPARS</TD> |
|---|
| 1933 | <TD ALIGN="RIGHT">0.133</TD> |
|---|
| 1934 | <TD ALIGN="RIGHT">0.167</TD> |
|---|
| 1935 | <TD ALIGN="RIGHT">0.308</TD> |
|---|
| 1936 | </TR> |
|---|
| 1937 | <TR><TD ALIGN="LEFT">DNAPARS</TD> |
|---|
| 1938 | <TD ALIGN="RIGHT">0.163</TD> |
|---|
| 1939 | <TD ALIGN="RIGHT">0.191</TD> |
|---|
| 1940 | <TD ALIGN="RIGHT">0.573</TD> |
|---|
| 1941 | </TR> |
|---|
| 1942 | <TR><TD ALIGN="LEFT">DNAPENNY</TD> |
|---|
| 1943 | <TD ALIGN="RIGHT">0.300</TD> |
|---|
| 1944 | <TD ALIGN="RIGHT">0.196</TD> |
|---|
| 1945 | <TD ALIGN="RIGHT">36.68</TD> |
|---|
| 1946 | </TR> |
|---|
| 1947 | <TR><TD ALIGN="LEFT">DNACOMP</TD> |
|---|
| 1948 | <TD ALIGN="RIGHT">0.081</TD> |
|---|
| 1949 | <TD ALIGN="RIGHT">0.073</TD> |
|---|
| 1950 | <TD ALIGN="RIGHT">0.127</TD> |
|---|
| 1951 | </TR> |
|---|
| 1952 | <TR><TD ALIGN="LEFT">DNAML</TD> |
|---|
| 1953 | <TD ALIGN="RIGHT">2.19</TD> |
|---|
| 1954 | <TD ALIGN="RIGHT">2.53</TD> |
|---|
| 1955 | <TD ALIGN="RIGHT">2.73</TD> |
|---|
| 1956 | </TR> |
|---|
| 1957 | <TR><TD ALIGN="LEFT">DNAMLK</TD> |
|---|
| 1958 | <TD ALIGN="RIGHT">5.40</TD> |
|---|
| 1959 | <TD ALIGN="RIGHT">6.13</TD> |
|---|
| 1960 | <TD ALIGN="RIGHT">7.21</TD> |
|---|
| 1961 | </TR> |
|---|
| 1962 | <TR><TD ALIGN="LEFT">PROML</TD> |
|---|
| 1963 | <TD ALIGN="RIGHT">44.79</TD> |
|---|
| 1964 | <TD ALIGN="RIGHT">90.46</TD> |
|---|
| 1965 | <TD ALIGN="RIGHT">68.49</TD> |
|---|
| 1966 | </TR> |
|---|
| 1967 | <TR><TD ALIGN="LEFT">PROMLK</TD> |
|---|
| 1968 | <TD ALIGN="RIGHT">171.01</TD> |
|---|
| 1969 | <TD ALIGN="RIGHT">183.61</TD> |
|---|
| 1970 | <TD ALIGN="RIGHT">239.34</TD> |
|---|
| 1971 | </TR> |
|---|
| 1972 | <TR><TD ALIGN="LEFT">DNAML</TD> |
|---|
| 1973 | <TD ALIGN="RIGHT">2.19</TD> |
|---|
| 1974 | <TD ALIGN="RIGHT">2.53</TD> |
|---|
| 1975 | <TD ALIGN="RIGHT">2.73</TD> |
|---|
| 1976 | </TR> |
|---|
| 1977 | <TR><TD ALIGN="LEFT">DNAINVAR</TD> |
|---|
| 1978 | <TD ALIGN="RIGHT">0.002</TD> |
|---|
| 1979 | <TD ALIGN="RIGHT">0.002</TD> |
|---|
| 1980 | <TD ALIGN="RIGHT">0.002</TD> |
|---|
| 1981 | </TR> |
|---|
| 1982 | <TR><TD ALIGN="LEFT">DNADIST</TD> |
|---|
| 1983 | <TD ALIGN="RIGHT">0.029</TD> |
|---|
| 1984 | <TD ALIGN="RIGHT">0.024</TD> |
|---|
| 1985 | <TD ALIGN="RIGHT">0.033</TD> |
|---|
| 1986 | </TR> |
|---|
| 1987 | <TR><TD ALIGN="LEFT">PROTDIST</TD> |
|---|
| 1988 | <TD ALIGN="RIGHT">1.095</TD> |
|---|
| 1989 | <TD ALIGN="RIGHT">1.089</TD> |
|---|
| 1990 | <TD ALIGN="RIGHT">1.107</TD> |
|---|
| 1991 | </TR> |
|---|
| 1992 | <TR><TD ALIGN="LEFT">RESTML</TD> |
|---|
| 1993 | <TD ALIGN="RIGHT">3.55</TD> |
|---|
| 1994 | <TD ALIGN="RIGHT">3.18</TD> |
|---|
| 1995 | <TD ALIGN="RIGHT">5.15</TD> |
|---|
| 1996 | </TR> |
|---|
| 1997 | <TR><TD ALIGN="LEFT">RESTDIST</TD> |
|---|
| 1998 | <TD ALIGN="RIGHT">0.012</TD> |
|---|
| 1999 | <TD ALIGN="RIGHT">0.010</TD> |
|---|
| 2000 | <TD ALIGN="RIGHT">0.010</TD> |
|---|
| 2001 | </TR> |
|---|
| 2002 | <TR><TD ALIGN="LEFT">FITCH</TD> |
|---|
| 2003 | <TD ALIGN="RIGHT">0.20</TD> |
|---|
| 2004 | <TD ALIGN="RIGHT">0.31</TD> |
|---|
| 2005 | <TD ALIGN="RIGHT">0.24</TD> |
|---|
| 2006 | </TR> |
|---|
| 2007 | <TR><TD ALIGN="LEFT">KITSCH</TD> |
|---|
| 2008 | <TD ALIGN="RIGHT">0.055</TD> |
|---|
| 2009 | <TD ALIGN="RIGHT">0.061</TD> |
|---|
| 2010 | <TD ALIGN="RIGHT">0.058</TD> |
|---|
| 2011 | </TR> |
|---|
| 2012 | <TR><TD ALIGN="LEFT">NEIGHBOR</TD> |
|---|
| 2013 | <TD ALIGN="RIGHT">0.003</TD> |
|---|
| 2014 | <TD ALIGN="RIGHT">0.004</TD> |
|---|
| 2015 | <TD ALIGN="RIGHT">0.005</TD> |
|---|
| 2016 | </TR> |
|---|
| 2017 | <TR><TD ALIGN="LEFT">CONTML</TD> |
|---|
| 2018 | <TD ALIGN="RIGHT">0.380</TD> |
|---|
| 2019 | <TD ALIGN="RIGHT">0.368</TD> |
|---|
| 2020 | <TD ALIGN="RIGHT">0.396</TD> |
|---|
| 2021 | </TR> |
|---|
| 2022 | <TR><TD ALIGN="LEFT">GENDIST</TD> |
|---|
| 2023 | <TD ALIGN="RIGHT">0.008</TD> |
|---|
| 2024 | <TD ALIGN="RIGHT">0.009</TD> |
|---|
| 2025 | <TD ALIGN="RIGHT">0.008</TD> |
|---|
| 2026 | </TR> |
|---|
| 2027 | <TR><TD ALIGN="LEFT">PARS</TD> |
|---|
| 2028 | <TD ALIGN="RIGHT">0.201</TD> |
|---|
| 2029 | <TD ALIGN="RIGHT">0.263</TD> |
|---|
| 2030 | <TD ALIGN="RIGHT">0.729</TD> |
|---|
| 2031 | </TR> |
|---|
| 2032 | <TR><TD ALIGN="LEFT">MIX</TD> |
|---|
| 2033 | <TD ALIGN="RIGHT">0.064</TD> |
|---|
| 2034 | <TD ALIGN="RIGHT">0.078</TD> |
|---|
| 2035 | <TD ALIGN="RIGHT">0.123</TD> |
|---|
| 2036 | </TR> |
|---|
| 2037 | <TR><TD ALIGN="LEFT">PENNY</TD> |
|---|
| 2038 | <TD ALIGN="RIGHT">0.038</TD> |
|---|
| 2039 | <TD ALIGN="RIGHT">0.087</TD> |
|---|
| 2040 | <TD ALIGN="RIGHT">15.93</TD> |
|---|
| 2041 | </TR> |
|---|
| 2042 | <TR><TD ALIGN="LEFT">DOLLOP</TD> |
|---|
| 2043 | <TD ALIGN="RIGHT">0.134</TD> |
|---|
| 2044 | <TD ALIGN="RIGHT">0.141</TD> |
|---|
| 2045 | <TD ALIGN="RIGHT">0.233</TD> |
|---|
| 2046 | </TR> |
|---|
| 2047 | <TR><TD ALIGN="LEFT">DOLPENNY</TD> |
|---|
| 2048 | <TD ALIGN="RIGHT">0.051</TD> |
|---|
| 2049 | <TD ALIGN="RIGHT">0.241</TD> |
|---|
| 2050 | <TD ALIGN="RIGHT">101.29</TD> |
|---|
| 2051 | </TR> |
|---|
| 2052 | <TR><TD ALIGN="LEFT">CLIQUE</TD> |
|---|
| 2053 | <TD ALIGN="RIGHT">0.010</TD> |
|---|
| 2054 | <TD ALIGN="RIGHT">0.015</TD> |
|---|
| 2055 | <TD ALIGN="RIGHT">0.020</TD> |
|---|
| 2056 | </TR> |
|---|
| 2057 | </TABLE> |
|---|
| 2058 | </DIV> |
|---|
| 2059 | |
|---|
| 2060 | <P> |
|---|
| 2061 | <BR> |
|---|
| 2062 | |
|---|
| 2063 | <P> |
|---|
| 2064 | In all cases the programs were run under the default options without compiler |
|---|
| 2065 | switches, except as |
|---|
| 2066 | specified here. The |
|---|
| 2067 | data sets used for the discrete characters programs have <TT>0</TT>'s and <TT>1</TT>'s |
|---|
| 2068 | instead of <TT>A</TT>'s and <TT>C</TT>'s. For CONTML the <TT>A</TT>'s and <TT>C</TT>'s |
|---|
| 2069 | were made into <TT>0.0</TT>'s and <TT>1.0</TT>'s and considered as 40 2-allele loci. |
|---|
| 2070 | For the distance programs 10 x 10 distance matrices were |
|---|
| 2071 | computed from the three data sets. |
|---|
| 2072 | For the restriction sites programs <TT>A</TT> and <TT>C</TT> were changed into |
|---|
| 2073 | <TT>+</TT> and <TT>-</TT>. It does not |
|---|
| 2074 | make much sense to benchmark MOVE, DOLMOVE, or DNAMOVE, although when there |
|---|
| 2075 | are many characters and many species the response time after each |
|---|
| 2076 | alteration of the tree should be proportional to the product of the number of |
|---|
| 2077 | species and the number of characters. For DNAML and DNAMLK the frequencies |
|---|
| 2078 | of the four bases were |
|---|
| 2079 | set to be equal rather than determined empirically as is the default. For |
|---|
| 2080 | RESTML the number of enzymes was set to 1. |
|---|
| 2081 | <P> |
|---|
| 2082 | In most cases, the benchmark was made more accurate by analyzing 10 data |
|---|
| 2083 | sets using the <TT>M</TT> (Multiple data sets) option and dividing the resulting |
|---|
| 2084 | time by 10. Times were determined as user times using the Linux <TT>time</TT> |
|---|
| 2085 | command. Several patterns will be apparent from this. The algorithms (MIX, |
|---|
| 2086 | DOLLOP, CONTML, FITCH, KITSCH, PROTPARS, DNAPARS, DNACOMP, and |
|---|
| 2087 | DNAML, DNAMLK, RESTML) that use the above-described addition strategy have |
|---|
| 2088 | run times that do not depend strongly on the messiness of the data. The only |
|---|
| 2089 | exception to this is that if a data set such as the Random data requires |
|---|
| 2090 | extra rounds of global rearrangements it takes longer. The |
|---|
| 2091 | programs differ greatly in run time: the likelihood programs RESTML, DNAML and |
|---|
| 2092 | CONTML are quite a bit slower than the others. The protein sequence parsimony |
|---|
| 2093 | program, which has to do a considerable amount of bookkeeping to keep track of |
|---|
| 2094 | which amino acids can mutate to each other, is also relatively slow. |
|---|
| 2095 | <P> |
|---|
| 2096 | Another class of algorithms includes PENNY, DOLPENNY, DNAPENNY and CLIQUE. |
|---|
| 2097 | These are branch-and-bound methods: in principle they should have execution |
|---|
| 2098 | times that rise exponentially with the number of species and/or |
|---|
| 2099 | characters, and they might be much more sensitive to messy data. This is |
|---|
| 2100 | apparent with PENNY, DOLPENNY, and DNAPENNY, which go from being reasonably |
|---|
| 2101 | fast with clean data to very slow with messy data. DOLPENNY is particularly |
|---|
| 2102 | slow on messy data - this is because this algorithm cannot make use of some of |
|---|
| 2103 | the lower-bound calculations that are possible with DNAPENNY and PENNY. CLIQUE |
|---|
| 2104 | is very fast on all |
|---|
| 2105 | data sets. Although in theory it should bog down if the number of cliques in |
|---|
| 2106 | the data is very large, that does not happen with random data, which in |
|---|
| 2107 | fact has few cliques and those small ones. Apparently the "worst-case" |
|---|
| 2108 | data sets that cause exponential run time are much rarer for CLIQUE than for |
|---|
| 2109 | the other branch-and-bound methods. |
|---|
| 2110 | <P> |
|---|
| 2111 | NEIGHBOR is quite fast compared to FITCH and KITSCH, and should make it |
|---|
| 2112 | possible to run much larger cases, although the results are expected to be |
|---|
| 2113 | a bit rougher than with those programs. |
|---|
| 2114 | <BR> |
|---|
| 2115 | <P> |
|---|
| 2116 | <H3>Speed with different numbers of species</H3> |
|---|
| 2117 | <P> |
|---|
| 2118 | How will the speed depend on the number of species and the number |
|---|
| 2119 | of characters? For the sequential-addition algorithms, the speed should |
|---|
| 2120 | be proportional to somewhere between the cube of the number of species and |
|---|
| 2121 | the square of the number of species, and to the number |
|---|
| 2122 | of characters. Thus a case that has, instead of 10 species and 20 |
|---|
| 2123 | characters, 20 species and 50 characters would take (in the cubic case) |
|---|
| 2124 | 2 x 2 x 2 x 2.5 = 20 |
|---|
| 2125 | times as long. This implies that cases with more than 20 species will |
|---|
| 2126 | be slow, and cases with more than 40 species <I>very</I> slow. This places a |
|---|
| 2127 | premium on working on small subproblems rather than just dumping a whole |
|---|
| 2128 | large data set into the programs. |
|---|
| 2129 | <P> |
|---|
| 2130 | An exception to these rules will be some of the DNA programs that use an |
|---|
| 2131 | aliasing device to save execution time. In these programs execution time |
|---|
| 2132 | will not necessarily increase proportional to the number of sites, |
|---|
| 2133 | as sites that show the same pattern of nucleotides will be detected |
|---|
| 2134 | as identical and the calculations for them will be done only once, which does |
|---|
| 2135 | not lead to more execution time. This is particularly |
|---|
| 2136 | likely to happen with few species and many sites, or with data sets that have |
|---|
| 2137 | small amounts of evolutionary divergence. |
|---|
| 2138 | <P> |
|---|
| 2139 | For programs FITCH and KITSCH, the distance matrix is square, so |
|---|
| 2140 | that when we double the number of species we also double the number of |
|---|
| 2141 | "characters", so that running times will go up as the fourth power of |
|---|
| 2142 | the number of species rather than the third power. Thus a 20-species |
|---|
| 2143 | case with FITCH is expected to run sixteen times more slowly than a 10-species |
|---|
| 2144 | case. |
|---|
| 2145 | <P> |
|---|
| 2146 | For programs like PENNY and CLIQUE the run times will rise faster |
|---|
| 2147 | than the cube of the number of species (in fact, they can rise faster |
|---|
| 2148 | than any power since these algorithms are not guaranteed to work in |
|---|
| 2149 | polynomial time). In practice, PENNY will frequently bog down above 11 |
|---|
| 2150 | species, while CLIQUE easily deals with larger numbers. |
|---|
| 2151 | <P> |
|---|
| 2152 | For NEIGHBOR the speed should vary only as the square of the number of |
|---|
| 2153 | species, so a case twice as large will take only four times as long. This |
|---|
| 2154 | will make it an attractive alternative to FITCH and KITSCH for large data |
|---|
| 2155 | sets. |
|---|
| 2156 | <P> |
|---|
| 2157 | <B>Note:</B> If you are unsure of how long a program will take, try it first on |
|---|
| 2158 | a few species, then work your way up until you get a feel for the speed |
|---|
| 2159 | and for what size programs you can afford to run. |
|---|
| 2160 | <P> |
|---|
| 2161 | Execution time is not the most important criterion for a program, |
|---|
| 2162 | particularly as computer time gets much cheaper than your time or a |
|---|
| 2163 | programmer's time. With workstations on which background jobs can be run |
|---|
| 2164 | all night, execution speed is not overwhelmingly relevant. Some of us have been |
|---|
| 2165 | conditioned by an earlier era of computing to consider execution speed |
|---|
| 2166 | paramount. But ease of use, ease of adaptation to your computer system, |
|---|
| 2167 | and ease of modification are much more important in practice, and in |
|---|
| 2168 | these respects I think these programs are adequate. Only if you are |
|---|
| 2169 | engaged in 1960's style mainframe computing, or if you have very large |
|---|
| 2170 | amounts of data is minimization of execution |
|---|
| 2171 | time paramount. |
|---|
| 2172 | <P> |
|---|
| 2173 | Nevertheless it would have been nice to have made the programs |
|---|
| 2174 | faster. The present speeds are a compromise between speed and |
|---|
| 2175 | effectiveness: by making them slower and trying more rearrangements in the |
|---|
| 2176 | trees, or by enumerating all possible trees, I could have made the programs |
|---|
| 2177 | more likely to find the best tree. By trying fewer rearrangements I |
|---|
| 2178 | could have speeded them up, but at the cost of finding worse trees. I |
|---|
| 2179 | could also have speeded them up by writing critical sections in assembly |
|---|
| 2180 | language, but this would have sacrificed ease of distribution to new |
|---|
| 2181 | computer systems. There are also some options included in these programs that |
|---|
| 2182 | make it |
|---|
| 2183 | harder to adopt some of the economies of bookkeeping that make other programs |
|---|
| 2184 | faster. However to some extent I have simply made the decision not to spend |
|---|
| 2185 | time trying to speed up program bookkeeping when there were new likelihood and |
|---|
| 2186 | statistical methods to be developed. |
|---|
| 2187 | <BR> |
|---|
| 2188 | <P> |
|---|
| 2189 | <H3>Relative speed of different machines</H3> |
|---|
| 2190 | <P> |
|---|
| 2191 | It is interesting to compare different machines using DNAPARS as the |
|---|
| 2192 | standard task. One can rate a machine on the DNAPARS benchmark by summing the |
|---|
| 2193 | times for all three of the data sets. Here are relative total timings over |
|---|
| 2194 | all three data sets (done with various versions of DNAPARS) for some machines, |
|---|
| 2195 | taking a Pentium MMX 266 notebook computer running Linux with gcc as the |
|---|
| 2196 | standard. Benchmarks from versions 3.4 and 3.5 of the program are |
|---|
| 2197 | included (respectively the Pascal and C versions whose timings are in |
|---|
| 2198 | parentheses. They are compared only with each other and are scaled to the |
|---|
| 2199 | rest of the timings using the joint runs on the 386SX and the Pentium MMX 266. |
|---|
| 2200 | This use of separate standards is necessary not |
|---|
| 2201 | because of different languages but because different versions of the package |
|---|
| 2202 | are being compared. Thus, the "Time" is the ratio of the Total to that for |
|---|
| 2203 | the Pentium, adjusted by the scalings of machines using 3.4 and 3.5 when |
|---|
| 2204 | appropriate. The Relative Speed is the reciprocal of the Time. |
|---|
| 2205 | <P> |
|---|
| 2206 | <DIV ALIGN="CENTER"> |
|---|
| 2207 | <TABLE CELLPADDING=3 BORDER="1"> |
|---|
| 2208 | <TR><TD ALIGN="LEFT"><B>Machine</B></TD> |
|---|
| 2209 | <TD ALIGN="LEFT"><B>Operating<BR>System</B></TD> |
|---|
| 2210 | <TD ALIGN="LEFT"><B>Compiler</B></TD> |
|---|
| 2211 | <TD ALIGN="LEFT"><B>Total</B></TD> |
|---|
| 2212 | <TD ALIGN="LEFT"><B>Time</B></TD> |
|---|
| 2213 | <TD ALIGN="LEFT"><B>Relative<BR>Speed</B></TD> |
|---|
| 2214 | </TR> |
|---|
| 2215 | <TR><TD ALIGN="LEFT">Toshiba T1100+</TD> |
|---|
| 2216 | <TD ALIGN="LEFT">MSDOS</TD> |
|---|
| 2217 | <TD ALIGN="LEFT">Turbo Pascal 3.01A</TD> |
|---|
| 2218 | <TD ALIGN="LEFT">(269)</TD> |
|---|
| 2219 | <TD ALIGN="LEFT">1758.2</TD> |
|---|
| 2220 | <TD ALIGN="LEFT">0.0005688</TD> |
|---|
| 2221 | </TR> |
|---|
| 2222 | <TR><TD ALIGN="LEFT">Apple Mac Plus</TD> |
|---|
| 2223 | <TD ALIGN="LEFT">MacOS</TD> |
|---|
| 2224 | <TD ALIGN="LEFT">Lightspeed Pascal 2</TD> |
|---|
| 2225 | <TD ALIGN="LEFT">(175.84)</TD> |
|---|
| 2226 | <TD ALIGN="LEFT">1149.3</TD> |
|---|
| 2227 | <TD ALIGN="LEFT">0.0008701</TD> |
|---|
| 2228 | </TR> |
|---|
| 2229 | <TR><TD ALIGN="LEFT">Toshiba T1100+</TD> |
|---|
| 2230 | <TD ALIGN="LEFT">MSDOS</TD> |
|---|
| 2231 | <TD ALIGN="LEFT">Turbo Pascal 5.0</TD> |
|---|
| 2232 | <TD ALIGN="LEFT">(162)</TD> |
|---|
| 2233 | <TD ALIGN="LEFT">1058.9</TD> |
|---|
| 2234 | <TD ALIGN="LEFT">0.0009443</TD> |
|---|
| 2235 | </TR> |
|---|
| 2236 | <TR><TD ALIGN="LEFT">Macintosh Classic</TD> |
|---|
| 2237 | <TD ALIGN="LEFT">MacOS</TD> |
|---|
| 2238 | <TD ALIGN="LEFT">Think Pascal 3</TD> |
|---|
| 2239 | <TD ALIGN="LEFT">(160)</TD> |
|---|
| 2240 | <TD ALIGN="LEFT">1045.8</TD> |
|---|
| 2241 | <TD ALIGN="LEFT">0.0009562</TD> |
|---|
| 2242 | </TR> |
|---|
| 2243 | <TR><TD ALIGN="LEFT">Macintosh Classic</TD> |
|---|
| 2244 | <TD ALIGN="LEFT">MacOS</TD> |
|---|
| 2245 | <TD ALIGN="LEFT">Think C</TD> |
|---|
| 2246 | <TD ALIGN="LEFT">(43.0)</TD> |
|---|
| 2247 | <TD ALIGN="LEFT">795.6</TD> |
|---|
| 2248 | <TD ALIGN="LEFT">0.0012569</TD> |
|---|
| 2249 | </TR> |
|---|
| 2250 | <TR><TD ALIGN="LEFT">IBM PS2/60</TD> |
|---|
| 2251 | <TD ALIGN="LEFT">MSDOS</TD> |
|---|
| 2252 | <TD ALIGN="LEFT">Turbo Pascal 5.0</TD> |
|---|
| 2253 | <TD ALIGN="LEFT">(58.76)</TD> |
|---|
| 2254 | <TD ALIGN="LEFT">384.00</TD> |
|---|
| 2255 | <TD ALIGN="LEFT">0.002604</TD> |
|---|
| 2256 | </TR> |
|---|
| 2257 | <TR><TD ALIGN="LEFT">80286 (12 Mhz)</TD> |
|---|
| 2258 | <TD ALIGN="LEFT">MSDOS</TD> |
|---|
| 2259 | <TD ALIGN="LEFT">Turbo Pascal 5.0</TD> |
|---|
| 2260 | <TD ALIGN="LEFT">(47.09)</TD> |
|---|
| 2261 | <TD ALIGN="LEFT">307.77</TD> |
|---|
| 2262 | <TD ALIGN="LEFT">0.003249</TD> |
|---|
| 2263 | </TR> |
|---|
| 2264 | <TR><TD ALIGN="LEFT">Apple Mac IIcx</TD> |
|---|
| 2265 | <TD ALIGN="LEFT">MacOS</TD> |
|---|
| 2266 | <TD ALIGN="LEFT">Think Pascal 3</TD> |
|---|
| 2267 | <TD ALIGN="LEFT">(42)</TD> |
|---|
| 2268 | <TD ALIGN="LEFT">274.44</TD> |
|---|
| 2269 | <TD ALIGN="LEFT">0.003644</TD> |
|---|
| 2270 | </TR> |
|---|
| 2271 | <TR><TD ALIGN="LEFT">Apple Mac SE/30</TD> |
|---|
| 2272 | <TD ALIGN="LEFT">MacOS</TD> |
|---|
| 2273 | <TD ALIGN="LEFT">Think Pascal 3</TD> |
|---|
| 2274 | <TD ALIGN="LEFT">(42)</TD> |
|---|
| 2275 | <TD ALIGN="LEFT">274.44</TD> |
|---|
| 2276 | <TD ALIGN="LEFT">0.003644</TD> |
|---|
| 2277 | </TR> |
|---|
| 2278 | <TR><TD ALIGN="LEFT">Apple Mac IIcx</TD> |
|---|
| 2279 | <TD ALIGN="LEFT">MacOS</TD> |
|---|
| 2280 | <TD ALIGN="LEFT">Lightspeed Pascal 2</TD> |
|---|
| 2281 | <TD ALIGN="LEFT">(39.84)</TD> |
|---|
| 2282 | <TD ALIGN="LEFT">260.44</TD> |
|---|
| 2283 | <TD ALIGN="LEFT">0.003840</TD> |
|---|
| 2284 | </TR> |
|---|
| 2285 | <TR><TD ALIGN="LEFT">Apple Mac IIcx</TD> |
|---|
| 2286 | <TD ALIGN="LEFT">MacOS</TD> |
|---|
| 2287 | <TD ALIGN="LEFT">Lightspeed Pascal 2#</TD> |
|---|
| 2288 | <TD ALIGN="LEFT">(39.69)</TD> |
|---|
| 2289 | <TD ALIGN="LEFT">259.33</TD> |
|---|
| 2290 | <TD ALIGN="LEFT">0.003856</TD> |
|---|
| 2291 | </TR> |
|---|
| 2292 | <TR><TD ALIGN="LEFT">Zenith Z386 (16MHz)</TD> |
|---|
| 2293 | <TD ALIGN="LEFT">MSDOS</TD> |
|---|
| 2294 | <TD ALIGN="LEFT">Turbo Pascal 5.0</TD> |
|---|
| 2295 | <TD ALIGN="LEFT">(38.27)</TD> |
|---|
| 2296 | <TD ALIGN="LEFT">256.67</TD> |
|---|
| 2297 | <TD ALIGN="LEFT">0.003896</TD> |
|---|
| 2298 | </TR> |
|---|
| 2299 | <TR><TD ALIGN="LEFT">Macintosh SE/30</TD> |
|---|
| 2300 | <TD ALIGN="LEFT">MacOS</TD> |
|---|
| 2301 | <TD ALIGN="LEFT">Think C</TD> |
|---|
| 2302 | <TD ALIGN="LEFT">(13.6)</TD> |
|---|
| 2303 | <TD ALIGN="LEFT">251.56</TD> |
|---|
| 2304 | <TD ALIGN="LEFT">0.003975</TD> |
|---|
| 2305 | </TR> |
|---|
| 2306 | <TR><TD ALIGN="LEFT">386SX (16 MHz)</TD> |
|---|
| 2307 | <TD ALIGN="LEFT">MSDOS</TD> |
|---|
| 2308 | <TD ALIGN="LEFT">Turbo Pascal 6.0</TD> |
|---|
| 2309 | <TD ALIGN="LEFT">(34)</TD> |
|---|
| 2310 | <TD ALIGN="LEFT">222.41</TD> |
|---|
| 2311 | <TD ALIGN="LEFT">0.004496</TD> |
|---|
| 2312 | </TR> |
|---|
| 2313 | <TR><TD ALIGN="LEFT">386SX (16 MHz)</TD> |
|---|
| 2314 | <TD ALIGN="LEFT">MSDOS</TD> |
|---|
| 2315 | <TD ALIGN="LEFT">Microsoft Quick C</TD> |
|---|
| 2316 | <TD ALIGN="LEFT">(12.01)</TD> |
|---|
| 2317 | <TD ALIGN="LEFT">222.41</TD> |
|---|
| 2318 | <TD ALIGN="LEFT">0.004496</TD> |
|---|
| 2319 | </TR> |
|---|
| 2320 | <TR><TD ALIGN="LEFT">Sequent-S81</TD> |
|---|
| 2321 | <TD ALIGN="LEFT">DYNIX</TD> |
|---|
| 2322 | <TD ALIGN="LEFT">Silicon Valley Pascal</TD> |
|---|
| 2323 | <TD ALIGN="LEFT">(13.0)</TD> |
|---|
| 2324 | <TD ALIGN="LEFT">84.89</TD> |
|---|
| 2325 | <TD ALIGN="LEFT">0.011780</TD> |
|---|
| 2326 | </TR> |
|---|
| 2327 | <TR><TD ALIGN="LEFT">VAX 11/785</TD> |
|---|
| 2328 | <TD ALIGN="LEFT">Unix</TD> |
|---|
| 2329 | <TD ALIGN="LEFT">Berkeley Pascal</TD> |
|---|
| 2330 | <TD ALIGN="LEFT">(11.9)</TD> |
|---|
| 2331 | <TD ALIGN="LEFT">77.77</TD> |
|---|
| 2332 | <TD ALIGN="LEFT">0.012857</TD> |
|---|
| 2333 | </TR> |
|---|
| 2334 | <TR><TD ALIGN="LEFT">80486-33</TD> |
|---|
| 2335 | <TD ALIGN="LEFT">MSDOS</TD> |
|---|
| 2336 | <TD ALIGN="LEFT">Turbo Pascal 6.0</TD> |
|---|
| 2337 | <TD ALIGN="LEFT">(11.46)</TD> |
|---|
| 2338 | <TD ALIGN="LEFT">74.89</TD> |
|---|
| 2339 | <TD ALIGN="LEFT">0.013353</TD> |
|---|
| 2340 | </TR> |
|---|
| 2341 | <TR><TD ALIGN="LEFT">Sun 3/60</TD> |
|---|
| 2342 | <TD ALIGN="LEFT">SunOS</TD> |
|---|
| 2343 | <TD ALIGN="LEFT">Sun C</TD> |
|---|
| 2344 | <TD ALIGN="LEFT">(3.93)</TD> |
|---|
| 2345 | <TD ALIGN="LEFT">72.67</TD> |
|---|
| 2346 | <TD ALIGN="LEFT">0.013761</TD> |
|---|
| 2347 | </TR> |
|---|
| 2348 | <TR><TD ALIGN="LEFT">NeXT Cube (68030)</TD> |
|---|
| 2349 | <TD ALIGN="LEFT">Mach</TD> |
|---|
| 2350 | <TD ALIGN="LEFT">Gnu C</TD> |
|---|
| 2351 | <TD ALIGN="LEFT">(2.608)</TD> |
|---|
| 2352 | <TD ALIGN="LEFT">48.256</TD> |
|---|
| 2353 | <TD ALIGN="LEFT">0.02072</TD> |
|---|
| 2354 | </TR> |
|---|
| 2355 | <TR><TD ALIGN="LEFT">Sequent S-81</TD> |
|---|
| 2356 | <TD ALIGN="LEFT">DYNIX</TD> |
|---|
| 2357 | <TD ALIGN="LEFT">Sequent Symmetry C</TD> |
|---|
| 2358 | <TD ALIGN="LEFT">(2.604)</TD> |
|---|
| 2359 | <TD ALIGN="LEFT">48.182</TD> |
|---|
| 2360 | <TD ALIGN="LEFT">0.02075</TD> |
|---|
| 2361 | </TR> |
|---|
| 2362 | <TR><TD ALIGN="LEFT">VAXstation 3500</TD> |
|---|
| 2363 | <TD ALIGN="LEFT">Unix</TD> |
|---|
| 2364 | <TD ALIGN="LEFT">Berkeley Pascal</TD> |
|---|
| 2365 | <TD ALIGN="LEFT">(7.3)</TD> |
|---|
| 2366 | <TD ALIGN="LEFT">47.777</TD> |
|---|
| 2367 | <TD ALIGN="LEFT">0.02093</TD> |
|---|
| 2368 | </TR> |
|---|
| 2369 | <TR><TD ALIGN="LEFT">Sequent S-81</TD> |
|---|
| 2370 | <TD ALIGN="LEFT">DYNIX</TD> |
|---|
| 2371 | <TD ALIGN="LEFT">Berkeley Pascal</TD> |
|---|
| 2372 | <TD ALIGN="LEFT">(5.6)</TD> |
|---|
| 2373 | <TD ALIGN="LEFT">36.600</TD> |
|---|
| 2374 | <TD ALIGN="LEFT">0.02732</TD> |
|---|
| 2375 | </TR> |
|---|
| 2376 | <TR><TD ALIGN="LEFT">Unisys 7000/40</TD> |
|---|
| 2377 | <TD ALIGN="LEFT">Unix</TD> |
|---|
| 2378 | <TD ALIGN="LEFT">Berkeley Pascal</TD> |
|---|
| 2379 | <TD ALIGN="LEFT">(5.24)</TD> |
|---|
| 2380 | <TD ALIGN="LEFT">34.244</TD> |
|---|
| 2381 | <TD ALIGN="LEFT">0.02920</TD> |
|---|
| 2382 | </TR> |
|---|
| 2383 | <TR><TD ALIGN="LEFT">VAX 8600</TD> |
|---|
| 2384 | <TD ALIGN="LEFT">VMS</TD> |
|---|
| 2385 | <TD ALIGN="LEFT">DEC VAX Pascal</TD> |
|---|
| 2386 | <TD ALIGN="LEFT">(3.96)</TD> |
|---|
| 2387 | <TD ALIGN="LEFT">25.889</TD> |
|---|
| 2388 | <TD ALIGN="LEFT">0.03863</TD> |
|---|
| 2389 | </TR> |
|---|
| 2390 | <TR><TD ALIGN="LEFT">Sun SPARC IPX</TD> |
|---|
| 2391 | <TD ALIGN="LEFT">SunOS</TD> |
|---|
| 2392 | <TD ALIGN="LEFT">Gnu C version 2.1</TD> |
|---|
| 2393 | <TD ALIGN="LEFT">(1.28)</TD> |
|---|
| 2394 | <TD ALIGN="LEFT">23.689</TD> |
|---|
| 2395 | <TD ALIGN="LEFT">0.04221</TD> |
|---|
| 2396 | </TR> |
|---|
| 2397 | <TR><TD ALIGN="LEFT">VAX 6000-530</TD> |
|---|
| 2398 | <TD ALIGN="LEFT">VMS</TD> |
|---|
| 2399 | <TD ALIGN="LEFT">DEC C</TD> |
|---|
| 2400 | <TD ALIGN="LEFT">(0.858)</TD> |
|---|
| 2401 | <TD ALIGN="LEFT">15.867</TD> |
|---|
| 2402 | <TD ALIGN="LEFT">0.06303</TD> |
|---|
| 2403 | </TR> |
|---|
| 2404 | <TR><TD ALIGN="LEFT">VAXstation 4000</TD> |
|---|
| 2405 | <TD ALIGN="LEFT">VMS</TD> |
|---|
| 2406 | <TD ALIGN="LEFT">DEC C</TD> |
|---|
| 2407 | <TD ALIGN="LEFT">(0.809)</TD> |
|---|
| 2408 | <TD ALIGN="LEFT">14.978</TD> |
|---|
| 2409 | <TD ALIGN="LEFT">0.06677</TD> |
|---|
| 2410 | </TR> |
|---|
| 2411 | <TR><TD ALIGN="LEFT">IBM RS/6000 540</TD> |
|---|
| 2412 | <TD ALIGN="LEFT">AIX</TD> |
|---|
| 2413 | <TD ALIGN="LEFT">XLP Pascal</TD> |
|---|
| 2414 | <TD ALIGN="LEFT">(2.276)</TD> |
|---|
| 2415 | <TD ALIGN="LEFT">14.866</TD> |
|---|
| 2416 | <TD ALIGN="LEFT">0.06726</TD> |
|---|
| 2417 | </TR> |
|---|
| 2418 | <TR><TD ALIGN="LEFT">NeXTstation(040/25)</TD> |
|---|
| 2419 | <TD ALIGN="LEFT">Mach</TD> |
|---|
| 2420 | <TD ALIGN="LEFT">Gnu C</TD> |
|---|
| 2421 | <TD ALIGN="LEFT">(0.75)</TD> |
|---|
| 2422 | <TD ALIGN="LEFT">13.867</TD> |
|---|
| 2423 | <TD ALIGN="LEFT">0.07212</TD> |
|---|
| 2424 | </TR> |
|---|
| 2425 | <TR><TD ALIGN="LEFT">Sun SPARC IPX</TD> |
|---|
| 2426 | <TD ALIGN="LEFT">SunOS</TD> |
|---|
| 2427 | <TD ALIGN="LEFT">Sun C</TD> |
|---|
| 2428 | <TD ALIGN="LEFT">(0.68)</TD> |
|---|
| 2429 | <TD ALIGN="LEFT">12.580</TD> |
|---|
| 2430 | <TD ALIGN="LEFT">0.07951</TD> |
|---|
| 2431 | </TR> |
|---|
| 2432 | <TR><TD ALIGN="LEFT">486DX (33 MHz)</TD> |
|---|
| 2433 | <TD ALIGN="LEFT">Linux</TD> |
|---|
| 2434 | <TD ALIGN="LEFT">Gnu C #</TD> |
|---|
| 2435 | <TD ALIGN="LEFT">(0.63)</TD> |
|---|
| 2436 | <TD ALIGN="LEFT">11.666</TD> |
|---|
| 2437 | <TD ALIGN="LEFT">0.08571</TD> |
|---|
| 2438 | </TR> |
|---|
| 2439 | <TR><TD ALIGN="LEFT">Sun SPARCstation-1</TD> |
|---|
| 2440 | <TD ALIGN="LEFT">Unix</TD> |
|---|
| 2441 | <TD ALIGN="LEFT">Sun Pascal</TD> |
|---|
| 2442 | <TD ALIGN="LEFT">(1.7)</TD> |
|---|
| 2443 | <TD ALIGN="LEFT">11.111</TD> |
|---|
| 2444 | <TD ALIGN="LEFT">0.09000</TD> |
|---|
| 2445 | </TR> |
|---|
| 2446 | <TR><TD ALIGN="LEFT">DECstation 5000/200</TD> |
|---|
| 2447 | <TD ALIGN="LEFT">Unix</TD> |
|---|
| 2448 | <TD ALIGN="LEFT">DEC Ultrix C</TD> |
|---|
| 2449 | <TD ALIGN="LEFT">(0.45)</TD> |
|---|
| 2450 | <TD ALIGN="LEFT">8.333</TD> |
|---|
| 2451 | <TD ALIGN="LEFT">0.12000</TD> |
|---|
| 2452 | </TR> |
|---|
| 2453 | <TR><TD ALIGN="LEFT">Sun SPARC 1+</TD> |
|---|
| 2454 | <TD ALIGN="LEFT">SunOS</TD> |
|---|
| 2455 | <TD ALIGN="LEFT">Sun C</TD> |
|---|
| 2456 | <TD ALIGN="LEFT">(0.40)</TD> |
|---|
| 2457 | <TD ALIGN="LEFT">7.400</TD> |
|---|
| 2458 | <TD ALIGN="LEFT">0.13513</TD> |
|---|
| 2459 | </TR> |
|---|
| 2460 | <TR><TD ALIGN="LEFT">DECstation 3100</TD> |
|---|
| 2461 | <TD ALIGN="LEFT">Unix</TD> |
|---|
| 2462 | <TD ALIGN="LEFT">DEC Ultrix Pascal</TD> |
|---|
| 2463 | <TD ALIGN="LEFT">(0.77)</TD> |
|---|
| 2464 | <TD ALIGN="LEFT">5.022</TD> |
|---|
| 2465 | <TD ALIGN="LEFT">0.1991</TD> |
|---|
| 2466 | </TR> |
|---|
| 2467 | <TR><TD ALIGN="LEFT">IBM 3090-300E</TD> |
|---|
| 2468 | <TD ALIGN="LEFT">AIX</TD> |
|---|
| 2469 | <TD ALIGN="LEFT">Metaware High C</TD> |
|---|
| 2470 | <TD ALIGN="LEFT">(0.27)</TD> |
|---|
| 2471 | <TD ALIGN="LEFT">5.000</TD> |
|---|
| 2472 | <TD ALIGN="LEFT">0.2000</TD> |
|---|
| 2473 | </TR> |
|---|
| 2474 | <TR><TD ALIGN="LEFT">DECstation 5000/125</TD> |
|---|
| 2475 | <TD ALIGN="LEFT">Unix</TD> |
|---|
| 2476 | <TD ALIGN="LEFT">DEC Ultrix C</TD> |
|---|
| 2477 | <TD ALIGN="LEFT">(0.267)</TD> |
|---|
| 2478 | <TD ALIGN="LEFT">4.933</TD> |
|---|
| 2479 | <TD ALIGN="LEFT">0.2027</TD> |
|---|
| 2480 | </TR> |
|---|
| 2481 | <TR><TD ALIGN="LEFT">DECstation 5000/200</TD> |
|---|
| 2482 | <TD ALIGN="LEFT">Unix</TD> |
|---|
| 2483 | <TD ALIGN="LEFT">DEC Ultrix C</TD> |
|---|
| 2484 | <TD ALIGN="LEFT">(0.256)</TD> |
|---|
| 2485 | <TD ALIGN="LEFT">4.733</TD> |
|---|
| 2486 | <TD ALIGN="LEFT">0.2113</TD> |
|---|
| 2487 | </TR> |
|---|
| 2488 | <TR><TD ALIGN="LEFT">Sun SPARC 4/50</TD> |
|---|
| 2489 | <TD ALIGN="LEFT">SunOS</TD> |
|---|
| 2490 | <TD ALIGN="LEFT">Sun C</TD> |
|---|
| 2491 | <TD ALIGN="LEFT">(0.249)</TD> |
|---|
| 2492 | <TD ALIGN="LEFT">4.607</TD> |
|---|
| 2493 | <TD ALIGN="LEFT">0.2171</TD> |
|---|
| 2494 | </TR> |
|---|
| 2495 | <TR><TD ALIGN="LEFT">DEC 3000/400 AXP</TD> |
|---|
| 2496 | <TD ALIGN="LEFT">Unix</TD> |
|---|
| 2497 | <TD ALIGN="LEFT">DEC C</TD> |
|---|
| 2498 | <TD ALIGN="LEFT">(0.224)</TD> |
|---|
| 2499 | <TD ALIGN="LEFT">4.144</TD> |
|---|
| 2500 | <TD ALIGN="LEFT">0.2413</TD> |
|---|
| 2501 | </TR> |
|---|
| 2502 | <TR><TD ALIGN="LEFT">DECstation 5000/240</TD> |
|---|
| 2503 | <TD ALIGN="LEFT">Unix</TD> |
|---|
| 2504 | <TD ALIGN="LEFT">DEC Ultrix C</TD> |
|---|
| 2505 | <TD ALIGN="LEFT">(0.1889)</TD> |
|---|
| 2506 | <TD ALIGN="LEFT">3.496</TD> |
|---|
| 2507 | <TD ALIGN="LEFT">0.2861</TD> |
|---|
| 2508 | </TR> |
|---|
| 2509 | <TR><TD ALIGN="LEFT">SGI Iris R4000</TD> |
|---|
| 2510 | <TD ALIGN="LEFT">Unix</TD> |
|---|
| 2511 | <TD ALIGN="LEFT">SGI C</TD> |
|---|
| 2512 | <TD ALIGN="LEFT">(0.184)</TD> |
|---|
| 2513 | <TD ALIGN="LEFT">3.404</TD> |
|---|
| 2514 | <TD ALIGN="LEFT">0.2937</TD> |
|---|
| 2515 | </TR> |
|---|
| 2516 | <TR><TD ALIGN="LEFT">IBM 3090-300E</TD> |
|---|
| 2517 | <TD ALIGN="LEFT">VM</TD> |
|---|
| 2518 | <TD ALIGN="LEFT">Pascal VS</TD> |
|---|
| 2519 | <TD ALIGN="LEFT">(0.464)</TD> |
|---|
| 2520 | <TD ALIGN="LEFT">3.022</TD> |
|---|
| 2521 | <TD ALIGN="LEFT">0.3309</TD> |
|---|
| 2522 | </TR> |
|---|
| 2523 | <TR><TD ALIGN="LEFT">DECstation 5000/200</TD> |
|---|
| 2524 | <TD ALIGN="LEFT">Unix</TD> |
|---|
| 2525 | <TD ALIGN="LEFT">DEC Ultrix Pascal</TD> |
|---|
| 2526 | <TD ALIGN="LEFT">(0.39)</TD> |
|---|
| 2527 | <TD ALIGN="LEFT">2.533</TD> |
|---|
| 2528 | <TD ALIGN="LEFT">0.3947</TD> |
|---|
| 2529 | </TR> |
|---|
| 2530 | <TR><TD ALIGN="LEFT">Pentium 120</TD> |
|---|
| 2531 | <TD ALIGN="LEFT">Linux</TD> |
|---|
| 2532 | <TD ALIGN="LEFT">Gnu C</TD> |
|---|
| 2533 | <TD ALIGN="LEFT">1.848</TD> |
|---|
| 2534 | <TD ALIGN="LEFT">1.994</TD> |
|---|
| 2535 | <TD ALIGN="LEFT">0.5016</TD> |
|---|
| 2536 | </TR> |
|---|
| 2537 | <TR><TD ALIGN="LEFT">Pentium Pro 180</TD> |
|---|
| 2538 | <TD ALIGN="LEFT">Linux</TD> |
|---|
| 2539 | <TD ALIGN="LEFT">Gnu C</TD> |
|---|
| 2540 | <TD ALIGN="LEFT">1.009</TD> |
|---|
| 2541 | <TD ALIGN="LEFT">1.088</TD> |
|---|
| 2542 | <TD ALIGN="LEFT">0.9353</TD> |
|---|
| 2543 | </TR> |
|---|
| 2544 | <TR><TD ALIGN="LEFT">Pentium 266 MMX</TD> |
|---|
| 2545 | <TD ALIGN="LEFT">Linux</TD> |
|---|
| 2546 | <TD ALIGN="LEFT">Gnu C (PHYLIP 3.5)</TD> |
|---|
| 2547 | <TD ALIGN="LEFT">(0.054)</TD> |
|---|
| 2548 | <TD ALIGN="LEFT">1.0</TD> |
|---|
| 2549 | <TD ALIGN="LEFT">1.0</TD> |
|---|
| 2550 | </TR> |
|---|
| 2551 | <TR><TD ALIGN="LEFT">Pentium 266 MMX</TD> |
|---|
| 2552 | <TD ALIGN="LEFT">Linux</TD> |
|---|
| 2553 | <TD ALIGN="LEFT">Gnu C</TD> |
|---|
| 2554 | <TD ALIGN="LEFT">0.927</TD> |
|---|
| 2555 | <TD ALIGN="LEFT">1.0</TD> |
|---|
| 2556 | <TD ALIGN="LEFT">1.0</TD> |
|---|
| 2557 | </TR> |
|---|
| 2558 | <TR><TD ALIGN="LEFT">Pentium 200</TD> |
|---|
| 2559 | <TD ALIGN="LEFT">Linux</TD> |
|---|
| 2560 | <TD ALIGN="LEFT">Gnu C</TD> |
|---|
| 2561 | <TD ALIGN="LEFT">0.853</TD> |
|---|
| 2562 | <TD ALIGN="LEFT">0.9202</TD> |
|---|
| 2563 | <TD ALIGN="LEFT">1.2647</TD> |
|---|
| 2564 | </TR> |
|---|
| 2565 | <TR><TD ALIGN="LEFT">SGI PowerChallenge</TD> |
|---|
| 2566 | <TD ALIGN="LEFT">Irix</TD> |
|---|
| 2567 | <TD ALIGN="LEFT">Gnu C</TD> |
|---|
| 2568 | <TD ALIGN="LEFT">0.844</TD> |
|---|
| 2569 | <TD ALIGN="LEFT">0.9297</TD> |
|---|
| 2570 | <TD ALIGN="LEFT">1.0756</TD> |
|---|
| 2571 | </TR> |
|---|
| 2572 | <TR><TD ALIGN="LEFT">DEC Alpha 400 4/233</TD> |
|---|
| 2573 | <TD ALIGN="LEFT">DUNIX</TD> |
|---|
| 2574 | <TD ALIGN="LEFT">Digital C (cc -fast)</TD> |
|---|
| 2575 | <TD ALIGN="LEFT">0.730</TD> |
|---|
| 2576 | <TD ALIGN="LEFT">0.7875</TD> |
|---|
| 2577 | <TD ALIGN="LEFT">1.2699</TD> |
|---|
| 2578 | </TR> |
|---|
| 2579 | <TR><TD ALIGN="LEFT">Pentium II 500</TD> |
|---|
| 2580 | <TD ALIGN="LEFT">Linux</TD> |
|---|
| 2581 | <TD ALIGN="LEFT">Gnu C</TD> |
|---|
| 2582 | <TD ALIGN="LEFT">0.368</TD> |
|---|
| 2583 | <TD ALIGN="LEFT">0.4053</TD> |
|---|
| 2584 | <TD ALIGN="LEFT">2.467</TD> |
|---|
| 2585 | </TR> |
|---|
| 2586 | <TR><TD ALIGN="LEFT">Compaq/Digital Alpha 500au</TD> |
|---|
| 2587 | <TD ALIGN="LEFT">DUNIX</TD> |
|---|
| 2588 | <TD ALIGN="LEFT">Digital C (cc -fast)</TD> |
|---|
| 2589 | <TD ALIGN="LEFT">0.167</TD> |
|---|
| 2590 | <TD ALIGN="LEFT">0.1805</TD> |
|---|
| 2591 | <TD ALIGN="LEFT">5.541</TD> |
|---|
| 2592 | </TR> |
|---|
| 2593 | </TABLE> |
|---|
| 2594 | </DIV> |
|---|
| 2595 | <P> |
|---|
| 2596 | This benchmark not only reflects integer performance of these machines |
|---|
| 2597 | (as DNAPARS has few floating-point operations) but also the efficiency |
|---|
| 2598 | of the compilers. Some of the machines (the DEC 3000/400 AXP |
|---|
| 2599 | and the IBM RS/6000, in particular) are much faster than this benchmark |
|---|
| 2600 | would indicate. The numerical programs benchmark below gives them a |
|---|
| 2601 | fairer test. The Compaq/Digital Alpha 500au times are exaggerated because, |
|---|
| 2602 | although their compiles are optimized for that processor, the Pentium |
|---|
| 2603 | compiles are not similarly optimized. |
|---|
| 2604 | <P> |
|---|
| 2605 | Note that parallel machines like the Sequent and the SGI PowerChallenge are not |
|---|
| 2606 | really as slow as indicated by the data here, as these runs did nothing to take |
|---|
| 2607 | advantage of their parallelism. |
|---|
| 2608 | <P> |
|---|
| 2609 | These benchmarks have now extended over 13 years, and in the DNAPARS |
|---|
| 2610 | benchmark they extend over a range of 8000-fold in speed! |
|---|
| 2611 | The experience of our laboratory, which seems typical, is that |
|---|
| 2612 | computer power grows by a factor of about 1.85 per year. This is |
|---|
| 2613 | roughly consistent with these benchmarks. |
|---|
| 2614 | <P> |
|---|
| 2615 | For a picture of speeds for a more numerically intensive program, |
|---|
| 2616 | here are benchmarks using DNAML, with the Pentium MMX 266 |
|---|
| 2617 | as the standard. Some of the timings, the ones in parentheses, are |
|---|
| 2618 | using PHYLIP version 3.5, and those are compared to that version run on |
|---|
| 2619 | the Pentium 266. Runs using the PHYLIP 3.4 Pascal version are adjusted |
|---|
| 2620 | using the 386SX timings where both were run. Numbers are |
|---|
| 2621 | total run times (total user time in the case of Unix) over all three data sets. |
|---|
| 2622 | <P> |
|---|
| 2623 | <DIV ALIGN="CENTER"> |
|---|
| 2624 | <TABLE CELLPADDING=3 BORDER="1"> |
|---|
| 2625 | <TR><TD ALIGN="LEFT"><B>Machine</B></TD> |
|---|
| 2626 | <TD ALIGN="LEFT"><B>Operating<BR>System</B></TD> |
|---|
| 2627 | <TD ALIGN="LEFT"><B>Compiler</B></TD> |
|---|
| 2628 | <TD ALIGN="RIGHT"><B>Seconds</B></TD> |
|---|
| 2629 | <TD ALIGN="LEFT"><B>Time</B></TD> |
|---|
| 2630 | <TD ALIGN="RIGHT"><B>Relative<BR>Speed</B></TD> |
|---|
| 2631 | </TR> |
|---|
| 2632 | <TR><TD ALIGN="LEFT">386SX 16 Mhz</TD> |
|---|
| 2633 | <TD ALIGN="LEFT">PCDOS</TD> |
|---|
| 2634 | <TD ALIGN="LEFT">Turbo Pascal 6</TD> |
|---|
| 2635 | <TD ALIGN="RIGHT">(7826)</TD> |
|---|
| 2636 | <TD ALIGN="LEFT"> 181.18</TD> |
|---|
| 2637 | <TD ALIGN="RIGHT">0.005519</TD> |
|---|
| 2638 | </TR> |
|---|
| 2639 | <TR><TD ALIGN="LEFT">386SX 16 Mhz</TD> |
|---|
| 2640 | <TD ALIGN="LEFT">PCDOS</TD> |
|---|
| 2641 | <TD ALIGN="LEFT">Quick C</TD> |
|---|
| 2642 | <TD ALIGN="RIGHT">(6549.79)</TD> |
|---|
| 2643 | <TD ALIGN="LEFT"> 181.18</TD> |
|---|
| 2644 | <TD ALIGN="RIGHT">0.005519</TD> |
|---|
| 2645 | </TR> |
|---|
| 2646 | <TR><TD ALIGN="LEFT">Compudyne 486DX/33</TD> |
|---|
| 2647 | <TD ALIGN="LEFT">Linux</TD> |
|---|
| 2648 | <TD ALIGN="LEFT">Gnu C</TD> |
|---|
| 2649 | <TD ALIGN="RIGHT">(1599.9)</TD> |
|---|
| 2650 | <TD ALIGN="LEFT"> 44.26</TD> |
|---|
| 2651 | <TD ALIGN="RIGHT">0.022595</TD> |
|---|
| 2652 | </TR> |
|---|
| 2653 | <TR><TD ALIGN="LEFT">SUN Sparcstation 1+</TD> |
|---|
| 2654 | <TD ALIGN="LEFT">SunOS</TD> |
|---|
| 2655 | <TD ALIGN="LEFT">Sun C</TD> |
|---|
| 2656 | <TD ALIGN="RIGHT">(1402.8)</TD> |
|---|
| 2657 | <TD ALIGN="LEFT"> 38.805</TD> |
|---|
| 2658 | <TD ALIGN="RIGHT">0.025770</TD> |
|---|
| 2659 | </TR> |
|---|
| 2660 | <TR><TD ALIGN="LEFT">Everex STEP 386/20</TD> |
|---|
| 2661 | <TD ALIGN="LEFT">PCDOS</TD> |
|---|
| 2662 | <TD ALIGN="LEFT">Turbo Pascal 5.5</TD> |
|---|
| 2663 | <TD ALIGN="RIGHT">(1440.8)</TD> |
|---|
| 2664 | <TD ALIGN="LEFT"> 33.356</TD> |
|---|
| 2665 | <TD ALIGN="RIGHT"> 0.029980</TD> |
|---|
| 2666 | </TR> |
|---|
| 2667 | <TR><TD ALIGN="LEFT">486DX/33</TD> |
|---|
| 2668 | <TD ALIGN="LEFT">PCDOS</TD> |
|---|
| 2669 | <TD ALIGN="LEFT">Turbo C++</TD> |
|---|
| 2670 | <TD ALIGN="RIGHT">(1107.2)</TD> |
|---|
| 2671 | <TD ALIGN="LEFT"> 30.628</TD> |
|---|
| 2672 | <TD ALIGN="RIGHT">0.032650</TD> |
|---|
| 2673 | </TR> |
|---|
| 2674 | <TR><TD ALIGN="LEFT">Compudyne 486DX/33</TD> |
|---|
| 2675 | <TD ALIGN="LEFT">PCDOS</TD> |
|---|
| 2676 | <TD ALIGN="LEFT">Waterloo C/386</TD> |
|---|
| 2677 | <TD ALIGN="RIGHT">(1045.78)</TD> |
|---|
| 2678 | <TD ALIGN="LEFT"> 28.929</TD> |
|---|
| 2679 | <TD ALIGN="RIGHT">0.034567</TD> |
|---|
| 2680 | </TR> |
|---|
| 2681 | <TR><TD ALIGN="LEFT">Sun SPARCstation IPX</TD> |
|---|
| 2682 | <TD ALIGN="LEFT">SunOS</TD> |
|---|
| 2683 | <TD ALIGN="LEFT">Gnu C</TD> |
|---|
| 2684 | <TD ALIGN="RIGHT"> (960.2)</TD> |
|---|
| 2685 | <TD ALIGN="LEFT"> 26.562</TD> |
|---|
| 2686 | <TD ALIGN="RIGHT">0.037648</TD> |
|---|
| 2687 | </TR> |
|---|
| 2688 | <TR><TD ALIGN="LEFT">NeXTstation(68040/25)</TD> |
|---|
| 2689 | <TD ALIGN="LEFT">Mach</TD> |
|---|
| 2690 | <TD ALIGN="LEFT">Gnu C</TD> |
|---|
| 2691 | <TD ALIGN="RIGHT"> (916.6)</TD> |
|---|
| 2692 | <TD ALIGN="LEFT"> 25.355</TD> |
|---|
| 2693 | <TD ALIGN="RIGHT">0.039439</TD> |
|---|
| 2694 | </TR> |
|---|
| 2695 | <TR><TD ALIGN="LEFT">486DX/33</TD> |
|---|
| 2696 | <TD ALIGN="LEFT">PCDOS</TD> |
|---|
| 2697 | <TD ALIGN="LEFT">Waterloo C/386</TD> |
|---|
| 2698 | <TD ALIGN="RIGHT"> (861.0)</TD> |
|---|
| 2699 | <TD ALIGN="LEFT"> 23.817</TD> |
|---|
| 2700 | <TD ALIGN="RIGHT">0.041986</TD> |
|---|
| 2701 | </TR> |
|---|
| 2702 | <TR><TD ALIGN="LEFT">Sun SPARCstation IPX</TD> |
|---|
| 2703 | <TD ALIGN="LEFT">SunOS</TD> |
|---|
| 2704 | <TD ALIGN="LEFT">Sun C</TD> |
|---|
| 2705 | <TD ALIGN="RIGHT"> (787.7)</TD> |
|---|
| 2706 | <TD ALIGN="LEFT"> 21.790</TD> |
|---|
| 2707 | <TD ALIGN="RIGHT">0.045893</TD> |
|---|
| 2708 | </TR> |
|---|
| 2709 | <TR><TD ALIGN="LEFT">486DX/33</TD> |
|---|
| 2710 | <TD ALIGN="LEFT">PCDOS</TD> |
|---|
| 2711 | <TD ALIGN="LEFT">Gnu C</TD> |
|---|
| 2712 | <TD ALIGN="RIGHT"> (650.9)</TD> |
|---|
| 2713 | <TD ALIGN="LEFT"> 18.006</TD> |
|---|
| 2714 | <TD ALIGN="RIGHT">0.05554</TD> |
|---|
| 2715 | </TR> |
|---|
| 2716 | <TR><TD ALIGN="LEFT">VAX 6000-530</TD> |
|---|
| 2717 | <TD ALIGN="LEFT">VMS</TD> |
|---|
| 2718 | <TD ALIGN="LEFT">DEC C</TD> |
|---|
| 2719 | <TD ALIGN="RIGHT"> (637.0)</TD> |
|---|
| 2720 | <TD ALIGN="LEFT"> 17.621</TD> |
|---|
| 2721 | <TD ALIGN="RIGHT">0.05675</TD> |
|---|
| 2722 | </TR> |
|---|
| 2723 | <TR><TD ALIGN="LEFT">DECstation 5000/200</TD> |
|---|
| 2724 | <TD ALIGN="LEFT">Unix</TD> |
|---|
| 2725 | <TD ALIGN="LEFT">DEC Ultrix RISC C</TD> |
|---|
| 2726 | <TD ALIGN="RIGHT"> (423.3)</TD> |
|---|
| 2727 | <TD ALIGN="LEFT"> 11.710</TD> |
|---|
| 2728 | <TD ALIGN="RIGHT">0.08540</TD> |
|---|
| 2729 | </TR> |
|---|
| 2730 | <TR><TD ALIGN="LEFT">IBM 3090-300E</TD> |
|---|
| 2731 | <TD ALIGN="LEFT">AIX</TD> |
|---|
| 2732 | <TD ALIGN="LEFT">Metaware High C</TD> |
|---|
| 2733 | <TD ALIGN="RIGHT"> (201.8)</TD> |
|---|
| 2734 | <TD ALIGN="LEFT"> 5.582</TD> |
|---|
| 2735 | <TD ALIGN="RIGHT">0.17914</TD> |
|---|
| 2736 | </TR> |
|---|
| 2737 | <TR><TD ALIGN="LEFT">Convex C240/1024</TD> |
|---|
| 2738 | <TD ALIGN="LEFT">Unix</TD> |
|---|
| 2739 | <TD ALIGN="LEFT">C</TD> |
|---|
| 2740 | <TD ALIGN="RIGHT"> (101.6)</TD> |
|---|
| 2741 | <TD ALIGN="LEFT"> 2.8105</TD> |
|---|
| 2742 | <TD ALIGN="RIGHT">0.35581</TD> |
|---|
| 2743 | </TR> |
|---|
| 2744 | <TR><TD ALIGN="LEFT">DEC 3000/400 AXP</TD> |
|---|
| 2745 | <TD ALIGN="LEFT">Unix</TD> |
|---|
| 2746 | <TD ALIGN="LEFT">DEC C</TD> |
|---|
| 2747 | <TD ALIGN="RIGHT"> (98.29)</TD> |
|---|
| 2748 | <TD ALIGN="LEFT"> 2.7189</TD> |
|---|
| 2749 | <TD ALIGN="RIGHT">0.36779</TD> |
|---|
| 2750 | </TR> |
|---|
| 2751 | <TR><TD ALIGN="LEFT">Pentium 120</TD> |
|---|
| 2752 | <TD ALIGN="LEFT">Linux</TD> |
|---|
| 2753 | <TD ALIGN="LEFT">Gnu C</TD> |
|---|
| 2754 | <TD ALIGN="RIGHT">25.26</TD> |
|---|
| 2755 | <TD ALIGN="LEFT">3.3906</TD> |
|---|
| 2756 | <TD ALIGN="RIGHT">0.29493</TD> |
|---|
| 2757 | </TR> |
|---|
| 2758 | <TR><TD ALIGN="LEFT">Pentium Pro 180</TD> |
|---|
| 2759 | <TD ALIGN="LEFT">Linux</TD> |
|---|
| 2760 | <TD ALIGN="LEFT">Gnu C</TD> |
|---|
| 2761 | <TD ALIGN="RIGHT">18.88</TD> |
|---|
| 2762 | <TD ALIGN="LEFT">2.5342</TD> |
|---|
| 2763 | <TD ALIGN="RIGHT">0.3946</TD> |
|---|
| 2764 | </TR> |
|---|
| 2765 | <TR><TD ALIGN="LEFT">Pentium 200</TD> |
|---|
| 2766 | <TD ALIGN="LEFT">Linux</TD> |
|---|
| 2767 | <TD ALIGN="LEFT">Gnu C</TD> |
|---|
| 2768 | <TD ALIGN="RIGHT">16.51</TD> |
|---|
| 2769 | <TD ALIGN="LEFT">2.2161</TD> |
|---|
| 2770 | <TD ALIGN="RIGHT">0.4512</TD> |
|---|
| 2771 | </TR> |
|---|
| 2772 | <TR><TD ALIGN="LEFT">SGI PowerChallenge</TD> |
|---|
| 2773 | <TD ALIGN="LEFT">IRIX</TD> |
|---|
| 2774 | <TD ALIGN="LEFT">Gnu C</TD> |
|---|
| 2775 | <TD ALIGN="RIGHT">12.446</TD> |
|---|
| 2776 | <TD ALIGN="LEFT">1.6706</TD> |
|---|
| 2777 | <TD ALIGN="RIGHT">0.5985</TD> |
|---|
| 2778 | </TR> |
|---|
| 2779 | <TR><TD ALIGN="LEFT">Pentium MMX 266</TD> |
|---|
| 2780 | <TD ALIGN="LEFT">Linux</TD> |
|---|
| 2781 | <TD ALIGN="LEFT">Gnu C (PHYLIP 3.5)</TD> |
|---|
| 2782 | <TD ALIGN="RIGHT">(36.15)</TD> |
|---|
| 2783 | <TD ALIGN="LEFT"> 1.0</TD> |
|---|
| 2784 | <TD ALIGN="RIGHT"> 1.0</TD> |
|---|
| 2785 | </TR> |
|---|
| 2786 | <TR><TD ALIGN="LEFT">DEC Alpha 400 4/233</TD> |
|---|
| 2787 | <TD ALIGN="LEFT">Linux</TD> |
|---|
| 2788 | <TD ALIGN="LEFT">Gnu C (cc -fast)</TD> |
|---|
| 2789 | <TD ALIGN="RIGHT">8.0418</TD> |
|---|
| 2790 | <TD ALIGN="LEFT">1.0792</TD> |
|---|
| 2791 | <TD ALIGN="RIGHT">0.9266</TD> |
|---|
| 2792 | </TR> |
|---|
| 2793 | <TR><TD ALIGN="LEFT">Pentium MMX 266</TD> |
|---|
| 2794 | <TD ALIGN="LEFT">Linux</TD> |
|---|
| 2795 | <TD ALIGN="LEFT">Gnu C</TD> |
|---|
| 2796 | <TD ALIGN="RIGHT">7.45</TD> |
|---|
| 2797 | <TD ALIGN="LEFT"> 1.0</TD> |
|---|
| 2798 | <TD ALIGN="RIGHT"> 1.0</TD> |
|---|
| 2799 | </TR> |
|---|
| 2800 | <TR><TD ALIGN="LEFT">Pentium II 500</TD> |
|---|
| 2801 | <TD ALIGN="LEFT">Linux</TD> |
|---|
| 2802 | <TD ALIGN="LEFT">Gnu C</TD> |
|---|
| 2803 | <TD ALIGN="RIGHT">6.02</TD> |
|---|
| 2804 | <TD ALIGN="LEFT"> 0.8081</TD> |
|---|
| 2805 | <TD ALIGN="RIGHT"> 1.2375</TD> |
|---|
| 2806 | </TR> |
|---|
| 2807 | <TR><TD ALIGN="LEFT">Compaq/Digital Alpha 500au</TD> |
|---|
| 2808 | <TD ALIGN="LEFT">Linux</TD> |
|---|
| 2809 | <TD ALIGN="LEFT">Gnu C (cc -fast)</TD> |
|---|
| 2810 | <TD ALIGN="RIGHT">0.9383</TD> |
|---|
| 2811 | <TD ALIGN="LEFT"> 0.1259</TD> |
|---|
| 2812 | <TD ALIGN="RIGHT">7.940</TD> |
|---|
| 2813 | </TR> |
|---|
| 2814 | </TABLE> |
|---|
| 2815 | </DIV> |
|---|
| 2816 | <P> |
|---|
| 2817 | As before, the parallel machines such as the Convex and the SGI PowerChallenge |
|---|
| 2818 | were only run using one processor, which does not take into account the |
|---|
| 2819 | gain that could be obtained by parallelizing the programs. The speed of the |
|---|
| 2820 | Compaq/Digital Alpha 500au is exaggerated because it was compiled in a way |
|---|
| 2821 | optimized for its processor, while the Pentium compiles were not. |
|---|
| 2822 | <P> |
|---|
| 2823 | You are invited to send me figures for your machine for |
|---|
| 2824 | inclusion in future tables. Use the data sets above and compute the total |
|---|
| 2825 | times for DNAPARS and for DNAML for the three data sets (setting the |
|---|
| 2826 | frequencies of the four bases to 0.25 each for the DNAML runs). Be sure to |
|---|
| 2827 | tell me the name and version of your compiler, and the version of PHYLIP you |
|---|
| 2828 | tested. |
|---|
| 2829 | If the times are too small to be measured accurately, obtain the times |
|---|
| 2830 | for ten data sets (the Multiple data sets option) and divide by 10. |
|---|
| 2831 | <P> |
|---|
| 2832 | <A NAME="comments"><HR><P></A> |
|---|
| 2833 | <DIV ALIGN="CENTER"> |
|---|
| 2834 | <H2>General Comments on Adapting<BR> |
|---|
| 2835 | the Package to Different Computer Systems</H2></DIV> |
|---|
| 2836 | <P> |
|---|
| 2837 | In the sections following you will find instructions on how to adapt the |
|---|
| 2838 | programs to different computers and compilers. The programs should compile |
|---|
| 2839 | without alteration on most versions of C. They use the "malloc" library |
|---|
| 2840 | or "calloc" function to allocate memory so that the upper limits on how many |
|---|
| 2841 | species or how many sites or characters they can run is set by the system memory |
|---|
| 2842 | available to that memory-allocation function. |
|---|
| 2843 | <P> |
|---|
| 2844 | In the document file for each program, I have supplied a small |
|---|
| 2845 | input example, and the output it produces, to help you check whether the |
|---|
| 2846 | programs are running properly. |
|---|
| 2847 | <P> |
|---|
| 2848 | <DIV ALIGN=CENTER> |
|---|
| 2849 | <A NAME="compiling"><HR><P></A> |
|---|
| 2850 | <H2>Compiling the programs</H2> |
|---|
| 2851 | </DIV> |
|---|
| 2852 | <P> |
|---|
| 2853 | If you have not been able to get executables for PHYLIP, you should be |
|---|
| 2854 | able to make your own. This is easy under Unix and Linux, but more |
|---|
| 2855 | difficult if you have a Macintosh or a Windows system. If you have the |
|---|
| 2856 | latter, we stringly recommend you download and use the PowerMac and |
|---|
| 2857 | Windows executables that we distribute. If you do that, you will not need |
|---|
| 2858 | to have any compiler or to do any compiling. I get a certain number of |
|---|
| 2859 | inquiries each year from confused users who are not sure what a compiler |
|---|
| 2860 | is but think they need one. After downloading the executables they |
|---|
| 2861 | contact me and complain that they did not find a compiler included in the |
|---|
| 2862 | package, and would I please e-mail them the compiler. What they really |
|---|
| 2863 | need to do is use the executables and forget about compiling them. |
|---|
| 2864 | <P> |
|---|
| 2865 | Some users may also need to compile the programs in order to modify them. |
|---|
| 2866 | The instructions below will help with this. |
|---|
| 2867 | <P> |
|---|
| 2868 | I will discuss how to compile PHYLIP using one of a number of widely-used |
|---|
| 2869 | compilers. After these I will comment on compiling PHYLIP on other, less |
|---|
| 2870 | widely-used systems. |
|---|
| 2871 | <P> |
|---|
| 2872 | <H3>Unix and Linux</H3> |
|---|
| 2873 | <P> |
|---|
| 2874 | In Unix and Linux (which is Unix in all important functional respects, if |
|---|
| 2875 | not in all |
|---|
| 2876 | legal respects) it is easy to compile PHYLIP yourself, which is why we have |
|---|
| 2877 | generally not bothered to distribute executables for Unix. Unix (and Linux) |
|---|
| 2878 | systems generally have a C compiler and have the <TT>make</TT> utility. We |
|---|
| 2879 | distribute with the PHYLIP source code a Unix-compatible <TT>Makefile</TT>. |
|---|
| 2880 | <P> |
|---|
| 2881 | After you have finished unpacking the Documentation and Source Code |
|---|
| 2882 | archive, you will find that you have created a directory <TT>phylip</TT> |
|---|
| 2883 | in which there are three |
|---|
| 2884 | subdirectories, called <TT>exe</TT>, <TT>src</TT>, and <TT>doc</TT>. |
|---|
| 2885 | There is also an HTML web page, <TT>phylip.html</TT>. The <TT>exe</TT> |
|---|
| 2886 | directory |
|---|
| 2887 | will be empty, <TT>src</TT> contains the source code files, including the |
|---|
| 2888 | <TT>Makefile</TT>. Directory <TT>doc</TT> contains the documentation files. |
|---|
| 2889 | <P> |
|---|
| 2890 | Enter the <TT>src</TT> directory. Before you compile, you will want to |
|---|
| 2891 | look at the makefile and see whether you want to alter the compilation |
|---|
| 2892 | command. There are careful instructions in the Makefile telling you how to |
|---|
| 2893 | do this. To compile all the programs just type: |
|---|
| 2894 | <P> |
|---|
| 2895 | <TT>make install</TT> |
|---|
| 2896 | <P> |
|---|
| 2897 | You will then see the compiling commands as they happen, with |
|---|
| 2898 | occasional warning messages. If these are warnings, rather than errors, |
|---|
| 2899 | they are not too serious. A typical warning would be like this: |
|---|
| 2900 | <P> |
|---|
| 2901 | <TT>dnaml.c:1204: warning: static declaration for re_move follows non-static</TT> |
|---|
| 2902 | <P> |
|---|
| 2903 | After a time the compiler will finish compiling. If you have done a |
|---|
| 2904 | <TT>make install</TT> the system will then move the executables into the |
|---|
| 2905 | <TT>exe</TT> subdirectory and also save space by erasing all the relocatable |
|---|
| 2906 | object files that were produced in the process. You should be left with |
|---|
| 2907 | useable executables in the <TT>exe</TT> directory, and the <TT>src</TT> |
|---|
| 2908 | directory should be as before. To run the executables, go into the |
|---|
| 2909 | <TT>exe</TT> directory and type the program name (say <TT>dnaml</TT>). |
|---|
| 2910 | The names of the |
|---|
| 2911 | executables will be the same as the names of the C programs, but without the |
|---|
| 2912 | <TT>.c</TT> suffix. Thus <TT>dnaml.c</TT> compiles to make an executable called <TT>dnaml</TT>. |
|---|
| 2913 | <P> |
|---|
| 2914 | A typical Unix or Linux installation would put the directory <TT>phylip</TT> |
|---|
| 2915 | in <TT>/usr/local</TT>. The name of the executables directory <TT>EXEDIR</TT> |
|---|
| 2916 | could be changed to be <TT>/usr/local/bin</TT>, so that the <TT>make install</TT> |
|---|
| 2917 | command puts the executables there. If the users have <TT>/usr/local/bin</TT> |
|---|
| 2918 | in their paths, the programs would be found when their names are typed. |
|---|
| 2919 | The font files <TT>font1</TT> through <TT>font6</TT> could also be |
|---|
| 2920 | placed there. A batch script containing the lines |
|---|
| 2921 | <P> |
|---|
| 2922 | <PRE> |
|---|
| 2923 | ln -s /usr/local/bin/font1 font1 |
|---|
| 2924 | ln -s /usr/local/bin/font2 font2 |
|---|
| 2925 | ln -s /usr/local/bin/font3 font3 |
|---|
| 2926 | ln -s /usr/local/bin/font4 font4 |
|---|
| 2927 | ln -s /usr/local/bin/font5 font5 |
|---|
| 2928 | ln -s /usr/local/bin/font6 font6 |
|---|
| 2929 | </PRE> |
|---|
| 2930 | <P> |
|---|
| 2931 | could be used to establish links in the user's working directory so that |
|---|
| 2932 | Drawtree and Drawgram would find these font files when users |
|---|
| 2933 | type a name such as <TT>font1</TT> when the program asks |
|---|
| 2934 | them for a font file name. The |
|---|
| 2935 | documentation web pages are in subdirectory <TT>doc</TT> of the |
|---|
| 2936 | main PHYLIP directory, except for one, <TT>phylip.html</TT> which is |
|---|
| 2937 | in the main PHYLIP directory. It has a table of all of the documentation |
|---|
| 2938 | pages, including this one. If users create a bookmark to that page |
|---|
| 2939 | it can be used to access all of the other documentation pages. |
|---|
| 2940 | <P> |
|---|
| 2941 | To compile just one program, such as DNAML, type: |
|---|
| 2942 | <P> |
|---|
| 2943 | <TT>make dnaml</TT> |
|---|
| 2944 | <P> |
|---|
| 2945 | After this compilation, <TT>dnaml</TT> will be in the <TT>src</TT> |
|---|
| 2946 | subdirectory. So will some rrelocatable object code files that |
|---|
| 2947 | were used to create the executable. These have names ending in |
|---|
| 2948 | <TT>.o</TT> - they can safely be deleted. |
|---|
| 2949 | <P> |
|---|
| 2950 | If you have problems with the compilation command, you can edit the |
|---|
| 2951 | <TT>Makefile</TT>. It has careful explanations at its front of how you |
|---|
| 2952 | might want to do so. For example, you might want to change the C |
|---|
| 2953 | compiler name <TT>cc</TT> to the name of the Gnu C compiler, <TT>gcc</TT>. |
|---|
| 2954 | This can be done by removing the comment character <TT>#</TT> from the |
|---|
| 2955 | front of one line, and placing it at the front of a nearby line. |
|---|
| 2956 | How to do so should be clear from the material at the beginning of the |
|---|
| 2957 | <TT>Makefile</TT>. We have included sample lines for using the <TT>gcc</TT> |
|---|
| 2958 | compiler and for using the Cygwin Gnu C++ environment on Windows, as |
|---|
| 2959 | well as the default of <TT>cc</TT>. |
|---|
| 2960 | <P> |
|---|
| 2961 | Some older C compilers (notably the Berkeley C compiler which is |
|---|
| 2962 | included free with some Sun systems) do not adhere to the ANSI C |
|---|
| 2963 | standard (because they were written before it was set down). |
|---|
| 2964 | They have trouble with the function prototypes which are in |
|---|
| 2965 | our programs. We have included an <TT>#ifndef</TT> preprocessor |
|---|
| 2966 | command to eliminate the problem, if you use the switch <TT>-DOLDC</TT> |
|---|
| 2967 | when compiling. Thus with these compilers you need only use this in |
|---|
| 2968 | your C flags (in the Makefile) and compilers such as Berkeley C |
|---|
| 2969 | will cause no trouble. |
|---|
| 2970 | <P> |
|---|
| 2971 | <H3>Macintosh PowerMacs</H3> |
|---|
| 2972 | <P> |
|---|
| 2973 | <B>Compiling with Metrowerks Codewarrior on Macintosh PowerMacs...</B> |
|---|
| 2974 | <P> |
|---|
| 2975 | We shall assume that you have a recent version of the Metrowerks |
|---|
| 2976 | Codewarrior C++ |
|---|
| 2977 | compiler. This description, and the project files that we provide, |
|---|
| 2978 | assume Codewarrior 5.3. We also assume some familiarity with |
|---|
| 2979 | the use of the Codewarrior compiler and its Integrated Development |
|---|
| 2980 | Environment (IDE). |
|---|
| 2981 | <P> |
|---|
| 2982 | Start with our <TT>src</TT> directory (folder) that contains the C source |
|---|
| 2983 | code files such as <TT>dnaml.c</TT> and also the Codewarrior resource |
|---|
| 2984 | files such as <TT>dnaml.rsrc</TT>, which are provided by us. |
|---|
| 2985 | <P> |
|---|
| 2986 | <B>Creating the project file.</B> We will use DnaML as our example. |
|---|
| 2987 | We have provided a full set of project files in the |
|---|
| 2988 | self-extracting Macintosh archive. |
|---|
| 2989 | <EM>If you have them then you do not need |
|---|
| 2990 | to do the items on the following list:</EM> |
|---|
| 2991 | <OL> |
|---|
| 2992 | <LI>Start up the Codewarrior IDE integrated development environment. |
|---|
| 2993 | <LI>Create a new project file by choosing <TT>New...</TT> on the <TT>File</TT> |
|---|
| 2994 | menu. |
|---|
| 2995 | <LI>Type in the project name <TT>dnaml.proj</TT> |
|---|
| 2996 | <LI>On the Project menu on the left side of the <TT>New</TT> window, double-click on <TT>MacOS C/C++ Stationery</TT> |
|---|
| 2997 | <LI>In the <TT>New project</TT> window that opens, click on the triangle |
|---|
| 2998 | to the left of <TT>Standard Console</TT>. |
|---|
| 2999 | <LI>Move the slider at the right of the window down until you reach |
|---|
| 3000 | <TT>SIOUX-WASTE</TT> |
|---|
| 3001 | <LI>Click on the triangle to the left of <TT>SIOUX-WASTE</TT>. This opens |
|---|
| 3002 | another list of choices below. |
|---|
| 3003 | <LI>Click on the menu item <TT>SIOUX-WASTE C PPC</TT>. Press the <TT>OK</TT> button. After a bit a window <TT>dnaml.proj</TT> will open. |
|---|
| 3004 | <LI>Click on the triangle to the left of the <TT>Sources</TT> menu item. A |
|---|
| 3005 | template item called <TT>HelloWorld.c</TT> will open. |
|---|
| 3006 | <LI>Select <TT>HelloWorld.c</TT>. |
|---|
| 3007 | <LI>Open the <TT>Edit</TT> menu at the top of the Mac screen and select |
|---|
| 3008 | <TT>Clear</TT>. A box will open asking if you want to remove <TT>HelloWorld.c</TT> from the project. |
|---|
| 3009 | <LI>Select <TT>OK</TT>. |
|---|
| 3010 | <LI>If the <TT>dnaml.c</TT> file came from the self-extracting Macintosh |
|---|
| 3011 | archive that we distribute, it should show a yellow-and-back-striped Metrowerks |
|---|
| 3012 | icon (if not, as when you get it from some other form of our distribution, |
|---|
| 3013 | you may have to pass it through a program like Microsoft Word, making |
|---|
| 3014 | sure to save it as a Text Only file, to get |
|---|
| 3015 | Metrowerks to be able to see it as a potential source code file). |
|---|
| 3016 | <LI>Drag the <TT>dnaml.c</TT> file onto the <TT>Sources</TT> item in your |
|---|
| 3017 | <TT>dnaml.proj</TT> window. |
|---|
| 3018 | <LI>Drop it onto Sources so that it appears under the <TT>Sources</TT> choice. |
|---|
| 3019 | This may take a few tries -- if it appears above <TT>Sources</TT> grab it |
|---|
| 3020 | and move it again. |
|---|
| 3021 | <LI>Now add the other files that must be compiled with <TT>dnaml.c</TT>. |
|---|
| 3022 | These can be identified by looking at our <TT>Makefile</TT> -- for DnaML |
|---|
| 3023 | they are <TT>seq.c</TT>, <TT>phylip.c</TT>, <TT>seq.h</TT>, and <TT>phylip.h</TT>. Each of them needs to be added to the project file in the same way that |
|---|
| 3024 | <TT>dnaml.c</TT> was. |
|---|
| 3025 | <LI>Drag <TT>dnaml.rsrc</TT> into <TT>Sources</TT> in the same way. It |
|---|
| 3026 | doesn't matter whether it appears before or after <TT>dnaml.c</TT>. |
|---|
| 3027 | <LI>Go to the <TT>Edit</TT> menu and select the <TT>PPC Std C SIOUX-WASTE Settings</TT> item. A window of that name will then open. |
|---|
| 3028 | <LI>Under the <TT>Target</TT> item you will see a <TT>PPC Target</TT> item. |
|---|
| 3029 | Select it. A <TT>PPC Target</TT> window will open to the right. |
|---|
| 3030 | <LI>Change the name in the <TT>File Name</TT> box to be <TT>PHYLIP</TT> |
|---|
| 3031 | <LI>Change the <TT>????</TT> in the <TT>Creator</TT> box to (say) <TT>PHYD</TT> |
|---|
| 3032 | <LI>Change the <TT>Preferred Heap Size</TT> to <TT>1024</TT>. |
|---|
| 3033 | <! need to add selections of PPC Processor here > |
|---|
| 3034 | <! ditto for Global Optimization > |
|---|
| 3035 | <LI>Under <TT>Language Settings</TT> in the left-hand menu of the window, |
|---|
| 3036 | select <TT>C/C++ Language</TT>. A window called <TT>C/C++ Language</TT> |
|---|
| 3037 | will open to the immediate right. |
|---|
| 3038 | <LI>Click on <TT>Require Function Prototypes</TT> to deselect that setting. |
|---|
| 3039 | <LI>Click on the <TT>Save</TT> button at the lower-right of the project |
|---|
| 3040 | settings window. |
|---|
| 3041 | <LI>Close the <TT>PPC Std C SIOUX-WASTE Settings</TT> window using the usual |
|---|
| 3042 | box in the upper-left corner. |
|---|
| 3043 | <LI>On your Desktop you should now find a folder <TT>PHYLIP</TT>. |
|---|
| 3044 | If it has a |
|---|
| 3045 | file called <TT>HelloWorld.c</TT> you may want to discard that file. |
|---|
| 3046 | <LI>In that <TT>PHYLIP</TT> folder you will find a file <TT>dnaml.proj</TT>. |
|---|
| 3047 | <LI>Double-click on that project file. If the Metrowerks is not already open, |
|---|
| 3048 | it should open now. |
|---|
| 3049 | <LI>If a window called <TT>Project Messages</TT> opens and there is a |
|---|
| 3050 | complaint in it about access paths being wrong, you should fix these by |
|---|
| 3051 | selecting the <TT>Reset project entry paths</TT> item in the <TT>Project</TT> |
|---|
| 3052 | menu. |
|---|
| 3053 | <LI>Select the <TT>Make</TT> item in the <TT>Project</TT> menu. |
|---|
| 3054 | <LI>In the <TT>Project</TT> menu, select <TT>Make</TT> |
|---|
| 3055 | </OL> |
|---|
| 3056 | <B>Compiling a program once its resource file is available.</B>. |
|---|
| 3057 | If the resource files are all available (as they should be), you did not need |
|---|
| 3058 | to do any of the above. Usually users will have no need to compile |
|---|
| 3059 | the programs, but occasionally they may want to change a setting or |
|---|
| 3060 | add a feature. In that case the Metrowerks Codewarrior compiler can be |
|---|
| 3061 | used. We have provided support for compiling the programs in its |
|---|
| 3062 | most recent version, version 5.3. The following discussion will |
|---|
| 3063 | assume that you have obtained and installed the compiler. |
|---|
| 3064 | <P> |
|---|
| 3065 | You should find in the source code directory |
|---|
| 3066 | <TT>src</TT> a subdirectory called <TT>mac</TT> which contains the |
|---|
| 3067 | Metrowerks Codewarrior compiler "project files" (with names ending in |
|---|
| 3068 | <TT>.proj</TT>, as well as the resource files (which end in <TT>.rsrc</TT> |
|---|
| 3069 | for each program. You can get into this subdirectory, activate the |
|---|
| 3070 | Metrowerks compiler, and open the appropriate project file. To |
|---|
| 3071 | compile the program, simply make sure that the project file is an |
|---|
| 3072 | active window, and type <TT>Command-M</TT> (which is to say, hold down |
|---|
| 3073 | the <TT>Command</TT> key while typing <TT>M</TT>). Alternatively, |
|---|
| 3074 | pull down the <TT>Project</TT> window and select <TT>Make</TT>. The |
|---|
| 3075 | program should then compile, possibly with ignorable warning messages. |
|---|
| 3076 | <P> |
|---|
| 3077 | <H3>Windows systems</H3> |
|---|
| 3078 | <P> |
|---|
| 3079 | <B>Compiling with Microsoft Visual C++</B> |
|---|
| 3080 | <P> |
|---|
| 3081 | Microsoft Visual C++ is used to compile the executables we distribute |
|---|
| 3082 | Windows. It can compile using a Makefile. We have supplied this |
|---|
| 3083 | in the source code distrubution as <TT>Makefile.msvc</TT>. |
|---|
| 3084 | You will need to preserve the Unix Makefile by renaming it to, say, |
|---|
| 3085 | <TT>Makefile.unix</TT>, then make a copy of <TT>Makefile.msvc</TT> |
|---|
| 3086 | and call it <TT>Makefile</TT>. |
|---|
| 3087 | <P> |
|---|
| 3088 | <B>Setting the path.</B> |
|---|
| 3089 | Before using <TT>nmake</TT> you will need to have the paths |
|---|
| 3090 | set properly. For this, use the Start menu to open Command or |
|---|
| 3091 | a Dos Prompt first. To set the path type<BR> |
|---|
| 3092 | <PRE> |
|---|
| 3093 | set MSVC=Path |
|---|
| 3094 | </PRE> |
|---|
| 3095 | where Path is where Microsoft Visual Studio is installed |
|---|
| 3096 | (e.g. it might be in <TT>c:\Microsoft Visual Studio</TT>). |
|---|
| 3097 | However the path you type should not have any spaces in it. |
|---|
| 3098 | This means that you may have to use the directory's |
|---|
| 3099 | DOS filename. In general to get a DOS name you take the first six letters of |
|---|
| 3100 | the directory name and follow them by <TT>~1</TT>. For example, |
|---|
| 3101 | <TT>Microsoft Visual Studio</TT> will have a DOS name |
|---|
| 3102 | <TT>Micros~1</TT>, <TT>Program Files</TT> will be <TT>Progra~1</TT>). |
|---|
| 3103 | Depending on what other |
|---|
| 3104 | file are in the directory the DOS name may be the first six letters followed |
|---|
| 3105 | by <TT>~2,~3,~4</TT>, etc... (e.g. <TT>Micros~3</TT> or <TT>Progra~5</TT>). |
|---|
| 3106 | It may take some |
|---|
| 3107 | experimentation to figure it out. With older Versions of Windows (pre-win2000) |
|---|
| 3108 | it may be possible to just right click on the directory icon and select |
|---|
| 3109 | Properties to get the DOS name. |
|---|
| 3110 | <P> |
|---|
| 3111 | Once you have set MSVC, type |
|---|
| 3112 | <PRE> |
|---|
| 3113 | PATH=%PATH%;%MSVC%\VC98\bin |
|---|
| 3114 | </PRE> |
|---|
| 3115 | Then the Makefile will need to be edited. The line |
|---|
| 3116 | <PRE> |
|---|
| 3117 | MSVCPATH=c:\Micros~1\VC98 |
|---|
| 3118 | </PRE> |
|---|
| 3119 | will need to be changed so that |
|---|
| 3120 | It points to whereever Microsoft Visual Studio is installed followed by |
|---|
| 3121 | <TT>\VC98</TT>. |
|---|
| 3122 | <P> |
|---|
| 3123 | <B>Using the Makefile</B>. The Makefile is invoked using the |
|---|
| 3124 | <TT>nmake</TT> command. If you simply type <TT>nmake</TT> you |
|---|
| 3125 | will get a list of possible <TT>make</TT> commands. For example, |
|---|
| 3126 | to compile a single program such as <TT>Dnaml</TT> but not |
|---|
| 3127 | install it, type <TT>make dnaml</TT>. To compile and install all |
|---|
| 3128 | programs type <TT>make install</TT>. We have supplied all the |
|---|
| 3129 | support files and icons needed for the compilations. They are |
|---|
| 3130 | in subdirectory <TT>msvc</TT> of the main source code |
|---|
| 3131 | directory. |
|---|
| 3132 | <P> |
|---|
| 3133 | <B>Compiling with Borland C++</B> |
|---|
| 3134 | <P> |
|---|
| 3135 | Borland C++ can be downloaded for free from Inprise (Borland) |
|---|
| 3136 | (see their site |
|---|
| 3137 | <A HREF="http://www.borland.com">http://www.borland.com</A> |
|---|
| 3138 | It can compile using a Makefile. We have supplied this |
|---|
| 3139 | in the source code distrubution as <TT>Makefile.bcc</TT>. |
|---|
| 3140 | You will need to preserve the Unix Makefile by renaming it to, say, |
|---|
| 3141 | <TT>Makefile.unix</TT>, then make a copy of <TT>Makefile.bcc</TT> |
|---|
| 3142 | and call it <TT>Makefile</TT>. The Makefile is invoked using the |
|---|
| 3143 | <TT>make</TT> command. If you simply type <TT>make</TT> you |
|---|
| 3144 | will get a list of possible <TT>make</TT> commands. For example, |
|---|
| 3145 | to compile a single program such as <TT>Dnaml</TT> but not |
|---|
| 3146 | install it, type <TT>make dnaml</TT>. To compile and install all |
|---|
| 3147 | programs type <TT>make install</TT>. We have supplied all the |
|---|
| 3148 | the support files and icons needed for the compilations. They |
|---|
| 3149 | are in subdirectory <TT>bcc</TT> of the main source code |
|---|
| 3150 | directory. We have had to supply a complete |
|---|
| 3151 | second set of the resource files with names <TT>*.brc</TT> |
|---|
| 3152 | because Borland resource files have a minor incompatibility |
|---|
| 3153 | with Microsoft Visual C++ resource files. |
|---|
| 3154 | <P> |
|---|
| 3155 | If this does not work the <TT>PATH</TT> may need to be set manually. |
|---|
| 3156 | This can be done by opening a Command or DOS window using the Start |
|---|
| 3157 | menu. To set the path, type |
|---|
| 3158 | <PRE> |
|---|
| 3159 | set BORLAND=Path |
|---|
| 3160 | </PRE> |
|---|
| 3161 | Where <TT>Path</TT> is where Borland is installed, such as |
|---|
| 3162 | <TT>C:\Progra~1\Borland</TT>. |
|---|
| 3163 | Then type |
|---|
| 3164 | <PRE> |
|---|
| 3165 | PATH=%PATH%;%BORLAND%\CBUILD~1\Bin |
|---|
| 3166 | </PRE> |
|---|
| 3167 | <P> |
|---|
| 3168 | <B>Compiling with Metrowerks Codewarrior for Windows</B> |
|---|
| 3169 | <P> |
|---|
| 3170 | As with Macintosh systems, Metrowerks Codewarrior requires |
|---|
| 3171 | you to have project files for each program you compile. |
|---|
| 3172 | For Metrowerks Codewarrior for Windows we are not providing the projects |
|---|
| 3173 | themselves, but we are providing |
|---|
| 3174 | projects which have been exported as XML files. To open one of these one |
|---|
| 3175 | cannot just click on |
|---|
| 3176 | File/Open but instead on the menu option File/Import Project. |
|---|
| 3177 | Metrowerks will then ask you for the project name. |
|---|
| 3178 | Type in the name of the program (e.g. dnaml). Once this is done Metrowerks will |
|---|
| 3179 | act like this is a regular project file. |
|---|
| 3180 | <P> |
|---|
| 3181 | We have supplied a complete set of these XML project files in the |
|---|
| 3182 | source code distribution. They are in subdirectory <TT>metro</TT> |
|---|
| 3183 | of the main source code directory. This is supplied with the |
|---|
| 3184 | source code distribution for Windows (it is not in the source |
|---|
| 3185 | code distributions for other platforms). |
|---|
| 3186 | For Metrowerks Codewarrior for Windows we are not providing the projects |
|---|
| 3187 | themselves, but we are providing |
|---|
| 3188 | projects which have been exported as XML files. To open one of these one |
|---|
| 3189 | cannot just click on |
|---|
| 3190 | File/Open but instead on the menu option File/Import Project. |
|---|
| 3191 | Metrowerks will then ask you for the project name. |
|---|
| 3192 | Type in the name of the program (e.g. dnaml). Once this is done Metrowerks will |
|---|
| 3193 | act like this is a regular project file. |
|---|
| 3194 | <P> |
|---|
| 3195 | To compile the program |
|---|
| 3196 | pull down the <TT>Project</TT> menu and select <TT>Make</TT>. The |
|---|
| 3197 | program should then compile, possibly with ignorable warning messages. |
|---|
| 3198 | <P> |
|---|
| 3199 | For the moment we are not giving here the details of |
|---|
| 3200 | how to create these projects yourself -- you usually will not need |
|---|
| 3201 | to, as you have the project files we have supplied. |
|---|
| 3202 | <P> |
|---|
| 3203 | <B>Compiling with Cygnus Gnu C++</B> |
|---|
| 3204 | <P> |
|---|
| 3205 | Cygnus Solutions (now a part of Red Hat, Inc.) has adapted the Gnu C compiler |
|---|
| 3206 | to Windows systems and |
|---|
| 3207 | provided an environment, CygWin, which mimics Unix for compiling. |
|---|
| 3208 | This is available for purchase from them, and they also make it |
|---|
| 3209 | available to be downloaded for free. The download is large. To get it, go |
|---|
| 3210 | to <A HREF="http://sources.redhat.com/cygwin/download.html">their download site</A> at |
|---|
| 3211 | <CODE>http://sources.redhat.com/cygwin/download.html</CODE> and follow the |
|---|
| 3212 | instructions there. It is a bit |
|---|
| 3213 | difficult to figure out how to download it -- you need to download |
|---|
| 3214 | their <TT>setup.exe</TT> program and then it will download the rest |
|---|
| 3215 | when it is run. You will need a lot of disk space for it. |
|---|
| 3216 | <P> |
|---|
| 3217 | Once you have |
|---|
| 3218 | installed the free Cygnus environment and the associated Gnu C compiler |
|---|
| 3219 | on your Windows system, compiling PHYLIP is essentially identical to |
|---|
| 3220 | what one does for Unix or Linux. In PHYLIP's <TT>src</TT> directory, |
|---|
| 3221 | change the name of our Unix <TT>Makefile</TT> to something like |
|---|
| 3222 | <TT>Makefile.unx</TT> (so as to keep it around). There is a special |
|---|
| 3223 | Makefile for the Cygwin |
|---|
| 3224 | compiler called <TT>Makefile.cyg</TT>. Make a copy of it called |
|---|
| 3225 | <TT>Makefile</TT>. |
|---|
| 3226 | <P> |
|---|
| 3227 | This Makefile should contain a compiling command: |
|---|
| 3228 | <P> |
|---|
| 3229 | <TT>CC = gcc</TT> |
|---|
| 3230 | <P> |
|---|
| 3231 | Now enter the Cygwin environment (which you can do using the Windows |
|---|
| 3232 | <TT>Start</TT> menu and its <TT>Programs</TT> menu item. There should be |
|---|
| 3233 | a <TT>Cygnus</TT> menu choice within that submenu, which you can use to |
|---|
| 3234 | start the Cygnus environment. This puts you in an imitation of a Unix |
|---|
| 3235 | shell. |
|---|
| 3236 | <P> |
|---|
| 3237 | On entering the CygWin environment you will find yourself in one of the |
|---|
| 3238 | subdirectories of the CygWin directory. Change to the directory where the |
|---|
| 3239 | PHYLIP programs have been put (for example by issuing the command |
|---|
| 3240 | <P> |
|---|
| 3241 | <TT>cd c:/phylip</TT><BR> |
|---|
| 3242 | <BR> |
|---|
| 3243 | You should then be able to compile PHYLIP |
|---|
| 3244 | by issuing the appropriate make command, such as <TT>make install</TT>. |
|---|
| 3245 | If you have modified one of our source code files such as <TT>dnaml.c</TT>, |
|---|
| 3246 | it would be wise to |
|---|
| 3247 | have saved the original version of it first as, say, <TT>dnaml.c0</TT>. |
|---|
| 3248 | To associate an icon with a program (say DnaML), you need an icon |
|---|
| 3249 | file (say <TT>dna.ico</TT> which contains the icon in standard format. |
|---|
| 3250 | There should also be a file called <TT>dnaml.rc</TT> which contains the single |
|---|
| 3251 | line: |
|---|
| 3252 | <P> |
|---|
| 3253 | <TT>dnaml ICON "dna.ico"</TT> |
|---|
| 3254 | <P> |
|---|
| 3255 | We have provided a subdirectory <TT>icons</TT> in the <TT>src</TT> |
|---|
| 3256 | subdirectory, containing a full set of icons and a full set of resource |
|---|
| 3257 | files (<TT>*.rc</TT>). |
|---|
| 3258 | Our Cygwin Makefile will automatically invoke them. |
|---|
| 3259 | <P> |
|---|
| 3260 | <H3>VMS VAX systems</H3> |
|---|
| 3261 | <P> |
|---|
| 3262 | We have not tried to compile version 3.6 on an OpenVMS system but the |
|---|
| 3263 | following instructions should work. |
|---|
| 3264 | On the OpenVMS operating system with DEC VAX VMS C the programs will compile |
|---|
| 3265 | without alteration. The commands for compiling a typical program |
|---|
| 3266 | (DNAPARS, which depends on the separately compiled files <TT>phylip.c</TT> |
|---|
| 3267 | and <TT>seq.c</TT>) are: |
|---|
| 3268 | <P> |
|---|
| 3269 | <TT>$ DEFINE LNK$LIBRARY SYS$LIBRARY:VAXCRTL |
|---|
| 3270 | <BR> |
|---|
| 3271 | $ CC DNAPARS.C |
|---|
| 3272 | <BR> |
|---|
| 3273 | $ CC PHYLIP.C |
|---|
| 3274 | <BR> |
|---|
| 3275 | $ CC SEQ.C |
|---|
| 3276 | <BR> |
|---|
| 3277 | $ LINK DNAPARS,PHYLIP,SEQ |
|---|
| 3278 | <BR> |
|---|
| 3279 | </TT> |
|---|
| 3280 | <P> |
|---|
| 3281 | Once you use this <TT>$ DEFINE</TT> statement during a given interactive session, |
|---|
| 3282 | you need not repeat it again as the symbol <TT>LNK$LIBRARY</TT> is thereafter |
|---|
| 3283 | properly defined. The compilation process leaves a file <TT>DNAPARS.OBJ</TT> |
|---|
| 3284 | in your directory: this can |
|---|
| 3285 | be discarded. The executable program is named <TT>DNAPARS.EXE</TT>. To run the program |
|---|
| 3286 | one then uses the command: |
|---|
| 3287 | <P> |
|---|
| 3288 | <TT>$ R DNAPARS</TT> |
|---|
| 3289 | <P> |
|---|
| 3290 | The compiler defaults to the filenames <TT>INFILE.</TT>, <TT>OUTFILE.</TT>, and |
|---|
| 3291 | <TT>TREEFILE.</TT>. |
|---|
| 3292 | If the input file <TT>INFILE.</TT> does not exist the program will prompt you to |
|---|
| 3293 | type in its name. Note that some commands on VMS such as <TT>TYPE OUTFILE</TT> |
|---|
| 3294 | will fail because the name of the file that it will attempt to type out will be not |
|---|
| 3295 | <TT>OUTFILE.</TT> but <TT>OUTFILE.LIS</TT>. To get it to type the write file you |
|---|
| 3296 | would have to instead issue the command <TT>TYPE OUTFILE.</TT>. |
|---|
| 3297 | <P> |
|---|
| 3298 | When you are |
|---|
| 3299 | using the interactive previewing feature of DRAWGRAM (or DRAWTREE) on |
|---|
| 3300 | a Tektronix or DEC ReGIS compatible terminal, you will want before |
|---|
| 3301 | running the program to have issued the command: |
|---|
| 3302 | <P> |
|---|
| 3303 | <TT>$ SET TERM/NOWRAP/ESCAPE</TT> |
|---|
| 3304 | <P> |
|---|
| 3305 | so that you do not run into trouble from the VMS line length limit of |
|---|
| 3306 | 255 characters or the filtering of escape characters. |
|---|
| 3307 | <P> |
|---|
| 3308 | To know which files to compile together, look at the entries in the |
|---|
| 3309 | <TT>Makefile</TT>. |
|---|
| 3310 | <P> |
|---|
| 3311 | VMS systems are rapidly disappearing, so we will not devote much |
|---|
| 3312 | effort to get PHYLIP working on them. |
|---|
| 3313 | <P> |
|---|
| 3314 | <H3>Parallel computers</H3> |
|---|
| 3315 | <P> |
|---|
| 3316 | As parallel computers become more common, the issue of how to compile |
|---|
| 3317 | PHYLIP for them has become more pressing. People have been compiling |
|---|
| 3318 | PHYLIP for vector machines and parallel machines for many years. We |
|---|
| 3319 | have not made a version for parallel machines because there is still |
|---|
| 3320 | no standard parallel programming environment on such machines (or rather, |
|---|
| 3321 | there are many standards, so that one cannot find one that makes |
|---|
| 3322 | a parallel execution version of PHYLIP practical). However the |
|---|
| 3323 | MPI Message Passing Interface is spreading rapidly, and we will |
|---|
| 3324 | probably support it in future versions of PHYLIP. |
|---|
| 3325 | <P> |
|---|
| 3326 | Although the underlying algorithms of most programs, |
|---|
| 3327 | which treat sites independently, should be amenable to vector and |
|---|
| 3328 | parallel processors, |
|---|
| 3329 | there are details of the code which might best be changed. |
|---|
| 3330 | In certain of the programs (<TT>Dnaml</TT>, <TT>Dnamlk</TT>, |
|---|
| 3331 | <TT>Proml</TT>, <TT>Promlk</TT>) I have put a special |
|---|
| 3332 | comment statement next to the loops in the program where |
|---|
| 3333 | the program will spend most of its time, and which are the places |
|---|
| 3334 | most likely to benefit from parallelization. This comment statement is:<BR> |
|---|
| 3335 | <PRE> |
|---|
| 3336 | /* parallelize here */ |
|---|
| 3337 | </PRE> |
|---|
| 3338 | In particular |
|---|
| 3339 | within these innermost loops of the programs there are often scalar quantities |
|---|
| 3340 | that are used for temporary bookkeeping. These quantities, such as |
|---|
| 3341 | <TT>sum1, sum2, zz, z1, yy, y1, aa, bb, cc, sum,</TT> and <TT>denom</TT> in procedure makenewv |
|---|
| 3342 | of DNAML (and similar quantities in procedure nuview) are there to |
|---|
| 3343 | minimize the number of array references. For vectorizing and parallelizing |
|---|
| 3344 | compilers it will |
|---|
| 3345 | be better to replace them by arrays so that processing can occur |
|---|
| 3346 | simultaneously. |
|---|
| 3347 | <P> |
|---|
| 3348 | If you succeed in making a parallel version of PHYLIP we would like to |
|---|
| 3349 | know how you did it. In particular, if you can prepare a web page which |
|---|
| 3350 | describes how to do it for your computer system, we would like to have it |
|---|
| 3351 | for inclusion in our PHYLIP web pages. Please e-mail it to me. We hope to |
|---|
| 3352 | have a set of pages that give detailed instructions on how to make parallel |
|---|
| 3353 | version of PHYLIP on various kinds of machines. Alternatively, if we |
|---|
| 3354 | are given your modified version of the program we may be able to |
|---|
| 3355 | figure out how to make modifications to our source code to allow |
|---|
| 3356 | users to compile the program in a way which makes those modifications. |
|---|
| 3357 | <P> |
|---|
| 3358 | <H3>Other computer systems</H3> |
|---|
| 3359 | <P> |
|---|
| 3360 | As you can see from the variety of different systems on which these |
|---|
| 3361 | programs have been successfully run, there are no serious |
|---|
| 3362 | incompatibility problems with most computer systems. PHYLIP in various |
|---|
| 3363 | past Pascal versions has also been compiled on 8080 and Z80 CP/M Systems, Apple |
|---|
| 3364 | II systems running UCSD Pascal, a variety of minicomputer systems such as |
|---|
| 3365 | DEC PDP-11's and HP 1000's, on 1970's era mainframes such as CDC |
|---|
| 3366 | Cyber systems, and so on. In a later era |
|---|
| 3367 | it was also compiled on IBM 370 mainframes, and of course on DOS and |
|---|
| 3368 | Windows systems and on Macintosh and PowerMacintosh systems. |
|---|
| 3369 | We have gradually |
|---|
| 3370 | accumulated experience on a wider variety of C compilers. If you succeed in |
|---|
| 3371 | compiling the C version of PHYLIP on a different machine or a different |
|---|
| 3372 | compiler, I would like to |
|---|
| 3373 | hear the details so that I can consider including the instructions in a future version |
|---|
| 3374 | of this manual. |
|---|
| 3375 | <P> |
|---|
| 3376 | <DIV ALIGN="CENTER"> |
|---|
| 3377 | <A NAME="FAQ"><HR><P></A> |
|---|
| 3378 | <H2>Frequently Asked Questions</H2></DIV> |
|---|
| 3379 | <P> |
|---|
| 3380 | This set of Frequently Asked Questions, and their answers, is from the |
|---|
| 3381 | PHYLIP web site. A more up-to-date version can be found there, at: |
|---|
| 3382 | <P> |
|---|
| 3383 | <DIV ALIGN="CENTER"> |
|---|
| 3384 | <A HREF="http://evolution.gs.washington.edu/phylip/faq.html"> |
|---|
| 3385 | <TT>http://evolution.gs.washington.edu/phylip/faq.html</TT></A></DIV> |
|---|
| 3386 | <P> |
|---|
| 3387 | <DL> |
|---|
| 3388 | <DT><STRONG>"It doesn't work! <I>It doesn't work!!</I> It says <TT>can't find infile.</TT></STRONG> |
|---|
| 3389 | <DD>Actually, it's working just fine. Many of the programs look for an input file called <TT>infile</TT>, |
|---|
| 3390 | and if one of that name is not present in the current directory, they then ask |
|---|
| 3391 | you to type in the name of the input file. That's all that it's doing. This |
|---|
| 3392 | is done so that |
|---|
| 3393 | you can get the program to read the file without you having to type in its |
|---|
| 3394 | name, by making a copy of your input file and calling it <TT>infile</TT>. |
|---|
| 3395 | If you don't do that, then the program issues this message. It looks |
|---|
| 3396 | alarming, but really all that it is trying to do is to get you to type in |
|---|
| 3397 | the name of the input file. Try giving it the name of the input file. |
|---|
| 3398 | <DT><STRONG>"The program reads my data file and then says it's has |
|---|
| 3399 | a memory allocation error!"</STRONG> |
|---|
| 3400 | <DD>This is what tends to happen if there is a problem with the format of the data |
|---|
| 3401 | file, so that the programs get confused and think they need to set aside memory |
|---|
| 3402 | for 1,000,000 species or so. The result is a "memory allocation error". Check the data file format against the documentation: |
|---|
| 3403 | make sure that the data files have <I>not</I> been saved in the format of |
|---|
| 3404 | your word processor (such as Microsoft Word) but in a "flat ASCII" or "text only" |
|---|
| 3405 | mode. Note that adding memory to your computer is <I>not</I> the |
|---|
| 3406 | way to solve this problem -- you probably have plenty of memory |
|---|
| 3407 | to run the program once the data file is in the correct format. |
|---|
| 3408 | <DT><STRONG>"On our Macintosh, larger data files fail to run."</STRONG> |
|---|
| 3409 | <DD>We have set the memory allowances on the Macintosh executables |
|---|
| 3410 | to be generous, but not too big. You therefore may need to |
|---|
| 3411 | increase them. Use the <TT>Get Info</TT> item on the Finder <TT>File</TT> menu. |
|---|
| 3412 | <DT><STRONG>"I opened the program but I don't see where to create |
|---|
| 3413 | a data file!"</STRONG> |
|---|
| 3414 | <DD>The programs (there are more than one) use data |
|---|
| 3415 | files that have been created outside of the program. They do not have any |
|---|
| 3416 | data editor within them. You can create a data file by using an editor, |
|---|
| 3417 | such as Microsoft Word, EMACS, vi, SimpleText, Notepad, etc. But be sure |
|---|
| 3418 | <I>not</I> to save the file in Microsoft Word's own format. It should be saved in |
|---|
| 3419 | Text Only format. You can use the documentation files, including the examples |
|---|
| 3420 | at the end of those files, to figure out the format of the input file. |
|---|
| 3421 | Documentation files such as <TT>main.html</TT>, <TT>sequence.html</TT>, |
|---|
| 3422 | <TT>distance.html</TT> and many others should be consulted. Many users |
|---|
| 3423 | create their data files by having their alignment program (such as |
|---|
| 3424 | ClustalW), output its alignments in PHYLIP format. Many alignment programs |
|---|
| 3425 | have options to do that. |
|---|
| 3426 | menu while the program is selected. |
|---|
| 3427 | <DT><STRONG>"I ran PHYLIP, and all it did was say it was extracting a bunch of files!"</STRONG> |
|---|
| 3428 | <DD> |
|---|
| 3429 | There is no executable program |
|---|
| 3430 | named <TT>PHYLIP</TT> in the PHYLIP package! But in some cases |
|---|
| 3431 | (especially the Windows distribution) there is a file called |
|---|
| 3432 | <TT>phylip.exe</TT>. |
|---|
| 3433 | That file is an archive of documentation and source code. Once you have |
|---|
| 3434 | run it and extracted the files in it, so that they are in the directory, |
|---|
| 3435 | running it again will just do the extraction again, which is unnecessary. |
|---|
| 3436 | Similarly for the archive files for the Windows executables, which |
|---|
| 3437 | have names like <TT>phylipwx.exe</TT> and <TT>phylipwy.exe</TT>. |
|---|
| 3438 | They are run only once to extract their contents. |
|---|
| 3439 | <DT><STRONG>"One program makes an output file and then the next program crashes while reading it!"</STRONG> |
|---|
| 3440 | <DD>Did you rename the file? If a program makes a file called <TT>outfile</TT>, and then the |
|---|
| 3441 | next program is told to use <TT>outfile</TT> as its input file, terrible things will |
|---|
| 3442 | happen. The second program first opens <TT>outfile</TT> as an output file, thus |
|---|
| 3443 | erasing it. When it then tries to read from this empty <TT>outfile</TT> |
|---|
| 3444 | a psychological |
|---|
| 3445 | crisis ensues. The solution is simply to rename <TT>outfile</TT> before trying to |
|---|
| 3446 | use it as an input file. |
|---|
| 3447 | <DT><STRONG>"I make a file called infile and then the program can't find it!"</STRONG> |
|---|
| 3448 | <DD>Let me guess. You are using Windows, right? You made your file in Word or |
|---|
| 3449 | in Notepad or WordPad, right? If you made a file in one of these editors, and |
|---|
| 3450 | saved it, not in Word format, but in Text Only format, then you were doing the |
|---|
| 3451 | right thing. But when you told the operating system to save the file as |
|---|
| 3452 | <TT>infile</TT>, it actually didn't. It saved it as |
|---|
| 3453 | <TT>infile.txt</TT>. Then just to make |
|---|
| 3454 | life harder for you, the operating system is set up by default to not show |
|---|
| 3455 | that three-letter extension to the file name. Next to its icon it will show |
|---|
| 3456 | the name <TT>infile</TT>. So you think, quite reasonably, that |
|---|
| 3457 | there is a file called <TT>infile</TT>. But there isn't a file of that |
|---|
| 3458 | name, so the program, quite reasonably, can't find a file called |
|---|
| 3459 | <TT>infile</TT>. If you want to check what the actual file name is, use |
|---|
| 3460 | the <TT>Properties</TT> |
|---|
| 3461 | menu item of the <TT>File</TT> item on your folder (in Windows versions, anyway). |
|---|
| 3462 | You should be able to get the program to work by telling it that the file name |
|---|
| 3463 | is <TT>INFILE.TXT</TT>. |
|---|
| 3464 | <DT><STRONG>"Consense gives wierd branch lengths! How do I |
|---|
| 3465 | get more reasonable ones?"</STRONG> |
|---|
| 3466 | <DD>Consense gives branch lengths which are simply the numbers of replicates |
|---|
| 3467 | that support the branch. This is not a good reflection of how long those |
|---|
| 3468 | branches are estimated to be. The best way to put better branch lengths on a |
|---|
| 3469 | consensus tree is to use it as a User Tree in a program that will estimate |
|---|
| 3470 | branch lengths for it. You may need to convert it to being an unrooted tree, |
|---|
| 3471 | using Retree, first. If the original program you were using was a parsimony |
|---|
| 3472 | program, which does not estimate branch lengths, you may instead have to make |
|---|
| 3473 | some distances between your species (using, for example, DnaDist), and use |
|---|
| 3474 | Fitch to put branch lengths on the user tree. Here is the sequence of |
|---|
| 3475 | steps you should go through: |
|---|
| 3476 | <OL> |
|---|
| 3477 | <LI>Take the tree and use Retree to make sure it is Unrooted (just |
|---|
| 3478 | read it into Retree and then save it, specifying Unrooted) |
|---|
| 3479 | <LI>Use the unrooted tree as a User Tree (option <TT>U</TT>) in one of |
|---|
| 3480 | our programs (such as Fitch or DnaML). If you use Fitch, you also |
|---|
| 3481 | need to use one of the distance programs such as DnaDist to |
|---|
| 3482 | compute a set of distances to serve as its input. |
|---|
| 3483 | <LI>Specify that the branch lengths |
|---|
| 3484 | of the tree are not to be used but should be re-estimated. This |
|---|
| 3485 | is actually the default. |
|---|
| 3486 | </OL> |
|---|
| 3487 | <DT><STRONG>"DrawTree (or DrawGram) doesn't work: it can't find the font file!"</STRONG> |
|---|
| 3488 | <DD>Six font files, called <TT>font1</TT> through <TT>font6</TT>, are |
|---|
| 3489 | distributed with the executables |
|---|
| 3490 | (and with the source code too). The program looks for a copy of one of them |
|---|
| 3491 | called <TT>fontfile</TT>. If you haven't made such a copy called |
|---|
| 3492 | <TT>fontfile</TT> it then asks |
|---|
| 3493 | you for the name of the font file. If they are in the current directory, just |
|---|
| 3494 | type one of <TT>font1</TT> through <TT>font6</TT>. The reason for |
|---|
| 3495 | having the program look for <TT>fontfile</TT> |
|---|
| 3496 | is so that you can copy your favorite font file, call the copy |
|---|
| 3497 | <TT>fontfile</TT>, |
|---|
| 3498 | and then it will be found automatically without you having to type the name of |
|---|
| 3499 | the font file each time. |
|---|
| 3500 | <DT><STRONG>"Can DrawGram draw a scale beside the tree? Print the branch lengths as numbers?"</STRONG> |
|---|
| 3501 | <DD>It can't do either of these. Doing so would make the program more complex, and |
|---|
| 3502 | it is not obvious how to fit the branch length numbers into a tree that has |
|---|
| 3503 | many very short internal branches. If you want these scales or numbers, |
|---|
| 3504 | choose an output plot file format (such as Postscript, PICT or PCX) that can be read by |
|---|
| 3505 | a drawing program such as Adobe Illustrator, Freehand, Canvas, CorelDraw, |
|---|
| 3506 | or MacDraw. |
|---|
| 3507 | Then you can add the scales and branch length numbers yourself by hand. Note |
|---|
| 3508 | the menu option in DrawTree and DrawGram that specifies the tree size to be |
|---|
| 3509 | a given number of centimeters per unit branch length. |
|---|
| 3510 | <DT><STRONG>"How can I get DrawGram or DrawTree to print the bootstrap values |
|---|
| 3511 | next to the branches?"</STRONG> |
|---|
| 3512 | <DD>When you do bootstrapping and use Consense, it prints the bootstrap |
|---|
| 3513 | values in its output file (both in a table of sets, and on the diagram |
|---|
| 3514 | of the tree which it makes). These are also in the output tree file of |
|---|
| 3515 | Consense. There they are in place of branch lengths. So to get them to |
|---|
| 3516 | be on the output of DrawGram or DrawTree, you must write the tree in the |
|---|
| 3517 | format of a drawing program and use it to put the values in by hand, as |
|---|
| 3518 | mentioned in the answer to the previous question. |
|---|
| 3519 | <DT><STRONG>"I have an HP Laserjet and can't get DrawGram to print on it"</STRONG> |
|---|
| 3520 | <DD>DRAWGRAM and DRAWTREE produce a plot file (called <TT>plotfile</TT>): they |
|---|
| 3521 | do not send it to the printer. It is up to you to get the plot file to |
|---|
| 3522 | the printer. If you are running Windows or DOS this can probably be done |
|---|
| 3523 | with the MSDOS command <TT>COPY/B PLOTFILE PRN:</TT>, unless your printer |
|---|
| 3524 | is a networked printer. The <TT>/B</TT> |
|---|
| 3525 | is important. If it is omitted the copy command will strip off the |
|---|
| 3526 | highest bit of each byte, which can cause the printing to fail or produce |
|---|
| 3527 | garbage. |
|---|
| 3528 | <DT><STRONG>"DNAML won't read the treefile that is produced by DNAPARS!"</STRONG> |
|---|
| 3529 | <DD>That's because the DnaPars tree file is a rooted tree, and DnaML wants an |
|---|
| 3530 | unrooted tree. Try using Retree to change the file to be an unrooted tree |
|---|
| 3531 | file.</DD> |
|---|
| 3532 | <DT><STRONG>"In bootstrapping, SEQBOOT makes too large a file"</STRONG> |
|---|
| 3533 | <DD>If there are 1000 bootstrap replicates, it will make a file |
|---|
| 3534 | 1000 times as long as your original data set. But for many methods |
|---|
| 3535 | there is another way that uses much less file space. You can use |
|---|
| 3536 | SEQBOOT to make a file of multiple sets of weights, and use those |
|---|
| 3537 | together with the original data set to do bootstrapping. |
|---|
| 3538 | <DT><STRONG>"In bootstrapping, the output file gets too big."</STRONG> |
|---|
| 3539 | <DD> When running a program such as NEIGHBOR or DNAPARS with multiple data |
|---|
| 3540 | sets (or multiple weights) for purposes of bootstrapping, |
|---|
| 3541 | the output file is usually not needed, as it |
|---|
| 3542 | is the output tree file that is used next. You can use the menu |
|---|
| 3543 | of the program to turn off the writing of trees into the |
|---|
| 3544 | output file. The trees will still be written into the tree file. |
|---|
| 3545 | <DT><STRONG>"Why doesn't NEIGHBOR read my DNA sequences correctly?"</STRONG> |
|---|
| 3546 | <DD>Because it wants |
|---|
| 3547 | to have as input a distance matrix, not sequences. You have to use DNADIST to |
|---|
| 3548 | make the distance matrix first. |
|---|
| 3549 | <P> |
|---|
| 3550 | <H3>How to make it do various things</H3> |
|---|
| 3551 | <P> |
|---|
| 3552 | <DT><STRONG>"How do I bootstrap?"</STRONG> |
|---|
| 3553 | <DD>The general method of bootstrapping |
|---|
| 3554 | involves running SEQBOOT to make multiple bootstrapped data sets out of your |
|---|
| 3555 | one data set, then running one of the tree-making programs with the Multiple |
|---|
| 3556 | data sets option to analyze them all, then running CONSENSE to make a majority |
|---|
| 3557 | rule consensus tree from the resulting tree file. Read the documentation of |
|---|
| 3558 | SEQBOOT to get further information. Before, only parsimony methods could be |
|---|
| 3559 | bootstrapped. With this new system almost any of the tree-making methods in |
|---|
| 3560 | the package can be bootstrapped. It is somewhat more tedious but you will find |
|---|
| 3561 | it much more rewarding. |
|---|
| 3562 | <DT><STRONG>"How do I specify a multi-species outgroup |
|---|
| 3563 | with your parsimony programs?"</STRONG> |
|---|
| 3564 | <DD>It's not a feature but is not too hard to do in many of the programs. In |
|---|
| 3565 | parsimony programs like MIX, for which the W (Weights) and A (Ancestral states) |
|---|
| 3566 | options are available, and weights can be larger than 1, all you need to do is: |
|---|
| 3567 | <DL COMPACT> |
|---|
| 3568 | <DT><STRONG>(a)</STRONG> |
|---|
| 3569 | <DD>In MIX, make up an extra character with states 0 for all the outgroups |
|---|
| 3570 | and 1 for all the ingroups. If using DNAPARS the ingroup can have (say) |
|---|
| 3571 | <TT>G</TT> and the outgroup <TT>A</TT>. |
|---|
| 3572 | <DT><STRONG>(b)</STRONG> |
|---|
| 3573 | <DD>Assign this character an enormous weight (such as <TT>Z</TT> for 35) using the W |
|---|
| 3574 | option, all other characters getting weight 1, or whatever weight they had |
|---|
| 3575 | before. |
|---|
| 3576 | <DT><STRONG>(c)</STRONG> |
|---|
| 3577 | <DD>If it is available, Use the A (Ancestral states) option to designate that |
|---|
| 3578 | for that new character the state found in the outgroup is the ancestral |
|---|
| 3579 | state. |
|---|
| 3580 | <DT><STRONG>(d)</STRONG> |
|---|
| 3581 | <DD>In MIX do not use the O (Outgroup) option. |
|---|
| 3582 | <DT><STRONG>(e)</STRONG> |
|---|
| 3583 | <DD>After the tree is found, the designated ingroup should have been held |
|---|
| 3584 | together by the fake character. The tree will be rooted somewhere in the |
|---|
| 3585 | outgroup (the program may or may not have a preference for one place in |
|---|
| 3586 | the outgroup over another). Make sure that you subtract from the total |
|---|
| 3587 | number of steps on the tree all steps in the new character. |
|---|
| 3588 | </DL> |
|---|
| 3589 | <P> |
|---|
| 3590 | In programs like DNAPARS, you cannot use this method as weights of sites |
|---|
| 3591 | cannot be greater than 1. But you do an analogous trick, by adding a |
|---|
| 3592 | largish number of extra sites to the data, with one nucleotide state ("A") |
|---|
| 3593 | for the ingroup and another ("G") for the outgroup. You will then have to |
|---|
| 3594 | use RETREE to manually reroot the tree in the desired place. |
|---|
| 3595 | <DT><STRONG>"How do I force certain groups to remain monophyletic in your |
|---|
| 3596 | parsimony programs?"</STRONG> |
|---|
| 3597 | <DD>By the same method as in the previous question, using multiple fake characters, any number of |
|---|
| 3598 | groups of species can be forced to be monophyletic. In MOVE, DOLMOVE, and |
|---|
| 3599 | DNAMOVE you can specify whatever outgroups you want without going to this |
|---|
| 3600 | trouble. |
|---|
| 3601 | <DT><STRONG>"How can I reroot one of the trees written out by PHYLIP?"</STRONG> |
|---|
| 3602 | <DD>Use the program |
|---|
| 3603 | RETREE. But keep in mind whether the tree inferred by the original program was |
|---|
| 3604 | already rooted, or whether you are free to reroot it. |
|---|
| 3605 | <DT><STRONG>"What do I do about deletions and insertions in my sequences?"</STRONG> |
|---|
| 3606 | <DD>The |
|---|
| 3607 | molecular sequence programs will accept sequences that have gaps (the "<TT>-</TT>" |
|---|
| 3608 | character). They do various things with them, mostly not optimal. DNAPARS |
|---|
| 3609 | counts "gap" as if it were a fifth nucleotide state (in addition to A, C, G, |
|---|
| 3610 | and T). Each site counts one change when a gap arises or disappears. The |
|---|
| 3611 | disadvantage of this treatment is that a long gap will be overweighted, with |
|---|
| 3612 | one event per gapped site. So a gap of 10 nucleotides will count as being as |
|---|
| 3613 | much evidence as 10 single site nucleotide substitutions. If there are not |
|---|
| 3614 | overlapping gaps, one way to correct this is to recode the first site in the |
|---|
| 3615 | gap as "<TT>-</TT>" but make all the others be "<TT>?</TT>" so the gap only counts as one event. |
|---|
| 3616 | Other programs such as DNAML and DNADIST count gaps as equivalent to unknown |
|---|
| 3617 | nucleotides (or unknown amino acids) on the grounds that we don't know what |
|---|
| 3618 | would be there if something were there. This completely leaves out the |
|---|
| 3619 | information from the presence or absence of the gap itself, but does not bias |
|---|
| 3620 | the gapped sequence to be close to or far from other gapped or ungapped |
|---|
| 3621 | sequences. |
|---|
| 3622 | So it is not necessary to remove gapped regions from your |
|---|
| 3623 | sequences, unless the presence of gaps indicates that the region is |
|---|
| 3624 | badly aligned. |
|---|
| 3625 | <DT><STRONG>"How can I produce distances for my data set which |
|---|
| 3626 | has 0's and 1's?"</STRONG> |
|---|
| 3627 | <DD>You can't do it in a simple and general |
|---|
| 3628 | way, for a straightforward reason. Distance methods must correct the |
|---|
| 3629 | distances for superimposed changes. Unless we know specifically how to |
|---|
| 3630 | do this for your particular characters, we cannot accomplish the |
|---|
| 3631 | correction. There are many formulas we could use, but we can't choose |
|---|
| 3632 | among them without much more information. There are issues of superimposed |
|---|
| 3633 | changes, as well as heterogeneity of rates of change in different |
|---|
| 3634 | characters. Thus we have not provided a distance program for 0/1 data. |
|---|
| 3635 | It is up to you to figure out what is an appropriate stochastic model |
|---|
| 3636 | for your data and to find the right distance formulas. |
|---|
| 3637 | <DT><STRONG>"I have RFLP fragment data: which programs should I |
|---|
| 3638 | use?"</STRONG> |
|---|
| 3639 | <DD>This is more difficult question than you may imagine. |
|---|
| 3640 | Here is quick tour of the issues: |
|---|
| 3641 | <UL><LI>You can code fragments are 0 and 1 and use a parsimony program. It is |
|---|
| 3642 | not obvious in advance whether 0 or 1 is ancestral, though it is likely that |
|---|
| 3643 | change in one direction is more likely than change in the other for each |
|---|
| 3644 | fragment. One can use either Wagner parsimony (programs <TT>MIX</TT>, |
|---|
| 3645 | <TT>PENNY</TT> or <TT>MOVE</TT>) or use Dollo parsimony |
|---|
| 3646 | (<TT>DOLLOP, DOLPENNY</TT> or <TT>DOLMOVE</TT>) |
|---|
| 3647 | with the ancestral states all set as unknown ("<TT>?</TT>"). |
|---|
| 3648 | <LI>You can use a distance matrix method using the RFLP distance of Nei and |
|---|
| 3649 | Li (1979). Their restriction fragment distance is available in our |
|---|
| 3650 | program RestDist. |
|---|
| 3651 | <LI>You should be very hesitant to bootstrap RFLP's. The individual |
|---|
| 3652 | fragments do not evolve independently: a single nucleotide substitution |
|---|
| 3653 | can eliminate one fragment and create two (or vice versa). |
|---|
| 3654 | </UL> |
|---|
| 3655 | For restriction <I>sites</I> (rather than fragments) life is a bit |
|---|
| 3656 | easier: they evolve nearly independently so bootstrapping is possible |
|---|
| 3657 | and <TT>RESTML</TT> can be used. Also directionality of change |
|---|
| 3658 | is less ambiguous when parsimony is used. |
|---|
| 3659 | <DT><STRONG>"Why don't your parsimony programs print out branch lengths?"</STRONG> |
|---|
| 3660 | <DD>Well, DNAPARS and PARS can. The others have not yet been upgraded to the |
|---|
| 3661 | same level. The longer answer is that it is because |
|---|
| 3662 | there are problems defining the branch lengths. If you look closely at the |
|---|
| 3663 | reconstructions of the states of the hypothetical ancestral nodes for almost |
|---|
| 3664 | any data set and almost any parsimony method you will find some ambiguous |
|---|
| 3665 | states on those nodes. There is then usually an ambiguity as to which branch |
|---|
| 3666 | the change is actually on. Other parsimony programs resolve this in one or |
|---|
| 3667 | another arbitrary fashion, sometimes with the user specifying how (for example, |
|---|
| 3668 | methods that push the changes up the tree as far as possible or down it as far |
|---|
| 3669 | as possible). Our older programs leave it to the user to do this. In |
|---|
| 3670 | DNAPARS and PARS we use an algorithm discovered by Hochbaum and Pathria (1997) |
|---|
| 3671 | (and independently by Wayne Maddison) to compute branch lengths that average |
|---|
| 3672 | over all possible placements of the changes. But these branch lengths, as |
|---|
| 3673 | nice as they are, do not correct for mulitple superimposed changes. Few |
|---|
| 3674 | programs available from others currently correct the branch lengths for |
|---|
| 3675 | multiple changes of state that may have overlain each other. One possible way |
|---|
| 3676 | to get branch lengths with nucleotide sequence data is to take the tree |
|---|
| 3677 | topology that you got, use RETREE to convert it to be unrooted, prepare a |
|---|
| 3678 | distance matrix from your data using DNADIST, and then use FITCH with that tree |
|---|
| 3679 | as User Tree and see what branch lengths it estimates. |
|---|
| 3680 | <DT><STRONG>"Why can't your programs handle unordered multistate characters?"</STRONG> |
|---|
| 3681 | <DD>In this 3.6 release there is a program PARS which does parsimony for |
|---|
| 3682 | undordered multistate characters with up to 8 states, plus <TT>?</TT>. The |
|---|
| 3683 | other the discrete characters parsimony programs can only handle two states, |
|---|
| 3684 | <TT>0</TT> and <TT>1</TT>. |
|---|
| 3685 | This is mostly because I have not yet had time to modify them to do so - the |
|---|
| 3686 | modifications would have to be extensive. Ultimately I hope to get these done. |
|---|
| 3687 | If you have four or fewer states and need a feature that is not in PARS, |
|---|
| 3688 | you could recode your states to look like nucleotides |
|---|
| 3689 | and use the parsimony programs in the molecular sequence section of PHYLIP, or |
|---|
| 3690 | you could use one of the excellent parsimony programs produced by others. |
|---|
| 3691 | <P> |
|---|
| 3692 | <H3>Background information needed:</H3> |
|---|
| 3693 | <P> |
|---|
| 3694 | <DT><STRONG>"What file format do I use for the sequences?"<BR> |
|---|
| 3695 | "How do I use the programs? I can't find any documentation!"</STRONG> |
|---|
| 3696 | <DD>These are discussed in the documentation files. Do you have them? If you |
|---|
| 3697 | have a copy of this page you probably do. They are |
|---|
| 3698 | in a separate archive from the executables (they are in the Documentation and |
|---|
| 3699 | Sources archives, which you should definitely fetch). Input file formats |
|---|
| 3700 | are discussed in <TT>main.html</TT>, in <TT>sequence.html</TT>, <TT>distance.html</TT>, |
|---|
| 3701 | <TT>contchar.html</TT>, <TT>discrete.html</TT>, and the documentation files for the |
|---|
| 3702 | individual programs. |
|---|
| 3703 | <DT><STRONG>"Where can I find out how to infer |
|---|
| 3704 | phylogenies?</STRONG> |
|---|
| 3705 | <DD>There are few books yet. For molecular data you could use one of these: |
|---|
| 3706 | <UL> |
|---|
| 3707 | <LI> Graur, D. and W.-H. Li. 2000. <EM>Fundamentals of Molecular |
|---|
| 3708 | Evolution.</EM> Sinauer Associates, Sunderland, Massachusetts. (or the earlier edition |
|---|
| 3709 | by Li and Graur). |
|---|
| 3710 | <LI> Page, R. D. P. and E. C. Holmes. 1998. <EM>Molecular Evolution: |
|---|
| 3711 | A Phylogenetic Approach.</EM> Blackwell, Oxford. |
|---|
| 3712 | <LI> Nei, M. and S. Kumar. 2000. <EM>Molecular Evolution and |
|---|
| 3713 | Phylogenetics.</EM> Oxford University Press, Oxford. |
|---|
| 3714 | <LI> Li, W.-H. 1999. <EM>Molecular Evolution.</EM> Sinauer Associates, |
|---|
| 3715 | Sunderland, Massachusetts. |
|---|
| 3716 | </UL> |
|---|
| 3717 | In addition, one of these three review articles may help: |
|---|
| 3718 | <UL><LI>Swofford, D. L., G. J. Olsen, P. J. Waddell, and D. M. Hillis. 1996. |
|---|
| 3719 | Phylogenetic inference. pp. 407-514 in <I>Molecular Systematics</I>, 2nd ed., |
|---|
| 3720 | ed. D. M. Hillis, C. Moritz, and B. K. Mable. Sinauer Associates, Sunderland, |
|---|
| 3721 | Massachusetts. |
|---|
| 3722 | <LI>Felsenstein, J. 1988. Phylogenies from molecular sequences: inference and |
|---|
| 3723 | reliability. <I>Annual Review of Genetics</I> <B>22:</B> 521-565. |
|---|
| 3724 | <LI>Felsenstein, J. 1988. Phylogenies and quantitative |
|---|
| 3725 | characters. <I>Annual Review of Ecology and Systematics</I> <B>19:</B> 445-471. |
|---|
| 3726 | </UL> |
|---|
| 3727 | My own book on phylogenies is due to be published in late 2002. It |
|---|
| 3728 | will be called "Inferring Phylogenies". For information on whether it has |
|---|
| 3729 | been published you should check the |
|---|
| 3730 | <A HREF="http://www.sinauer.com">Sinauer Associates web site</A>. |
|---|
| 3731 | <P> |
|---|
| 3732 | <H3>Questions about distribution and citation:</H3> |
|---|
| 3733 | <P> |
|---|
| 3734 | <DT><STRONG>"If I copied PHYLIP from a friend without you knowing, should I try |
|---|
| 3735 | to keep you from finding out?"</STRONG> |
|---|
| 3736 | <DD>No. It is to your advantage and mine for you to |
|---|
| 3737 | let me know. If you did not get PHYLIP "officially" from me or from someone |
|---|
| 3738 | authorized by me, but copied a friend's version, you are not in my database of |
|---|
| 3739 | users. You may also have an old version which has since been |
|---|
| 3740 | substantially improved. I don't mind you "bootlegging" |
|---|
| 3741 | PHYLIP (it's free anyway), but |
|---|
| 3742 | you should realize that you may have copied an outdated version. If you are reading this |
|---|
| 3743 | Web page, |
|---|
| 3744 | you can get the latest version just as quickly over Internet. |
|---|
| 3745 | It will help both of us if you get |
|---|
| 3746 | onto my mailing list. If you are on it, then I will give your name to other |
|---|
| 3747 | nearby users when they ask for the names of nearby users, and they are urged to contact you and |
|---|
| 3748 | update your copy. (I benefit by getting a better feel for how many |
|---|
| 3749 | distributions there have been, and having a better mailing list to use to give |
|---|
| 3750 | other users local people to contact). Use the registration form which |
|---|
| 3751 | can be accessed through our web site's registration page. |
|---|
| 3752 | <DT><STRONG>"How do I make a citation to the PHYLIP package in the paper I am |
|---|
| 3753 | writing?"</STRONG> |
|---|
| 3754 | <DD>One way is like this: |
|---|
| 3755 | <P> |
|---|
| 3756 | Felsenstein, J. 2002. PHYLIP (Phylogeny Inference Package) version 3.6a3. |
|---|
| 3757 | <I>Distributed by the author. Department of Genome Sciences, University of |
|---|
| 3758 | Washington, Seattle.</I> |
|---|
| 3759 | <P> |
|---|
| 3760 | or if the editor for whom you are writing insists that the citation must be to |
|---|
| 3761 | a printed publication, you could cite a notice for version 3.2 published in |
|---|
| 3762 | Cladistics: |
|---|
| 3763 | <P> |
|---|
| 3764 | Felsenstein, J. 1989. PHYLIP - Phylogeny Inference Package (Version 3.2). |
|---|
| 3765 | <I>Cladistics</I> <B>5:</B> 164-166. |
|---|
| 3766 | <BR> |
|---|
| 3767 | <P> |
|---|
| 3768 | For a while a printed version of the PHYLIP documentation was available and one |
|---|
| 3769 | could cite that. This is no longer true. Other than that, this is difficult, |
|---|
| 3770 | because I have never written a paper announcing PHYLIP! My 1985b paper in |
|---|
| 3771 | Evolution on the bootstrap method contains a |
|---|
| 3772 | one-paragraph Appendix describing the availability of this package, and that |
|---|
| 3773 | can also be cited as a reference for the package, although it was |
|---|
| 3774 | distributed since 1980 while the bootstrap paper is 1985. A paper on PHYLIP |
|---|
| 3775 | is needed mostly to give people something to cite, as word-of-mouth, references |
|---|
| 3776 | in other people's papers, and electronic newsgroup postings have spread the |
|---|
| 3777 | word about PHYLIP's existence quite effectively. |
|---|
| 3778 | <DT><STRONG>"Can I make copies of PHYLIP available to the students in |
|---|
| 3779 | my class?"</STRONG> |
|---|
| 3780 | <DD>Generally, yes. Read the Copyright notice near the front of |
|---|
| 3781 | this main documentation page. If you charge money for PHYLIP, |
|---|
| 3782 | or use it in a service for which you charge money, you will need |
|---|
| 3783 | to negotiate a royalty. But you can make it freely available |
|---|
| 3784 | and you do not need to get any special permission from us to do so. |
|---|
| 3785 | <DT><STRONG>"How many copies of PHYLIP have been distributed?"</STRONG> |
|---|
| 3786 | <DD>On |
|---|
| 3787 | 27 September, 1996 we reached 5,000 registered installations worldwide. |
|---|
| 3788 | (By now we are well over 15,000 but have lost count for |
|---|
| 3789 | the moment). Of course there are |
|---|
| 3790 | many more people who have got copies from friends. PHYLIP is the most widely |
|---|
| 3791 | distributed phylogeny package. (This situation may reverse itself rapidly |
|---|
| 3792 | once PAUP* is fully released. During the years it was in full distribution, |
|---|
| 3793 | PAUP was ahead in phylogenies published, and the availability of distance and |
|---|
| 3794 | likelihood methods in PAUP* are making it very popular.) |
|---|
| 3795 | In recent years magnetic tape distribution and e-mail distribution of |
|---|
| 3796 | PHYLIP have disappeared, |
|---|
| 3797 | and there has been a big decrease of diskette distributions (down to only |
|---|
| 3798 | one or two per year). But all this has |
|---|
| 3799 | been more than offset by, first, an explosion of distributions by anonymous ftp |
|---|
| 3800 | over Internet, and then a bigger explosion of World Wide Web distributions and |
|---|
| 3801 | registrations (about 6 registrations per day at the moment). |
|---|
| 3802 | <P> |
|---|
| 3803 | <H3>Questions about documentation</H3> |
|---|
| 3804 | <P> |
|---|
| 3805 | <DT><STRONG>"Where can I get a printed version of the PHYLIP documents?"</STRONG> |
|---|
| 3806 | <DD>For the |
|---|
| 3807 | moment, you can only get a printed version by printing it yourself. For |
|---|
| 3808 | versions 3.1 to 3.3 a printed version was sold by Christopher Meacham and Tom |
|---|
| 3809 | Duncan, then at the University Herbarium of the University of California at |
|---|
| 3810 | Berkeley. But they have had to discontinue this as it was too much work. You |
|---|
| 3811 | should be able to print out the documentation files on almost any printer and |
|---|
| 3812 | make yourself a printed version of whichever of them you need. |
|---|
| 3813 | <DT><STRONG>"Why have I been dropped from your newsletter mailing list?"</STRONG> |
|---|
| 3814 | <DD>You haven't. |
|---|
| 3815 | The newsletter was dropped. It simply was too hard to mail it out to such a |
|---|
| 3816 | large mailing list. The last issue of the newsletter was Number 9 in May, |
|---|
| 3817 | 1987. The Listserver News Bulletins that we tried for a while have also been dropped |
|---|
| 3818 | as too hard to keep up to date. I am hoping that our World Wide Web site will take their place. |
|---|
| 3819 | </DL> |
|---|
| 3820 | <P> |
|---|
| 3821 | <DIV ALIGN="CENTER"> |
|---|
| 3822 | <H3>Additional Frequently Asked Questions, or:</B> |
|---|
| 3823 | "Why didn't it occur to you to ...</H3></DIV> |
|---|
| 3824 | <DL> |
|---|
| 3825 | <DT><STRONG>... allow the options to be set on the command line?</STRONG> |
|---|
| 3826 | <DD>We could in Unix and Linux, or somewhat differently in Windows. But |
|---|
| 3827 | there are so many options that this would be difficult, especially |
|---|
| 3828 | when the options require additional information to be supplied such as |
|---|
| 3829 | rates of evolution for many categories of sites. You may be asking this |
|---|
| 3830 | question because you want to automate the operation of PHYLIP programs |
|---|
| 3831 | using batch files (command files) to run in background. If that is the |
|---|
| 3832 | issue, see the section of this main documentation page on |
|---|
| 3833 | "Running the programs in background or under control of a command file". |
|---|
| 3834 | It explains how to set the options using input redirection and a file |
|---|
| 3835 | that has the menu responses as keystrokes. |
|---|
| 3836 | <DT><STRONG>... write these programs in Pascal?"</STRONG> |
|---|
| 3837 | <DD>These programs started out |
|---|
| 3838 | in Pascal in 1980. In 1993 we released both Pascal and C versions. The |
|---|
| 3839 | present version (3.6) and |
|---|
| 3840 | future versions will be C-only. I make fewer mistakes in Pascal and do |
|---|
| 3841 | like the language better than C, but C has overtaken Pascal and Pascal |
|---|
| 3842 | compilers are starting to be hard to find on some machines. Also C is a |
|---|
| 3843 | bit better standardized which makes the number of modifications a user |
|---|
| 3844 | has to make to adapt the programs to their system much less. |
|---|
| 3845 | <DT><STRONG>... write these programs in Java?"</STRONG> |
|---|
| 3846 | <DD>Well, we might. It is not completely clear which of two contenders, |
|---|
| 3847 | C++ and Java, will become more widespread, and which one will gradually |
|---|
| 3848 | fade away. Whichever one is more successful, we will probably want to use |
|---|
| 3849 | for future versions of PHYLIP. As the C compilers that are used to |
|---|
| 3850 | compile PHYLIP are usually also able to compile C++, we will be moving in |
|---|
| 3851 | that direction, but with constant worrying about whether to convert PHYLIP |
|---|
| 3852 | to Java instead.</DD> |
|---|
| 3853 | <DT><STRONG>... forgot about all those inferior systems and just develop PHYLIP for Unix?"</STRONG> |
|---|
| 3854 | <DD>This is self-answering, since the same people first said I should |
|---|
| 3855 | just develop it for Apple II's, then for CP/M Z-80's, then for IBM PCDOS, |
|---|
| 3856 | then for Macintoshes or for Sun |
|---|
| 3857 | workstations, and then for Windows. If I had listened to them and done any one of these, I would |
|---|
| 3858 | have had a very hard time adapting the package to any of the other ones once |
|---|
| 3859 | these folks changed their mind (and most of them did)! |
|---|
| 3860 | <DT><STRONG>... write these programs in PROLOG |
|---|
| 3861 | (or Ada, or Modula-2, or SIMULA, or BCPL, or PL/I, or APL, or LISP)?"</STRONG> |
|---|
| 3862 | <DD>These are all languages I have considered. All |
|---|
| 3863 | have advantages, but they are not really widespread (as are C and C++). |
|---|
| 3864 | <DT><STRONG>... include in the package a program to do the Distance Wagner method, (or |
|---|
| 3865 | successive approximations character weighting, |
|---|
| 3866 | or transformation series analysis)?"</STRONG> |
|---|
| 3867 | <DD>In most cases where I have not |
|---|
| 3868 | included other methods, it is because I decided that they had no substantial |
|---|
| 3869 | advantages over methods that were included (such as the programs FITCH, |
|---|
| 3870 | KITSCH, NEIGHBOR, the <TT>T</TT> option of MIX and DOLLOP, and the "<TT>?</TT>" ancestral |
|---|
| 3871 | states option of the discrete characters parsimony programs). |
|---|
| 3872 | <DT><STRONG>... include in the package ordination methods and more |
|---|
| 3873 | clustering algorithms?"</STRONG> |
|---|
| 3874 | <DD>Because this is <I>not</I> a clustering package, it's a |
|---|
| 3875 | package for phylogeny estimation. Those are different tasks with different |
|---|
| 3876 | objectives and mostly different methods. Mary Kuhner and Jon Yamato have, |
|---|
| 3877 | however, |
|---|
| 3878 | included in NEIGHBOR an option for UPGMA clustering, which will be very |
|---|
| 3879 | similar to KITSCH in results. |
|---|
| 3880 | <DT><STRONG>... include in the package a program to do nucleotide sequence |
|---|
| 3881 | alignment?"</STRONG> |
|---|
| 3882 | <DD>Well, yes, I should |
|---|
| 3883 | have, and this is scheduled to be in future releases. But multiple sequence |
|---|
| 3884 | alignment programs, in the era after Sankoff, Morel, and Cedergren's 1973 |
|---|
| 3885 | classic paper, need to use substantial computer horsepower to estimate the |
|---|
| 3886 | alignment and the tree together (but see Karl Nicholas's program |
|---|
| 3887 | <TT>GeneDoc</TT> or Ward Wheeler and David Gladstein's <TT>MALIGN</TT>, as |
|---|
| 3888 | well as more approximate methods of tree-based alignment used in |
|---|
| 3889 | <TT>ClustalW</TT> or <TT>TreeAlign</TT>). |
|---|
| 3890 | </DL> |
|---|
| 3891 | <P> |
|---|
| 3892 | <DIV ALIGN="CENTER"> |
|---|
| 3893 | <H3>(Fortunately) obsolete questions</H3></DIV> |
|---|
| 3894 | <P> |
|---|
| 3895 | (The following four questions, once |
|---|
| 3896 | common, have finally disappeared, I am pleased to report). |
|---|
| 3897 | <H4>"Why didn't it occur to you to ...</H4></DIV> |
|---|
| 3898 | <DL> |
|---|
| 3899 | <DT><STRONG>... let me log in to your computer in Seattle |
|---|
| 3900 | and copy the files out over a phone line?"</STRONG> |
|---|
| 3901 | <DD>No thanks. It would cost you for a lot of |
|---|
| 3902 | long-distance telephone time, plus a half hour of my time and yours in which |
|---|
| 3903 | I had to explain to you how to log in and do the copying. |
|---|
| 3904 | <DT><STRONG>... send me a listing of your program?"</STRONG> |
|---|
| 3905 | <DD>Damn it, it's not "a program", |
|---|
| 3906 | it's 35 programs, in a great many files. What were you |
|---|
| 3907 | thinking of doing, having 1800-line programs typed in by slaves at your |
|---|
| 3908 | end? If you were going to go to all that trouble why not try network |
|---|
| 3909 | transfer? If you have these then you can print out all the |
|---|
| 3910 | listings you want to and add them to the huge stack of printed output in |
|---|
| 3911 | the corner of your office. |
|---|
| 3912 | <DT><STRONG>... write a magnetic tape in our computer center's favorite format |
|---|
| 3913 | (inverted Lithuanian EBCDIC at 998 bpi)?"</STRONG> |
|---|
| 3914 | <DD>Because the ANSI standard |
|---|
| 3915 | format is the most widely used one, and even though your computer center |
|---|
| 3916 | may pretend it can't read a tape written this way, if you sniff around |
|---|
| 3917 | you will find a utility to read it. It's just a <I>lot</I> easier for me to |
|---|
| 3918 | let you do that work. If I tried to put the tape into your format, I |
|---|
| 3919 | would probably get it wrong anyway. |
|---|
| 3920 | <DT><STRONG>... give us a version of these in FORTRAN?"</STRONG> |
|---|
| 3921 | <DD>Because the |
|---|
| 3922 | programs are <I>far</I> easier to write and debug in C or Pascal, and cannot |
|---|
| 3923 | easily be |
|---|
| 3924 | rewritten into FORTRAN (they make extensive use of recursive calls and |
|---|
| 3925 | of records and pointers). In any case, C is widely available. If you don't |
|---|
| 3926 | have a C compiler or don't know |
|---|
| 3927 | how to use it, you are going to have to learn a language like C or |
|---|
| 3928 | Pascal sooner or later, and the sooner the better. |
|---|
| 3929 | </DL> |
|---|
| 3930 | <P> |
|---|
| 3931 | <A NAME="newfeatures"><HR><P></A> |
|---|
| 3932 | <DIV ALIGN="CENTER"> |
|---|
| 3933 | <H2>New Features in This Version</H2></DIV> |
|---|
| 3934 | <P> |
|---|
| 3935 | Version 3.6 has many new features: |
|---|
| 3936 | <UL><LI>Faster (well, less, slow) likelihood programs. |
|---|
| 3937 | <LI>The DNA and protein likelihood and distance programs allow |
|---|
| 3938 | for rate variation between sites using a gamma distribution of |
|---|
| 3939 | rates among sites, or using a gamma distribution plus a given |
|---|
| 3940 | fraction of sites which are assumed invariant. |
|---|
| 3941 | <LI>A new multistate discrete characters parsimony program, PARS, that |
|---|
| 3942 | handles unordered multistate characters. |
|---|
| 3943 | <LI>The DNAPARS and PARS parsimony programs can infer multifurcating |
|---|
| 3944 | trees, which sensibly reduces the number of tied trees they find. |
|---|
| 3945 | <LI>A new protein sequence likelihood program, <TT>PROML</TT>, |
|---|
| 3946 | and also a version, <TT>PROMLK</TT> which assumes a molecular clock. |
|---|
| 3947 | <LI>A new restriction sites and restriction fragments distance program, |
|---|
| 3948 | <TT>RESTDIST</TT>, that can also be used to compute distances for RAPD and |
|---|
| 3949 | AFLP data. It also allows for gamma-distributed rate variation among |
|---|
| 3950 | DNA sites. |
|---|
| 3951 | <LI>In the DNA likelihood programs, you can now specify different |
|---|
| 3952 | categories of rates of change (such as rates for first, second, and |
|---|
| 3953 | third positions of a coding sequence) and assign them to specific sites. |
|---|
| 3954 | This is in addition to the ability of the program to use the Hidden Markov |
|---|
| 3955 | Model mechanism to allow rates of change to vary across sites in a way that |
|---|
| 3956 | does not ask you to assign which rate goes with which site. |
|---|
| 3957 | <LI>The input files for many of the programs are now |
|---|
| 3958 | simpler, in that they do not contain options information such as specification |
|---|
| 3959 | of weights and categories. That information is now provided in separete |
|---|
| 3960 | files with default names such as <TT>weights</TT> and <TT>categories</TT>. |
|---|
| 3961 | <LI>The DNA likelihood programs can now evaluate multifurcating |
|---|
| 3962 | user trees (option <TT>U</TT>). |
|---|
| 3963 | <LI>All programs that read in user-defined trees now do so from a separate |
|---|
| 3964 | file, whose default name is <TT>intree</TT>, rather than requiring them to |
|---|
| 3965 | be in the input file as before. |
|---|
| 3966 | <LI>The DNA likelihood programs can infer the sequence at ancestral |
|---|
| 3967 | nodes in the interior of the tree. |
|---|
| 3968 | <LI>DNAPARS can now do transversion parsimony. |
|---|
| 3969 | <LI>The bootstrapping program SEQBOOT now can, instead of producing a |
|---|
| 3970 | large file containing multiple data sets, be asked instead |
|---|
| 3971 | to produce a weights file with multiple sets of weights. Many |
|---|
| 3972 | programs in this release can analyze those multiple weights together with |
|---|
| 3973 | the original data set, which saves disk space. |
|---|
| 3974 | <LI>The bootstrapping program SEQBOOT can pass weights and categories |
|---|
| 3975 | information through to a multiple weights file or a multiple categories |
|---|
| 3976 | file. |
|---|
| 3977 | <LI>SEQBOOT can also convert sequence files from Interleaved to |
|---|
| 3978 | Sequential form, or back. |
|---|
| 3979 | <LI>SEQBOOT can also write a sequence data file into a preliminary version of |
|---|
| 3980 | a new XML format which is being defined for sequence alignments, |
|---|
| 3981 | for use by programs that need XML input |
|---|
| 3982 | (none of the current PHYLIP programs yet need this format, but it |
|---|
| 3983 | will be useful in the future). |
|---|
| 3984 | <LI>RETREE can now write tree out into a preliminary version of a new XML tree |
|---|
| 3985 | file format which is in the process of being defined. |
|---|
| 3986 | <LI>The Kishino-Hasegawa-Templeton (KHT) test which compares user-defined |
|---|
| 3987 | trees (option U) is now joined by the Shimodaira-Hasegawa (SH) test |
|---|
| 3988 | (Shimodaira and Hasegawa, 1999) which corrects for comparisons among |
|---|
| 3989 | multiple tests. This avoids a statistical problem with multiple user trees. |
|---|
| 3990 | <LI>CONTRAST can now carry out an analysis that takes into account |
|---|
| 3991 | within-species variation, according to a model similar (but not |
|---|
| 3992 | identical) to that introduced by Michael Lynch (1990) |
|---|
| 3993 | <LI>A new program, TREEDIST, computes the Robinson-Foulds symmetric |
|---|
| 3994 | difference distance among trees. This measures the number of branches in |
|---|
| 3995 | the trees that are present in one but not the other. |
|---|
| 3996 | <LI>FITCH and KITSCH now have an option to make trees by the |
|---|
| 3997 | minimum evolution distance matrix method. |
|---|
| 3998 | <LI>The protein parsimony program PROTPARS now allows you to choose among |
|---|
| 3999 | a number of different genetic codes such as mitochondrial codes. |
|---|
| 4000 | <LI>The consensus tree program CONSENSE |
|---|
| 4001 | can compute the M<SUB>l</SUB> family of consensus tree methods, which |
|---|
| 4002 | generalize the Majority Rule consensus tree method. It can |
|---|
| 4003 | also compute our extended Majority Rule consensus (which is |
|---|
| 4004 | Majority Rule with some additional groups added to resolve the |
|---|
| 4005 | tree more completely), and it can also compute the original |
|---|
| 4006 | Majority Rule consensus tree method which does not add these |
|---|
| 4007 | extra groups. It can also |
|---|
| 4008 | compute the Strict consensus. |
|---|
| 4009 | <LI>The tree-drawing programs DRAWGRAM and DRAWTREE have a number of new |
|---|
| 4010 | options of kinds of file they can produce, including Windows Bitmap files, |
|---|
| 4011 | files for the Idraw and FIG X windows drawing programs, the POV ray-tracer, |
|---|
| 4012 | and even VRML Virtual Reality Markup Language files that will enable you |
|---|
| 4013 | to wander around the tree using a VRML plugin for your browser, such as |
|---|
| 4014 | Cosmo Player. |
|---|
| 4015 | <LI>DRAWTREE now uses my new Equal Daylight Algorithm to draw unrooted |
|---|
| 4016 | trees. This gives a much better-looking tree. Of course, competing programs |
|---|
| 4017 | such as TREEVIEW and PAUP draw trees that look just as good - because they |
|---|
| 4018 | too have started to use my method (with my encouragement). DRAWTREE also |
|---|
| 4019 | can use another algorithm, the n-body method. |
|---|
| 4020 | <LI>The tree-drawing programs can now produce trees across multiple |
|---|
| 4021 | pages, which is handy for looking at trees with very large numbers |
|---|
| 4022 | of tips, and for producing giant diagrams by pasting together |
|---|
| 4023 | multiple sheets of paper. |
|---|
| 4024 | </UL> |
|---|
| 4025 | <P> |
|---|
| 4026 | There are many more, lesser features added as well. |
|---|
| 4027 | <P> |
|---|
| 4028 | <A NAME="future"><HR><P></A> |
|---|
| 4029 | <DIV ALIGN="CENTER"> |
|---|
| 4030 | <H2>Coming Attractions, Future Plans</H2></DIV> |
|---|
| 4031 | <P> |
|---|
| 4032 | There are some obvious deficiencies in this version. Some of these |
|---|
| 4033 | holes will be filled in the next few releases (leading to version |
|---|
| 4034 | 4.0). They include: |
|---|
| 4035 | <OL> |
|---|
| 4036 | <LI>A program to align molecular sequences on a predefined User Tree may |
|---|
| 4037 | ultimately be included. This will allow alignment and phylogeny |
|---|
| 4038 | reconstruction to procede iteratively by successive runs of two programs, one |
|---|
| 4039 | aligning on a tree and the other finding a better tree based on that alignment. |
|---|
| 4040 | In the shorter run a simple two-sequence alignment program may be included. |
|---|
| 4041 | <LI>An interactive "likelihood explorer" for DNA sequences will be written. |
|---|
| 4042 | This will allow, either with or without the assumption of a molecular |
|---|
| 4043 | clock, trees to be varied interactively so that the user can get a much |
|---|
| 4044 | better feel for the shape of the likelihood surface. Likelihood will be |
|---|
| 4045 | able to be plotted against branch lengths for any branch. |
|---|
| 4046 | <LI>If possible we will find some way of correcting for purine/pyrimidine |
|---|
| 4047 | richness variations among species, within the framework of the maximum |
|---|
| 4048 | likelihood programs. That they maximum likelihood programs do not allow |
|---|
| 4049 | for base composition variation is their major limitation at the moment. |
|---|
| 4050 | <LI>The Hidden Markov Model (regional rates) option of DNAML and DNAMLK will |
|---|
| 4051 | be generalized to allow |
|---|
| 4052 | for rates at sites to gradually change as one moves along the tree, |
|---|
| 4053 | in an attempt to implement Fitch and Markowitz's (1970) notion of "covarions". |
|---|
| 4054 | <LI>Obviously we need to start thinking about a more visual mouse/windows |
|---|
| 4055 | interface, but only if that can be used on X windows, Macintoshes, and |
|---|
| 4056 | Windows. |
|---|
| 4057 | <LI>Program PENNY and its relatives will improved so as to run faster |
|---|
| 4058 | and find all most parsimonious trees more quickly. |
|---|
| 4059 | <LI>A more sophisticated compatibility program should be included, if I can |
|---|
| 4060 | find one. |
|---|
| 4061 | <LI>An "evolutionary clock" version of CONTML will be done, and the same |
|---|
| 4062 | may also be done for RESTML. |
|---|
| 4063 | <LI>We are gradually generalizing the tree structures in the programs to |
|---|
| 4064 | infer multifurcating trees as well as bifurcating ones. |
|---|
| 4065 | We should be able to have any program read any tree and know what to do |
|---|
| 4066 | with it, without the user having to fret about whether an unrooted tree was |
|---|
| 4067 | fed to a program that needs a rooted tree. |
|---|
| 4068 | <LI>We are economizing on the size of the source code, and enforcing some |
|---|
| 4069 | standardization of it, by putting frequently used routines in separate |
|---|
| 4070 | files which can be linked into various programs. This will enforce |
|---|
| 4071 | a rather complete standardization of our code. |
|---|
| 4072 | <LI>We will move our code to an object-oriented |
|---|
| 4073 | language, most lkely C++. One could describe the language that version |
|---|
| 4074 | 3.4 was written in as "Pascal", version 3.5 as "Pascal written in C", |
|---|
| 4075 | version 3.6 as "C written in C", and maybe version 4.0 as "C++ written |
|---|
| 4076 | in C" and then 4.1 as "C++ written in C++". At least that scenario |
|---|
| 4077 | is one possibility. |
|---|
| 4078 | </OL> |
|---|
| 4079 | <P> |
|---|
| 4080 | Much of the future development of the package will be in the DNA and protein |
|---|
| 4081 | likelihood programs and the distance matrix programs. This is for several |
|---|
| 4082 | reasons. First, I am more interested in those problems. Second, collection of |
|---|
| 4083 | molecular data is increasing rapidly, and those programs have the most promise |
|---|
| 4084 | for future development |
|---|
| 4085 | for those data. |
|---|
| 4086 | <P> |
|---|
| 4087 | <A NAME="endorsements"><HR><P></A> |
|---|
| 4088 | <DIV ALIGN="CENTER"> |
|---|
| 4089 | <H2>Endorsements</H2></DIV> |
|---|
| 4090 | <P> |
|---|
| 4091 | Here are some comments people have made in print about PHYLIP. Explanatory |
|---|
| 4092 | material in square brackets is my own. They fall naturally into two groups: |
|---|
| 4093 | <P> |
|---|
| 4094 | <H3>From the pages of <I>Cladistics</I>:</H3> |
|---|
| 4095 | <P> |
|---|
| 4096 | <BLOCKQUOTE> |
|---|
| 4097 | "Under no circumstances can we recommend PHYLIP/WAG [their name for the |
|---|
| 4098 | Wagner parsimony option of MIX]." |
|---|
| 4099 | <DIV ALIGN="RIGHT"> |
|---|
| 4100 | Luckow, M. and R. A. Pimentel (1985) |
|---|
| 4101 | </DIV> |
|---|
| 4102 | </BLOCKQUOTE> |
|---|
| 4103 | <P> |
|---|
| 4104 | <BLOCKQUOTE> |
|---|
| 4105 | "PHYLIP has not proven very effective in implementing parsimony (Luckow and |
|---|
| 4106 | Pimentel, 1985)." |
|---|
| 4107 | <DIV ALIGN="RIGHT"> |
|---|
| 4108 | J. Carpenter (1987a) |
|---|
| 4109 | </DIV> |
|---|
| 4110 | </BLOCKQUOTE> |
|---|
| 4111 | <P> |
|---|
| 4112 | <BLOCKQUOTE> |
|---|
| 4113 | "... PHYLIP. This is the computer program where every newsletter concerning |
|---|
| 4114 | it is mostly bug-catching, some of which have been put there by previous |
|---|
| 4115 | corrections. As Platnick (1987) documents, through dint of much labor useful |
|---|
| 4116 | results may be attained with this program, but I would suggest an |
|---|
| 4117 | easier way: FORMAT b:" |
|---|
| 4118 | <DIV ALIGN="RIGHT"> |
|---|
| 4119 | J. Carpenter (1987b) |
|---|
| 4120 | </DIV> |
|---|
| 4121 | </BLOCKQUOTE> |
|---|
| 4122 | <P> |
|---|
| 4123 | <BLOCKQUOTE> |
|---|
| 4124 | "PHYLIP is bug-infested and both less effective and orders of |
|---|
| 4125 | magnitude slower than other programs ...." |
|---|
| 4126 | <DIV ALIGN="RIGHT"> |
|---|
| 4127 | "T. N. Nayenizgani" [J. S. Farris] (1990) |
|---|
| 4128 | </DIV> |
|---|
| 4129 | </BLOCKQUOTE> |
|---|
| 4130 | <P> |
|---|
| 4131 | <BLOCKQUOTE> |
|---|
| 4132 | "Hennig86 [by J. S. Farris] provides such substantial improvements over |
|---|
| 4133 | previously available programs (for both mainframes and microcomputers) that |
|---|
| 4134 | it should now become the tool of choice for practising systematists." |
|---|
| 4135 | <DIV ALIGN="RIGHT"> |
|---|
| 4136 | N. Platnick (1989) |
|---|
| 4137 | </DIV> |
|---|
| 4138 | </BLOCKQUOTE> |
|---|
| 4139 | <P> |
|---|
| 4140 | <H3>... and in the pages of other journals:</H3> |
|---|
| 4141 | <P> |
|---|
| 4142 | <BLOCKQUOTE> |
|---|
| 4143 | "The availability, within PHYLIP of distance, compatibility, maximum likelihood, |
|---|
| 4144 | and generalized `invariants' algorithms (Cavender and Felsenstein, 1987) sets |
|---|
| 4145 | it apart from other packages .... One of the strengths of PHYLIP is its |
|---|
| 4146 | documentation ...." |
|---|
| 4147 | <DIV ALIGN="RIGHT"> |
|---|
| 4148 | Michael J. Sanderson (1990) |
|---|
| 4149 | </DIV> |
|---|
| 4150 | <EM>(Sanderson also criticizes PHYLIP for slowness and inflexibility of its |
|---|
| 4151 | parsimony algorithms, and compliments other packages on their strengths).</EM> |
|---|
| 4152 | </BLOCKQUOTE> |
|---|
| 4153 | <P> |
|---|
| 4154 | <BLOCKQUOTE> |
|---|
| 4155 | "This package of programs has gradually become a basic necessity to anyone |
|---|
| 4156 | working seriously on various aspects of phylogenetic inference .... The package |
|---|
| 4157 | includes more programs than any other known phylogeny package. But it is not |
|---|
| 4158 | just a collection of cladistic and related programs. The package has great |
|---|
| 4159 | value added to the whole, and for this it is unique and of extreme |
|---|
| 4160 | importance .... its various strengths are in the great array of methods |
|---|
| 4161 | provided ...." |
|---|
| 4162 | <DIV ALIGN="RIGHT"> |
|---|
| 4163 | Bernard R. Baum (1989) |
|---|
| 4164 | </DIV> |
|---|
| 4165 | </BLOCKQUOTE> |
|---|
| 4166 | <P> |
|---|
| 4167 | (note also W. Fink's critical remarks (1986) on version 2.8 of PHYLIP). |
|---|
| 4168 | <P> |
|---|
| 4169 | <A NAME="references"><HR><P></A> |
|---|
| 4170 | <DIV ALIGN="CENTER"> |
|---|
| 4171 | <H2>References for the Documentation Files</H2></DIV> |
|---|
| 4172 | <P> |
|---|
| 4173 | In the documentation files that follow I frequently refer to papers |
|---|
| 4174 | in the literature. In order to centralize the references they are given |
|---|
| 4175 | in this section. The chapter by David Swofford, |
|---|
| 4176 | Gary Olsen, Peter Waddell, and David Hillis |
|---|
| 4177 | (1996) is also an excellent review of the issues in phylogeny |
|---|
| 4178 | reconstruction. |
|---|
| 4179 | If you want to find further papers beyond these, my |
|---|
| 4180 | Quarterly Review of Biology review of 1982 and my Annual Review of Genetics |
|---|
| 4181 | review of 1988 list many further references. |
|---|
| 4182 | <P> |
|---|
| 4183 | Adams, E. N. 1972. Consensus techniques and the comparison of |
|---|
| 4184 | taxonomic trees. <I>Systematic Zoology</I> <B>21:</B> 390-397. |
|---|
| 4185 | <P> |
|---|
| 4186 | Adams, E. N. 1986. N-trees as nestings: complexity, similarity, and |
|---|
| 4187 | consensus. <I>Journal of Classification</I> <B>3:</B> 299-317. |
|---|
| 4188 | <P> |
|---|
| 4189 | Archie, J. W. 1989. A randomization test for phylogenetic information in |
|---|
| 4190 | systematic data. <I>Systematic Zoology</I> <B>38:</B> 219-252. |
|---|
| 4191 | <P> |
|---|
| 4192 | Barry, D., and J. A. Hartigan. 1987. Statistical analysis of hominoid |
|---|
| 4193 | molecular evolution. <I>Statistical Science</I> <B>2:</B> 191-210. |
|---|
| 4194 | <P> |
|---|
| 4195 | Baum, B. R. 1989. PHYLIP: Phylogeny Inference Package. Version 3.2. (Software |
|---|
| 4196 | review). <I>Quarterly Review of Biology</I> <B>64:</B> 539-541. |
|---|
| 4197 | <P> |
|---|
| 4198 | Bron, C., and J. Kerbosch. 1973. Algorithm 457: Finding all cliques |
|---|
| 4199 | of an undirected graph. <I>Communications of the Association for Computing Machinery</I> <B>16:</B> 575-577. |
|---|
| 4200 | <P> |
|---|
| 4201 | Camin, J. H., and R. R. Sokal. 1965. A method for deducing branching |
|---|
| 4202 | sequences in phylogeny. <I>Evolution</I> <B>19:</B> 311-326. |
|---|
| 4203 | <P> |
|---|
| 4204 | Carpenter, J. 1987a. A report on the Society for the Study of Evolution |
|---|
| 4205 | workshop "Computer Programs for Inferring Phylogenies". <I>Cladistics</I> <B>3:</B> |
|---|
| 4206 | 363-375. |
|---|
| 4207 | <P> |
|---|
| 4208 | Carpenter, J. 1987b. Cladistics of cladists. <I>Cladistics</I> <B>3:</B> 363-375. |
|---|
| 4209 | <P> |
|---|
| 4210 | Cavalli-Sforza, L. L., and A. W. F. Edwards. 1967. Phylogenetic |
|---|
| 4211 | analysis: models and estimation procedures. <I>Evolution</I> <B>32:</B> 550-570 |
|---|
| 4212 | (also <I>American Journal of Human Genetics</I> <B>19:</B> 233-257). |
|---|
| 4213 | <P> |
|---|
| 4214 | Cavender, J. A. and J. Felsenstein. 1987. Invariants of phylogenies in a |
|---|
| 4215 | simple case with discrete states. <I>Journal of Classification</I> <B>4:</B> 57-71. |
|---|
| 4216 | <P> |
|---|
| 4217 | Churchill, G.A. 1989. Stochastic models for heterogeneous DNA sequences. |
|---|
| 4218 | <I>Bulletin of Mathematical Biology</I> <B>51:</B> 79-94. |
|---|
| 4219 | <P> |
|---|
| 4220 | Conn, E. E. and P. K. Stumpf. 1963. <I>Outlines of Biochemistry.</I> John Wiley |
|---|
| 4221 | and Sons, New York. |
|---|
| 4222 | <P> |
|---|
| 4223 | Day, W. H. E. 1983. Computationally difficult parsimony problems in |
|---|
| 4224 | phylogenetic systematics. <I>Journal of Theoretical Biology</I> <B>103:</B> |
|---|
| 4225 | 429-438. |
|---|
| 4226 | <P> |
|---|
| 4227 | Dayhoff, M. O. and R. V. Eck. 1968. <I>Atlas of Protein Sequence |
|---|
| 4228 | and Structure 1967-1968.</I> National Biomedical Research Foundation, |
|---|
| 4229 | Silver Spring, Maryland. |
|---|
| 4230 | <P> |
|---|
| 4231 | Dayhoff, M. O., R. M. Schwartz, and B. C. Orcutt. 1979. A model of |
|---|
| 4232 | evolutionary change in proteins. pp. 345-352 in <I>Atlas of |
|---|
| 4233 | Protein Sequence and Structure, volume 5, supplement 3, 1978,</I> ed. |
|---|
| 4234 | M. O. Dayhoff. National Biomedical Research Foundation, Silver Spring, Maryland |
|---|
| 4235 | . |
|---|
| 4236 | <P> |
|---|
| 4237 | Dayhoff, M. O. 1979. <I>Atlas of Protein Sequence and Structure, Volume 5, |
|---|
| 4238 | Supplement 3, 1978.</I> National Biomedical Research Foundation, Washington, D.C. |
|---|
| 4239 | <P> |
|---|
| 4240 | DeBry, R. W. and N. A. Slade. 1985. Cladistic analysis of restriction |
|---|
| 4241 | endonuclease cleavage maps within a maximum-likelihood framework. |
|---|
| 4242 | <I>Systematic Zoology</I> <B>34:</B> 21-34. |
|---|
| 4243 | <P> |
|---|
| 4244 | Dempster, A. P., N. M. Laird, and D. B. Rubin. 1977. Maximum |
|---|
| 4245 | likelihood from incomplete data via the EM algorithm. <I>Journal of the Royal Statistical Society B</I> <B>39:</B> 1-38. |
|---|
| 4246 | <P> |
|---|
| 4247 | Eck, R. V., and M. O. Dayhoff. 1966. <I>Atlas of Protein Sequence and |
|---|
| 4248 | Structure 1966.</I> National Biomedical Research Foundation, Silver |
|---|
| 4249 | Spring, Maryland. |
|---|
| 4250 | <P> |
|---|
| 4251 | Edwards, A. W. F., and L. L. Cavalli-Sforza. 1964. Reconstruction of |
|---|
| 4252 | evolutionary trees. pp. 67-76 in <I>Phenetic and Phylogenetic |
|---|
| 4253 | Classification,</I> ed. V. H. Heywood and J. McNeill. Systematics |
|---|
| 4254 | Association Volume No. 6. Systematics Association, London. |
|---|
| 4255 | <P> |
|---|
| 4256 | Estabrook, G. F., C. S. Johnson, Jr., and F. R. McMorris. 1976a. A |
|---|
| 4257 | mathematical foundation for the analysis of character |
|---|
| 4258 | compatibility. <I>Mathematical Biosciences</I> <B>23:</B> 181-187. |
|---|
| 4259 | <P> |
|---|
| 4260 | Estabrook, G. F., C. S. Johnson, Jr., and F. R. McMorris. 1976b. An |
|---|
| 4261 | algebraic analysis of cladistic characters. <I>Discrete Mathematics</I> <B>16:</B> 141-147. |
|---|
| 4262 | <P> |
|---|
| 4263 | Estabrook, G. F., F. R. McMorris, and C. A. Meacham. 1985. Comparison of |
|---|
| 4264 | undirected phylogenetic trees based on subtrees of four evolutionary units. |
|---|
| 4265 | <I>Systematic Zoology</I> <B>34:</B> 193-200. |
|---|
| 4266 | <P> |
|---|
| 4267 | Faith, D. P. 1990. Chance marsupial relationships. <I>Nature</I><B>345:</B> 393-394. |
|---|
| 4268 | <P> |
|---|
| 4269 | Faith, D. P. and P. S. Cranston. 1991. Could a cladogram this short have |
|---|
| 4270 | arisen by chance alone?: On permutation tests for cladistic |
|---|
| 4271 | structure. <I>Cladistics</I> <B>7:</B> 1-28. |
|---|
| 4272 | <P> |
|---|
| 4273 | Farris, J. S. 1977. Phylogenetic analysis under Dollo's Law. <I>Systematic Zoology</I> <B>26:</B> 77-88. |
|---|
| 4274 | <P> |
|---|
| 4275 | Farris, J. S. 1978a. Inferring phylogenetic trees from chromosome |
|---|
| 4276 | inversion data. <I>Systematic Zoology</I> <B>27:</B> 275-284. |
|---|
| 4277 | <P> |
|---|
| 4278 | Farris, J. S. 1981. Distance data in phylogenetic analysis. pp. 3-23 |
|---|
| 4279 | in <I>Advances in Cladistics: Proceedings of the first meeting of the |
|---|
| 4280 | Willi Hennig Society,</I> ed. V. A. Funk and D. R. Brooks. New York |
|---|
| 4281 | Botanical Garden, Bronx, New York. |
|---|
| 4282 | <P> |
|---|
| 4283 | Farris, J. S. 1983. The logical basis of phylogenetic analysis. pp. 1-47 |
|---|
| 4284 | in <I>Advances in Cladistics, Volume 2, Proceedings of the Second Meeting of |
|---|
| 4285 | the Willi Hennig Society.</I> ed. Norman I. Platnick and V. A. Funk. Columbia |
|---|
| 4286 | University Press, New York. |
|---|
| 4287 | <P> |
|---|
| 4288 | Farris, J. S. 1985. Distance data revisited. <I>Cladistics</I> <B>1:</B> 67-85. |
|---|
| 4289 | <P> |
|---|
| 4290 | Farris, J. S. 1986. Distances and statistics. <I>Cladistics</I> <B>2:</B> 144-157. |
|---|
| 4291 | <P> |
|---|
| 4292 | Farris, J. S. ["T. N. Nayenizgani"]. 1990. The systematics association |
|---|
| 4293 | enters its golden years (review of <I>Prospects in Systematics</I>, ed. D. |
|---|
| 4294 | Hawksworth). <I>Cladistics</I> <B>6:</B> 307-314. |
|---|
| 4295 | <P> |
|---|
| 4296 | Felsenstein, J. 1973a. Maximum likelihood and minimum-steps methods |
|---|
| 4297 | for estimating evolutionary trees from data on discrete characters. |
|---|
| 4298 | <I>Systematic Zoology</I> <B>22:</B> 240-249. |
|---|
| 4299 | <P> |
|---|
| 4300 | Felsenstein, J. 1973b. Maximum-likelihood estimation of evolutionary |
|---|
| 4301 | trees from continuous characters. <I>American Journal of Human Genetics</I> <B>25:</B> |
|---|
| 4302 | 471-492. |
|---|
| 4303 | <P> |
|---|
| 4304 | Felsenstein, J. 1978a. The number of evolutionary trees. <I>Systematic Zoology</I> <B>27:</B> 27-33. |
|---|
| 4305 | <P> |
|---|
| 4306 | Felsenstein, J. 1978b. Cases in which parsimony and compatibility |
|---|
| 4307 | methods will be positively misleading. <I>Systematic Zoology</I> <B>27:</B> |
|---|
| 4308 | 401-410. |
|---|
| 4309 | <P> |
|---|
| 4310 | Felsenstein, J. 1979. Alternative methods of phylogenetic inference |
|---|
| 4311 | and their interrelationship. <I>Systematic Zoology</I> <B>28:</B> 49-62. |
|---|
| 4312 | <P> |
|---|
| 4313 | Felsenstein, J. 1981a. Evolutionary trees from DNA sequences: a |
|---|
| 4314 | maximum likelihood approach. <I>Journal of Molecular Evolution</I> <B>17:</B> 368-376. |
|---|
| 4315 | <P> |
|---|
| 4316 | Felsenstein, J. 1981b. A likelihood approach to character weighting |
|---|
| 4317 | and what it tells us about parsimony and compatibility. <I>Biological Journal of the Linnean Society</I> <B>16:</B> 183-196. |
|---|
| 4318 | <P> |
|---|
| 4319 | Felsenstein, J. 1981c. Evolutionary trees from gene frequencies and |
|---|
| 4320 | quantitative characters: finding maximum likelihood estimates. |
|---|
| 4321 | <I>Evolution</I> <B>35:</B> 1229-1242. |
|---|
| 4322 | <P> |
|---|
| 4323 | Felsenstein, J. 1982. Numerical methods for inferring evolutionary |
|---|
| 4324 | trees. <I>Quarterly Review of Biology</I> <B>57:</B> 379-404. |
|---|
| 4325 | <P> |
|---|
| 4326 | Felsenstein, J. 1983b. Parsimony in systematics: biological and |
|---|
| 4327 | statistical issues. <I>Annual Review of Ecology and Systematics</I> <B>14:</B> 313-333. |
|---|
| 4328 | <P> |
|---|
| 4329 | Felsenstein, J. 1984a. Distance methods for inferring phylogenies: a |
|---|
| 4330 | justification. <I>Evolution</I> <B>38:</B> 16-24. |
|---|
| 4331 | <P> |
|---|
| 4332 | Felsenstein, J. 1984b. The statistical approach to inferring |
|---|
| 4333 | evolutionary trees and what it tells us about parsimony and |
|---|
| 4334 | compatibility. pp. 169-191 in: <I>Cladistics: Perspectives in the |
|---|
| 4335 | Reconstruction of Evolutionary History,</I> edited by T. Duncan and T. F. |
|---|
| 4336 | Stuessy. Columbia University Press, New York. |
|---|
| 4337 | <P> |
|---|
| 4338 | Felsenstein, J. 1985a. Confidence limits on phylogenies with a molecular |
|---|
| 4339 | clock. <I>Systematic Zoology</I> <B>34:</B> 152-161. |
|---|
| 4340 | <P> |
|---|
| 4341 | Felsenstein, J. 1985b. Confidence limits on phylogenies: an approach |
|---|
| 4342 | using the bootstrap. <I>Evolution</I> <B>39:</B> 783-791. |
|---|
| 4343 | <P> |
|---|
| 4344 | Felsenstein, J. 1985c. Phylogenies from gene frequencies: a statistical |
|---|
| 4345 | problem. <I>Systematic Zoology</I> <B>34:</B> 300-311. |
|---|
| 4346 | <P> |
|---|
| 4347 | Felsenstein, J. 1985d. Phylogenies and the comparative method. <I>American Naturalist</I> <B>125:</B> 1-12. |
|---|
| 4348 | <P> |
|---|
| 4349 | Felsenstein, J. 1986. Distance methods: a reply to Farris. <I>Cladistics</I> <B>2:</B> |
|---|
| 4350 | 130-144. |
|---|
| 4351 | <P> |
|---|
| 4352 | Felsenstein, J. and E. Sober. 1986. Parsimony and likelihood: an |
|---|
| 4353 | exchange. <I>Systematic Zoology</I> <B>35:</B> 617-626. |
|---|
| 4354 | <P> |
|---|
| 4355 | Felsenstein, J. 1988a. Phylogenies and quantitative characters. <I>Annual Review of Ecology and Systematics</I> <B>19:</B> 445-471. |
|---|
| 4356 | <P> |
|---|
| 4357 | Felsenstein, J. 1988b. Phylogenies from molecular sequences: inference and |
|---|
| 4358 | reliability. <I>Annual Review of Genetics</I> <B>22:</B> 521-565. |
|---|
| 4359 | <P> |
|---|
| 4360 | Felsenstein, J. 1992. Phylogenies from restriction sites, a |
|---|
| 4361 | maximum likelihood approach. <I>Evolution</I> <B>46:</B> 159-173. |
|---|
| 4362 | <P> |
|---|
| 4363 | Felsenstein, J. and G. A. Churchill. 1996. |
|---|
| 4364 | A hidden Markov model approach to variation among sites in rate of evolution |
|---|
| 4365 | <I>Molecular Biology and Evolution</I> <B>13:</B> 93-104. |
|---|
| 4366 | <P> |
|---|
| 4367 | Fink, W. L. 1986. Microcomputers and phylogenetic analysis. <I>Science</I> <B>234:</B> 1135-1139. |
|---|
| 4368 | <P> |
|---|
| 4369 | Fitch, W. M., and E. Markowitz. 1970. An improved method for determining |
|---|
| 4370 | codon variability in a gene and its application to the rate of fixation of |
|---|
| 4371 | mutations in evolution. <I>Biochemical Genetics</I> <B>4:</B> 579-593. |
|---|
| 4372 | <P> |
|---|
| 4373 | Fitch, W. M., and E. Margoliash. 1967. Construction of phylogenetic |
|---|
| 4374 | trees. <I>Science</I> <B>155:</B> 279-284. |
|---|
| 4375 | <P> |
|---|
| 4376 | Fitch, W. M. 1971. Toward defining the course of evolution: minimum |
|---|
| 4377 | change for a specified tree topology. <I>Systematic Zoology</I> <B>20:</B> 406-416. |
|---|
| 4378 | <P> |
|---|
| 4379 | Fitch, W. M. 1975. Toward finding the tree of maximum parsimony. pp. 189-230 |
|---|
| 4380 | in Proceedings of the Eighth International Conference on Numerical Taxonomy, |
|---|
| 4381 | ed. G. F. Estabrook. W. H. Freeman, San Francisco. |
|---|
| 4382 | <P> |
|---|
| 4383 | Fitch, W. M. and E. Markowitz. 1970. An improved method for determining |
|---|
| 4384 | codon variability and its application to the rate of fixation of mutations |
|---|
| 4385 | in evolution. <I>Biochemical Genetics</I> <B>4:</B> 579-593. |
|---|
| 4386 | <P> |
|---|
| 4387 | George, D. G., L. T. Hunt, and W. C. Barker. 1988. Current methods in |
|---|
| 4388 | sequence comparison and analysis. pp. 127-149 in Macromolecular Sequencing |
|---|
| 4389 | and Synthesis, ed. D. H. Schlesinger. Alan R. Liss, New York. |
|---|
| 4390 | <P> |
|---|
| 4391 | Gomberg, D. 1966. "Bayesian" post-diction in an evolution process. |
|---|
| 4392 | unpublished manuscript: University of Pavia, Italy. |
|---|
| 4393 | <P> |
|---|
| 4394 | Graham, R. L., and L. R. Foulds. 1982. Unlikelihood that minimal |
|---|
| 4395 | phylogenies for a realistic biological study can be constructed in |
|---|
| 4396 | reasonable computational time. <I>Mathematical Biosciences</I> <B>60:</B> 133-142. |
|---|
| 4397 | <P> |
|---|
| 4398 | Hasegawa, M. and T. Yano. 1984a. Maximum likelihood method of phylogenetic |
|---|
| 4399 | inference from DNA sequence data. <I>Bulletin of the Biometric Society of Japan</I> No. 5: 1-7. |
|---|
| 4400 | <P> |
|---|
| 4401 | Hasegawa, M. and T. Yano. 1984b. Phylogeny and classification of |
|---|
| 4402 | Hominoidea as inferred from DNA sequence data. <I>Proceedings of the Japan Academy</I> <B>60 B:</B> 389-392. |
|---|
| 4403 | <P> |
|---|
| 4404 | Hasegawa, M., Y. Iida, T. Yano, F. Takaiwa, and M. Iwabuchi. 1985a. |
|---|
| 4405 | Phylogenetic relationships among eukaryotic kingdoms as inferred from |
|---|
| 4406 | ribosomal RNA sequences. Journal of Molecular Evolution 22: 32-38. |
|---|
| 4407 | <P> |
|---|
| 4408 | Hasegawa, M., H. Kishino, and T. Yano. 1985b. Dating of the human-ape |
|---|
| 4409 | splitting by a molecular clock of mitochondrial DNA. Journal of Molecular |
|---|
| 4410 | Evolution 22: 160-174. |
|---|
| 4411 | <P> |
|---|
| 4412 | Hendy, M. D., and D. Penny. 1982. Branch and bound algorithms to |
|---|
| 4413 | determine minimal evolutionary trees. <I>Mathematical Biosciences</I> <B>59:</B> 277-290. |
|---|
| 4414 | <P> |
|---|
| 4415 | Higgins, D. G. and P. M. Sharp. 1989. Fast and sensitive |
|---|
| 4416 | multiple sequence alignments on a microcomputer. <I>Computer Applications in the Biological Sciences (CABIOS)</I> <B>5:</B> 151-153. |
|---|
| 4417 | <P> |
|---|
| 4418 | Hochbaum, D. S. and A. Pathria. 1997. Path costs in evolutionary |
|---|
| 4419 | tree reconstruction. <I>Journal of Computational Biology</I> <B>4:</B> 163-175. |
|---|
| 4420 | <P> |
|---|
| 4421 | Holmquist, R., M. M. Miyamoto, and M. Goodman. 1988. Higher-primate |
|---|
| 4422 | phylogeny - why can't we decide? <I>Molecular Biology and Evolution</I> <B>5:</B> 201-216. |
|---|
| 4423 | <P> |
|---|
| 4424 | Inger, R. F. 1967. The development of a phylogeny of frogs. |
|---|
| 4425 | <I>Evolution</I> <B>21:</B> 369-384. |
|---|
| 4426 | <P> |
|---|
| 4427 | Jin, L. and M. Nei. 1990. Limitations of the evolutionary parsimony method |
|---|
| 4428 | of phylogenetic analysis. <I>Molecular Biology and Evolution</I> <B>7:</B> 82-102. |
|---|
| 4429 | <P> |
|---|
| 4430 | Jones, D. T., W. R. Taylor and J. M. Thornton. 1992. The rapid generation of |
|---|
| 4431 | mutation data matrices from protein sequences. <I>Computer Applications |
|---|
| 4432 | in the Biosciences (CABIOS)</I> <B>8:</B> 275-282. |
|---|
| 4433 | <P> |
|---|
| 4434 | Jukes, T. H. and C. R. Cantor. 1969. Evolution of protein molecules. pp. |
|---|
| 4435 | 21-132 in Mammalian Protein Metabolism, ed. H. N. Munro. Academic Press, New |
|---|
| 4436 | York. |
|---|
| 4437 | <P> |
|---|
| 4438 | Kidd, K. K. and L. A. Sgaramella-Zonta. 1971. Phylogenetic analysis: concepts |
|---|
| 4439 | and methods. <I>American Journal of Human Genetics</I> <B>23:</B> 235-252. |
|---|
| 4440 | <P> |
|---|
| 4441 | Kim, J. and M. A. Burgman. 1988. Accuracy of phylogenetic-estimation |
|---|
| 4442 | methods using simulated allele-frequency data. <I>Evolution</I> <B>42:</B> 596-602. |
|---|
| 4443 | <P> |
|---|
| 4444 | Kimura, M. 1980. A simple model for estimating evolutionary rates of base |
|---|
| 4445 | substitutions through comparative studies of nucleotide sequences. <I>Journal of Molecular Evolution</I> <B>16:</B> 111-120. |
|---|
| 4446 | <P> |
|---|
| 4447 | Kimura, M. 1983. The Neutral Theory of Molecular Evolution. Cambridge |
|---|
| 4448 | University Press, Cambridge. |
|---|
| 4449 | <P> |
|---|
| 4450 | Kingman, J. F. C. 1982a. The coalescent. <I>Stochastic Processes and Their Applications</I> <B>13:</B> 235-248. |
|---|
| 4451 | <P> |
|---|
| 4452 | Kingman, J. F. C. 1982b. On the genealogy of large populations. <I>Journal of Applied Probability</I> <B>19A:</B> 27-43. |
|---|
| 4453 | <P> |
|---|
| 4454 | Kishino, H. and M. Hasegawa. 1989. Evaluation of the maximum likelihood |
|---|
| 4455 | estimate of the evolutionary tree topologies from DNA sequence data, and the |
|---|
| 4456 | branching order in Hominoidea. <I>Journal of Molecular Evolution</I> <B>29:</B> 170-179. |
|---|
| 4457 | <P> |
|---|
| 4458 | Kluge, A. G., and J. S. Farris. 1969. Quantitative phyletics and the |
|---|
| 4459 | evolution of anurans. <I>Systematic Zoology</I> <B>18:</B> 1-32. |
|---|
| 4460 | <P> |
|---|
| 4461 | Kuhner, M. K. and J. Felsenstein. 1994. A simulation comparison of |
|---|
| 4462 | phylogeny algorithms under equal and unequal evolutionary rates. |
|---|
| 4463 | <I>Molecular Biology and Evolution</I> <B>11:</B> 459-468 (Erratum <B>12:</B> 525 1995). |
|---|
| 4464 | <P> |
|---|
| 4465 | Künsch, H. R. 1989. The jackknife and the bootstrap for general stationary |
|---|
| 4466 | observations. <I>Annals of Statistics</I> <B>17:</B> 1217-1241. |
|---|
| 4467 | <P> |
|---|
| 4468 | Lake, J. A. 1987. A rate-independent technique for analysis of nucleic acid |
|---|
| 4469 | sequences: evolutionary parsimony. <I>Molecular Biology and Evolution</I> <B>4:</B> 167-191. |
|---|
| 4470 | <P> |
|---|
| 4471 | Lake, J. A. 1994. Reconstructing evolutionary trees from DNA and protein |
|---|
| 4472 | sequences: paralinear distances. |
|---|
| 4473 | <I>Proceedings of the Natonal Academy of Sciences, USA</I> <B>91:</B> 1455-1459. |
|---|
| 4474 | <P> |
|---|
| 4475 | Le Quesne, W. J. 1969. A method of selection of characters in |
|---|
| 4476 | numerical taxonomy. <I>Systematic Zoology</I> <B>18:</B> 201-205. |
|---|
| 4477 | <P> |
|---|
| 4478 | Le Quesne, W. J. 1974. The uniquely evolved character concept and its |
|---|
| 4479 | cladistic application. <I>Systematic Zoology</I> <B>23:</B> 513-517. |
|---|
| 4480 | <P> |
|---|
| 4481 | Lewis, H. R., and C. H. Papadimitriou. 1978. The efficiency of |
|---|
| 4482 | algorithms. <I>Scientific American</I> <B>238:</B> 96-109 (January issue) |
|---|
| 4483 | <P> |
|---|
| 4484 | Lockhart, P. J., M. A. Steel, M. D. Hendy, and D. Penny. 1994. |
|---|
| 4485 | Recovering evolutionary trees under a more realistic model of sequence |
|---|
| 4486 | evolution. <I>Molecular Biology and Evolution</I> <B>11:</B> 605-612. |
|---|
| 4487 | <P> |
|---|
| 4488 | López-Martínez, N.; Álvarez-Sierra, |
|---|
| 4489 | M. A. & García Moreno, E. 1986. Paleontología y |
|---|
| 4490 | Bioestratigrafía |
|---|
| 4491 | (Micromamíferos) del Mioceno medio-superior del Sector Central de |
|---|
| 4492 | la Cuenca del Duero. <I>Stvdia Geologica Salmanticensia</I> |
|---|
| 4493 | <B>22:</B> 146-191. |
|---|
| 4494 | <P> |
|---|
| 4495 | Luckow, M. and D. Pimentel. 1985. An empirical comparison of |
|---|
| 4496 | numerical Wagner computer programs. <I>Cladistics</I> <B>1:</B> 47-66. |
|---|
| 4497 | <P> |
|---|
| 4498 | Lynch, M. 1990. Methods for the analysis of comparative data in evolutionary |
|---|
| 4499 | biology. <I>Evolution</I> <B>45:</B> 1065-1080. |
|---|
| 4500 | <P> |
|---|
| 4501 | Maddison, D. R. 1991. The discovery and importance of multiple islands of |
|---|
| 4502 | most-parsimonious trees. <I>Systematic Zoology</I> <B>40:</B> 315-328. |
|---|
| 4503 | <P> |
|---|
| 4504 | Margush, T. and F. R. McMorris. 1981. Consensus n-trees. <I>Bulletin of Mathematical Biology</I> <B>43:</B> 239-244. |
|---|
| 4505 | <P> |
|---|
| 4506 | Nelson, G. 1979. Cladistic analysis and synthesis: principles and definitions, |
|---|
| 4507 | with a historical note on Adanson's <I>Familles des Plantes</I> |
|---|
| 4508 | (1763-1764). <I>Systematic Zoology</I> <B>28:</B> 1-21. |
|---|
| 4509 | <P> |
|---|
| 4510 | Nei, M. 1972. Genetic distance between populations. <I>American Naturalist</I> <B>106:</B> 283-292. |
|---|
| 4511 | <P> |
|---|
| 4512 | Nei, M. and W.-H. Li. 1979. Mathematical model for studying genetic variation |
|---|
| 4513 | in terms of restriction endonucleases. <I>Proceedings of the National Academy of Sciences, USA</I> <B>76:</B> 5269-5273. |
|---|
| 4514 | <P> |
|---|
| 4515 | Page, R. D. M. 1989. Comments on component-compatibility in historical |
|---|
| 4516 | biogeography. <I>Cladistics</I> <B>5:</B> 167-182. |
|---|
| 4517 | <P> |
|---|
| 4518 | Penny, D. and M. D. Hendy. 1985. Testing methods of evolutionary tree |
|---|
| 4519 | construction. <I>Cladistics</I> <B>1:</B> 266-278. |
|---|
| 4520 | <P> |
|---|
| 4521 | Platnick, N. 1987. An empirical comparison of microcomputer parsimony |
|---|
| 4522 | programs. <I>Cladistics</I> <B>3:</B> 121-144. |
|---|
| 4523 | <P> |
|---|
| 4524 | Platnick, N. 1989. An empirical comparison of microcomputer parsimony |
|---|
| 4525 | programs. II. <I>Cladistics</I> <B>5:</B> 145-161. |
|---|
| 4526 | <P> |
|---|
| 4527 | Reynolds, J. B., B. S. Weir, and C. C. Cockerham. 1983. Estimation of the |
|---|
| 4528 | coancestry coefficient: basis for a short-term genetic |
|---|
| 4529 | distance. <I>Genetics</I> <B>105:</B> 767-779. |
|---|
| 4530 | <P> |
|---|
| 4531 | Robinson, D. F. and L. R. Foulds. 1981. Comparison of phylogenetic trees. |
|---|
| 4532 | <I>Mathematical Biosciences</I> <B>53:</B> 131-147. |
|---|
| 4533 | <P> |
|---|
| 4534 | Rohlf, F. J. and M. C. Wooten. 1988. Evaluation of the restricted maximum |
|---|
| 4535 | likelihood method for estimating phylogenetic trees using simulated allele- |
|---|
| 4536 | frequency data. <I>Evolution</I> <B>42:</B> 581-595. |
|---|
| 4537 | <P> |
|---|
| 4538 | Rzhetsky, A., and M. Nei. 1992. Statistical properties of the ordinary |
|---|
| 4539 | least-squares, generalized least-squares, and minimum-evolution methods |
|---|
| 4540 | of phylogenetic inference. <I>Journal of Molecular Evolution</I> <B>35:</B> |
|---|
| 4541 | 367-375 . |
|---|
| 4542 | <P> |
|---|
| 4543 | Saitou, N., Nei, M. 1987. The neighbor-joining method: a new method for |
|---|
| 4544 | reconstructing phylogenetic trees. <I>Molecular Biology and Evolution</I> <B>4:</B> 406-425. |
|---|
| 4545 | <P> |
|---|
| 4546 | Sanderson, M. J. 1990. Flexible phylogeny reconstruction: a review of |
|---|
| 4547 | phylogenetic inference packages using parsimony. <I>Systematic Zoology</I> <B>39:</B> 414-420. |
|---|
| 4548 | <P> |
|---|
| 4549 | Sankoff, D. D., C. Morel, R. J. Cedergren. 1973. Evolution of 5S RNA and |
|---|
| 4550 | the nonrandomness of base replacement. <I>Nature New Biology</I> <B>245:</B> 232-234. |
|---|
| 4551 | <P> |
|---|
| 4552 | Shimodaira, H. and M. Hasegawa. 1999. Multiple comparisons of log-likelihoods |
|---|
| 4553 | with applications to phylogenetic inference. <EM>Molecular Biology and |
|---|
| 4554 | Evolution</EM> <B>16:</B> 1114-1116. |
|---|
| 4555 | <P> |
|---|
| 4556 | Sokal, R. R. and P. H. A. Sneath. 1963. <I>Principles of Numerical Taxonomy.</I> |
|---|
| 4557 | W. H. Freeman, San Francisco. |
|---|
| 4558 | <P> |
|---|
| 4559 | Smouse, P. E. and W.-H. Li. 1987. Likelihood analysis of mitochondrial |
|---|
| 4560 | restriction-cleavage patterns for the human-chimpanzee-gorilla trichotomy. |
|---|
| 4561 | <I>Evolution</I> <B>41:</B> 1162-1176. |
|---|
| 4562 | <P> |
|---|
| 4563 | Sober, E. 1983a. Parsimony in systematics: philosophical issues. <I>Annual Review of Ecology and Systematics</I> <B>14:</B> 335-357. |
|---|
| 4564 | <P> |
|---|
| 4565 | Sober, E. 1983b. A likelihood justification of parsimony. <I>Cladistics</I> <B>1:</B> 209-233. |
|---|
| 4566 | <P> |
|---|
| 4567 | Sober, E. 1988. <I>Reconstructing the Past: Parsimony, Evolution, |
|---|
| 4568 | and Inference.</I> MIT Press, Cambridge, Massachusetts. |
|---|
| 4569 | <P> |
|---|
| 4570 | Sokal, R. R., and P. H. A. Sneath. 1963. <I>Principles of Numerical |
|---|
| 4571 | Taxonomy.</I> W. H. Freeman, San Francisco. |
|---|
| 4572 | <P> |
|---|
| 4573 | Steel, M. A. 1994. Recovering a tree from the Markov leaf colourations |
|---|
| 4574 | it generates under a Markov model. <I>Applied Mathematics Letters</I> |
|---|
| 4575 | <B>7:</B> 19-23. |
|---|
| 4576 | <P> |
|---|
| 4577 | Studier, J. A. and K. J. Keppler. 1988. A note on the neighbor-joining |
|---|
| 4578 | algorithm of Saitou and Nei. <I>Molecular Biology and Evolution</I><B>5:</B> 729-731. |
|---|
| 4579 | <P> |
|---|
| 4580 | Swofford, D. L. and G. J. Olsen. 1990. Phylogeny reconstruction. Chapter |
|---|
| 4581 | 11, pages 411-501 in <I>Molecular Systematics,</I> ed. D. M. Hillis and C. Moritz. |
|---|
| 4582 | Sinauer Associates, Sunderland, Massachusetts. |
|---|
| 4583 | <P> |
|---|
| 4584 | Swofford, D. L., G. J. Olsen, P. J. Waddell, and D. M. Hillis. 1996. |
|---|
| 4585 | Phylogenetic inference. pp. 407-514 in <I>Molecular Systematics</I>, 2nd ed., |
|---|
| 4586 | ed. D. M. Hillis, C. Moritz, and B. K. Mable. Sinauer Associates, Sunderland, |
|---|
| 4587 | Massachusetts. |
|---|
| 4588 | <P> |
|---|
| 4589 | Templeton, A. R. 1983. Phylogenetic inference from restriction endonuclease |
|---|
| 4590 | cleavage site maps with particular reference to the evolution of humans and the |
|---|
| 4591 | apes. <I>Evolution</I> <B>37:</B> 221-244. |
|---|
| 4592 | <P> |
|---|
| 4593 | Thompson, E. A. 1975. <I>Human Evolutionary Trees.</I> Cambridge University |
|---|
| 4594 | Press, Cambridge. |
|---|
| 4595 | <P> |
|---|
| 4596 | Wu, C. F. J. 1986. Jackknife, bootstrap and other resampling plans in |
|---|
| 4597 | regression analysis. <I>Annals of Statistics</I> <B>14:</B> 1261-1295. |
|---|
| 4598 | <P> |
|---|
| 4599 | Yang, Z. 1993. Maximum-likelihood estimation of phylogeny from DNA sequences |
|---|
| 4600 | when substitution rates differ over sites. <I>Molecular Biology and |
|---|
| 4601 | Evolution</I> <B>10:</B> 1396-1401. |
|---|
| 4602 | <P> |
|---|
| 4603 | Yang, Z. 1994. Maximum likelihood phylogenetic estimation from DNA sequences |
|---|
| 4604 | with variable rates over sites: approximate methods. <I>Journal of Molecular |
|---|
| 4605 | Evolution</I> <B>39:</B> 306-314. |
|---|
| 4606 | <P> |
|---|
| 4607 | Yang, Z. 1995. A space-time process model for the evolution of DNA sequences. |
|---|
| 4608 | <I>Genetics</I> <B>139:</B> 993-1005. |
|---|
| 4609 | <P> |
|---|
| 4610 | <DIV ALIGN="CENTER"> |
|---|
| 4611 | <H2>Credits</H2></DIV> |
|---|
| 4612 | <P> |
|---|
| 4613 | Over the years various granting agencies have contributed to the |
|---|
| 4614 | support of the PHYLIP project (at first without knowing it). They are: |
|---|
| 4615 | <P> |
|---|
| 4616 | <TABLE CELLPADDING=3 BORDER="1"> |
|---|
| 4617 | <TR><TD ALIGN="LEFT">Years</TD> |
|---|
| 4618 | <TD ALIGN="LEFT">Agency</TD> |
|---|
| 4619 | <TD ALIGN="LEFT">Grant or Contract Number</TD> |
|---|
| 4620 | </TR> |
|---|
| 4621 | <TR><TD ALIGN="LEFT">1999-2002</TD> |
|---|
| 4622 | <TD ALIGN="LEFT">NSF</TD> |
|---|
| 4623 | <TD ALIGN="LEFT">BIR-9527687</TD> |
|---|
| 4624 | </TR> |
|---|
| 4625 | <TR><TD ALIGN="LEFT">1999-2002</TD> |
|---|
| 4626 | <TD ALIGN="LEFT">NIH NIGMS</TD> |
|---|
| 4627 | <TD ALIGN="LEFT">R01 GM51929-04</TD> |
|---|
| 4628 | </TR> |
|---|
| 4629 | <TR><TD ALIGN="LEFT">1999-2001</TD> |
|---|
| 4630 | <TD ALIGN="LEFT">NIH NIMH</TD> |
|---|
| 4631 | <TD ALIGN="LEFT">R01 HG01989-01</TD> |
|---|
| 4632 | </TR> |
|---|
| 4633 | <TR><TD ALIGN="LEFT">1995-1999</TD> |
|---|
| 4634 | <TD ALIGN="LEFT">NIH NIGMS</TD> |
|---|
| 4635 | <TD ALIGN="LEFT">R01 GM51929-01</TD> |
|---|
| 4636 | </TR> |
|---|
| 4637 | <TR><TD ALIGN="LEFT">1992-1995 </TD> |
|---|
| 4638 | <TD ALIGN="LEFT">National Science Foundation</TD> |
|---|
| 4639 | <TD ALIGN="LEFT">DEB-9207558</TD> |
|---|
| 4640 | </TR> |
|---|
| 4641 | <TR><TD ALIGN="LEFT">1992-1994</TD> |
|---|
| 4642 | <TD ALIGN="LEFT">NIH NIGMS Shannon Award</TD> |
|---|
| 4643 | <TD ALIGN="LEFT">2 R55 GM41716-04</TD> |
|---|
| 4644 | </TR> |
|---|
| 4645 | <TR><TD ALIGN="LEFT"> |
|---|
| 4646 | 1989-1992</TD> |
|---|
| 4647 | <TD ALIGN="LEFT">NIH NIGMS</TD> |
|---|
| 4648 | <TD ALIGN="LEFT">1 R01-GM41716-01</TD> |
|---|
| 4649 | </TR> |
|---|
| 4650 | <TR><TD ALIGN="LEFT"> |
|---|
| 4651 | 1990-1992</TD> |
|---|
| 4652 | <TD ALIGN="LEFT">National Science Foundation</TD> |
|---|
| 4653 | <TD ALIGN="LEFT">BSR-8918333</TD> |
|---|
| 4654 | </TR> |
|---|
| 4655 | <TR><TD ALIGN="LEFT"> |
|---|
| 4656 | 1987-1990</TD> |
|---|
| 4657 | <TD ALIGN="LEFT">National Science Foundation</TD> |
|---|
| 4658 | <TD ALIGN="LEFT">BSR-8614807</TD> |
|---|
| 4659 | </TR> |
|---|
| 4660 | <TR><TD ALIGN="LEFT">1979-1987</TD> |
|---|
| 4661 | <TD ALIGN="LEFT">U.S. Department of Energy</TD> |
|---|
| 4662 | <TD ALIGN="LEFT">DE-AM06-76RLO2225 TA DE-AT06-76EV71005</TD> |
|---|
| 4663 | </TR> |
|---|
| 4664 | </TABLE> |
|---|
| 4665 | <P> |
|---|
| 4666 | I am particularly grateful to program administrators William Moore, |
|---|
| 4667 | Irene Eckstrand, Peter Arzberger, and Conrad Istock, who have |
|---|
| 4668 | gone beyond the call of duty to make sure that PHYLIP continued. |
|---|
| 4669 | <P> |
|---|
| 4670 | Booby prizes for funding are awarded to: |
|---|
| 4671 | <UL><LI>The people at the U.S. Department of Energy who, in 1987, decided they |
|---|
| 4672 | were "not interested in phylogenies", |
|---|
| 4673 | <LI>The members of the Systematics Panel of NSF who twice (in 1989 and 1992) |
|---|
| 4674 | positively recommended that my applications <I>not</I> be funded. I am very |
|---|
| 4675 | grateful to program director William Moore for courageously overruling |
|---|
| 4676 | their decision the first time. The 1992 NSF Systematics Panel could claim |
|---|
| 4677 | no credit for PHYLIP whatsoever. |
|---|
| 4678 | <LI>The members of the 1992 Genetics Study Section of NIH who rated my |
|---|
| 4679 | proposal in the 53rd percentile (I don't know if that's 53rd from |
|---|
| 4680 | the top or the bottom, but does it matter?), thus denying it funding. I am, |
|---|
| 4681 | however, grateful to the NIGMS administrators, especially Irene Eckstrand, |
|---|
| 4682 | who supported giving me |
|---|
| 4683 | a "Shannon award" partially funding my work for a period in spite of this |
|---|
| 4684 | rating. |
|---|
| 4685 | </UL> |
|---|
| 4686 | <P> |
|---|
| 4687 | The original Camin-Sokal parsimony program and the polymorphism parsimony |
|---|
| 4688 | program were written by me in 1977 and 1978. They were Pascal versions of |
|---|
| 4689 | earlier FORTRAN programs I wrote in 1966 and 1967 using the same algorithm to |
|---|
| 4690 | infer phylogenies under the Camin-Sokal and polymorphism parsimony |
|---|
| 4691 | criteria. Harvey Motulsky worked for me as a programmer in 1971 and wrote |
|---|
| 4692 | FORTRAN programs to carry out the Camin-Sokal, Dollo, and polymorphism |
|---|
| 4693 | methods (he is known these days as the author of the scientific |
|---|
| 4694 | graphing package GraphPad). But most of the early work on PHYLIP other than my own was by Jerry |
|---|
| 4695 | Shurman and Mark Moehring. Jerry Shurman worked for me in the summers of |
|---|
| 4696 | 1979 and 1980, and Mark Moehring worked for me in the summers of 1980 and |
|---|
| 4697 | 1981. Both wrote original versions of many of the other programs, based on |
|---|
| 4698 | the original versions of my Camin-Sokal parsimony program and POLYM. These |
|---|
| 4699 | formed the basis of Version 1 of the Package, first distributed in October, |
|---|
| 4700 | 1980. |
|---|
| 4701 | <P> |
|---|
| 4702 | Version 2, released in the spring of 1982, involved a fairly complete rewrite |
|---|
| 4703 | by me of many of those programs. Hisashi Horino for |
|---|
| 4704 | version 3.3 reworked some parts of the programs CLIQUE and CONSENSE |
|---|
| 4705 | to make their output more comprehensible, and has added some code to the |
|---|
| 4706 | tree-drawing programs DRAWGRAM and DRAWTREE as well. He also worked on |
|---|
| 4707 | some of the Drawtree and Drawgram driver code. |
|---|
| 4708 | <P> |
|---|
| 4709 | My more recent part-time programmers Akiko Fuseki, Sean Lamont, |
|---|
| 4710 | Andrew Keeffe, Daniel Yek, Dan Fineman, Patrick Colacurcio, |
|---|
| 4711 | Mike Palczewski, and Doug Buxton gave |
|---|
| 4712 | me substantial help with the current release, and their excellent work is |
|---|
| 4713 | greatly appreciated. Akiko in particular did much of the hard work of adding |
|---|
| 4714 | new features and changing old ones in the 3.4 and 3.5 releases, |
|---|
| 4715 | centralized many of the C routines in support files, and is responsible for the |
|---|
| 4716 | new versions of DNAPARS and PARS. Andrew |
|---|
| 4717 | prepared the Macintosh version, wrote RETREE, added the ray-tracing |
|---|
| 4718 | and PICT code to the DRAW programs and has since done much other work. Sean |
|---|
| 4719 | was central to the conversion to |
|---|
| 4720 | C, and tested it extensively. My postdoctoral fellow |
|---|
| 4721 | Mary Kuhner and her associate Jon Yamato created NEIGHBOR, the |
|---|
| 4722 | neighbor-joining and UPGMA program, for the current release, for which I am |
|---|
| 4723 | also grateful (Naruya Saitou and Li Jin kindly encouraged us to use some of the |
|---|
| 4724 | code from their own implementation of this method). |
|---|
| 4725 | <P> |
|---|
| 4726 | I am very grateful to over 200 |
|---|
| 4727 | users for algorithmic suggestions, complaints about features (or lack of |
|---|
| 4728 | features), and information about the behavior of their operating systems |
|---|
| 4729 | and compilers. A list of some of their names will be found at the credits page |
|---|
| 4730 | on the PHYLIP web site. |
|---|
| 4731 | <P> |
|---|
| 4732 | A major contribution to this package has been made by others |
|---|
| 4733 | writing programs or parts of programs. Chris Meacham contributed the |
|---|
| 4734 | important program FACTOR, long demanded by users, and the even more |
|---|
| 4735 | important ones PLOTREE and PLOTGRAM. Important parts of the code in |
|---|
| 4736 | DRAWGRAM and DRAWTREE were taken over from those two programs. |
|---|
| 4737 | Kent Fiala wrote |
|---|
| 4738 | function "reroot" to do outgroup-rooting, which was an essential part of many |
|---|
| 4739 | programs in earlier versions. Someone at the Western Australia Institute of |
|---|
| 4740 | Technology suggested the name PHYLIP (by writing it the label on the |
|---|
| 4741 | outside of a magnetic tape), but they all seem to deny having done |
|---|
| 4742 | so (and I've lost the relevant letter). |
|---|
| 4743 | <P> |
|---|
| 4744 | The distribution of the package also owes much to Buz Wilson and Willem Ellis, |
|---|
| 4745 | who put a lot of effort into the early distributions of the PCDOS and |
|---|
| 4746 | Macintosh versions respectively. Christopher Meacham and Tom Duncan for three |
|---|
| 4747 | versions distributed a printed version of these documentation files (they are no |
|---|
| 4748 | longer able to do so), and I am |
|---|
| 4749 | very grateful to them for those efforts. William H.E. Day and F. James Rohlf |
|---|
| 4750 | have been very helpful in setting up the listserver news bulletin service which |
|---|
| 4751 | succeeded the PHYLIP newsletter for a time. |
|---|
| 4752 | <P> |
|---|
| 4753 | I also wish to thank the people who have made computer resources available to |
|---|
| 4754 | me, mostly in the loan of use of microcomputers. These include Jeremy |
|---|
| 4755 | Field, Clem Furlong, Rick Garber, Dan Jacobson, Rochelle Kochin, Monty Slatkin, |
|---|
| 4756 | Jim Archie, Jim Thomas, and George Gilchrist. |
|---|
| 4757 | <P> |
|---|
| 4758 | I should also note the computers used to develop this package: |
|---|
| 4759 | These include a CDC 6400, two DECSystem 1090s, my trusty old SOL-20, my |
|---|
| 4760 | old Osborne-1, a VAX 11/780, a VAX 8600, a MicroVAX I, a DECstation |
|---|
| 4761 | 3100, my old Toshiba 1100+, my |
|---|
| 4762 | DECstation 5000/200, a DECstation 5000/125, a Compudyne 486DX/33, a |
|---|
| 4763 | Trinity Genesis 386SX, a Zenith Z386, a Mac Classic, a DEC Alphastation 400 |
|---|
| 4764 | 4/233, a Pentium 120, a Pentium 200, a PowerMac 6100, and a Macintosh G3. |
|---|
| 4765 | (One of the reasons |
|---|
| 4766 | we have been successful in achieving compatibility between different computer |
|---|
| 4767 | systems is that I have had to run them myself under so many different operating |
|---|
| 4768 | systems and compilers). |
|---|
| 4769 | <P> |
|---|
| 4770 | <A NAME="otherprograms"><HR><P></A> |
|---|
| 4771 | <DIV ALIGN="CENTER"> |
|---|
| 4772 | <H2>Other Phylogeny Programs Available Elsewhere</H2></DIV> |
|---|
| 4773 | <P> |
|---|
| 4774 | A comprehensive list of phylogeny programs is maintained at the PHYLIP |
|---|
| 4775 | web site on the Phylogeny Programs pages: |
|---|
| 4776 | <P> |
|---|
| 4777 | <DIV ALIGN="CENTER"> |
|---|
| 4778 | <FONT SIZE=+2><A HREF="http://evolution.gs.washington.edu/phylip/software.html"> |
|---|
| 4779 | <TT>http://evolution.gs.washington.edu/phylip/software.html</TT></FONT></A></DIV> |
|---|
| 4780 | <P> |
|---|
| 4781 | Here we will simply mention some of the major general-purpose programs. For |
|---|
| 4782 | many more and much more, see those web pages. |
|---|
| 4783 | <P> |
|---|
| 4784 | <B>PAUP*</B> A comprehensive program with parsimony, likelihood, and |
|---|
| 4785 | distance matrix methods. It competes with PHYLIP to be responsible for |
|---|
| 4786 | the most trees published. Written by David Swofford and distributed by |
|---|
| 4787 | Sinauer Associates of Sunderland, Massachusetts. |
|---|
| 4788 | It is described in a web pages for |
|---|
| 4789 | <A HREF="http://www.sinauer.com/detail.php?id=8060">the Macintosh version,</A> |
|---|
| 4790 | <A HREF="http://www.sinauer.com/detail.php?id=8079">the Windows version,</A> |
|---|
| 4791 | and |
|---|
| 4792 | <A HREF="http://www.sinauer.com/detail.php?id=8044">the Unix/OpenVMS version.</A> |
|---|
| 4793 | Current prices are $100 for the Macintosh version, $85 for the |
|---|
| 4794 | Windows version, and $150 for Unix versions for many kinds of workstations. |
|---|
| 4795 | <P> |
|---|
| 4796 | <B>MacClade</B> An interactive Macintosh and PowerMac program to |
|---|
| 4797 | rearrange trees and watch the changes in the fit of the trees to |
|---|
| 4798 | data as judged by parsimony. MacClade has a great many features including |
|---|
| 4799 | a spreadsheet data editor and many different descriptive statistics |
|---|
| 4800 | for different kinds of data. It is particularly designed to export and |
|---|
| 4801 | import data to and from PAUP*. |
|---|
| 4802 | MacClade is available for $100 from Sinauer Associates, of Sunderland, |
|---|
| 4803 | Massachusetts. It is described in a web page at |
|---|
| 4804 | <A HREF="http://www.sinauer.com/detail.php?id=4707"> |
|---|
| 4805 | <TT>http://www.sinauer.com/detail.php?id=4707</TT></A>. |
|---|
| 4806 | MacClade is also described on its <A HREF="http://phylogeny.arizona.edu/macclade/macclade.html"> |
|---|
| 4807 | Web page</A>, at <CODE>http://phylogeny.arizona.edu/macclade/macclade.html</CODE |
|---|
| 4808 | >. |
|---|
| 4809 | <P> |
|---|
| 4810 | <B>MEGA</B> A Windows and DOS program by Sudhir Kumar of Arizona State University |
|---|
| 4811 | (written together with Koichiro Tamura and Masatoshi Nei while he was a |
|---|
| 4812 | student in Nei's lab at Pennsylvania |
|---|
| 4813 | State University). It can carry out parsimony and distance matrix methods |
|---|
| 4814 | for DNA sequence data. Version 2.1 for Windows |
|---|
| 4815 | can be downloaded from <A HREF="http://www.megasoftware.net"> |
|---|
| 4816 | the MEGA web site</A> |
|---|
| 4817 | at <TT>http://www.megasoftware.net</TT>. |
|---|
| 4818 | <P> |
|---|
| 4819 | <B>PAML</B> Ziheng Yang of the Department of Genetics and Biometry at |
|---|
| 4820 | University College, London has written this package of programs to |
|---|
| 4821 | carry out likelihood analysis of DNA and protein sequence data. PAML is |
|---|
| 4822 | particularly strong in the options for coping with variability of rates |
|---|
| 4823 | of evolution from site to site, though it is less able than some other |
|---|
| 4824 | packages to search effectively for the best tree. It is available as |
|---|
| 4825 | C source code and as PowerMac and Windows executables from its web site at |
|---|
| 4826 | <A HREF="http://abacus.gene.ucl.ac.uk/software/paml.html"> |
|---|
| 4827 | <TT>http://abacus.gene.ucl.ac.uk/software/paml.html</TT></A>. |
|---|
| 4828 | <P> |
|---|
| 4829 | <B>TREE-PUZZLE</B> This package by Korbinian Strimmer and Arndt von Haeseler |
|---|
| 4830 | was begun when they were at the Uviversität Munchen in Germany. |
|---|
| 4831 | TREE-PUZZLE can carry out likelihood |
|---|
| 4832 | methods for DNA and protein data, searching by the strategy of |
|---|
| 4833 | "quartet puzzling" which they invented. It can also compute distances. |
|---|
| 4834 | It superimposes trees estimated |
|---|
| 4835 | from many quartets of species. TREE-PUZZLE is available for Unix, Macintoshes, |
|---|
| 4836 | or Windows from their web site at |
|---|
| 4837 | <A HREF="http://www.tree-puzzle.de/"><TT>http://www.tree-puzzle.de/</TT></A>. |
|---|
| 4838 | <P> |
|---|
| 4839 | <B>DAMBE</B> A package written by Xuhua Xia, then of the |
|---|
| 4840 | Department of |
|---|
| 4841 | Ecology and Biodiversity of the University of Hong Kong. |
|---|
| 4842 | Its initials stand for Data Analysis in Molecular Biology and Evolution. |
|---|
| 4843 | DAMBE is a general-purpose package for DNA and protein sequence phylogenies. |
|---|
| 4844 | It can read and |
|---|
| 4845 | convert a number of file formats, and has many features for |
|---|
| 4846 | descriptive statistics, and can compute a number of commonly-used |
|---|
| 4847 | distance matrix measures and infer phylogenies by parsimony, distance, |
|---|
| 4848 | or likelihood methods, including bootstrapping and jackknifing. There are |
|---|
| 4849 | a number of kinds of statistical tests of trees available and it |
|---|
| 4850 | can also display phylogenies. DAMBE includes a copy of ClustalW as well; |
|---|
| 4851 | DAMBE consists of Windows95 executables. It is available from its |
|---|
| 4852 | web site at <A HREF="http://web.hku.hk/~xxia/software/software.htm"> |
|---|
| 4853 | <CODE>http://web.hku.hk/~xxia/software/software.htm</CODE></A>. |
|---|
| 4854 | Xia has now moved to the Department of Biology of the University of Ottawa, |
|---|
| 4855 | Canada, and I suspect the DAMBE web site will soon follow him there. |
|---|
| 4856 | <P> |
|---|
| 4857 | <B>MOLPHY</B> A package of programs for carrying out likelihood analysis |
|---|
| 4858 | of DNA and protein data, written by Jun Adachi and Masami Hasegawa of the |
|---|
| 4859 | Institute of Statistical Mathematics in Tokyo, Japan. The source code |
|---|
| 4860 | is available from them at |
|---|
| 4861 | <A HREF="http://www.ism.ac.jp/software/ismlib/softother.e.html"> |
|---|
| 4862 | the MOLPHY web site</A> at |
|---|
| 4863 | <CODE>http://www.ism.ac.jp/software/ismlib/softother.e.html</CODE>, and |
|---|
| 4864 | Windows executables are available from Russell Malmberg's web site at |
|---|
| 4865 | <A HREF="http://dogwood.botany.uga.edu/malmberg/software.html"> |
|---|
| 4866 | <TT>http://dogwood.botany.uga.edu/malmberg/software.html</TT></A>. |
|---|
| 4867 | <P> |
|---|
| 4868 | <B>Hennig86</B> A fast parsimony program by J. S. Farris of the |
|---|
| 4869 | Naturhistoriska Riksmuseet in Stockholm, Sweden for discrete characters |
|---|
| 4870 | data (it can handle DNA if its states are recoded to be digits). |
|---|
| 4871 | Reputed to be faster than PAUP*. |
|---|
| 4872 | The program is distributed as an executable and costs $50, plus $5 |
|---|
| 4873 | mailing costs ($10 outside of of the U.S.). The user's name should be stated, |
|---|
| 4874 | as copies are personalized as a copy-protection measure. It is |
|---|
| 4875 | distributed by Arnold Kluge, Amphibians and Reptiles, Museum of Zoology, |
|---|
| 4876 | University of |
|---|
| 4877 | Michigan, Ann Arbor, Michigan 48109-1079, U.S.A. (<TT>akluge@umich.edu</TT>) and |
|---|
| 4878 | by Diana Lipscomb at George Washington University (<TT>BIODL@gwuvm.gwu.edu</TT>). |
|---|
| 4879 | <P> |
|---|
| 4880 | <B>RnA</B> J. S. Farris's very fast program which uses parsimony |
|---|
| 4881 | to carry out jackknifing resampling of DNA sequence data. This would be |
|---|
| 4882 | nearly equivalent in properties to bootstrapping if the jackknifing were |
|---|
| 4883 | sampling random halves of the data, but Farris prefers to have each |
|---|
| 4884 | jackknife sample delete a fraction 1/<I>e</I> of the data, which will give |
|---|
| 4885 | most groups too much support (he would disagree with this statement). |
|---|
| 4886 | RnA is available from Arnold Kluge, Amphibians and Reptiles, Museum of Zoology, |
|---|
| 4887 | University of |
|---|
| 4888 | Michigan, Ann Arbor, Michigan 48109-1079, U.S.A. (<TT>akluge@umich.edu</TT>) |
|---|
| 4889 | and Diana Lipscomb |
|---|
| 4890 | at George Washington University (<TT>BIODL@gwuvm.gwu.edu</TT>) who may be |
|---|
| 4891 | contacted for details. The cost is about $30 US. |
|---|
| 4892 | <P> |
|---|
| 4893 | <B>NONA</B> Pablo Goloboff, of the Instituto Miguel Lillo in |
|---|
| 4894 | Tucuman, Argentina has written these very fast parsimony programs, capable |
|---|
| 4895 | of some relevant forms of weighted parsimony, which can handle either |
|---|
| 4896 | DNA sequence data or discrete characters. It is available as shareware |
|---|
| 4897 | from <A HREF="http://www.cladistics.com/aboutNona.htm"> |
|---|
| 4898 | <TT>http://www.cladistics.com/aboutNona.htm</TT></A> |
|---|
| 4899 | There is a 30 day free trial, after which |
|---|
| 4900 | NONA must be purchased separately by sending a check for $40.00 to |
|---|
| 4901 | either directly to the the author, or to: James M. Carpenter, Attn: NONA, |
|---|
| 4902 | Division of Invertebrate Zoology, American Museum of Natural History, |
|---|
| 4903 | Central Park West at 79th Street, New York, NY 10024. |
|---|
| 4904 | <P> |
|---|
| 4905 | <B>TNT</B> This program, by Pablo Goloboff, J. S. Farris, and Kevin Nixon, |
|---|
| 4906 | is for searching large data sets for most parsimonious trees. |
|---|
| 4907 | The authors are respectively at the Instituto Miguel Lillo in Tucuman, |
|---|
| 4908 | Argentina, the Naturhistoriska Riksmuseet in Stockholm, Sweden, and the |
|---|
| 4909 | Hortorium, Cornell University, Ithaca, New York. |
|---|
| 4910 | TNT is described |
|---|
| 4911 | as faster than other methods, though not faster than NONA for small to |
|---|
| 4912 | medium data sets. Its distribution status is somewhat uncertain. The site |
|---|
| 4913 | <A HREF="http://www.cladistics.com/aboutTNT.html"> |
|---|
| 4914 | <TT>http://www.cladistics.com/aboutTNT.html</TT></A> |
|---|
| 4915 | describes it as unavailable, |
|---|
| 4916 | while the web site <A HREF="http://www.cladistics.com/webtnt.html"> |
|---|
| 4917 | <TT>http://www.cladistics.com/webtnt.html</TT></A> makes a beta version |
|---|
| 4918 | available for download. The program downloaded is free but needs a password to |
|---|
| 4919 | function, which the user should obtain from Pablo Goloboff (see the latter |
|---|
| 4920 | web page for details). |
|---|
| 4921 | <P> |
|---|
| 4922 | These are only a few of the more than 194 different phylogeny packages that |
|---|
| 4923 | are now available (as of January, 2001 - the number keeps increasing). The |
|---|
| 4924 | others are described (and web links and ftp addresses provided) at my |
|---|
| 4925 | Phylogeny Programs web pages at the address given above. |
|---|
| 4926 | <P> |
|---|
| 4927 | <A NAME="helpme"><HR><P></A> |
|---|
| 4928 | <DIV ALIGN="CENTER"> |
|---|
| 4929 | <H2>How You Can Help Me</H2></DIV> |
|---|
| 4930 | <P> |
|---|
| 4931 | Simply let me know of any problems you have had adapting the |
|---|
| 4932 | programs to your computer. I can often make "transparent" changes that, by |
|---|
| 4933 | making the code avoid the wilder, woolier, and less standard parts of |
|---|
| 4934 | C, not only help others who have your machine but even improve the |
|---|
| 4935 | chance of the programs functioning on new machines. I would like fairly |
|---|
| 4936 | detailed information on what gave trouble, on what operating system, |
|---|
| 4937 | machine, and (if relevant) compiler, and what had to be done to make the |
|---|
| 4938 | programs work. I am sometimes able to do some over-the-telephone |
|---|
| 4939 | trouble-shooting, particularly |
|---|
| 4940 | if I don't have to pay for the call, but electronic mail is a the best |
|---|
| 4941 | way for me to be asked about problems, as you can include your |
|---|
| 4942 | input and output files so I can see what is going on (please do <EM>not</EM> |
|---|
| 4943 | send them as Attachments, but as part of the body of a message). I'd really |
|---|
| 4944 | like these programs to be |
|---|
| 4945 | able to run with only routine changes on <I>absolutely everything</I>, down to |
|---|
| 4946 | and possibly including the Amana Touchmatic Radarange Microwave Oven |
|---|
| 4947 | which was an Intel 8080 system (in fact, early versions of this package did |
|---|
| 4948 | run successfully on Intel 8080 systems running the CP/M operating system). |
|---|
| 4949 | A PalmPilot version is contemplated too. |
|---|
| 4950 | <P> |
|---|
| 4951 | I would also like to know timings of programs from the package, when |
|---|
| 4952 | run on the three test input files provided above, for various computer and |
|---|
| 4953 | compiler combinations, so that I can provide this information in the |
|---|
| 4954 | section on speeds of this document. |
|---|
| 4955 | <P> |
|---|
| 4956 | For the phylogeny plotting programs DRAWGRAM and DRAWTREE, |
|---|
| 4957 | I am particularly interested in knowing what has to be done |
|---|
| 4958 | to adapt them for other graphic file formats. |
|---|
| 4959 | <P> |
|---|
| 4960 | You can also be helpful to PHYLIP users in your part of the world by |
|---|
| 4961 | helping them get the latest version of PHYLIP from our web site |
|---|
| 4962 | and by helping them with any |
|---|
| 4963 | problems they may have in getting PHYLIP working on their data. |
|---|
| 4964 | <P> |
|---|
| 4965 | Your help is appreciated. I am always happy to hear suggestions |
|---|
| 4966 | for features and programs that ought to be incorporated in the package, |
|---|
| 4967 | but please do not be upset if I turn out to have already considered the |
|---|
| 4968 | particular possibility you suggest and decided against it. |
|---|
| 4969 | <P> |
|---|
| 4970 | <A NAME="trouble"><HR><P></A> |
|---|
| 4971 | <DIV ALIGN="CENTER"> |
|---|
| 4972 | <H2>In Case of Trouble</H2></DIV> |
|---|
| 4973 | <P> |
|---|
| 4974 | <I>Read The (documentation) Files Meticulously</I> ("RTFM"). If that doesn't solve the |
|---|
| 4975 | problem, please check the Frequently Asked Questions web page at the |
|---|
| 4976 | PHYLIP web site: |
|---|
| 4977 | <P> |
|---|
| 4978 | <FONT SIZE=+2> |
|---|
| 4979 | <TT><A HREF="http://evolution.gs.washington.edu/phylip/faq.html"> |
|---|
| 4980 | http://evolution.gs.washington.edu/phylip/faq.html</TT></A></FONT> |
|---|
| 4981 | <P> |
|---|
| 4982 | and the PHYLIP Bugs web page at that site: |
|---|
| 4983 | <P> |
|---|
| 4984 | <FONT SIZE=+2> |
|---|
| 4985 | <TT><A HREF="http://evolution.gs.washington.edu/phylip/bugs.html"> |
|---|
| 4986 | http://evolution.gs.washington.edu/phylip/bugs.html</TT></A></FONT> |
|---|
| 4987 | <P> |
|---|
| 4988 | If none of these answers your question, get in touch with me. My electronic mail address |
|---|
| 4989 | is given below. If you do ask about a problem, please specify the program |
|---|
| 4990 | name, version of the package, computer operating system, and |
|---|
| 4991 | send me your data file so I can test the problem. Do <I>not</I> |
|---|
| 4992 | send your data file as an e-mail Attachment but instead |
|---|
| 4993 | as the body of a message. I read the e-mail on a Unix system, which makes |
|---|
| 4994 | it impossible to read some formats of attachments without |
|---|
| 4995 | running around to other machines and moving the files there. This |
|---|
| 4996 | is one of my least favorite activities, so please do not use attachments. |
|---|
| 4997 | Also it will help if you |
|---|
| 4998 | have the relevant output and documentation files so that you |
|---|
| 4999 | can refer to them in any correspondence. I can also be reached by telephone |
|---|
| 5000 | by calling me in my office: |
|---|
| 5001 | +1-(206)-543-0150, or at home: +1-(206)-526-9057 (how's <I>that</I> for user |
|---|
| 5002 | support!). If I cannot be reached at either place, a message can be left at |
|---|
| 5003 | the office of |
|---|
| 5004 | the Department of Genome Sciences, (206)-221-7377 but I prefer strongly that I not |
|---|
| 5005 | call you, as in any phone consultation the least you can do is pay the phone |
|---|
| 5006 | bill. Better yet, use electronic mail. |
|---|
| 5007 | <P> |
|---|
| 5008 | Particularly if you are in a part of the world distant from me, you may also |
|---|
| 5009 | want to try to get in touch with other users of PHYLIP nearby. I can also, |
|---|
| 5010 | if requested, provide a list of nearby users. |
|---|
| 5011 | <P> |
|---|
| 5012 | <DIV ALIGN="RIGHT"> |
|---|
| 5013 | <TABLE><TR><TD ALIGN=LEFT> |
|---|
| 5014 | Joe Felsenstein<BR> |
|---|
| 5015 | Department of Genome Sciences<BR> |
|---|
| 5016 | University of Washington<BR> |
|---|
| 5017 | Box 357730<BR> |
|---|
| 5018 | Seattle, Washington 98195-7730, U.S.A. |
|---|
| 5019 | </TD></TR></TABLE> |
|---|
| 5020 | </DIV> |
|---|
| 5021 | <P> |
|---|
| 5022 | Electronic mail addresses: <TT>joe@gs.washington.edu</TT> |
|---|
| 5023 | <BR><HR> |
|---|
| 5024 | </BODY> |
|---|
| 5025 | </HTML> |
|---|