1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> |
---|
2 | <HTML> |
---|
3 | <HEAD> |
---|
4 | <TITLE>main</TITLE> |
---|
5 | <META NAME="description" CONTENT="main"> |
---|
6 | <META NAME="keywords" CONTENT="PHYLIP", "main", "documentation"> |
---|
7 | <META NAME="resource-type" CONTENT="document"> |
---|
8 | <META NAME="distribution" CONTENT="global"> |
---|
9 | <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> |
---|
10 | </HEAD> |
---|
11 | <BODY BGCOLOR="#ccffff"> |
---|
12 | <P> |
---|
13 | <DIV ALIGN="CENTER"> |
---|
14 | <H1>PHYLIP</H1> |
---|
15 | <H2>Phylogeny Inference Package</H2> |
---|
16 | <P> |
---|
17 | <IMG SRC="phylip.gif" ALT="PHYLIP Logo"> |
---|
18 | <P> |
---|
19 | <H3>Version 3.6(alpha3)</H3> |
---|
20 | <P> |
---|
21 | <H3>July, 2002</H3> |
---|
22 | <P> |
---|
23 | <H2>by Joseph Felsenstein</H2> |
---|
24 | <P> |
---|
25 | <BR> |
---|
26 | <TABLE> |
---|
27 | <TR><TD> |
---|
28 | <FONT SIZE="+2"> |
---|
29 | Department of Genome Sciences<BR> |
---|
30 | University of Washington<BR> |
---|
31 | Box 357730<BR> |
---|
32 | Seattle, WA 98195-7730<BR> |
---|
33 | USA |
---|
34 | </FONT> |
---|
35 | </TD></TR> |
---|
36 | </TABLE> |
---|
37 | <H2>E-mail address: <TT>joe@gs.washington.edu</TT></H2> |
---|
38 | </DIV> |
---|
39 | <P> |
---|
40 | <DIV ALIGN="CENTER"> |
---|
41 | <A NAME="contents"><HR><P></A> |
---|
42 | <H2>Contents of this document</H2></DIV> |
---|
43 | <P> |
---|
44 | <BR> |
---|
45 | <A HREF="#contents">Contents of this document |
---|
46 | <BR> |
---|
47 | <A HREF="#description">A Brief Description of the Programs</A> |
---|
48 | <BR> |
---|
49 | <A HREF="#copyright">Copyright Notice for PHYLIP</A> |
---|
50 | <BR> |
---|
51 | <A HREF="#documentation">The Documentation Files and How to Read Them</A> |
---|
52 | <BR> |
---|
53 | <A HREF="#programs">What The Programs Do</A> |
---|
54 | <BR> |
---|
55 | <A HREF="#running">Running the Programs</A> |
---|
56 | <BR> |
---|
57 | A word about input files |
---|
58 | <BR> |
---|
59 | Running the programs on a Windows machine |
---|
60 | <BR> |
---|
61 | Running the programs on a Macintosh |
---|
62 | <BR> |
---|
63 | Running the programs on a Unix system |
---|
64 | <BR> |
---|
65 | Running the programs in MSDOS |
---|
66 | <BR> |
---|
67 | Running the programs in background or under control of a command file |
---|
68 | <BR> |
---|
69 | <A HREF="#inputfiles">Preparing Input Files</A> |
---|
70 | <BR> |
---|
71 | Input and output files |
---|
72 | <BR> |
---|
73 | Data file format |
---|
74 | <BR> |
---|
75 | <A HREF="#menu">The Menu</A> |
---|
76 | <BR> |
---|
77 | <A HREF="#outputfile">The Output File</A> |
---|
78 | <BR> |
---|
79 | <A HREF="#treefile">The Tree File</A> |
---|
80 | <BR> |
---|
81 | <A HREF="#options">The Options and How To Invoke Them</A> |
---|
82 | <BR> |
---|
83 | Common options in the menu |
---|
84 | <BR> |
---|
85 | The <TT>U</TT> (User tree) option |
---|
86 | <BR> |
---|
87 | The <TT>G</TT> (Global) option |
---|
88 | <BR> |
---|
89 | The <TT>J</TT> (Jumble) option |
---|
90 | <BR> |
---|
91 | The <TT>O</TT> (Outgroup) option |
---|
92 | <BR> |
---|
93 | The <TT>T</TT> (Threshold) option |
---|
94 | <BR> |
---|
95 | The <TT>M</TT> (Multiple data sets) option |
---|
96 | <BR> |
---|
97 | The <TT>W</TT> (Weights) option |
---|
98 | <BR> |
---|
99 | The option to write out the trees into a tree file |
---|
100 | <BR> |
---|
101 | The (<TT>0</TT>) terminal type option |
---|
102 | <BR> |
---|
103 | <A HREF="#algorithm">The Algorithm for Constructing Trees</A> |
---|
104 | <BR> |
---|
105 | Local Rearrangements |
---|
106 | <BR> |
---|
107 | Global Rearrangements |
---|
108 | <BR> |
---|
109 | Multiple Jumbles |
---|
110 | <BR> |
---|
111 | Saving multiple tied trees |
---|
112 | <BR> |
---|
113 | Strategy for Finding the Best Tree |
---|
114 | <BR> |
---|
115 | <A HREF="#warning">A Warning on Interpreting Results</A> |
---|
116 | <BR> |
---|
117 | <A HREF="#speed">Relative Speed of Different Programs and Machines</A> |
---|
118 | <BR> |
---|
119 | Relative speed of the different programs |
---|
120 | <BR> |
---|
121 | Speed with different numbers of species |
---|
122 | <BR> |
---|
123 | Relative speed of different machines |
---|
124 | <BR> |
---|
125 | <A HREF="#comments">General Comments on Adapting the Package to Different Computer Systems</A> |
---|
126 | <BR> |
---|
127 | <A HREF="#compiling">Compiling the programs</A> |
---|
128 | <BR> |
---|
129 | Unix and Linux |
---|
130 | <BR> |
---|
131 | Macintosh PowerMacs |
---|
132 | <BR> |
---|
133 | Compiling with Metrowerks Codewarrior |
---|
134 | <BR> |
---|
135 | On Windows systems |
---|
136 | <BR> |
---|
137 | Compiling with Microsoft Visual C++ |
---|
138 | <BR> |
---|
139 | Compiling with Borland C++ |
---|
140 | <BR> |
---|
141 | Compiling with Metrowerks Codewarrior for Windows |
---|
142 | <BR> |
---|
143 | Compiling with Cygnus Gnu C++ |
---|
144 | <BR> |
---|
145 | VMS VAX systems |
---|
146 | <BR> |
---|
147 | Parallel computers |
---|
148 | <BR> |
---|
149 | Other computer systems |
---|
150 | <BR> |
---|
151 | <A HREF="#FAQ">Frequently Asked Questions</A> |
---|
152 | <BR> |
---|
153 | How to make it do various things |
---|
154 | <BR> |
---|
155 | Background information needed: |
---|
156 | <BR> |
---|
157 | Questions about distribution and citation: |
---|
158 | <BR> |
---|
159 | Questions about documentation |
---|
160 | <BR> |
---|
161 | Additional Frequently Asked Questions, or: "Why didn't it occur to you to ... |
---|
162 | <BR> |
---|
163 | (Fortunately) obsolete questions |
---|
164 | <BR> |
---|
165 | <A HREF="#newfeatures">New Features in This Version</A> |
---|
166 | <BR> |
---|
167 | <A HREF="#future">Coming Attractions, Future Plans</A> |
---|
168 | <BR> |
---|
169 | <A HREF="#endorsements">Endorsements</A> |
---|
170 | <BR> |
---|
171 | From the pages of <I>Cladistics</I> |
---|
172 | <BR> |
---|
173 | ... and in the pages of other journals: |
---|
174 | <BR> |
---|
175 | <A HREF="#references">References for the Documentation Files</A> |
---|
176 | <BR> |
---|
177 | <A HREF="#credits">Credits</A> |
---|
178 | <BR> |
---|
179 | <A HREF="#otherprograms">Other Phylogeny Programs Available Elsewhere</A> |
---|
180 | <BR> |
---|
181 | PAUP* |
---|
182 | <BR> |
---|
183 | MacClade |
---|
184 | <BR> |
---|
185 | MEGA |
---|
186 | <BR> |
---|
187 | MOLPHY |
---|
188 | <BR> |
---|
189 | PAML |
---|
190 | <BR> |
---|
191 | TREE-PUZZLE |
---|
192 | <BR> |
---|
193 | DAMBE |
---|
194 | <BR> |
---|
195 | Hennig86 |
---|
196 | <BR> |
---|
197 | RnA |
---|
198 | <BR> |
---|
199 | NONA |
---|
200 | <BR> |
---|
201 | TNT |
---|
202 | <BR> |
---|
203 | <A HREF="#helpme">How You Can Help Me</A> |
---|
204 | <BR> |
---|
205 | <A HREF="#trouble">In Case of Trouble</A> |
---|
206 | <P> |
---|
207 | <A NAME="description"><HR><P></A> |
---|
208 | <DIV ALIGN="CENTER"> |
---|
209 | <H2>A Brief Description of the Programs</H2></DIV> |
---|
210 | <P> |
---|
211 | <TT>PHYLIP</TT>, the Phylogeny Inference Package, is a package of programs for |
---|
212 | inferring phylogenies (evolutionary trees). It has been distributed since |
---|
213 | 1980, and has over 10,000 registered users, making it the most widely |
---|
214 | distributed package of phylogeny programs. It is available free, from |
---|
215 | its web site: |
---|
216 | <P> |
---|
217 | <DIV ALIGN="CENTER"> |
---|
218 | <FONT SIZE=+2><A HREF="http://evolution.gs.washington.edu/phylip.html"> |
---|
219 | <TT>http://evolution.gs.washington.edu/phylip.html</TT></A></FONT> |
---|
220 | |
---|
221 | </DIV> |
---|
222 | <P> |
---|
223 | <TT>PHYLIP</TT> is available as source code in C, and also as executables for |
---|
224 | some common computer systems. It can infer phylogenies by parsimony, |
---|
225 | compatibility, distance matrix methods, and likelihood. It can also |
---|
226 | compute consensus trees, compute distances between trees, draw trees, |
---|
227 | resample data sets by bootstrapping or jackknifing, edit trees, and |
---|
228 | compute distance matrices. It can handle data that are nucleotide |
---|
229 | sequences, protein sequences, gene frequencies, restriction sites, |
---|
230 | restriction fragments, distances, discrete characters, and continuous |
---|
231 | characters. |
---|
232 | <P> |
---|
233 | <BR> |
---|
234 | <A NAME="copyright"><HR><P></A> |
---|
235 | <DIV ALIGN=CENTER> |
---|
236 | <TABLE BORDER=4 WIDTH=80%><TR><TD ALIGN=LEFT> |
---|
237 | <DIV ALIGN="CENTER"> |
---|
238 | <H2>Copyright Notice for PHYLIP</H2></DIV> |
---|
239 | <P> |
---|
240 | The following copyright notice is intended to cover all source code, all |
---|
241 | documentation, and all executable programs of the PHYLIP package. |
---|
242 | <P> |
---|
243 | © Copyright 1980-2002. University of Washington and Joseph Felsenstein. All |
---|
244 | rights reserved. Permission is granted to reproduce, perform, and modify |
---|
245 | these programs and documentation files. Permission is granted to distribute |
---|
246 | or provide access to these |
---|
247 | programs provided that this copyright notice is not removed, the programs are |
---|
248 | not integrated with or called by any product or service that generates |
---|
249 | revenue, and that your distribution of these materials program are free. |
---|
250 | Any modified |
---|
251 | versions of these materials that are distributed or accessible shall indicate |
---|
252 | that they are based on these program. Institutions of higher education are |
---|
253 | granted permission to distribute this material to their students and staff |
---|
254 | for a fee to recover distribution costs. Permission requests for any other |
---|
255 | distribution of this program should be directed to <TT>license@u.washington.edu</TT>. |
---|
256 | <BR> |
---|
257 | </TD></TR></TABLE></DIV> |
---|
258 | |
---|
259 | <BR> |
---|
260 | <A NAME="documentation"><HR><P></A> |
---|
261 | <DIV ALIGN="CENTER"> |
---|
262 | <H2>The Documentation Files and How to Read Them</H2></DIV> |
---|
263 | <P> |
---|
264 | <TT>PHYLIP</TT> comes with an extensive set of documentation files. These |
---|
265 | include the main documentation file (this one), which you should read |
---|
266 | fairly completely. In addition there are files for groups of programs, |
---|
267 | including ones for the <A HREF="sequence.html">molecular sequence</A> |
---|
268 | programs, the <A HREF="distance.html">distance matrix</A> |
---|
269 | programs, the |
---|
270 | <A HREF="contchar.html">gene frequency and continuous characters</A> |
---|
271 | programs, the <A HREF="discrete.html">discrete characters</A> programs, |
---|
272 | and the <A HREF="draw.html">tree drawing</A> programs. Finally, |
---|
273 | each program has its own documentation file. References for the |
---|
274 | documentation files are all gathered together in this main documentation |
---|
275 | file. A good strategy is to: |
---|
276 | <OL> |
---|
277 | <LI>Read this main documentation file. |
---|
278 | <LI>Tentatively decide which programs are of interest to you. |
---|
279 | <LI>Read the documentation files for the groups of programs that |
---|
280 | contain those. |
---|
281 | <LI>Read the documentation files for those individual programs. |
---|
282 | </OL> |
---|
283 | <P> |
---|
284 | <A NAME="programs"><HR><P></A> |
---|
285 | <DIV ALIGN="CENTER"> |
---|
286 | <H2>What The Programs Do</H2></DIV> |
---|
287 | <P> |
---|
288 | Here is a short description of each of the programs. For more detailed |
---|
289 | discussion you should definitely read the documentation file for the |
---|
290 | individual program and the documentation file for the group of programs |
---|
291 | it is in. In this list the name of each program is a link which will |
---|
292 | take you to the documentation file for that program. Note that there is no |
---|
293 | program in the PHYLIP package called PHYLIP. |
---|
294 | <DL> |
---|
295 | <DT><STRONG><A HREF="protpars.html">PROTPARS</A></STRONG> |
---|
296 | <DD>Estimates phylogenies from protein sequences (input using the |
---|
297 | standard one-letter code for amino acids) using the parsimony method, in |
---|
298 | a variant which counts only those nucleotide changes that change the amino |
---|
299 | acid, on the assumption that silent changes are more easily accomplished. |
---|
300 | <DT><STRONG><A HREF="dnapars.html">DNAPARS</A></STRONG> |
---|
301 | <DD>Estimates phylogenies by the parsimony method using nucleic acid |
---|
302 | sequences. Allows use the full IUB ambiguity codes, and estimates |
---|
303 | ancestral nucleotide states. Gaps treated as a fifth nucleotide state. |
---|
304 | Can use 0/1 weights, reconstruct ancestral states, and infer branch |
---|
305 | lengths. |
---|
306 | <DT><STRONG><A HREF="dnamove.html">DNAMOVE</A></STRONG> |
---|
307 | <DD>Interactive construction of phylogenies from nucleic acid |
---|
308 | sequences, with their evaluation by parsimony and compatibility and the |
---|
309 | display of reconstructed ancestral bases. This can be used to find |
---|
310 | parsimony or compatibility estimates by hand. |
---|
311 | <DT><STRONG><A HREF="dnapenny.html">DNAPENNY</A></STRONG> |
---|
312 | <DD>Finds all most parsimonious phylogenies for nucleic acid |
---|
313 | sequences by branch-and-bound search. This may not be practical (depending |
---|
314 | on the data) for more than 15 species or so. |
---|
315 | <DT><STRONG><A HREF="dnacomp.html">DNACOMP</A></STRONG> |
---|
316 | <DD>Estimates phylogenies from nucleic acid sequence data using |
---|
317 | the compatibility criterion, which searches for the largest number of sites |
---|
318 | which could have all states (nucleotides) uniquely evolved on the same |
---|
319 | tree. Compatibility is particularly appropriate when sites vary greatly in |
---|
320 | their rates of evolution, but we do not know in advance which are the less |
---|
321 | reliable ones. |
---|
322 | <DT><STRONG><A HREF="dnainvar.html">DNAINVAR</A></STRONG> |
---|
323 | <DD>For nucleic acid sequence data on four species, computes |
---|
324 | Lake's and Cavender's phylogenetic invariants, which test alternative tree |
---|
325 | topologies. The program also tabulates the frequencies of occurrence of the |
---|
326 | different nucleotide patterns. Lake's invariants are the method which he |
---|
327 | calls "evolutionary parsimony". |
---|
328 | <DT><STRONG><A HREF="dnaml.html">DNAML</A></STRONG> |
---|
329 | <DD>Estimates phylogenies from nucleotide sequences by maximum |
---|
330 | likelihood. The model employed allows for unequal expected frequencies of |
---|
331 | the four nucleotides, for unequal rates of transitions and transversions, |
---|
332 | and for different (prespecified) rates of change in different categories of |
---|
333 | sites, with the program inferring which sites have which rates. It also |
---|
334 | allows different rates of change at known sites. |
---|
335 | <DT><STRONG><A HREF="dnamlk.html">DNAMLK</A></STRONG> |
---|
336 | <DD>Same as DNAML but assumes a molecular clock. The use of the |
---|
337 | two programs together permits a likelihood ratio test of the |
---|
338 | molecular clock hypothesis to be made. |
---|
339 | <DT><STRONG><A HREF="proml.html">PROML</A></STRONG> |
---|
340 | <DD>Estimates phylogenies from protein amino acid sequences by maximum |
---|
341 | likelihood. The PAM or JTTF models can be employed. The program |
---|
342 | can allow for different (prespecified) rates of change in different |
---|
343 | categories of amino acid positions, with the program inferring which |
---|
344 | posiitons have which rates. It also allows different rates of change |
---|
345 | at known sites. |
---|
346 | <DT><STRONG><A HREF="promlk.html">PROMLK</A></STRONG> |
---|
347 | <DD>Same as PROML but assumes a molecular clock. The use of the |
---|
348 | two programs together permits a likelihood ratio test of the |
---|
349 | molecular clock hypothesis to be made. |
---|
350 | <DT><STRONG><A HREF="dnadist.html">DNADIST</A></STRONG> |
---|
351 | <DD>Computes four different distances between species from nucleic |
---|
352 | acid sequences. The distances can then be used in the distance matrix |
---|
353 | programs. The distances are the Jukes-Cantor formula, one based on Kimura's |
---|
354 | 2-parameter method, Jin and Nei's distance which allows for rate variation |
---|
355 | from site to site, and a maximum likelihood method using the model employed |
---|
356 | in DNAML. The latter method of computing distances can be very slow. |
---|
357 | <DT><STRONG><A HREF="protdist.html">PROTDIST</A></STRONG> |
---|
358 | <DD>Computes a distance measure for protein sequences, using |
---|
359 | maximum likelihood estimates based on the Dayhoff PAM matrix, Kimura's 1983 |
---|
360 | approximation to it, or a model based on the genetic code plus a |
---|
361 | constraint on changing to a different category of amino acid. Rate |
---|
362 | variation from site to site is also allowed. The |
---|
363 | distances can be used in the distance matrix programs. |
---|
364 | <DT><STRONG><A HREF="restdist.html">RESTDIST</A></STRONG> |
---|
365 | <DD>Distances calculated from restriction sites data or |
---|
366 | restriction fragments data. The restriction sites option is the one to |
---|
367 | use to also make distances for RAPDs or AFLPs. |
---|
368 | <DT><STRONG><A HREF="restml.html">RESTML</A></STRONG> |
---|
369 | <DD>Estimation of phylogenies by maximum likelihood using |
---|
370 | restriction sites data (not restriction fragments but presence/absence of |
---|
371 | individual sites). It employs the Jukes-Cantor symmetrical model of |
---|
372 | nucleotide change, which does not allow for differences of rate between |
---|
373 | transitions and transversions. This program is <I>very</I> slow. |
---|
374 | <DT><STRONG><A HREF="seqboot.html">SEQBOOT</A></STRONG> |
---|
375 | <DD>Reads in a data set, and produces multiple data sets from |
---|
376 | it by bootstrap resampling. Since most programs in the current version of |
---|
377 | the package allow processing of multiple data sets, this can be used |
---|
378 | together with the consensus tree program CONSENSE to do bootstrap (or |
---|
379 | delete-half-jackknife) analyses with most of the methods in this package. |
---|
380 | This program also allows the Archie/Faith technique of permutation of |
---|
381 | species within characters. It can also rewrite a data set to convert |
---|
382 | it from between the PHYLIP Interleaved and Sequential forms, and into |
---|
383 | a preliminary version of a new XML sequence alignment format |
---|
384 | which is under development. |
---|
385 | <DT><STRONG><A HREF="fitch.html">FITCH</A></STRONG> |
---|
386 | <DD>Estimates phylogenies from distance matrix data under the |
---|
387 | "additive tree model" according to which the distances are expected to |
---|
388 | equal the sums of branch lengths between the species. Uses the |
---|
389 | Fitch-Margoliash criterion and some related least squares criteria. Does |
---|
390 | not assume an evolutionary clock. This program will be useful with |
---|
391 | distances computed from molecular sequences, restriction sites or fragments |
---|
392 | distances, with DNA hybridization measurements, and with genetic distances |
---|
393 | computed from gene frequencies. |
---|
394 | <DT><STRONG><A HREF="kitsch.html">KITSCH</A></STRONG> |
---|
395 | <DD>Estimates phylogenies from distance matrix data under the |
---|
396 | "ultrametric" model which is the same as the additive tree model except |
---|
397 | that an evolutionary clock is assumed. The Fitch-Margoliash criterion and |
---|
398 | other least squares criteria are assumed. This program will be useful with |
---|
399 | distances computed from molecular sequences, restriction sites or |
---|
400 | fragments distances, with distances from DNA hybridization measurements, |
---|
401 | and with genetic distances computed from gene frequencies. |
---|
402 | <DT><STRONG><A HREF="neighbor.html">NEIGHBOR</A></STRONG> |
---|
403 | <DD>An implementation by Mary Kuhner and John Yamato of Saitou and |
---|
404 | Nei's "Neighbor Joining Method," and of the UPGMA (Average Linkage |
---|
405 | clustering) method. Neighbor Joining is a distance matrix method producing |
---|
406 | an unrooted tree without the assumption of a clock. UPGMA does assume a |
---|
407 | clock. The branch lengths are not optimized by the least squares criterion |
---|
408 | but the methods are very fast and thus can handle much larger data sets. |
---|
409 | <DT><STRONG><A HREF="contml.html">CONTML</A></STRONG> |
---|
410 | <DD>Estimates phylogenies from gene frequency data by maximum |
---|
411 | likelihood under a model in which all divergence is due to genetic drift in |
---|
412 | the absence of new mutations. Does not assume a molecular clock. An |
---|
413 | alternative method of analyzing this data is to compute Nei's genetic |
---|
414 | distance and use one of the distance matrix programs. |
---|
415 | This program can also do maximum likelihoodn analysis of continuous |
---|
416 | charactersn that evolve by a Brownian Motion model, but it assumes that |
---|
417 | the characters evolve at equal rates and in an uncorrelated fashion, so |
---|
418 | that it does not take into account the usual correlations of characters. |
---|
419 | <DT><STRONG><A HREF="gendist.html">GENDIST</A></STRONG> |
---|
420 | <DD>Computes one of three different genetic distance formulas |
---|
421 | from gene frequency data. The formulas are Nei's genetic distance, the |
---|
422 | Cavalli-Sforza chord measure, and the genetic distance of Reynolds et. al. |
---|
423 | The former is appropriate for data in which new mutations occur in an |
---|
424 | infinite isoalleles neutral mutation model, the latter two for a model |
---|
425 | without mutation and with pure genetic drift. The distances are written to |
---|
426 | a file in a format appropriate for input to the distance matrix programs. |
---|
427 | <DT><STRONG><A HREF="contrast.html">CONTRAST</A></STRONG> |
---|
428 | <DD>Reads a tree from a tree file, and a data set with continuous |
---|
429 | characters data, and produces the independent contrasts for those |
---|
430 | characters, for use in any multivariate statistics package. Will also |
---|
431 | produce covariances, regressions and correlations between characters for |
---|
432 | those contrasts. Can also correct for within-species sampling variation |
---|
433 | when individual phenotypes are available within a population. |
---|
434 | <DT><STRONG><A HREF="pars.html">PARS</A></STRONG> |
---|
435 | <DD>Multistate discrete-characters parsimony method. Up to 8 states |
---|
436 | (as well as "<TT>?</TT>") are allowed. Cannot do Camin-Sokal or Dollo Parsimony. |
---|
437 | Can reconstruct ancestral states, use character weights, and infer branch |
---|
438 | lengths. |
---|
439 | <DT><STRONG><A HREF="mix.html">MIX</A></STRONG> |
---|
440 | <DD>Estimates phylogenies by some parsimony methods for discrete |
---|
441 | character data with two states (0 and 1). Allows use of the |
---|
442 | Wagner parsimony method, the Camin-Sokal parsimony method, or arbitrary |
---|
443 | mixtures of these. Also reconstructs ancestral states and allows weighting |
---|
444 | of characters (does not infer branch lengths). |
---|
445 | <DT><STRONG><A HREF="move.html">MOVE</A></STRONG> |
---|
446 | <DD>Interactive construction of phylogenies from discrete character |
---|
447 | data with two states (0 and 1). Evaluates parsimony and compatibility |
---|
448 | criteria for those phylogenies and displays reconstructed states throughout |
---|
449 | the tree. This can be used to find parsimony or compatibility estimates by |
---|
450 | hand. |
---|
451 | <DT><STRONG><A HREF="penny.html">PENNY</A></STRONG> |
---|
452 | <DD>Finds all most parsimonious phylogenies for discrete-character |
---|
453 | data with two states, for the Wagner, Camin-Sokal, and mixed parsimony |
---|
454 | criteria using the branch-and-bound method of exact search. May be |
---|
455 | impractical (depending on the data) for more than 10-11 species. |
---|
456 | <DT><STRONG><A HREF="dollop.html">DOLLOP</A></STRONG> |
---|
457 | <DD>Estimates phylogenies by the Dollo or polymorphism parsimony |
---|
458 | criteria for discrete character data with two states (0 and 1). Also |
---|
459 | reconstructs ancestral states and allows weighting of characters. Dollo |
---|
460 | parsimony is particularly appropriate for restriction sites data; with |
---|
461 | ancestor states specified as unknown it may be appropriate for restriction |
---|
462 | fragments data. |
---|
463 | <DT><STRONG><A HREF="dolmove.html">DOLMOVE</A></STRONG> |
---|
464 | <DD>Interactive construction of phylogenies from discrete |
---|
465 | character data with two states (0 and 1) using the Dollo or polymorphism |
---|
466 | parsimony criteria. Evaluates parsimony and compatibility criteria for |
---|
467 | those phylogenies and displays reconstructed states throughout the tree. |
---|
468 | This can be used to find parsimony or compatibility estimates by hand. |
---|
469 | <DT><STRONG><A HREF="dolpenny.html">DOLPENNY</A></STRONG> |
---|
470 | <DD>Finds all most parsimonious phylogenies for |
---|
471 | discrete-character data with two states, for the Dollo or polymorphism |
---|
472 | parsimony criteria using the branch-and-bound method of exact search. May |
---|
473 | be impractical (depending on the data) for more than 10-11 species. |
---|
474 | <DT><STRONG><A HREF="clique.html">CLIQUE</A></STRONG> |
---|
475 | <DD>Finds the largest clique of mutually compatible characters, and |
---|
476 | the phylogeny which they recommend, for discrete character data with two |
---|
477 | states. The largest clique (or all cliques within a given size range of |
---|
478 | the largest one) are found by a very fast branch and bound search method. |
---|
479 | The method does not allow for missing data. For such cases the <TT>T</TT> |
---|
480 | (Threshold) option of PARS or MIX may be a useful alternative. |
---|
481 | Compatibility methods are particular useful when some characters are of |
---|
482 | poor quality and the rest of good quality, but when it is not known in |
---|
483 | advance which ones are which. |
---|
484 | <DT><STRONG><A HREF="factor.html">FACTOR</A></STRONG> |
---|
485 | <DD>Takes discrete multistate data with character state trees and |
---|
486 | produces the corresponding data set with two states (0 and 1). Written by |
---|
487 | Christopher Meacham. This program was formerly used to accomodate |
---|
488 | multistate characters in MIX, but this is less necessary now that PARS is |
---|
489 | available. |
---|
490 | <DT><STRONG><A HREF="drawgram.html">DRAWGRAM</A></STRONG> |
---|
491 | <DD>Plots rooted phylogenies, cladograms, and phenograms in a |
---|
492 | wide variety of user-controllable formats. The program is interactive and |
---|
493 | allows previewing of the tree on PC or Macintosh graphics screens, |
---|
494 | and Tektronix or Digital graphics terminals. Final output can be |
---|
495 | to a file formatted for one of the drawing programs, on |
---|
496 | a laser printer (such as Postscript or PCL-compatible printers), |
---|
497 | on graphics screens or terminals, on pen plotters (Hewlett-Packard or |
---|
498 | Houston Instruments) or on dot matrix printers capable of graphics |
---|
499 | (Epson, Okidata, Imagewriter, or Toshiba). |
---|
500 | <DT><STRONG><A HREF="drawtree.html">DRAWTREE</A></STRONG> |
---|
501 | <DD>Similar to DRAWGRAM but plots unrooted phylogenies. |
---|
502 | <DT><STRONG><A HREF="treedist.html">TREEDIST</A></STRONG> |
---|
503 | <DD>Computes the Robinson-Foulds symmetric difference distance |
---|
504 | between trees, which allows for differences in tree topology (but does not |
---|
505 | use branch lengths). |
---|
506 | <DT><STRONG><A HREF="consense.html">CONSENSE</A></STRONG> |
---|
507 | <DD>Computes consensus trees by the majority-rule consensus tree |
---|
508 | method, which also allows one to easily find the strict consensus tree. |
---|
509 | Is not able to compute the Adams consensus tree. Trees are input in a tree |
---|
510 | file in standard nested-parenthesis notation, which is produced by many of |
---|
511 | the tree estimation programs in the package. This program can be used as |
---|
512 | the final step in doing bootstrap analyses for many of the methods in the |
---|
513 | package. |
---|
514 | <DT><STRONG><A HREF="retree.html">RETREE</A></STRONG> |
---|
515 | <DD>Reads in a tree (with branch lengths if necessary) and allows |
---|
516 | you to reroot the tree, to flip branches, to change species names and |
---|
517 | branch lengths, and then write the result out. Can be used to convert |
---|
518 | between rooted and unrooted trees, and to write the tree into a |
---|
519 | preliminary version of a new XML tree file format which is under |
---|
520 | development. |
---|
521 | </DL> |
---|
522 | <P> |
---|
523 | <A NAME="running"><HR><P></A> |
---|
524 | <DIV ALIGN="CENTER"> |
---|
525 | <H2>Running the Programs</H2></DIV> |
---|
526 | <P> |
---|
527 | This section assumes that you have obtained PHYLIP as compiled executables |
---|
528 | (for Windows, Macintosh, or DOS), or have obtained the source code |
---|
529 | and compiled it yourself (for Linux, Unix, or OpenVMS). For machines for |
---|
530 | which compiled executables are available, there will usually be no need for |
---|
531 | you to have a compiler or compile the programs yourself. This section |
---|
532 | describes how to run the programs. Later in this document we will |
---|
533 | discuss how to download and install PHYLIP (in case you are somehow |
---|
534 | reading this without yet having done that). Normally you will only read |
---|
535 | this document after downloading and installing PHYLIP. |
---|
536 | <P> |
---|
537 | <H3>A word about input files.</H3> |
---|
538 | <P> |
---|
539 | For all of these types of machines, it is |
---|
540 | important to have the input files for the programs (typically data files) |
---|
541 | prepared in advance. They can be prepared in any editor, but it is important |
---|
542 | that they be saved in Text Only ("flat ASCII") format, not in the format that |
---|
543 | word processors such as Microsoft Word want to write. It is up to you to read |
---|
544 | the PHYLIP documentation files which describe the files formats that are |
---|
545 | needed. There is a partial description in the next section of this document. |
---|
546 | The input files can also be obtained by running a program that |
---|
547 | produces output files in PHYLIP format (some of these programs do, and so do |
---|
548 | programs by others such as sequence alignment programs such as ClustalW and |
---|
549 | sequence format conversion programs such as Readseq). There is <I>not</I> any |
---|
550 | input file editor available in any program in PHYLIP (you should <I>not</I> |
---|
551 | simply start running one of the programs and then expect to click a mouse |
---|
552 | somewhere to start creating a data file). |
---|
553 | <P> |
---|
554 | When they start running, the programs look first for input files with |
---|
555 | particular names (such as <TT>infile</TT>, <TT>treefile</TT>, <TT>intree</TT>, or <TT>fontfile</TT>). |
---|
556 | Exactly which file names they look for varies a bit from program to program, |
---|
557 | and you should read the documentation file for the particular program to |
---|
558 | find out. If you have files with those names the programs will use them |
---|
559 | and not ask you for the file name. If they do not find files of those |
---|
560 | names, the programs will say that they cannot find a file of that name, and |
---|
561 | ask you to type in the file name. |
---|
562 | For example, if DnaML looks |
---|
563 | for the file <TT>infile</TT> and does not find one of that name, |
---|
564 | it prints the message: |
---|
565 | <P> |
---|
566 | <TABLE><TR><TD BGCOLOR=white> |
---|
567 | <TT>dnaml: can't find input file "infile"<BR> |
---|
568 | Please enter a new file name></TT> |
---|
569 | </TD></TR></TABLE> |
---|
570 | <P><I>This does not mean that an error |
---|
571 | has occurred.</I> All you need to do is to type in the name of the file. |
---|
572 | <P> |
---|
573 | The program looks for the input files in the same directory that the |
---|
574 | program is in (a directory is the same thing as a "folder"). In Windows, Linux, Unix, or MSDOS, if you are asked for the |
---|
575 | file name you can type in the path to the file, as part of the name (thus, |
---|
576 | if the file is in the directory above the current one, you can type in |
---|
577 | a file name such as <TT>../myfile.dna</TT>). If you do not know what a |
---|
578 | "directory" is, or what "above" means, then you are a member of the new |
---|
579 | generation who just clicks the mouse and assumes that a list of file names |
---|
580 | will magically appear. (Typically members of this generation have no idea |
---|
581 | where the files are on their system, and accumulate enormous amounts of |
---|
582 | unnecessary clutter in their file systems.) In this case you should ask |
---|
583 | someone to explain directories to you. |
---|
584 | <P> |
---|
585 | <H3>Running the programs on a Windows machine.</H3> |
---|
586 | <P> |
---|
587 | Double-click on the icon for |
---|
588 | the program. A window should open with a menu in it. Further dialog with the |
---|
589 | program occurs |
---|
590 | by typing on the keyboard in response to what you see in the window. The |
---|
591 | programs can be interrupted either by typing Control-C (which means to |
---|
592 | press down on the <TT>Ctrl</TT> key while typing the letter <TT>C</TT>), or by using |
---|
593 | the mouse to open the <TT>File</TT> menu in the upper-left corner of the program's |
---|
594 | window area and then select <TT>Quit</TT>. Other than this, most PHYLIP programs |
---|
595 | make no use of the mouse. The tree-drawing programs Drawtree and Drawgram |
---|
596 | do allow use of the mouse to select some options. |
---|
597 | <P> |
---|
598 | <H3>Running the programs on a Macintosh.</H3> |
---|
599 | <P> |
---|
600 | Double-click on the icon for |
---|
601 | the program. A window should open. Further dialog with the program occurs |
---|
602 | by typing on the keyboard in response to what you see in the window. The |
---|
603 | programs can be interrupted by using |
---|
604 | the mouse to open the <TT>File</TT> menu in the upper-left corner of the program's |
---|
605 | window area and then select <TT>Quit</TT>. Alternatively, you can use the |
---|
606 | Command-Q key combination. |
---|
607 | <P> |
---|
608 | When you use Quit, the program will ask you whether you want to save |
---|
609 | a file whose name is the program name (often followed by <TT>.out</TT> -- for |
---|
610 | example, if you are using DNAML it will ask you if you want to save file |
---|
611 | <TT>Dnaml.out</TT>. This file is simply a record of everything that |
---|
612 | displayed on the program window, and you usually will not want to save it. |
---|
613 | Pressing the <TT>Enter</TT> key or selecting the Do Not Save button with |
---|
614 | the mouse will keep this from being saved. |
---|
615 | <P> |
---|
616 | If you encounter memory limitations on a Macintosh, and determine that |
---|
617 | this is not due to a problem with the format of the input file, as it |
---|
618 | often will be, you may be able to solve it by raising the limits of the |
---|
619 | stack and heap sizes of the program. To do this click on the program |
---|
620 | and then select <TT>Get Info</TT> from the Finder <TT>File</TT> menu. |
---|
621 | This will open a window which can be made to show the memory limits |
---|
622 | of the program. These can be changed by selecting them and typing in |
---|
623 | larger numbers. This may relieve nagging memory problems. If it does |
---|
624 | not, consult your local documentation and suspect problems with your |
---|
625 | input file format. |
---|
626 | <P> |
---|
627 | <H3>Running the programs on a Unix system.</H3> |
---|
628 | <P> |
---|
629 | Type the name of the program |
---|
630 | in lower-case letters (such as <TT>dnaml</TT>). To interrupt the program while |
---|
631 | it is running, type Control-C (which means to press down on the <TT>Ctrl</TT> key |
---|
632 | while typing the letter <TT>C</TT>). |
---|
633 | <P> |
---|
634 | <H3>Running the programs in MSDOS.</H3> |
---|
635 | <P> |
---|
636 | Type the name of the program |
---|
637 | in lower-case letters (such as <TT>dnaml</TT>). To interrupt the program while |
---|
638 | it is running, type Control-C (which means to press down on the <TT>Ctrl</TT> key |
---|
639 | while typing the letter <TT>C</TT>). |
---|
640 | <P> |
---|
641 | <H3>Running the programs in background or under control of a command file</H3> |
---|
642 | <P> |
---|
643 | In running the programs, you may sometimes want to put them in background |
---|
644 | so you can proceed with other work. On systems with a windowing environment |
---|
645 | they can be put in their own window, and commands like the Unix and Linux |
---|
646 | <TT>nice</TT> command used to make |
---|
647 | them have lower priority so that they do not interfere with interactive |
---|
648 | applications in other windows. This part of the discussion will |
---|
649 | assume either a Windows system or a Unix or Linux system. I will |
---|
650 | note when the commands work on one of these systems but not the other. |
---|
651 | Running jobs in background on Macintosh systems is an arcane art into whose |
---|
652 | mysteries I have not been initiated (or perhaps no one has been initiated). |
---|
653 | <P> |
---|
654 | If there is no windowing |
---|
655 | environment, on a Unix or Linux system you will want to use an |
---|
656 | ampersand (<TT>&</TT>) after the command file name when invoking it to put the |
---|
657 | job in the background. You will have to put all the responses to the |
---|
658 | interactive menu of the program into a file and tell the background job |
---|
659 | to take its input from that file. |
---|
660 | On Windows systems there is no <TT>&</TT> or <TT>nice</TT> command |
---|
661 | but input and output redirection and command files work fine, with the sole |
---|
662 | difference that the a file of commands must have a name ending in |
---|
663 | <TT>.BAT</TT>, such as <TT>FOOFILE.BAT</TT>. |
---|
664 | <P> |
---|
665 | For example: suppose you want to run DNAPARS in a background, taking its |
---|
666 | input data from a file called <TT>sequences.dat</TT>, putting its interactive |
---|
667 | output to file called <TT>screenout</TT>, and using a file called <TT>input</TT> as |
---|
668 | the place to store the interactive input. The file <TT>input</TT> need only |
---|
669 | contain two lines: |
---|
670 | <P> |
---|
671 | <TABLE><TR><TD bgcolor=white> |
---|
672 | <PRE> |
---|
673 | sequences.dat |
---|
674 | Y |
---|
675 | </PRE> |
---|
676 | </TD></TR></TABLE> |
---|
677 | <P> |
---|
678 | which is what you would have typed to run the program interactively, in |
---|
679 | response to the program's request for an input file name if it did not |
---|
680 | find a file named <TT>infile</TT>, in in response the the menu. |
---|
681 | <P> |
---|
682 | To run the program in background, in Unix or Linux you would simply give the command: |
---|
683 | <P> |
---|
684 | <TT>dnapars < input > screenout & |
---|
685 | </TT> |
---|
686 | <P> |
---|
687 | These run the program with input responses coming from <TT>input</TT> and |
---|
688 | interactive output being put into file <TT>screenout</TT>. The usual output |
---|
689 | file and tree file will also be created by this run (keep that in mind |
---|
690 | as if you run any other PHYLIP program from the same directory while |
---|
691 | this one is running in background you may overwrite the output file from |
---|
692 | one program with that from the other!). |
---|
693 | <P> |
---|
694 | If you wanted to give the program lower priority, so that it would |
---|
695 | not interfere with other work, and you have Berkeley Unix type job control |
---|
696 | facilities in your Unix or Linux (and you usually do), you can use the |
---|
697 | <TT>nice</TT> command: |
---|
698 | <P> |
---|
699 | <TT>nice +10 dnapars < input > screenout & |
---|
700 | </TT> |
---|
701 | <P> |
---|
702 | which lowers the priority of the run. To also time the run and put the |
---|
703 | timing at the end of <TT>screenout</TT>, you can do this: |
---|
704 | <P> |
---|
705 | <TT>nice +10 ( time dnapars < input ) >& screenout & |
---|
706 | </TT> |
---|
707 | <P> |
---|
708 | which I will not attempt to explain. |
---|
709 | <P> |
---|
710 | On Unix or Linux systems |
---|
711 | you may also want to explore putting the interactive output into the |
---|
712 | null file <TT>/dev/null</TT> so as to not be bothered with it (but then you |
---|
713 | cannot look at it to see why something went wrong). If you have problems |
---|
714 | with creating output files that are too large, you may want to |
---|
715 | explore carefully the turning off of options in the programs you run. |
---|
716 | <P> |
---|
717 | If you are doing several runs in one, as for example when you do a |
---|
718 | bootstrap analysis using SEQBOOT, DNAPARS (say), and CONSENSE, you |
---|
719 | can use an editor to create a "command file" with these commands: |
---|
720 | <P> |
---|
721 | <TABLE><TR><TD bgcolor=white> |
---|
722 | <PRE> |
---|
723 | seqboot < input1 > screenout |
---|
724 | mv outfile infile |
---|
725 | dnapars < input2 >> screenout |
---|
726 | mv outtree intree |
---|
727 | consense < input3 >> screenout |
---|
728 | </PRE> |
---|
729 | </TD></TR></TABLE> |
---|
730 | <P> |
---|
731 | This is the Unix or Linux version -- in the MSDOS version, the renaming |
---|
732 | of files and the appending of output to the file <TT>screenout</TT> is |
---|
733 | handled differently. |
---|
734 | <P> |
---|
735 | On Unix or Linux the command file might be named something like |
---|
736 | <TT>foofile</TT>, and on Windows systems might be named <TT>foofile.bat</TT>. |
---|
737 | <P> |
---|
738 | On Unix or Linux the command file must be given |
---|
739 | execute permission by using the command <TT>chmod +x foofile</TT> followed |
---|
740 | by the command <TT>rehash</TT>. The job that <TT>foofile</TT> describes |
---|
741 | can be run in background on Unix or Linux by giving the command |
---|
742 | <P> |
---|
743 | <TT>foofile &</TT> |
---|
744 | <P> |
---|
745 | On Windows systems it can be run by |
---|
746 | clicking on the icon of the command file. Its icon will have a little gear |
---|
747 | symbol. |
---|
748 | <P> |
---|
749 | Note that you must also have the interactive input |
---|
750 | commands for SEQBOOT (including the random number seed), DNAPARS, and |
---|
751 | CONSENSE in the separate files <TT>input1</TT>, <TT>input2</TT>, and <TT>input3</TT>. |
---|
752 | Note that when PHYLIP programs attempt to open a new output file (such as |
---|
753 | <TT>outfile</TT>, <TT>outtree</TT>, or <TT>plotfile</TT>, if they see |
---|
754 | a file of that name already in existence they will ask you if you want to |
---|
755 | overwrite it, and offer alternatives including writing to another file, |
---|
756 | appending information to that file, or quitting the program without writing to |
---|
757 | the file. This means that in writing batch files it is important to know |
---|
758 | whether there will be a prompt of this sort. You must know in advance |
---|
759 | whether the file will exist. You may want to put in your batch file a |
---|
760 | command that tests for the existence of a pre-existing output file and |
---|
761 | if so, removes it. You might even want to put in a command that creates a |
---|
762 | file of that name, so that you can be sure it is there! Either way, |
---|
763 | you will then know whether to put into your file of keyboard responses the |
---|
764 | proper response to the inquiry about overwriting that output file. |
---|
765 | <P> |
---|
766 | <A NAME="inputfiles"><HR><P></A> |
---|
767 | <DIV ALIGN="CENTER"> |
---|
768 | <H2>Preparing Input Files</H2></DIV> |
---|
769 | <P> |
---|
770 | The input files for PHYLIP programs must be prepared separately - there is |
---|
771 | no data editor within PHYLIP. You can use a word processor (or text |
---|
772 | editor) to prepare them yourself, or you can use a program that produces |
---|
773 | a PHYLIP-format output. Sequence alignment programs such as ClustalW |
---|
774 | commonly have an option to produce PHYLIP files as output, and some |
---|
775 | other phylogeny programs, such as MacClade and TreeView, are capable of |
---|
776 | producing a PHYLIP-format file. |
---|
777 | <P> |
---|
778 | The format of the input files is discussed below, and you should also |
---|
779 | read the other PHYLIP documentation relevant to the particular type of |
---|
780 | data that you are using, and the particular programs you want to run, as |
---|
781 | there will be more details there. |
---|
782 | <P> |
---|
783 | It is very important that the input files be in "Text Only" or "flat |
---|
784 | ASCII" format. This means that they contain only printable ASCII/ISO |
---|
785 | characters, and not any unprintable characters. Many word processors such |
---|
786 | as Microsoft Word save their files in a format that contains unprintable |
---|
787 | characters, unless you tell them not to. For Microsoft Word you can |
---|
788 | select <TT>Save As</TT> from its <TT>File</TT> menu, and choose <TT>Text Only</TT> |
---|
789 | as the file format. This can also be done in WordPad utility in Windows . |
---|
790 | Other word processors will have equivalent |
---|
791 | options. Text editors such as the <TT>vi</TT> and <TT>emacs</TT> editors on |
---|
792 | Unix and Linux, Windows Notepad, the <TT>SimpleText</TT> editor in MacOS, or the <TT>pico</TT> |
---|
793 | editor that comes with the <TT>pine</TT> |
---|
794 | mailer program, produce their files in Text Only format and should not |
---|
795 | cause any trouble. |
---|
796 | <P> |
---|
797 | <H3>Input and output files</H3> |
---|
798 | <P> |
---|
799 | For most of the PHYLIP programs, information comes from a series of |
---|
800 | input files, and ends up in a series of output files: |
---|
801 | <P> |
---|
802 | <DIV ALIGN="CENTER"> |
---|
803 | <TABLE> |
---|
804 | <TR><TD> |
---|
805 | <PRE> |
---|
806 | ------------------- |
---|
807 | | | |
---|
808 | infile ---------> | | |
---|
809 | | | |
---|
810 | intree ---------> | | -----------> outfile |
---|
811 | | | |
---|
812 | weights --------> | program | -----------> outtree |
---|
813 | | | |
---|
814 | categories -----> | | -----------> plotfile |
---|
815 | | | |
---|
816 | fonftile -------> | | |
---|
817 | | | |
---|
818 | ------------------- |
---|
819 | </PRE> |
---|
820 | </TD></TR> |
---|
821 | </TABLE> |
---|
822 | </DIV><P></P> |
---|
823 | |
---|
824 | <P> |
---|
825 | The programs interact with the user by presenting a menu. Aside from the |
---|
826 | user's choices from the menu, they read |
---|
827 | all other input from files. These files have default names. The program |
---|
828 | will try to find a file of that name - if it does not, it will ask the |
---|
829 | user to supply the name of that file. |
---|
830 | Input data such as DNA sequences |
---|
831 | comes from a file whose default name is <TT>infile</TT>. If the user |
---|
832 | supplies a tree, this is in a file whose default name is <TT>intree</TT>. |
---|
833 | Values of weights for the characters are in <TT>weights</TT>, and the |
---|
834 | tree plotting program need some digitized fonts which are supplied in |
---|
835 | <TT>fontfile</TT> (all these are default names). |
---|
836 | <P> |
---|
837 | For example, if DnaML looks |
---|
838 | for the file <TT>infile</TT> and does not find one of that name, |
---|
839 | it prints the message: |
---|
840 | <P> |
---|
841 | <TABLE><TR><TD BGCOLOR=white> |
---|
842 | <TT>dnaml: can't find input file "infile"<BR> |
---|
843 | Please enter a new file name></TT> |
---|
844 | </TD></TR></TABLE> |
---|
845 | <P> |
---|
846 | This simply means that it wants you to type in the name of the |
---|
847 | input file. |
---|
848 | <P> |
---|
849 | Two programs in the package works differently according to an older ("Old |
---|
850 | Style") system. These are <TT>CLIQUE</TT> and <TT>FACTOR</TT>. The information on ancestral |
---|
851 | states is supplied in the data file whose |
---|
852 | default name is <TT>infile</TT>, and for <TT>FACTOR</TT> the Factors |
---|
853 | information is written into the output file rather than being put into a |
---|
854 | separate file called <TT>factors</TT>. See the <A HREF="clique.html">documentation |
---|
855 | page for <TT>CLIQUE</TT></A> |
---|
856 | and the <A HREF="factor.html">documentation page for FACTOR</A> |
---|
857 | for information on these differences. By the time of the final 3.6 |
---|
858 | release we hope to have these last Old Style programs converted to the new |
---|
859 | system. |
---|
860 | <P> |
---|
861 | <H3>Data file format</H3> |
---|
862 | <P> |
---|
863 | I have tried to adhere to a rather stereotyped input and output |
---|
864 | format. For the parsimony, compatibility and maximum likelihood programs, |
---|
865 | excluding the distance matrix methods, the simplest version of the input |
---|
866 | data file looks something like this: |
---|
867 | <P> |
---|
868 | <TABLE><TR><TD BGCOLOR=white> |
---|
869 | <PRE> |
---|
870 | 6 13 |
---|
871 | Archaeopt CGATGCTTAC CGC |
---|
872 | HesperorniCGTTACTCGT TGT |
---|
873 | BaluchitheTAATGTTAAT TGT |
---|
874 | B. virginiTAATGTTCGT TGT |
---|
875 | BrontosaurCAAAACCCAT CAT |
---|
876 | B.subtilisGGCAGCCAAT CAC |
---|
877 | </TD></TR></TABLE> |
---|
878 | </PRE> |
---|
879 | <P> |
---|
880 | The first line of the input file contains the number of species and the |
---|
881 | number of characters (in this case sites). These are in free format, separated |
---|
882 | by blanks. The information for each species follows, starting with a |
---|
883 | ten-character species name (which can include blanks and some punctuation |
---|
884 | marks), and continuing with the characters for that species. The name should |
---|
885 | be on the same line as the first character of the data for that species. |
---|
886 | (I will use the term "species" for the tips of the trees, recognizing |
---|
887 | that in some cases these will actually be populations or individual gene |
---|
888 | sequences). |
---|
889 | <P> |
---|
890 | The name should be ten characters in length, filled out to the full |
---|
891 | ten characters by blanks if shorter. Any printable ASCII/ISO character is |
---|
892 | allowed in the name, except for parentheses ("<TT>(</TT>" and "<TT>)</TT>"), square |
---|
893 | brackets ("<TT>[</TT>" and "<TT>]</TT>"), colon ("<TT>:</TT>"), semicolon ("<TT>;</TT>") and comma ("<TT>,</TT>"). |
---|
894 | If you forget to extend the names to ten characters in length by blanks, |
---|
895 | the program will get out of synchronization with the contents of the data |
---|
896 | file, and an error message will result. |
---|
897 | <P> |
---|
898 | In the |
---|
899 | discrete-character programs, DNA sequence programs and protein sequence |
---|
900 | programs the characters are each a |
---|
901 | single letter or digit, sometimes separated by blanks. In |
---|
902 | the continuous-characters programs they are real numbers with decimal points, |
---|
903 | separated by blanks: |
---|
904 | <P> |
---|
905 | <TT>Latimeria 2.03 3.457 100.2 0.0 -3.7</TT> |
---|
906 | <P> |
---|
907 | The conventions about continuing the data beyond one line per species are |
---|
908 | different between the molecular sequence programs and the others. The |
---|
909 | molecular sequence programs can take the data in "aligned" or "interleaved" |
---|
910 | format, in which we first have some lines giving the first part of each of the |
---|
911 | sequences, then some |
---|
912 | lines giving the next part of each, and so on. Thus the sequences might |
---|
913 | look like this: |
---|
914 | <P> |
---|
915 | <TABLE><TR><TD BGCOLOR=white> |
---|
916 | <PRE> |
---|
917 | 6 39 |
---|
918 | Archaeopt CGATGCTTAC CGCCGATGCT |
---|
919 | HesperorniCGTTACTCGT TGTCGTTACT |
---|
920 | BaluchitheTAATGTTAAT TGTTAATGTT |
---|
921 | B. virginiTAATGTTCGT TGTTAATGTT |
---|
922 | BrontosaurCAAAACCCAT CATCAAAACC |
---|
923 | B.subtilisGGCAGCCAAT CACGGCAGCC |
---|
924 | |
---|
925 | TACCGCCGAT GCTTACCGC |
---|
926 | CGTTGTCGTT ACTCGTTGT |
---|
927 | AATTGTTAAT GTTAATTGT |
---|
928 | CGTTGTTAAT GTTCGTTGT |
---|
929 | CATCATCAAA ACCCATCAT |
---|
930 | AATCACGGCA GCCAATCAC |
---|
931 | </PRE> |
---|
932 | </TD></TR></TABLE> |
---|
933 | <P> |
---|
934 | Note that in these sequences we have a blank every |
---|
935 | ten sites to make them easier to read: any such blanks are allowed. The blank |
---|
936 | line which separates the two groups of lines (the ones |
---|
937 | containing sites 1-20 and ones containing sites 21-39) may or may not |
---|
938 | be present, but if it is, it should be a line of zero length and not contain |
---|
939 | any extra blank |
---|
940 | characters (this is because of a limitation of the current versions |
---|
941 | of the programs). It is important that the number of sites in each |
---|
942 | group be the same for all species (i.e., it will not be possible to run |
---|
943 | the programs successfully if the first species line contains 20 bases, but |
---|
944 | the first line for the second species contains 21 bases). |
---|
945 | <P> |
---|
946 | Alternatively, an option can be selected in the menu to take the data in |
---|
947 | "sequential" format, with all of the data for the first species, |
---|
948 | then all of the characters for the next species, and so on. This is also |
---|
949 | the way that the discrete characters programs and the gene frequencies |
---|
950 | and quantitative characters programs want to read the data. They do not |
---|
951 | allow the interleaved format. |
---|
952 | <P> |
---|
953 | In the sequential format, the character data can run on to a new line at any |
---|
954 | time (except in the middle of a species name or, in the case of continuous |
---|
955 | character and distance matrix programs where you cannot go to a new line in |
---|
956 | the middle of a real number). Thus it is legal to have: |
---|
957 | <P> |
---|
958 | <TT>Archaeopt 001100 |
---|
959 | <BR> |
---|
960 | 1101 |
---|
961 | <BR> |
---|
962 | </TT> |
---|
963 | <P> |
---|
964 | or even: |
---|
965 | <P> |
---|
966 | <TT>Archaeopt |
---|
967 | <BR> |
---|
968 | 0011001101 |
---|
969 | <BR> |
---|
970 | </TT> |
---|
971 | |
---|
972 | <P> |
---|
973 | though note that the <I>full</I> ten characters of the species name <I>must</I> |
---|
974 | then be present: in the above case there must be a blank after the "t". In all |
---|
975 | cases it is possible to put internal blanks between any of the character |
---|
976 | values, so that |
---|
977 | <P> |
---|
978 | <TT>Archaeopt 0011001101 0111011100 |
---|
979 | </TT> |
---|
980 | <P> |
---|
981 | is allowed. |
---|
982 | <P> |
---|
983 | Note that you can convert molecular sequence data between the interleaved |
---|
984 | and the sequential data formats by using the Rewrite option of the D |
---|
985 | menu item in SEQBOOT. |
---|
986 | <P> |
---|
987 | If you make an error in the format of the input file, the programs can |
---|
988 | sometimes detect that |
---|
989 | they have been fed an illegal character or illegal numerical value and issue |
---|
990 | an error message such as <TT>BAD CHARACTER STATE:</TT>, often printing out the |
---|
991 | bad value, and sometimes the number of the species and character in which it |
---|
992 | occurred. The program will then stop shortly after. One of the things which |
---|
993 | can lead to a bad value is the omission of something earlier in the file, or |
---|
994 | the insertion of something superfluous, which cause the reading of the file to |
---|
995 | get out of synchronization. The program then starts reading things it |
---|
996 | didn't expect, and concludes that they are in error. So if you see this error |
---|
997 | message, you may also want |
---|
998 | to look for the earlier problem that may have led to the program becoming |
---|
999 | confused about what it is reading. |
---|
1000 | <P> |
---|
1001 | Some options are described below, but you should also read the documentation |
---|
1002 | for the groups of the programs and for the individual programs. |
---|
1003 | <BR> |
---|
1004 | <P> |
---|
1005 | <A NAME="menu"><HR><P></A> |
---|
1006 | <H3>The Menu</H3> |
---|
1007 | <P> |
---|
1008 | The menu is straightforward. It typically looks like this (this one is for |
---|
1009 | DNAPARS): |
---|
1010 | <P> |
---|
1011 | <TABLE><TR><TD BGCOLOR=white> |
---|
1012 | <PRE> |
---|
1013 | DNA parsimony algorithm, version 3.6 |
---|
1014 | |
---|
1015 | Setting for this run: |
---|
1016 | U Search for best tree? Yes |
---|
1017 | S Search option? More thorough search |
---|
1018 | V Number of trees to save? 100 |
---|
1019 | J Randomize input order of sequences? No. Use input order |
---|
1020 | O Outgroup root? No, use as outgroup species 1 |
---|
1021 | T Use Threshold parsimony? No, use ordinary parsimony |
---|
1022 | N Use Transversion parsimony? No, count all steps |
---|
1023 | W Sites weighted? No |
---|
1024 | M Analyze multiple data sets? No |
---|
1025 | I Input sequences interleaved? Yes |
---|
1026 | 0 Terminal type (IBM PC, ANSI, none)? (none) |
---|
1027 | 1 Print out the data at start of run No |
---|
1028 | 2 Print indications of progress of run Yes |
---|
1029 | 3 Print out tree Yes |
---|
1030 | 4 Print out steps in each site No |
---|
1031 | 5 Print sequences at all nodes of tree No |
---|
1032 | 6 Write out trees onto tree file? Yes |
---|
1033 | |
---|
1034 | Y to accept these or type the letter for one to change |
---|
1035 | </PRE> |
---|
1036 | </TD></TR></TABLE> |
---|
1037 | <P> |
---|
1038 | If you want to accept the default settings (they are shown in the above case) |
---|
1039 | you can simply type <TT>Y</TT> followed by pressing on the <TT>Enter</TT> key. |
---|
1040 | If you want to change any of the options, you should type the letter |
---|
1041 | shown to the left of its entry in the menu. For example, to set a threshold |
---|
1042 | type <TT>T</TT>. Lower-case letters will also work. For many of the options |
---|
1043 | the program will ask for supplementary information, such as the value of |
---|
1044 | the threshold. |
---|
1045 | <P> |
---|
1046 | Note the <TT>Terminal type</TT> entry, which you will find on all menus. It |
---|
1047 | allows you to specify which type of terminal your screen is. The options |
---|
1048 | are an IBM PC screen, an ANSI standard terminal, or <TT>none</TT>. |
---|
1049 | Choosing zero (<TT>0</TT>) toggles |
---|
1050 | among these three options in cyclical order, changing each time the <TT>0</TT> |
---|
1051 | option is chosen. If one of them is right for your terminal the screen will be |
---|
1052 | cleared before the menu is displayed. If none works, the <TT>none</TT> option |
---|
1053 | should probably be chosen. The programs should start with a terminal option |
---|
1054 | appropriate for your computer, but if they do not, you can change the |
---|
1055 | terminal type manually. This is particularly important in program RETREE |
---|
1056 | where a tree is displayed on the screen - if the terminal type is set to the |
---|
1057 | wrong value, the tree can look very strange. |
---|
1058 | <P> |
---|
1059 | The other numbered options control which information the program will |
---|
1060 | display on your screen or on the output files. The option to <TT>Print |
---|
1061 | indications of progress of run</TT> will show information such as the names of |
---|
1062 | the species as they are successively added to the tree, and the |
---|
1063 | progress of rearrangements. You will usually want to see these as |
---|
1064 | reassurance that the program is running and to help you estimate how long |
---|
1065 | it will take. But if you are running the program "in background" as can be |
---|
1066 | done on multitasking and multiuser systems, and do not have the |
---|
1067 | program running in its own window, you may want to turn this option off so |
---|
1068 | that it does not disturb your use of the computer while the program is |
---|
1069 | running. |
---|
1070 | <P> |
---|
1071 | <A NAME="outputfile"><HR><P></A> |
---|
1072 | <H2>The Output File</H2> |
---|
1073 | <BR> |
---|
1074 | <P> |
---|
1075 | Most of the programs write their output onto a file called (usually) <TT>outfile</TT>, and a representation of the trees found onto a file called |
---|
1076 | <TT>outtree</TT>. |
---|
1077 | <P> |
---|
1078 | The exact contents of the output file vary from program to program and also |
---|
1079 | depend on which menu options you have selected. For many programs, if you |
---|
1080 | select all possible output information, the output will consist of |
---|
1081 | (1) the name of the program and its |
---|
1082 | version number, (2) some of the input information printed out, and (3) a series of |
---|
1083 | phylogenies, some with associated information indicating how much change |
---|
1084 | there was in each character or on each part of the tree. A typical rooted tree |
---|
1085 | looks like this: |
---|
1086 | <P> |
---|
1087 | <TABLE><TR><TD BGCOLOR=white> |
---|
1088 | <PRE> |
---|
1089 | +-------------------Gibbon |
---|
1090 | +----------------------------2 |
---|
1091 | ! ! +------------------Orang |
---|
1092 | ! +------4 |
---|
1093 | ! ! +---------Gorilla |
---|
1094 | +-----3 +--6 |
---|
1095 | ! ! ! +---------Chimp |
---|
1096 | ! ! +----5 |
---|
1097 | --1 ! +-----Human |
---|
1098 | ! ! |
---|
1099 | ! +-----------------------------------------------Mouse |
---|
1100 | ! |
---|
1101 | +------------------------------------------------Bovine |
---|
1102 | </PRE> |
---|
1103 | </TD></TR></TABLE> |
---|
1104 | <P> |
---|
1105 | The interpretation of the tree is fairly straightforward: it "grows" |
---|
1106 | from left to right. The numbers at the forks are arbitrary and are used (if |
---|
1107 | present) merely to identify the forks. For many of the programs the tree |
---|
1108 | produced is unrooted. Rooted and unrooted trees are printed in nearly the |
---|
1109 | same form, but the unrooted ones are accompanied by the |
---|
1110 | warning message: |
---|
1111 | <P> |
---|
1112 | <TT> remember: this is an unrooted tree! |
---|
1113 | </TT> |
---|
1114 | <P> |
---|
1115 | to indicate that this is an unrooted tree and to warn against |
---|
1116 | taking the position of its root too seriously. Mathematicians still call |
---|
1117 | an unrooted tree a tree, though some systematists unfortunately use the term |
---|
1118 | "network" for an unrooted tree. This conflicts with standard mathematical |
---|
1119 | usage, which reserves the name "network" for a completely different kind of |
---|
1120 | graph). The root of this tree could be anywhere, say on the line leading |
---|
1121 | immediately to <TT>Mouse</TT>. As an exercise, |
---|
1122 | see if you can tell whether the following tree is or is not a different |
---|
1123 | one from the above: |
---|
1124 | <P> |
---|
1125 | <TABLE><TR><TD BGCOLOR=white> |
---|
1126 | <PRE> |
---|
1127 | +-----------------------------------------------Mouse |
---|
1128 | ! |
---|
1129 | +---------4 +------------------Orang |
---|
1130 | ! ! +------3 |
---|
1131 | ! ! ! ! +---------Chimp |
---|
1132 | ---6 +----------------------------1 ! +----2 |
---|
1133 | ! ! +--5 +-----Human |
---|
1134 | ! ! ! |
---|
1135 | ! ! +---------Gorilla |
---|
1136 | ! ! |
---|
1137 | ! +-------------------Gibbon |
---|
1138 | ! |
---|
1139 | +-------------------------------------------Bovine |
---|
1140 | |
---|
1141 | remember: this is an unrooted tree! |
---|
1142 | </PRE> |
---|
1143 | </TD></TR></TABLE> |
---|
1144 | <P> |
---|
1145 | (it is <I>not</I> different). It is <I>important</I> also to realize that the |
---|
1146 | lengths of the segments of the printed tree may not be significant: some |
---|
1147 | may actually represent branches of zero length, in the sense that there is no |
---|
1148 | evidence that |
---|
1149 | those branches are nonzero in length. Some of the diagrams of trees attempt |
---|
1150 | to print branches approximately proportional to estimated |
---|
1151 | branch lengths, while in others the lengths are purely conventional and |
---|
1152 | are presented just to make the topology visible. You will have to look closely |
---|
1153 | at the documentation that accompanies each program to see what it presents |
---|
1154 | and what is known about the lengths of the branches on the tree. The above |
---|
1155 | tree attempts to represent branch lengths approximately in the diagram. But |
---|
1156 | even in those cases, some of the smaller branches are likely to be |
---|
1157 | artificially lengthened to make the tree topology clearer. Here is what |
---|
1158 | a tree from DNAPARS looks like, when no attempt is made to make the |
---|
1159 | lengths of branches in the diagram proportional to estimated branch |
---|
1160 | lengths: |
---|
1161 | <P> |
---|
1162 | <TABLE><TR><TD BGCOLOR=white> |
---|
1163 | <PRE> |
---|
1164 | +--Human |
---|
1165 | +--5 |
---|
1166 | +--4 +--Chimp |
---|
1167 | ! ! |
---|
1168 | +--3 +-----Gorilla |
---|
1169 | ! ! |
---|
1170 | +--2 +--------Orang |
---|
1171 | ! ! |
---|
1172 | +--1 +-----------Gibbon |
---|
1173 | ! ! |
---|
1174 | --6 +--------------Mouse |
---|
1175 | ! |
---|
1176 | +-----------------Bovine |
---|
1177 | |
---|
1178 | remember: this is an unrooted tree! |
---|
1179 | </PRE> |
---|
1180 | </TD></TR></TABLE> |
---|
1181 | <P> |
---|
1182 | When a tree has branch lengths, it will be accompanied by a table showing |
---|
1183 | for each branch the numbers (or names) of the nodes at each end of the |
---|
1184 | branch, and the length of that branch. For the first tree shown above, |
---|
1185 | the corresponding table is: |
---|
1186 | <P> |
---|
1187 | <TABLE><TR><TD BGCOLOR=white> |
---|
1188 | <PRE> |
---|
1189 | Between And Length Approx. Confidence Limits |
---|
1190 | ------- --- ------ ------- ---------- ------ |
---|
1191 | |
---|
1192 | 1 Bovine 0.90216 ( 0.50346, 1.30086) ** |
---|
1193 | 1 Mouse 0.79240 ( 0.42191, 1.16297) ** |
---|
1194 | 1 2 0.48553 ( 0.16602, 0.80496) ** |
---|
1195 | 2 3 0.12113 ( zero, 0.24676) * |
---|
1196 | 3 4 0.04895 ( zero, 0.12668) |
---|
1197 | 4 5 0.07459 ( 0.00735, 0.14180) ** |
---|
1198 | 5 Human 0.10563 ( 0.04234, 0.16889) ** |
---|
1199 | 5 Chimp 0.17158 ( 0.09765, 0.24553) ** |
---|
1200 | 4 Gorilla 0.15266 ( 0.07468, 0.23069) ** |
---|
1201 | 3 Orang 0.30368 ( 0.18735, 0.41999) ** |
---|
1202 | 2 Gibbon 0.33636 ( 0.19264, 0.48009) ** |
---|
1203 | |
---|
1204 | * = significantly positive, P < 0.05 |
---|
1205 | ** = significantly positive, P < 0.01 |
---|
1206 | </PRE> |
---|
1207 | </TD></TR></TABLE> |
---|
1208 | <P> |
---|
1209 | Ignoring the asterisks and the approximate confidence limits, which will be |
---|
1210 | described in the documentation file for DNAML, we can see that the table |
---|
1211 | gives a more precise idea of what the lengths of all the branches are. |
---|
1212 | Similar tables exist in distance matrix and likelihood programs, as well |
---|
1213 | as in the parsimony programs DNAPARS and PARS. |
---|
1214 | <P> |
---|
1215 | Some of the parsimony programs in the package can print out a table |
---|
1216 | of the number of steps that different characters (or sites) require on |
---|
1217 | the tree. This table may not be obvious at first. A typical example looks like |
---|
1218 | this: |
---|
1219 | <P> |
---|
1220 | <TABLE><TR><TD BGCOLOR=white> |
---|
1221 | <PRE> |
---|
1222 | steps in each site: |
---|
1223 | 0 1 2 3 4 5 6 7 8 9 |
---|
1224 | *----------------------------------------- |
---|
1225 | 0! 2 2 2 2 1 1 2 2 1 |
---|
1226 | 10! 1 2 3 1 1 1 1 1 1 2 |
---|
1227 | 20! 1 2 2 1 2 2 1 1 1 2 |
---|
1228 | 30! 1 2 1 1 1 2 1 3 1 1 |
---|
1229 | 40! 1 |
---|
1230 | </PRE> |
---|
1231 | </TD></TR></TABLE> |
---|
1232 | <P> |
---|
1233 | The numbers across the top and down the side indicate which site |
---|
1234 | is being referred to. Thus site 23 is column "3" of row "20" |
---|
1235 | and has 1 step in this case. |
---|
1236 | <P> |
---|
1237 | There are many other kinds of information that can appear in the |
---|
1238 | output file, They vary from program to program, and we leave their |
---|
1239 | description to the documentation files for the specific programs. |
---|
1240 | <P> |
---|
1241 | <A NAME="treefile"><HR><P></A> |
---|
1242 | <H2>The Tree File</H2> |
---|
1243 | <P> |
---|
1244 | In output from most programs, |
---|
1245 | a representation of the tree is also written into the tree file |
---|
1246 | <TT>outtree</TT>. The tree is specified by nested pairs |
---|
1247 | of parentheses, enclosing |
---|
1248 | names and separated by commas. We will describe how this works |
---|
1249 | below. If there are any blanks in the names, |
---|
1250 | these must be replaced by the underscore character "<TT>_</TT>". Trailing blanks |
---|
1251 | in the name may be omitted. The pattern of the parentheses indicates |
---|
1252 | the pattern of the tree by having each pair of parentheses enclose all |
---|
1253 | the members of a monophyletic group. The tree file could look like this: |
---|
1254 | <P> |
---|
1255 | <TT>((Mouse,Bovine),(Gibbon,(Orang,(Gorilla,(Chimp,Human))))); |
---|
1256 | </TT> |
---|
1257 | <P> |
---|
1258 | In this tree the first fork separates the lineage leading to |
---|
1259 | <TT>Mouse</TT> and <TT>Bovine</TT> from the lineage leading to the rest. Within the |
---|
1260 | latter group there is a fork separating <TT>Gibbon</TT> from the rest, and so on. |
---|
1261 | The entire tree is enclosed in an outermost pair of parentheses. The tree ends |
---|
1262 | with a semicolon. In some programs such as DNAML, FITCH, and CONTML, |
---|
1263 | the tree will be unrooted. An unrooted tree should have its |
---|
1264 | bottommost fork have a |
---|
1265 | three-way split, with three groups separated by two commas: |
---|
1266 | <P> |
---|
1267 | <TT>(A,(B,(C,D)),(E,F)); |
---|
1268 | </TT> |
---|
1269 | <P> |
---|
1270 | Here the three groups at the bottom node are <TT>A</TT>, <TT>(B,C,D)</TT>, and |
---|
1271 | <TT>(E,F)</TT>. The single three-way split corresponds to one of the interior |
---|
1272 | nodes of the unrooted tree (it can be any interior node of the tree). The |
---|
1273 | remaining forks are encountered as you move out from that first node. |
---|
1274 | In newer programs, some are able to tolerate these other forks being |
---|
1275 | multifurcations (multi-way splits). |
---|
1276 | You should check the documentation files |
---|
1277 | for the particular programs you are using to see in which of these forms |
---|
1278 | you can expect the user tree to be in. Note that many of the programs |
---|
1279 | that actually estimate an unrooted tree (such as DNAPARS) produce trees in the |
---|
1280 | treefile in rooted form! This is done for reasons of arbitrary internal bookkeeping. The placement of the root is arbitrary. We are working toward |
---|
1281 | having all programs be able to read all trees, whether rooted or unrooted, |
---|
1282 | multifurcating or bifurcating, and having them do the right thing with |
---|
1283 | them. But this is a long-term goal and it is not yet achieved. |
---|
1284 | <P> |
---|
1285 | For programs that infer branch lengths, these are given in the trees in the |
---|
1286 | tree file as real numbers following a colon, and placed immediately |
---|
1287 | after the group descended from that branch. Here is a typical tree |
---|
1288 | with branch lengths: |
---|
1289 | <P> |
---|
1290 | <TT>((cat:47.14069,(weasel:18.87953,((dog:25.46154,(raccoon:19.19959,<BR> |
---|
1291 | bear:6.80041):0.84600):3.87382,(sea_lion:11.99700,<BR> |
---|
1292 | seal:12.00300):7.52973):2.09461):20.59201):25.0,monkey:75.85931); |
---|
1293 | </TT> |
---|
1294 | <P> |
---|
1295 | Note that the tree may continue to a new line at any time except in the |
---|
1296 | middle of a name or the middle of a branch length, although in trees |
---|
1297 | written to the tree file this will only be done after a comma. |
---|
1298 | <P> |
---|
1299 | These representations of trees are a subset of the standard adopted |
---|
1300 | on 24 June 1986 at the annual meetings of the Society for the Study of |
---|
1301 | Evolution by an informal committee (its final session in Newick's |
---|
1302 | lobster restaurant - hence its name, the Newick standard) |
---|
1303 | consisting of Wayne Maddison (author of MacClade), David Swofford (PAUP), |
---|
1304 | F. James Rohlf (NTSYS-PC), Chris Meacham (COMPROB and the original |
---|
1305 | PHYLIP tree drawing programs), James Archie, |
---|
1306 | William H.E. Day, and me. This standard is a generalization of |
---|
1307 | PHYLIP's format, itself based on a well-known representation of trees in |
---|
1308 | terms of parenthesis patterns which is due to the famous mathematician |
---|
1309 | Arthur Cayley, and which has been around for over a century. The |
---|
1310 | standard is now employed by most phylogeny computer programs but unfortunately |
---|
1311 | has yet to be decribed in a formal published description. Other |
---|
1312 | descriptions by me and by Gary Olsen can be accessed using the Web at: |
---|
1313 | <P> |
---|
1314 | <DIV ALIGN="CENTER"> |
---|
1315 | <FONT SIZE=+2><A HREF="http://evolution.gs.washington.edu/phylip/newicktree.html"> |
---|
1316 | <TT>http://evolution.gs.washington.edu/phylip/newicktree.html</TT></A></FONT> |
---|
1317 | </DIV> |
---|
1318 | <P> |
---|
1319 | <A NAME="options"><HR><P></A> |
---|
1320 | <H2>The Options and How To Invoke Them</H2> |
---|
1321 | <P> |
---|
1322 | Most of the programs allow various options that alter the amount of |
---|
1323 | information the program is provided or what is done with the |
---|
1324 | information. Options are selected in the menu. |
---|
1325 | <P> |
---|
1326 | <H3>Common options in the menu</H3> |
---|
1327 | <P> |
---|
1328 | A number of the options from the menu, the <TT>U</TT> (User tree), <TT>G</TT> (Global), |
---|
1329 | <TT>J</TT> (Jumble), <TT>O</TT> (Outgroup), <TT>W</TT> (Weights), |
---|
1330 | <TT>T</TT> (Threshold), <TT>M</TT> (multiple data sets), and the tree output options, are used |
---|
1331 | so widely that it is best to discuss them in this document. |
---|
1332 | <P> |
---|
1333 | <B>The <TT>U</TT> (User tree) option.</B> This option toggles between the default |
---|
1334 | setting, which allows the program to search for the best tree, and the |
---|
1335 | User tree setting, which reads a tree or trees ("user trees") from the input |
---|
1336 | tree file and evaluates them. The input tree file's |
---|
1337 | default name is <TT>intree</TT>. In a few cases the trees should |
---|
1338 | be preceded by a line giving the number of trees: |
---|
1339 | <P> |
---|
1340 | <TABLE><TR><TD BGCOLOR=white> |
---|
1341 | <PRE> |
---|
1342 | 3 |
---|
1343 | ((Alligator,Bear),((Cow,(Dog,Elephant)),Ferret)); |
---|
1344 | ((Alligator,Bear),(((Cow,Dog),Elephant),Ferret)); |
---|
1345 | ((Alligator,Bear),((Cow,Dog),(Elephant,Ferret))); |
---|
1346 | </PRE> |
---|
1347 | </TD></TR></TABLE> |
---|
1348 | <P> |
---|
1349 | while in most cases the initial line with the number of trees is not |
---|
1350 | required. This is an inconsistency in the programs that we are intending |
---|
1351 | to eliminate soon. Some programs require rooted trees, some unrooted |
---|
1352 | trees, and some can handle multifurcating trees. You should read |
---|
1353 | the documentation for the particular program to find out which it |
---|
1354 | requires. Program RETREE can be used to convert trees among |
---|
1355 | these forms (on saving a tree from RETREE, you are asked whether |
---|
1356 | you want it to be rooted or unrooted). |
---|
1357 | <P> |
---|
1358 | In using the user tree option, check the pattern of parentheses |
---|
1359 | carefully. The programs do not always detect |
---|
1360 | whether the tree makes sense, and if it does not there will probably be |
---|
1361 | a crash (hopefully, but not inevitably, with an error message indicating |
---|
1362 | the nature of the problem). Trees written out by programs are |
---|
1363 | typically in the proper form. |
---|
1364 | <P> |
---|
1365 | Some of the programs require that the user trees be preceded by line with the |
---|
1366 | number of user trees. Some require that they <EM>not</EM> be preceded by |
---|
1367 | this line, and many can tolerate either. I have tried to note for |
---|
1368 | each of these programs which of these forms of the user tree file |
---|
1369 | is appropriate. We hope to bring all programs to the same user tree file |
---|
1370 | format as soon as possible. |
---|
1371 | <P> |
---|
1372 | <B>The <TT>G</TT> (Global) option.</B> In the programs which construct trees (except for |
---|
1373 | NEIGHBOR, the "...PENNY" programs and CLIQUE, and of course |
---|
1374 | the "...MOVE" programs where you construct the trees yourself), |
---|
1375 | after all species have been added to the tree a rearrangements phase |
---|
1376 | ensues. In most of these programs the rearrangements are automatically |
---|
1377 | global, which in this case means that subtrees will be removed from the tree |
---|
1378 | and put back on in all possible ways so as to have a better chance of |
---|
1379 | finding a better tree. Since this can be time consuming (it roughly |
---|
1380 | triples the time taken for a run) it is left as an option in some of the |
---|
1381 | programs, specifically CONTML, FITCH, and DNAML. In these programs |
---|
1382 | the G menu option toggles between the default of local rearrangement and |
---|
1383 | global rearrangement. The rearrangements are explained more below. |
---|
1384 | <P> |
---|
1385 | <B>The <TT>J</TT> (Jumble) option.</B> In most of the tree construction programs |
---|
1386 | (except for the "...PENNY" programs and CLIQUE), the exact |
---|
1387 | details of the search of different trees depend on the order of input of |
---|
1388 | species. In these programs <TT>J</TT> option enables you to tell the program to use |
---|
1389 | a random number |
---|
1390 | generator to choose the input order of species. This option is toggled on |
---|
1391 | and off by |
---|
1392 | selecting option <TT>J</TT> in the menu. The program will then prompt you for |
---|
1393 | a "seed" for the random number generator. The seed should be an integer |
---|
1394 | between 1 and 32767, and should of form 4n+1, |
---|
1395 | which means that it must give a remainder of 1 when divided by 4. This can be |
---|
1396 | judged by looking at the last two digits of the number. Each different seed |
---|
1397 | leads to a different sequence of addition of species. By simply changing the |
---|
1398 | random number seed and re-running the programs one can look for other, and |
---|
1399 | better trees. If the seed entered is not odd, the program will not proceed, |
---|
1400 | but will prompt for another seed. |
---|
1401 | <P> |
---|
1402 | The Jumble option also causes the program to ask you how many times you |
---|
1403 | want to restart the process. If you answer 10, the program will |
---|
1404 | try ten different orders of species in constructing the trees, and the |
---|
1405 | results printed out will reflect this entire search process (that is, |
---|
1406 | the best trees found among all 10 runs will be printed out, not the |
---|
1407 | best trees from each individual run). |
---|
1408 | <P> |
---|
1409 | Some people have asked what are good values of the random number seed. |
---|
1410 | The random number seed is used to start a process of choosing "random" |
---|
1411 | (actually pseudorandom) numbers, which behave as if they were |
---|
1412 | unpredictably randomly chosen between 0 and 2<SUP>32</SUP>-1 (which is |
---|
1413 | 4,294,967,296). You could put in the number 133 and find that the |
---|
1414 | next random number was 1,876,973,009. As they are effectively |
---|
1415 | unpredictable, there is no such thing as a choice that is better than |
---|
1416 | any other, provided that the numbers are of the form 4<I>n</I>+1. However |
---|
1417 | if you re-use a random number seed, the sequence of random numbers |
---|
1418 | that result will be the same as before, resulting in exactly the same |
---|
1419 | series of choices, which may not be what you want. |
---|
1420 | <P> |
---|
1421 | <B>The <TT>O</TT> (Outgroup) option.</B> This specifies which species is to be used |
---|
1422 | to root the tree by having it become the outgroup. This option is |
---|
1423 | toggled on and off by choosing <TT>O</TT> in the menu (the alphabetic |
---|
1424 | character <TT>O</TT>, not the digit <TT>0</TT>). When it is on, the program will |
---|
1425 | then prompt for the |
---|
1426 | number of the outgroup (the species being taken in the numerical order that |
---|
1427 | they occur in the input file). Responding by typing <TT>6</TT> and then an |
---|
1428 | <TT>Enter</TT> character indicates that the sixth species in the data |
---|
1429 | is the outgroup. Outgroup-rooting will not be attempted if the |
---|
1430 | data have already established a root for the tree from some other |
---|
1431 | consideration, and may not be if it is a user-defined tree, |
---|
1432 | despite your invoking the option. Thus programs such as DOLLOP that |
---|
1433 | produce only rooted trees do not allow the Outgroup option. It is also |
---|
1434 | not available in KITSCH, DNAMLK, or CLIQUE. When it is used, the tree as |
---|
1435 | printed out is still listed as being an |
---|
1436 | unrooted tree, though the outgroup is connected to the bottommost node |
---|
1437 | so that it is easy to visually convert the tree into rooted form. |
---|
1438 | <P> |
---|
1439 | <B>The <TT>T</TT> (Threshold) option.</B> This sets a threshold forn the |
---|
1440 | parsimony programs such that if the |
---|
1441 | number of steps counted in a character is higher than the threshold, it |
---|
1442 | will be taken to be the threshold value rather than the actual number of |
---|
1443 | steps. The default is a threshold so high that it will never be |
---|
1444 | surpassed (in which case the steps whill simply be counted). The <TT>T</TT> |
---|
1445 | menu option toggles on and off asking the user to |
---|
1446 | supply a threshold. The use of thresholds to obtain methods intermediate |
---|
1447 | between parsimony and compatibility methods is described in my 1981b paper. |
---|
1448 | When the T option is in force, the program |
---|
1449 | will prompt for the numerical threshold value. This will be a positive |
---|
1450 | real number greater than 1. In programs MIX, MOVE, PENNY, PROTPARS, |
---|
1451 | DNAPARS, DNAMOVE, and DNAPENNY, do not use threshold values less |
---|
1452 | than or equal to 1.0, as they have no meaning and lead to a tree which |
---|
1453 | depends only on considerations such as the input order of species and not at |
---|
1454 | all on the character state data! In programs DOLLOP, DOLMOVE, and DOLPENNY |
---|
1455 | the threshold should never be 0.0 or less, for the same |
---|
1456 | reason. The <TT>T</TT> option is an |
---|
1457 | important and underutilized one: it is, for example, the only way in this |
---|
1458 | package (except for program DNACOMP) to do a compatibility analysis when there |
---|
1459 | are missing data. It is a method of de-weighting characters that evolve |
---|
1460 | rapidly. I wish more people were aware of its properties. |
---|
1461 | <P> |
---|
1462 | <B>The <TT>M</TT> (Multiple data sets) option.</B> In menu programs there is an |
---|
1463 | <TT>M</TT> menu |
---|
1464 | option which allows one to toggle on the multiple data sets option. The |
---|
1465 | program will ask you how many data sets it should expect. The data sets |
---|
1466 | have the same format as the first data set. Here is a (very small) input file |
---|
1467 | with two five-species data sets: |
---|
1468 | <P> |
---|
1469 | <TABLE><TR><TD bgcolor=white> |
---|
1470 | <PRE> |
---|
1471 | 5 6 |
---|
1472 | Alpha CCACCA |
---|
1473 | Beta CCAAAA |
---|
1474 | Gamma CAACCA |
---|
1475 | Delta AACAAC |
---|
1476 | Epsilon AACCCA |
---|
1477 | 5 6 |
---|
1478 | Alpha CACACA |
---|
1479 | Beta CCAACC |
---|
1480 | Gamma CAACAC |
---|
1481 | Delta GCCTGG |
---|
1482 | Epsilon TGCAAT |
---|
1483 | </PRE> |
---|
1484 | </TD></TR></TABLE> |
---|
1485 | <P> |
---|
1486 | The main use of this option will be to allow all of the methods in these |
---|
1487 | programs to be bootstrapped. Using the program SEQBOOT one can take any |
---|
1488 | DNA, protein, restriction sites, gene frequency or binary character data set and |
---|
1489 | make multiple data sets by bootstrapping. Trees can be produced for all of |
---|
1490 | these using the <TT>M</TT> option. They will be written on the tree output file if |
---|
1491 | that option is left in force. Then the program CONSENSE can be used with |
---|
1492 | that tree file as its input file. The result is a majority rule consensus |
---|
1493 | tree which can be used to make confidence intervals. The present version |
---|
1494 | of the package allows, with the use of SEQBOOT and CONSENSE and the M option, |
---|
1495 | bootstrapping of many of the methods in the package. |
---|
1496 | <P> |
---|
1497 | Programs DNAML, DNAPARS and PARS can also take multiple weights |
---|
1498 | instead of multiple data sets. They can then do bootstrapping by |
---|
1499 | reading in one data set, together with a file of weights that show how |
---|
1500 | the characters (or sites) are reweighted in each bootstrap sample. Thus a |
---|
1501 | site that is omitted in a bootstrap sample has effectively been given |
---|
1502 | weight 0, while a site that has been duplicated has effectively been |
---|
1503 | given weight 2. SEQBOOT has a menu selection to produce the file of |
---|
1504 | weights information automatically, instead of producing a file of |
---|
1505 | multiple data sets. |
---|
1506 | <P> |
---|
1507 | <B>The <TT>W</TT> (Weights) option</B>. This signals the program that, in |
---|
1508 | addition to the data set, you want to read in a series of weights that |
---|
1509 | tell how many times each character is to be counted. If the weight |
---|
1510 | for a character is zero (<TT>0</TT>) then that character is in effect to |
---|
1511 | be omitted when the tree is evaluated. If it is (<TT>1</TT>) the |
---|
1512 | character is to be counted once. Some programs allow weights greater than |
---|
1513 | 1 as well. These have the effect that the character is counted as |
---|
1514 | if it were present that many times, so that a weight of 4 means that the |
---|
1515 | character is counted 4 times. |
---|
1516 | The values 0-9 give weights 0 through 9, and the |
---|
1517 | values A-Z give weights 10 through 35. By use of the weights we can |
---|
1518 | give overwhelming weight to some characters, and drop others from the |
---|
1519 | analysis. In the molecular sequence programs only two values of the |
---|
1520 | weights, 0 or 1 are allowed. |
---|
1521 | <P> |
---|
1522 | The weights are used to analyze subsets of the characters, and also can be |
---|
1523 | used for resampling of the data as in bootstrap and jackknife resampling. |
---|
1524 | For those programs that allow weights to be greater than 1, they can also |
---|
1525 | be used to emphasize information from some characters more strongly than |
---|
1526 | others. Of course, you must have some rationale for doing this. |
---|
1527 | <P> |
---|
1528 | The weights are provided as a sequence of digits. Thus they might be |
---|
1529 | <P> |
---|
1530 | <TT>10011111100010100011110001100</TT> |
---|
1531 | <P> |
---|
1532 | The weights are to be provided in an input file |
---|
1533 | whose default name is <TT>weights</TT>. In programs such as SEQBOOT |
---|
1534 | that can also output a file of weights, the input weights have a default |
---|
1535 | file name of <TT>inweights</TT>, and the output file name has a default |
---|
1536 | file name of <TT>outweights</TT>. |
---|
1537 | <P> |
---|
1538 | Weights can be used to analyze different subsets of characters (by weighting |
---|
1539 | the rest as zero). Alternatively, in the discrete characters programs |
---|
1540 | they can be used to force a certain |
---|
1541 | group to appear on the phylogeny (in effect confining consideration to only |
---|
1542 | phylogenies containing that group). This is done by adding an imaginary |
---|
1543 | character that has <TT>1</TT>'s for the members of the group, and <TT>0</TT>'s |
---|
1544 | for all the |
---|
1545 | other species. That imaginary character is then given the highest weight |
---|
1546 | possible: the result will be that any phylogeny that does not contain that |
---|
1547 | group will be penalized by such a heavy amount that it will not (except in |
---|
1548 | the most unusual circumstances) be considered. Of course, the new character |
---|
1549 | brings extra steps to the tree, but the number of these can be calculated |
---|
1550 | in advance and subtracted out of the total when reporting the results. This |
---|
1551 | use of weights is an important one, and one sadly ignored |
---|
1552 | by many users who could profit from it. In the case of molecular sequences |
---|
1553 | we cannot use weights this way, so that to force a given group to appear we |
---|
1554 | have to add a large extra segment of sites to the molecule, with (say) A's |
---|
1555 | for that group and C's for every other species. |
---|
1556 | <P> |
---|
1557 | <B>The option to write out the trees into a tree file</B>. This specifies that you |
---|
1558 | want the program to write |
---|
1559 | out the tree not only on its usual output, but also onto a file in |
---|
1560 | nested-parenthesis notation (as described above). This option is sufficiently |
---|
1561 | useful that it is turned on by default in all programs that allow it. You |
---|
1562 | can optionally turn it off if you wish, by typing the appropriate number |
---|
1563 | from the menu (it varies from program to program). This option is useful for |
---|
1564 | creating tree files that can be directly read into the programs, including |
---|
1565 | the consensus tree and tree distance programs, and the tree plotting programs. |
---|
1566 | <P> |
---|
1567 | The output tree file has a default name of <TT>outtree</TT>. |
---|
1568 | <P> |
---|
1569 | <B>The (<TT>0</TT>) terminal type option</B> . (This is the digit <TT>0</TT>, not |
---|
1570 | the alphabetic character <TT>O</TT>). The program will default to |
---|
1571 | one particular assumption about your terminal (except in the case of |
---|
1572 | Macintoshes, the default will be an ANSI compatible terminal). You can |
---|
1573 | alternatively select it to be either an IBM PC, or nothing. |
---|
1574 | This affects the ability of the programs to clear the screen when they |
---|
1575 | display their menus, and the graphics characters used to display trees |
---|
1576 | in the programs DNAMOVE, MOVE, DOLMOVE, and RETREE. If you are running an |
---|
1577 | MSDOS system and have the ANSI.SYS driver installed in your CONFIG.SYS |
---|
1578 | file, you may find that the screen clears correctly even with the default |
---|
1579 | setting of ANSI. |
---|
1580 | <P> |
---|
1581 | <A NAME="algorithm"><HR><P></A> |
---|
1582 | <DIV ALIGN="CENTER"> |
---|
1583 | <H2>The Algorithm for Constructing Trees</H2></DIV> |
---|
1584 | <P> |
---|
1585 | All of the programs except FACTOR, DNADIST, GENDIST, DNAINVAR, SEQBOOT, |
---|
1586 | CONTRAST, RETREE, and the plotting and |
---|
1587 | consensus tree programs act to construct an estimate of a phylogeny. MOVE, |
---|
1588 | DOLMOVE, and DNAMOVE let you construct it yourself by hand. All of |
---|
1589 | the rest but NEIGHBOR, the "...PENNY" programs and CLIQUE make use of |
---|
1590 | a common approach involving additions and rearrangements. They are |
---|
1591 | trying to minimize or maximize some quantity over the space of all |
---|
1592 | possible evolutionary trees. Each program contains a part that, given |
---|
1593 | the topology of the tree, evaluates the quantity that is being minimized |
---|
1594 | or maximized. The straightforward approach would be to evaluate all |
---|
1595 | possible tree topologies one after another and pick the one which, |
---|
1596 | according to the criterion being used, is best. This would not be |
---|
1597 | possible for more than a small number of species, since the number of |
---|
1598 | possible tree topologies is enormous. A review of the literature on the |
---|
1599 | counting of evolutionary trees will be found one of my papers |
---|
1600 | (Felsenstein, 1978a). |
---|
1601 | <P> |
---|
1602 | Since we cannot search all topologies, these programs are not |
---|
1603 | guaranteed to always find the best tree, although they seem to do quite |
---|
1604 | well in practice. The strategy they employ is as follows: the species |
---|
1605 | are taken in the order in which they appear in the input file. The |
---|
1606 | first two (in some programs the first three) are taken and a tree |
---|
1607 | constructed containing only those. There is only one possible topology for |
---|
1608 | this tree. Then the next species is taken, and we consider where it |
---|
1609 | might be added to the tree. If the initial tree is (say) a rooted tree |
---|
1610 | with two species and we want the resulting three-species tree to be a |
---|
1611 | bifurcating tree, there are only three places where we could add the |
---|
1612 | third species. Each of these is tried, and each time the resulting tree is |
---|
1613 | evaluated according to the criterion. The best one is chosen to be the |
---|
1614 | basis for further operations. Now we consider adding the fourth |
---|
1615 | species, again at each of the five possible places that would result in |
---|
1616 | a bifurcating tree. Again, the best of these is accepted. |
---|
1617 | <P> |
---|
1618 | <H3>Local Rearrangements</H3> |
---|
1619 | <P> |
---|
1620 | The process continues in this manner, with one important exception. After |
---|
1621 | each species is added, and before the next |
---|
1622 | is added, a number of rearrangements of the tree are tried, in an effort |
---|
1623 | to improve it. The algorithms move through the tree, making all |
---|
1624 | possible local rearrangements of the tree. A local rearrangement involves an |
---|
1625 | internal segment of the tree in the following manner. Each internal |
---|
1626 | segment of the tree is of this form (where T1, T2, and T3 are subtrees |
---|
1627 | - parts of the tree that can contain further forks and tips): |
---|
1628 | <P> |
---|
1629 | <PRE> |
---|
1630 | T1 T2 T3 |
---|
1631 | \ / / |
---|
1632 | \ / / |
---|
1633 | \ / / |
---|
1634 | \/ / |
---|
1635 | * / |
---|
1636 | * / |
---|
1637 | * / |
---|
1638 | * / |
---|
1639 | * |
---|
1640 | ! |
---|
1641 | ! |
---|
1642 | </PRE> |
---|
1643 | <P> |
---|
1644 | the segment we are discussing being indicated by the asterisks. A local |
---|
1645 | rearrangement consists of switching the subtrees T1 and T3 or T2 and T3, |
---|
1646 | so as to obtain one of the following: |
---|
1647 | <P> |
---|
1648 | <PRE> |
---|
1649 | T3 T2 T1 T1 T3 T2 |
---|
1650 | \ / / \ / / |
---|
1651 | \ / / \ / / |
---|
1652 | \ / / \ / / |
---|
1653 | \ / / \ / / |
---|
1654 | \ / \ / |
---|
1655 | \ / \ / |
---|
1656 | \ / \ / |
---|
1657 | \ / \ / |
---|
1658 | ! ! |
---|
1659 | ! ! |
---|
1660 | ! ! |
---|
1661 | </PRE> |
---|
1662 | <P> |
---|
1663 | Each time a local rearrangement is successful in finding a better tree, |
---|
1664 | the new arrangement is accepted. The phase of local rearrangements does |
---|
1665 | not end until the program can traverse the entire tree, attempting local |
---|
1666 | rearrangements, without finding any that improve the tree. |
---|
1667 | <P> |
---|
1668 | This strategy of adding species and making local rearrangements will look |
---|
1669 | at about (n-1)x(2n-3) different topologies, though if |
---|
1670 | rearrangements are frequently successful the number may be larger. I |
---|
1671 | have been describing the strategy when rooted trees are being |
---|
1672 | considered. For unrooted trees there is a precisely similar strategy, |
---|
1673 | though the first tree constructed may be a three-species tree and the |
---|
1674 | rearrangements may not start until after the addition of the fifth |
---|
1675 | species. |
---|
1676 | <P> |
---|
1677 | Though we are not guaranteed to have found the best tree topology, |
---|
1678 | we are guaranteed that no nearby topology (i. e. none accessible by a |
---|
1679 | single local rearrangement) is better. In this sense we have reached a |
---|
1680 | local optimum of our criterion. Note that the whole process is |
---|
1681 | dependent on the order in which the species are present in the input |
---|
1682 | file. We can try to find a different and better solution by reordering |
---|
1683 | the species in the input file and running the program again (or, more |
---|
1684 | easily, by using the <TT>J</TT> option). If none of |
---|
1685 | these attempts finds a better solution, then we have some indication |
---|
1686 | that we may have found the best topology, though we can never be certain |
---|
1687 | of this. |
---|
1688 | <P> |
---|
1689 | Note also that a new topology is never accepted unless it is better |
---|
1690 | than the previous one, so that the rearrangement process can never fall |
---|
1691 | into an endless loop. This is also the way ties in our criterion are |
---|
1692 | resolved, namely by sticking with the tree found first. However, the tree |
---|
1693 | construction programs other than CLIQUE, CONTML, FITCH, |
---|
1694 | and DNAML do keep a record of all trees found that are tied with the best one |
---|
1695 | found. This gives you some immediate idea of which parts of the tree can be |
---|
1696 | altered without affecting the quality of the result. |
---|
1697 | <P> |
---|
1698 | |
---|
1699 | <H3>Global Rearrangements</H3> |
---|
1700 | <P> |
---|
1701 | A feature of most of the programs, such as PROTPARS, DNAPARS, |
---|
1702 | DNACOMP, DNAML, DNAMLK, RESTML, KITSCH, FITCH, CONTML, MIX, and DOLLOP, |
---|
1703 | is "global" optimization of the tree. In four of these (CONTML, |
---|
1704 | FITCH, DNAML and DNAMLK) this is an option, <TT>G</TT>. In the others it |
---|
1705 | automatically applies. When |
---|
1706 | it is present there is an additional stage to the search for the best tree. |
---|
1707 | Each possible subtree is removed from the tree from the tree and added back in |
---|
1708 | all possible places. This process continues until all subtrees can be removed |
---|
1709 | and added again without any improvement in the tree. The purpose of this |
---|
1710 | extra rearrangement is to make it less likely that one or more a species gets |
---|
1711 | "stuck" in a suboptimal region of the space of all possible trees. The use of |
---|
1712 | global optimization results in approximately a tripling (3 x ) of the run-time, |
---|
1713 | which is why I have left it as an option in some of the slower programs. |
---|
1714 | <P> |
---|
1715 | What PHYLIP calls "global" rearrangements are more properly called |
---|
1716 | SPR (subtree pruning and regrafting) by Swofford et. al. (1996) as distinct |
---|
1717 | from the NNI (nearest neighbor interchange) rearrangements that PHYLIP |
---|
1718 | also uses, and the TBR (tree bisection and reconnection) rearrangements |
---|
1719 | that it does not use. |
---|
1720 | <P> |
---|
1721 | The programs doing global optimization print out a dot "<TT>.</TT>" after each group is |
---|
1722 | removed and re-added to the tree, to give the user some sign that the |
---|
1723 | rearrangements are proceeding. A new line of dots is started whenever a new |
---|
1724 | round of global rearrangements is started following an improvement in the |
---|
1725 | tree. On the line before the dots are printed there is printed a bar of |
---|
1726 | the form "!---------------!" to show how many dots |
---|
1727 | to expect. The dots will |
---|
1728 | not be printed out at a uniform rate, but the later dots, which represent |
---|
1729 | removal of larger groups from the tree and trying them consequently in fewer |
---|
1730 | places, will print out more quickly. With some compilers each row of dots may |
---|
1731 | not be printed out until it is complete. |
---|
1732 | <P> |
---|
1733 | It should be noted that PENNY, DOLPENNY, DNAPENNY and CLIQUE use a more |
---|
1734 | sophisticated strategy of "depth-first search" with a "branch and bound" |
---|
1735 | search method that guarantees that all |
---|
1736 | of the best trees will be found. In the case |
---|
1737 | of PENNY, DOLPENNY and DNAPENNY there can be a considerable sacrifice of |
---|
1738 | computer time if the number of species is greater than about ten: it is a |
---|
1739 | matter for you to consider whether it is worth it for you to guarantee finding |
---|
1740 | all the most parsimonious trees, and that depends on how much free computer |
---|
1741 | time you have! CLIQUE finds all largest cliques, and does so without undue |
---|
1742 | burning of computer time. Although all of these problems that have been |
---|
1743 | investigated fall into the |
---|
1744 | category of "NP-hard" problems that in effect do not have a rapid solution, |
---|
1745 | the cases that cause this trouble for the largest-cliques algorithm in |
---|
1746 | CLIQUE apparently are not biologically realistic and do not occur in actual |
---|
1747 | data. |
---|
1748 | <P> |
---|
1749 | |
---|
1750 | <H3>Multiple Jumbles</H3> |
---|
1751 | <P> |
---|
1752 | As just mentioned, for most of these programs the search depends on the order |
---|
1753 | in which the species are entered into the tree. Using the <TT>J</TT> (Jumble) |
---|
1754 | option you can supply a random number seed which will allow the program to put |
---|
1755 | the species in in a random order. Jumbling can be |
---|
1756 | done multiple times. For example, if you tell the program to do it |
---|
1757 | 10 times, it will go through the tree-building process 10 times, each with a |
---|
1758 | different random order of adding species. It will keep a record of the trees |
---|
1759 | tied for best over the whole process. In other words, it does not just |
---|
1760 | record the best trees from each of the 10 runs, but records the best ones |
---|
1761 | overall. Of course this is slow, taking 10 times longer than a single run. |
---|
1762 | But it does give us a much greater chance of finding all of the most |
---|
1763 | parsimonious trees. In the terminology of Maddison (1991) it |
---|
1764 | can find different "islands" of trees. The present algorithms do not |
---|
1765 | guarantee us to find all trees in a given "island" from a single run, so |
---|
1766 | multiple runs also help explore those "islands" that are found. |
---|
1767 | <P> |
---|
1768 | <H3>Saving multiple tied trees</H3> |
---|
1769 | <P> |
---|
1770 | For the parsimony and compatibility programs, one can have a perfect tie |
---|
1771 | between two or more trees. In these programs these trees are all |
---|
1772 | saved. For the newer parsimony programs such as DNAPARS and PARS, |
---|
1773 | global rearrangement is carried out on all of these tied trees. This can |
---|
1774 | be turned off in the menu. |
---|
1775 | <P> |
---|
1776 | For trees with criteria which are real numbers, such as the distance |
---|
1777 | matrix programs FITCH and KITSCH, and the likelihood programs DNAML, |
---|
1778 | DNAMLK, CONTML, and RESTML, it is difficult to get an exact tie between |
---|
1779 | trees. Consequently these programs save only the single best tree |
---|
1780 | (even though the others may be only a tiny bit worse). |
---|
1781 | <P> |
---|
1782 | <H3>Strategy for Finding the Best Tree</H3> |
---|
1783 | <P> |
---|
1784 | In practice, it is advisable to use the Jumble option to evaluate many |
---|
1785 | different orderings of the input species. <I>It is advisable to use the |
---|
1786 | Jumble option and specify that it be done many times (as many as ten)</I> |
---|
1787 | to use different orderings |
---|
1788 | of the input species). |
---|
1789 | <P> |
---|
1790 | People who want a magic "black box" program whose results they do |
---|
1791 | not have to question (or think about) often are upset that these |
---|
1792 | programs give results that are dependent on the order in which the species |
---|
1793 | are entered in the data. To me this property is an advantage, for it |
---|
1794 | permits you to try different searches for better trees, simply by |
---|
1795 | varying the input order of species. If you do not use the multiple Jumble |
---|
1796 | option, but do multiple individual runs instead, you |
---|
1797 | can easily decide which to pay most attention to - the one or ones that |
---|
1798 | are best according to the criterion employed (for example, with parsimony, |
---|
1799 | the one out of the runs that results in the tree with the fewest changes). |
---|
1800 | <P> |
---|
1801 | In practice, in a single run, it usually seems best to put species that are |
---|
1802 | likely to be sources of confusion in the topology last, as by the time they are |
---|
1803 | added the arrangement of the earlier species will have stabilized into a |
---|
1804 | good configuration, and then the last few species will by fitted into |
---|
1805 | that topology. There will be less chance this way of a poor initial |
---|
1806 | topology that would affect all subsequent parts of the search. However, |
---|
1807 | a variety of arrangements of the input order of species should be tried, |
---|
1808 | as can be done if the <TT>J</TT> option is used, |
---|
1809 | and no species should be kept in a fixed place in the order of input. |
---|
1810 | Note that the results of the "...PENNY" programs and CLIQUE |
---|
1811 | are not sensitive to the input order of species, and NEIGHBOR is only |
---|
1812 | slightly sensistive to it, so that multiple Jumbling is not possible |
---|
1813 | with those programs. Note also that with global search, which |
---|
1814 | is standard in many programs and in others is an |
---|
1815 | option, each group (including |
---|
1816 | each individual species) will be removed and re-added in all possible |
---|
1817 | positions, so that a species causing confusion will have more chance of moving |
---|
1818 | to a new location than it would without global rearrangement. |
---|
1819 | <P> |
---|
1820 | <A NAME="warning"><HR><P></A> |
---|
1821 | <DIV ALIGN="CENTER"> |
---|
1822 | <H2>A Warning on Interpreting Results</H2></DIV> |
---|
1823 | <P> |
---|
1824 | Probably the most important thing to keep in mind while running any of the |
---|
1825 | parsimony or compatibility programs is not |
---|
1826 | to overinterpret the result. Many users treat the set of most parsimonious |
---|
1827 | trees as if it were a confidence interval. If a group appears in all of the |
---|
1828 | most parsimonious trees then they treat it as well established. Unfortunately |
---|
1829 | <I>the confidence interval on phylogenies appears to be much |
---|
1830 | larger than the set of all most parsimonious trees</I> (Felsenstein, 1985b). |
---|
1831 | Likewise, variation of result among different methods will not be a good |
---|
1832 | indicator of the size of the confidence interval. Consider a simple data set |
---|
1833 | in which, out of 100 binary characters, 51 recommend the unrooted tree |
---|
1834 | <TT>((A,B),(C,D))</TT> and 49 the tree <TT>((A,D),(B,C))</TT>. Many different |
---|
1835 | methods will all give the same result on |
---|
1836 | such a data set: they will estimate the tree as <TT>((A,B),(C,D))</TT>. |
---|
1837 | Nevertheless it is |
---|
1838 | clear that the 51:49 margin by which this tree is favored is not statistically |
---|
1839 | significantly different from 50:50. So <I>consistency among different methods |
---|
1840 | is a poor guide to statistical significance</I>. |
---|
1841 | <P> |
---|
1842 | <A NAME="speed"><HR><P></A> |
---|
1843 | <DIV ALIGN="CENTER"> |
---|
1844 | <H2>Relative Speed of Different<BR> |
---|
1845 | Programs and Machines</H2></DIV> |
---|
1846 | <P> |
---|
1847 | <H3>Relative speed of the different programs</H3> |
---|
1848 | <P> |
---|
1849 | C compilers differ in efficiency of the code they generate, |
---|
1850 | and some deal with some features of the language better than with |
---|
1851 | others. Thus a program which is unusually fast on one computer may be |
---|
1852 | unusually slow on another. Nevertheless, as a rough guide to relative |
---|
1853 | execution speeds, I have tested the programs on three data sets, each of |
---|
1854 | which has 10 species and 40 characters. The first is an imaginary one |
---|
1855 | in which all characters are compatible - ("The Willi Hennig Memorial |
---|
1856 | Data Set" as J. S. Farris once called ones like it). The second is the binary |
---|
1857 | recoded form of the fossil horses data set of Camin and Sokal (1965). |
---|
1858 | The third data set has data that is completely random: 10 species and 20 |
---|
1859 | characters that have a 50% chance that each character state is <TT>0</TT> or |
---|
1860 | <TT>1</TT> (or <TT>A</TT> or <TT>G</TT>). The data sets thus range from a completely |
---|
1861 | compatible one in which there is no homoplasy (paralellism or convergence), |
---|
1862 | through the horses data set, which requires 29 steps where the possible |
---|
1863 | minimum number would be 20, to the random data set, which requires 49 steps. |
---|
1864 | We can thus see how this increasing messiness of the data affects running |
---|
1865 | times. The three data sets have all had 20 sites of <TT>A</TT>'s added to the |
---|
1866 | end of each sequence, so as to prevent likelihood or distance matrix programs |
---|
1867 | from having infinite branch lengths (the test data sets used for timing |
---|
1868 | previous versions of PHYLIP wsere the same except that they lacked these |
---|
1869 | 20 extra sites). |
---|
1870 | <P> |
---|
1871 | Here are the nucleotide sequence versions of the three data sets: |
---|
1872 | <P> |
---|
1873 | <TABLE><TR><TD BGCOLOR=white> |
---|
1874 | <PRE> |
---|
1875 | 10 40 |
---|
1876 | A CACACACAAAAAAAAAAACAAAAAAAAAAAAAAAAAAAAA |
---|
1877 | B CACACAACAAAAAAAAAACAAAAAAAAAAAAAAAAAAAAA |
---|
1878 | C CACAACAAAAAAAAAAAACAAAAAAAAAAAAAAAAAAAAA |
---|
1879 | D CAACAAAACAAAAAAAAACAAAAAAAAAAAAAAAAAAAAA |
---|
1880 | E CAACAAAAACAAAAAAAACAAAAAAAAAAAAAAAAAAAAA |
---|
1881 | F ACAAAAAAAACACACAAAACAAAAAAAAAAAAAAAAAAAA |
---|
1882 | G ACAAAAAAAACACAACAAACAAAAAAAAAAAAAAAAAAAA |
---|
1883 | H ACAAAAAAAACAACAAAAACAAAAAAAAAAAAAAAAAAAA |
---|
1884 | I ACAAAAAAAAACAAAACAACAAAAAAAAAAAAAAAAAAAA |
---|
1885 | J ACAAAAAAAAACAAAAACACAAAAAAAAAAAAAAAAAAAA |
---|
1886 | </PRE> |
---|
1887 | </TD></TR></TABLE> |
---|
1888 | <P> |
---|
1889 | <TABLE><TR><TD BGCOLOR=white> |
---|
1890 | <PRE> |
---|
1891 | 10 40 |
---|
1892 | MesohippusAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA |
---|
1893 | HypohippusAAACCCCCCCAAAAAAAAACAAAAAAAAAAAAAAAAAAAA |
---|
1894 | ArchaeohipCAAAAAAAAAAAAAAAACACAAAAAAAAAAAAAAAAAAAA |
---|
1895 | ParahippusCAAACAACAACAAAAAAAACAAAAAAAAAAAAAAAAAAAA |
---|
1896 | MerychippuCCAACCACCACCCCACACCCAAAAAAAAAAAAAAAAAAAA |
---|
1897 | M. secunduCCAACCACCACCCACACCCCAAAAAAAAAAAAAAAAAAAA |
---|
1898 | Nannipus CCAACCACAACCCCACACCCAAAAAAAAAAAAAAAAAAAA |
---|
1899 | NeohippariCCAACCCCCCCCCCACACCCAAAAAAAAAAAAAAAAAAAA |
---|
1900 | Calippus CCAACCACAACCCACACCCCAAAAAAAAAAAAAAAAAAAA |
---|
1901 | PliohippusCCCACCCCCCCCCACACCCCAAAAAAAAAAAAAAAAAAAA |
---|
1902 | </PRE> |
---|
1903 | </TD></TR></TABLE> |
---|
1904 | <P> |
---|
1905 | <TABLE><TR><TD BGCOLOR=white> |
---|
1906 | <PRE> |
---|
1907 | 10 40 |
---|
1908 | A CACACAACCAAACAAACCACAAAAAAAAAAAAAAAAAAAA |
---|
1909 | B AAACCACACACACAAACCCAAAAAAAAAAAAAAAAAAAAA |
---|
1910 | C ACAAAACCAAACCACCCACAAAAAAAAAAAAAAAAAAAAA |
---|
1911 | D AAAAACACAACACACCAAACAAAAAAAAAAAAAAAAAAAA |
---|
1912 | E AAACAACCACACACAACCAAAAAAAAAAAAAAAAAAAAAA |
---|
1913 | F CCCAAACACCCCCAAAAAACAAAAAAAAAAAAAAAAAAAA |
---|
1914 | G ACACCCCCACACCCACCAACAAAAAAAAAAAAAAAAAAAA |
---|
1915 | H AAAACAACAACCACCCCACCAAAAAAAAAAAAAAAAAAAA |
---|
1916 | I ACACAACAACACAAACAACCAAAAAAAAAAAAAAAAAAAA |
---|
1917 | J CCAAAAACACCCAACCCAACAAAAAAAAAAAAAAAAAAAA |
---|
1918 | </PRE> |
---|
1919 | </TD></TR></TABLE> |
---|
1920 | <P> |
---|
1921 | Here are the timings of many of the version 3.6 programs on these three data |
---|
1922 | sets as run after being compiled by Gnu C and run on a |
---|
1923 | 266 MHz Pentium MMX computer under Linux. |
---|
1924 | <P> |
---|
1925 | <DIV ALIGN="CENTER"> |
---|
1926 | <TABLE CELLPADDING=3 BORDER="1"> |
---|
1927 | <TR><TD ALIGN="LEFT"> </TD> |
---|
1928 | <TD ALIGN="RIGHT">Hennigian Data</TD> |
---|
1929 | <TD ALIGN="RIGHT">Horses Data</TD> |
---|
1930 | <TD ALIGN="RIGHT">Random Data</TD> |
---|
1931 | </TR> |
---|
1932 | <TR><TD ALIGN="LEFT">PROTPARS</TD> |
---|
1933 | <TD ALIGN="RIGHT">0.133</TD> |
---|
1934 | <TD ALIGN="RIGHT">0.167</TD> |
---|
1935 | <TD ALIGN="RIGHT">0.308</TD> |
---|
1936 | </TR> |
---|
1937 | <TR><TD ALIGN="LEFT">DNAPARS</TD> |
---|
1938 | <TD ALIGN="RIGHT">0.163</TD> |
---|
1939 | <TD ALIGN="RIGHT">0.191</TD> |
---|
1940 | <TD ALIGN="RIGHT">0.573</TD> |
---|
1941 | </TR> |
---|
1942 | <TR><TD ALIGN="LEFT">DNAPENNY</TD> |
---|
1943 | <TD ALIGN="RIGHT">0.300</TD> |
---|
1944 | <TD ALIGN="RIGHT">0.196</TD> |
---|
1945 | <TD ALIGN="RIGHT">36.68</TD> |
---|
1946 | </TR> |
---|
1947 | <TR><TD ALIGN="LEFT">DNACOMP</TD> |
---|
1948 | <TD ALIGN="RIGHT">0.081</TD> |
---|
1949 | <TD ALIGN="RIGHT">0.073</TD> |
---|
1950 | <TD ALIGN="RIGHT">0.127</TD> |
---|
1951 | </TR> |
---|
1952 | <TR><TD ALIGN="LEFT">DNAML</TD> |
---|
1953 | <TD ALIGN="RIGHT">2.19</TD> |
---|
1954 | <TD ALIGN="RIGHT">2.53</TD> |
---|
1955 | <TD ALIGN="RIGHT">2.73</TD> |
---|
1956 | </TR> |
---|
1957 | <TR><TD ALIGN="LEFT">DNAMLK</TD> |
---|
1958 | <TD ALIGN="RIGHT">5.40</TD> |
---|
1959 | <TD ALIGN="RIGHT">6.13</TD> |
---|
1960 | <TD ALIGN="RIGHT">7.21</TD> |
---|
1961 | </TR> |
---|
1962 | <TR><TD ALIGN="LEFT">PROML</TD> |
---|
1963 | <TD ALIGN="RIGHT">44.79</TD> |
---|
1964 | <TD ALIGN="RIGHT">90.46</TD> |
---|
1965 | <TD ALIGN="RIGHT">68.49</TD> |
---|
1966 | </TR> |
---|
1967 | <TR><TD ALIGN="LEFT">PROMLK</TD> |
---|
1968 | <TD ALIGN="RIGHT">171.01</TD> |
---|
1969 | <TD ALIGN="RIGHT">183.61</TD> |
---|
1970 | <TD ALIGN="RIGHT">239.34</TD> |
---|
1971 | </TR> |
---|
1972 | <TR><TD ALIGN="LEFT">DNAML</TD> |
---|
1973 | <TD ALIGN="RIGHT">2.19</TD> |
---|
1974 | <TD ALIGN="RIGHT">2.53</TD> |
---|
1975 | <TD ALIGN="RIGHT">2.73</TD> |
---|
1976 | </TR> |
---|
1977 | <TR><TD ALIGN="LEFT">DNAINVAR</TD> |
---|
1978 | <TD ALIGN="RIGHT">0.002</TD> |
---|
1979 | <TD ALIGN="RIGHT">0.002</TD> |
---|
1980 | <TD ALIGN="RIGHT">0.002</TD> |
---|
1981 | </TR> |
---|
1982 | <TR><TD ALIGN="LEFT">DNADIST</TD> |
---|
1983 | <TD ALIGN="RIGHT">0.029</TD> |
---|
1984 | <TD ALIGN="RIGHT">0.024</TD> |
---|
1985 | <TD ALIGN="RIGHT">0.033</TD> |
---|
1986 | </TR> |
---|
1987 | <TR><TD ALIGN="LEFT">PROTDIST</TD> |
---|
1988 | <TD ALIGN="RIGHT">1.095</TD> |
---|
1989 | <TD ALIGN="RIGHT">1.089</TD> |
---|
1990 | <TD ALIGN="RIGHT">1.107</TD> |
---|
1991 | </TR> |
---|
1992 | <TR><TD ALIGN="LEFT">RESTML</TD> |
---|
1993 | <TD ALIGN="RIGHT">3.55</TD> |
---|
1994 | <TD ALIGN="RIGHT">3.18</TD> |
---|
1995 | <TD ALIGN="RIGHT">5.15</TD> |
---|
1996 | </TR> |
---|
1997 | <TR><TD ALIGN="LEFT">RESTDIST</TD> |
---|
1998 | <TD ALIGN="RIGHT">0.012</TD> |
---|
1999 | <TD ALIGN="RIGHT">0.010</TD> |
---|
2000 | <TD ALIGN="RIGHT">0.010</TD> |
---|
2001 | </TR> |
---|
2002 | <TR><TD ALIGN="LEFT">FITCH</TD> |
---|
2003 | <TD ALIGN="RIGHT">0.20</TD> |
---|
2004 | <TD ALIGN="RIGHT">0.31</TD> |
---|
2005 | <TD ALIGN="RIGHT">0.24</TD> |
---|
2006 | </TR> |
---|
2007 | <TR><TD ALIGN="LEFT">KITSCH</TD> |
---|
2008 | <TD ALIGN="RIGHT">0.055</TD> |
---|
2009 | <TD ALIGN="RIGHT">0.061</TD> |
---|
2010 | <TD ALIGN="RIGHT">0.058</TD> |
---|
2011 | </TR> |
---|
2012 | <TR><TD ALIGN="LEFT">NEIGHBOR</TD> |
---|
2013 | <TD ALIGN="RIGHT">0.003</TD> |
---|
2014 | <TD ALIGN="RIGHT">0.004</TD> |
---|
2015 | <TD ALIGN="RIGHT">0.005</TD> |
---|
2016 | </TR> |
---|
2017 | <TR><TD ALIGN="LEFT">CONTML</TD> |
---|
2018 | <TD ALIGN="RIGHT">0.380</TD> |
---|
2019 | <TD ALIGN="RIGHT">0.368</TD> |
---|
2020 | <TD ALIGN="RIGHT">0.396</TD> |
---|
2021 | </TR> |
---|
2022 | <TR><TD ALIGN="LEFT">GENDIST</TD> |
---|
2023 | <TD ALIGN="RIGHT">0.008</TD> |
---|
2024 | <TD ALIGN="RIGHT">0.009</TD> |
---|
2025 | <TD ALIGN="RIGHT">0.008</TD> |
---|
2026 | </TR> |
---|
2027 | <TR><TD ALIGN="LEFT">PARS</TD> |
---|
2028 | <TD ALIGN="RIGHT">0.201</TD> |
---|
2029 | <TD ALIGN="RIGHT">0.263</TD> |
---|
2030 | <TD ALIGN="RIGHT">0.729</TD> |
---|
2031 | </TR> |
---|
2032 | <TR><TD ALIGN="LEFT">MIX</TD> |
---|
2033 | <TD ALIGN="RIGHT">0.064</TD> |
---|
2034 | <TD ALIGN="RIGHT">0.078</TD> |
---|
2035 | <TD ALIGN="RIGHT">0.123</TD> |
---|
2036 | </TR> |
---|
2037 | <TR><TD ALIGN="LEFT">PENNY</TD> |
---|
2038 | <TD ALIGN="RIGHT">0.038</TD> |
---|
2039 | <TD ALIGN="RIGHT">0.087</TD> |
---|
2040 | <TD ALIGN="RIGHT">15.93</TD> |
---|
2041 | </TR> |
---|
2042 | <TR><TD ALIGN="LEFT">DOLLOP</TD> |
---|
2043 | <TD ALIGN="RIGHT">0.134</TD> |
---|
2044 | <TD ALIGN="RIGHT">0.141</TD> |
---|
2045 | <TD ALIGN="RIGHT">0.233</TD> |
---|
2046 | </TR> |
---|
2047 | <TR><TD ALIGN="LEFT">DOLPENNY</TD> |
---|
2048 | <TD ALIGN="RIGHT">0.051</TD> |
---|
2049 | <TD ALIGN="RIGHT">0.241</TD> |
---|
2050 | <TD ALIGN="RIGHT">101.29</TD> |
---|
2051 | </TR> |
---|
2052 | <TR><TD ALIGN="LEFT">CLIQUE</TD> |
---|
2053 | <TD ALIGN="RIGHT">0.010</TD> |
---|
2054 | <TD ALIGN="RIGHT">0.015</TD> |
---|
2055 | <TD ALIGN="RIGHT">0.020</TD> |
---|
2056 | </TR> |
---|
2057 | </TABLE> |
---|
2058 | </DIV> |
---|
2059 | |
---|
2060 | <P> |
---|
2061 | <BR> |
---|
2062 | |
---|
2063 | <P> |
---|
2064 | In all cases the programs were run under the default options without compiler |
---|
2065 | switches, except as |
---|
2066 | specified here. The |
---|
2067 | data sets used for the discrete characters programs have <TT>0</TT>'s and <TT>1</TT>'s |
---|
2068 | instead of <TT>A</TT>'s and <TT>C</TT>'s. For CONTML the <TT>A</TT>'s and <TT>C</TT>'s |
---|
2069 | were made into <TT>0.0</TT>'s and <TT>1.0</TT>'s and considered as 40 2-allele loci. |
---|
2070 | For the distance programs 10 x 10 distance matrices were |
---|
2071 | computed from the three data sets. |
---|
2072 | For the restriction sites programs <TT>A</TT> and <TT>C</TT> were changed into |
---|
2073 | <TT>+</TT> and <TT>-</TT>. It does not |
---|
2074 | make much sense to benchmark MOVE, DOLMOVE, or DNAMOVE, although when there |
---|
2075 | are many characters and many species the response time after each |
---|
2076 | alteration of the tree should be proportional to the product of the number of |
---|
2077 | species and the number of characters. For DNAML and DNAMLK the frequencies |
---|
2078 | of the four bases were |
---|
2079 | set to be equal rather than determined empirically as is the default. For |
---|
2080 | RESTML the number of enzymes was set to 1. |
---|
2081 | <P> |
---|
2082 | In most cases, the benchmark was made more accurate by analyzing 10 data |
---|
2083 | sets using the <TT>M</TT> (Multiple data sets) option and dividing the resulting |
---|
2084 | time by 10. Times were determined as user times using the Linux <TT>time</TT> |
---|
2085 | command. Several patterns will be apparent from this. The algorithms (MIX, |
---|
2086 | DOLLOP, CONTML, FITCH, KITSCH, PROTPARS, DNAPARS, DNACOMP, and |
---|
2087 | DNAML, DNAMLK, RESTML) that use the above-described addition strategy have |
---|
2088 | run times that do not depend strongly on the messiness of the data. The only |
---|
2089 | exception to this is that if a data set such as the Random data requires |
---|
2090 | extra rounds of global rearrangements it takes longer. The |
---|
2091 | programs differ greatly in run time: the likelihood programs RESTML, DNAML and |
---|
2092 | CONTML are quite a bit slower than the others. The protein sequence parsimony |
---|
2093 | program, which has to do a considerable amount of bookkeeping to keep track of |
---|
2094 | which amino acids can mutate to each other, is also relatively slow. |
---|
2095 | <P> |
---|
2096 | Another class of algorithms includes PENNY, DOLPENNY, DNAPENNY and CLIQUE. |
---|
2097 | These are branch-and-bound methods: in principle they should have execution |
---|
2098 | times that rise exponentially with the number of species and/or |
---|
2099 | characters, and they might be much more sensitive to messy data. This is |
---|
2100 | apparent with PENNY, DOLPENNY, and DNAPENNY, which go from being reasonably |
---|
2101 | fast with clean data to very slow with messy data. DOLPENNY is particularly |
---|
2102 | slow on messy data - this is because this algorithm cannot make use of some of |
---|
2103 | the lower-bound calculations that are possible with DNAPENNY and PENNY. CLIQUE |
---|
2104 | is very fast on all |
---|
2105 | data sets. Although in theory it should bog down if the number of cliques in |
---|
2106 | the data is very large, that does not happen with random data, which in |
---|
2107 | fact has few cliques and those small ones. Apparently the "worst-case" |
---|
2108 | data sets that cause exponential run time are much rarer for CLIQUE than for |
---|
2109 | the other branch-and-bound methods. |
---|
2110 | <P> |
---|
2111 | NEIGHBOR is quite fast compared to FITCH and KITSCH, and should make it |
---|
2112 | possible to run much larger cases, although the results are expected to be |
---|
2113 | a bit rougher than with those programs. |
---|
2114 | <BR> |
---|
2115 | <P> |
---|
2116 | <H3>Speed with different numbers of species</H3> |
---|
2117 | <P> |
---|
2118 | How will the speed depend on the number of species and the number |
---|
2119 | of characters? For the sequential-addition algorithms, the speed should |
---|
2120 | be proportional to somewhere between the cube of the number of species and |
---|
2121 | the square of the number of species, and to the number |
---|
2122 | of characters. Thus a case that has, instead of 10 species and 20 |
---|
2123 | characters, 20 species and 50 characters would take (in the cubic case) |
---|
2124 | 2 x 2 x 2 x 2.5 = 20 |
---|
2125 | times as long. This implies that cases with more than 20 species will |
---|
2126 | be slow, and cases with more than 40 species <I>very</I> slow. This places a |
---|
2127 | premium on working on small subproblems rather than just dumping a whole |
---|
2128 | large data set into the programs. |
---|
2129 | <P> |
---|
2130 | An exception to these rules will be some of the DNA programs that use an |
---|
2131 | aliasing device to save execution time. In these programs execution time |
---|
2132 | will not necessarily increase proportional to the number of sites, |
---|
2133 | as sites that show the same pattern of nucleotides will be detected |
---|
2134 | as identical and the calculations for them will be done only once, which does |
---|
2135 | not lead to more execution time. This is particularly |
---|
2136 | likely to happen with few species and many sites, or with data sets that have |
---|
2137 | small amounts of evolutionary divergence. |
---|
2138 | <P> |
---|
2139 | For programs FITCH and KITSCH, the distance matrix is square, so |
---|
2140 | that when we double the number of species we also double the number of |
---|
2141 | "characters", so that running times will go up as the fourth power of |
---|
2142 | the number of species rather than the third power. Thus a 20-species |
---|
2143 | case with FITCH is expected to run sixteen times more slowly than a 10-species |
---|
2144 | case. |
---|
2145 | <P> |
---|
2146 | For programs like PENNY and CLIQUE the run times will rise faster |
---|
2147 | than the cube of the number of species (in fact, they can rise faster |
---|
2148 | than any power since these algorithms are not guaranteed to work in |
---|
2149 | polynomial time). In practice, PENNY will frequently bog down above 11 |
---|
2150 | species, while CLIQUE easily deals with larger numbers. |
---|
2151 | <P> |
---|
2152 | For NEIGHBOR the speed should vary only as the square of the number of |
---|
2153 | species, so a case twice as large will take only four times as long. This |
---|
2154 | will make it an attractive alternative to FITCH and KITSCH for large data |
---|
2155 | sets. |
---|
2156 | <P> |
---|
2157 | <B>Note:</B> If you are unsure of how long a program will take, try it first on |
---|
2158 | a few species, then work your way up until you get a feel for the speed |
---|
2159 | and for what size programs you can afford to run. |
---|
2160 | <P> |
---|
2161 | Execution time is not the most important criterion for a program, |
---|
2162 | particularly as computer time gets much cheaper than your time or a |
---|
2163 | programmer's time. With workstations on which background jobs can be run |
---|
2164 | all night, execution speed is not overwhelmingly relevant. Some of us have been |
---|
2165 | conditioned by an earlier era of computing to consider execution speed |
---|
2166 | paramount. But ease of use, ease of adaptation to your computer system, |
---|
2167 | and ease of modification are much more important in practice, and in |
---|
2168 | these respects I think these programs are adequate. Only if you are |
---|
2169 | engaged in 1960's style mainframe computing, or if you have very large |
---|
2170 | amounts of data is minimization of execution |
---|
2171 | time paramount. |
---|
2172 | <P> |
---|
2173 | Nevertheless it would have been nice to have made the programs |
---|
2174 | faster. The present speeds are a compromise between speed and |
---|
2175 | effectiveness: by making them slower and trying more rearrangements in the |
---|
2176 | trees, or by enumerating all possible trees, I could have made the programs |
---|
2177 | more likely to find the best tree. By trying fewer rearrangements I |
---|
2178 | could have speeded them up, but at the cost of finding worse trees. I |
---|
2179 | could also have speeded them up by writing critical sections in assembly |
---|
2180 | language, but this would have sacrificed ease of distribution to new |
---|
2181 | computer systems. There are also some options included in these programs that |
---|
2182 | make it |
---|
2183 | harder to adopt some of the economies of bookkeeping that make other programs |
---|
2184 | faster. However to some extent I have simply made the decision not to spend |
---|
2185 | time trying to speed up program bookkeeping when there were new likelihood and |
---|
2186 | statistical methods to be developed. |
---|
2187 | <BR> |
---|
2188 | <P> |
---|
2189 | <H3>Relative speed of different machines</H3> |
---|
2190 | <P> |
---|
2191 | It is interesting to compare different machines using DNAPARS as the |
---|
2192 | standard task. One can rate a machine on the DNAPARS benchmark by summing the |
---|
2193 | times for all three of the data sets. Here are relative total timings over |
---|
2194 | all three data sets (done with various versions of DNAPARS) for some machines, |
---|
2195 | taking a Pentium MMX 266 notebook computer running Linux with gcc as the |
---|
2196 | standard. Benchmarks from versions 3.4 and 3.5 of the program are |
---|
2197 | included (respectively the Pascal and C versions whose timings are in |
---|
2198 | parentheses. They are compared only with each other and are scaled to the |
---|
2199 | rest of the timings using the joint runs on the 386SX and the Pentium MMX 266. |
---|
2200 | This use of separate standards is necessary not |
---|
2201 | because of different languages but because different versions of the package |
---|
2202 | are being compared. Thus, the "Time" is the ratio of the Total to that for |
---|
2203 | the Pentium, adjusted by the scalings of machines using 3.4 and 3.5 when |
---|
2204 | appropriate. The Relative Speed is the reciprocal of the Time. |
---|
2205 | <P> |
---|
2206 | <DIV ALIGN="CENTER"> |
---|
2207 | <TABLE CELLPADDING=3 BORDER="1"> |
---|
2208 | <TR><TD ALIGN="LEFT"><B>Machine</B></TD> |
---|
2209 | <TD ALIGN="LEFT"><B>Operating<BR>System</B></TD> |
---|
2210 | <TD ALIGN="LEFT"><B>Compiler</B></TD> |
---|
2211 | <TD ALIGN="LEFT"><B>Total</B></TD> |
---|
2212 | <TD ALIGN="LEFT"><B>Time</B></TD> |
---|
2213 | <TD ALIGN="LEFT"><B>Relative<BR>Speed</B></TD> |
---|
2214 | </TR> |
---|
2215 | <TR><TD ALIGN="LEFT">Toshiba T1100+</TD> |
---|
2216 | <TD ALIGN="LEFT">MSDOS</TD> |
---|
2217 | <TD ALIGN="LEFT">Turbo Pascal 3.01A</TD> |
---|
2218 | <TD ALIGN="LEFT">(269)</TD> |
---|
2219 | <TD ALIGN="LEFT">1758.2</TD> |
---|
2220 | <TD ALIGN="LEFT">0.0005688</TD> |
---|
2221 | </TR> |
---|
2222 | <TR><TD ALIGN="LEFT">Apple Mac Plus</TD> |
---|
2223 | <TD ALIGN="LEFT">MacOS</TD> |
---|
2224 | <TD ALIGN="LEFT">Lightspeed Pascal 2</TD> |
---|
2225 | <TD ALIGN="LEFT">(175.84)</TD> |
---|
2226 | <TD ALIGN="LEFT">1149.3</TD> |
---|
2227 | <TD ALIGN="LEFT">0.0008701</TD> |
---|
2228 | </TR> |
---|
2229 | <TR><TD ALIGN="LEFT">Toshiba T1100+</TD> |
---|
2230 | <TD ALIGN="LEFT">MSDOS</TD> |
---|
2231 | <TD ALIGN="LEFT">Turbo Pascal 5.0</TD> |
---|
2232 | <TD ALIGN="LEFT">(162)</TD> |
---|
2233 | <TD ALIGN="LEFT">1058.9</TD> |
---|
2234 | <TD ALIGN="LEFT">0.0009443</TD> |
---|
2235 | </TR> |
---|
2236 | <TR><TD ALIGN="LEFT">Macintosh Classic</TD> |
---|
2237 | <TD ALIGN="LEFT">MacOS</TD> |
---|
2238 | <TD ALIGN="LEFT">Think Pascal 3</TD> |
---|
2239 | <TD ALIGN="LEFT">(160)</TD> |
---|
2240 | <TD ALIGN="LEFT">1045.8</TD> |
---|
2241 | <TD ALIGN="LEFT">0.0009562</TD> |
---|
2242 | </TR> |
---|
2243 | <TR><TD ALIGN="LEFT">Macintosh Classic</TD> |
---|
2244 | <TD ALIGN="LEFT">MacOS</TD> |
---|
2245 | <TD ALIGN="LEFT">Think C</TD> |
---|
2246 | <TD ALIGN="LEFT">(43.0)</TD> |
---|
2247 | <TD ALIGN="LEFT">795.6</TD> |
---|
2248 | <TD ALIGN="LEFT">0.0012569</TD> |
---|
2249 | </TR> |
---|
2250 | <TR><TD ALIGN="LEFT">IBM PS2/60</TD> |
---|
2251 | <TD ALIGN="LEFT">MSDOS</TD> |
---|
2252 | <TD ALIGN="LEFT">Turbo Pascal 5.0</TD> |
---|
2253 | <TD ALIGN="LEFT">(58.76)</TD> |
---|
2254 | <TD ALIGN="LEFT">384.00</TD> |
---|
2255 | <TD ALIGN="LEFT">0.002604</TD> |
---|
2256 | </TR> |
---|
2257 | <TR><TD ALIGN="LEFT">80286 (12 Mhz)</TD> |
---|
2258 | <TD ALIGN="LEFT">MSDOS</TD> |
---|
2259 | <TD ALIGN="LEFT">Turbo Pascal 5.0</TD> |
---|
2260 | <TD ALIGN="LEFT">(47.09)</TD> |
---|
2261 | <TD ALIGN="LEFT">307.77</TD> |
---|
2262 | <TD ALIGN="LEFT">0.003249</TD> |
---|
2263 | </TR> |
---|
2264 | <TR><TD ALIGN="LEFT">Apple Mac IIcx</TD> |
---|
2265 | <TD ALIGN="LEFT">MacOS</TD> |
---|
2266 | <TD ALIGN="LEFT">Think Pascal 3</TD> |
---|
2267 | <TD ALIGN="LEFT">(42)</TD> |
---|
2268 | <TD ALIGN="LEFT">274.44</TD> |
---|
2269 | <TD ALIGN="LEFT">0.003644</TD> |
---|
2270 | </TR> |
---|
2271 | <TR><TD ALIGN="LEFT">Apple Mac SE/30</TD> |
---|
2272 | <TD ALIGN="LEFT">MacOS</TD> |
---|
2273 | <TD ALIGN="LEFT">Think Pascal 3</TD> |
---|
2274 | <TD ALIGN="LEFT">(42)</TD> |
---|
2275 | <TD ALIGN="LEFT">274.44</TD> |
---|
2276 | <TD ALIGN="LEFT">0.003644</TD> |
---|
2277 | </TR> |
---|
2278 | <TR><TD ALIGN="LEFT">Apple Mac IIcx</TD> |
---|
2279 | <TD ALIGN="LEFT">MacOS</TD> |
---|
2280 | <TD ALIGN="LEFT">Lightspeed Pascal 2</TD> |
---|
2281 | <TD ALIGN="LEFT">(39.84)</TD> |
---|
2282 | <TD ALIGN="LEFT">260.44</TD> |
---|
2283 | <TD ALIGN="LEFT">0.003840</TD> |
---|
2284 | </TR> |
---|
2285 | <TR><TD ALIGN="LEFT">Apple Mac IIcx</TD> |
---|
2286 | <TD ALIGN="LEFT">MacOS</TD> |
---|
2287 | <TD ALIGN="LEFT">Lightspeed Pascal 2#</TD> |
---|
2288 | <TD ALIGN="LEFT">(39.69)</TD> |
---|
2289 | <TD ALIGN="LEFT">259.33</TD> |
---|
2290 | <TD ALIGN="LEFT">0.003856</TD> |
---|
2291 | </TR> |
---|
2292 | <TR><TD ALIGN="LEFT">Zenith Z386 (16MHz)</TD> |
---|
2293 | <TD ALIGN="LEFT">MSDOS</TD> |
---|
2294 | <TD ALIGN="LEFT">Turbo Pascal 5.0</TD> |
---|
2295 | <TD ALIGN="LEFT">(38.27)</TD> |
---|
2296 | <TD ALIGN="LEFT">256.67</TD> |
---|
2297 | <TD ALIGN="LEFT">0.003896</TD> |
---|
2298 | </TR> |
---|
2299 | <TR><TD ALIGN="LEFT">Macintosh SE/30</TD> |
---|
2300 | <TD ALIGN="LEFT">MacOS</TD> |
---|
2301 | <TD ALIGN="LEFT">Think C</TD> |
---|
2302 | <TD ALIGN="LEFT">(13.6)</TD> |
---|
2303 | <TD ALIGN="LEFT">251.56</TD> |
---|
2304 | <TD ALIGN="LEFT">0.003975</TD> |
---|
2305 | </TR> |
---|
2306 | <TR><TD ALIGN="LEFT">386SX (16 MHz)</TD> |
---|
2307 | <TD ALIGN="LEFT">MSDOS</TD> |
---|
2308 | <TD ALIGN="LEFT">Turbo Pascal 6.0</TD> |
---|
2309 | <TD ALIGN="LEFT">(34)</TD> |
---|
2310 | <TD ALIGN="LEFT">222.41</TD> |
---|
2311 | <TD ALIGN="LEFT">0.004496</TD> |
---|
2312 | </TR> |
---|
2313 | <TR><TD ALIGN="LEFT">386SX (16 MHz)</TD> |
---|
2314 | <TD ALIGN="LEFT">MSDOS</TD> |
---|
2315 | <TD ALIGN="LEFT">Microsoft Quick C</TD> |
---|
2316 | <TD ALIGN="LEFT">(12.01)</TD> |
---|
2317 | <TD ALIGN="LEFT">222.41</TD> |
---|
2318 | <TD ALIGN="LEFT">0.004496</TD> |
---|
2319 | </TR> |
---|
2320 | <TR><TD ALIGN="LEFT">Sequent-S81</TD> |
---|
2321 | <TD ALIGN="LEFT">DYNIX</TD> |
---|
2322 | <TD ALIGN="LEFT">Silicon Valley Pascal</TD> |
---|
2323 | <TD ALIGN="LEFT">(13.0)</TD> |
---|
2324 | <TD ALIGN="LEFT">84.89</TD> |
---|
2325 | <TD ALIGN="LEFT">0.011780</TD> |
---|
2326 | </TR> |
---|
2327 | <TR><TD ALIGN="LEFT">VAX 11/785</TD> |
---|
2328 | <TD ALIGN="LEFT">Unix</TD> |
---|
2329 | <TD ALIGN="LEFT">Berkeley Pascal</TD> |
---|
2330 | <TD ALIGN="LEFT">(11.9)</TD> |
---|
2331 | <TD ALIGN="LEFT">77.77</TD> |
---|
2332 | <TD ALIGN="LEFT">0.012857</TD> |
---|
2333 | </TR> |
---|
2334 | <TR><TD ALIGN="LEFT">80486-33</TD> |
---|
2335 | <TD ALIGN="LEFT">MSDOS</TD> |
---|
2336 | <TD ALIGN="LEFT">Turbo Pascal 6.0</TD> |
---|
2337 | <TD ALIGN="LEFT">(11.46)</TD> |
---|
2338 | <TD ALIGN="LEFT">74.89</TD> |
---|
2339 | <TD ALIGN="LEFT">0.013353</TD> |
---|
2340 | </TR> |
---|
2341 | <TR><TD ALIGN="LEFT">Sun 3/60</TD> |
---|
2342 | <TD ALIGN="LEFT">SunOS</TD> |
---|
2343 | <TD ALIGN="LEFT">Sun C</TD> |
---|
2344 | <TD ALIGN="LEFT">(3.93)</TD> |
---|
2345 | <TD ALIGN="LEFT">72.67</TD> |
---|
2346 | <TD ALIGN="LEFT">0.013761</TD> |
---|
2347 | </TR> |
---|
2348 | <TR><TD ALIGN="LEFT">NeXT Cube (68030)</TD> |
---|
2349 | <TD ALIGN="LEFT">Mach</TD> |
---|
2350 | <TD ALIGN="LEFT">Gnu C</TD> |
---|
2351 | <TD ALIGN="LEFT">(2.608)</TD> |
---|
2352 | <TD ALIGN="LEFT">48.256</TD> |
---|
2353 | <TD ALIGN="LEFT">0.02072</TD> |
---|
2354 | </TR> |
---|
2355 | <TR><TD ALIGN="LEFT">Sequent S-81</TD> |
---|
2356 | <TD ALIGN="LEFT">DYNIX</TD> |
---|
2357 | <TD ALIGN="LEFT">Sequent Symmetry C</TD> |
---|
2358 | <TD ALIGN="LEFT">(2.604)</TD> |
---|
2359 | <TD ALIGN="LEFT">48.182</TD> |
---|
2360 | <TD ALIGN="LEFT">0.02075</TD> |
---|
2361 | </TR> |
---|
2362 | <TR><TD ALIGN="LEFT">VAXstation 3500</TD> |
---|
2363 | <TD ALIGN="LEFT">Unix</TD> |
---|
2364 | <TD ALIGN="LEFT">Berkeley Pascal</TD> |
---|
2365 | <TD ALIGN="LEFT">(7.3)</TD> |
---|
2366 | <TD ALIGN="LEFT">47.777</TD> |
---|
2367 | <TD ALIGN="LEFT">0.02093</TD> |
---|
2368 | </TR> |
---|
2369 | <TR><TD ALIGN="LEFT">Sequent S-81</TD> |
---|
2370 | <TD ALIGN="LEFT">DYNIX</TD> |
---|
2371 | <TD ALIGN="LEFT">Berkeley Pascal</TD> |
---|
2372 | <TD ALIGN="LEFT">(5.6)</TD> |
---|
2373 | <TD ALIGN="LEFT">36.600</TD> |
---|
2374 | <TD ALIGN="LEFT">0.02732</TD> |
---|
2375 | </TR> |
---|
2376 | <TR><TD ALIGN="LEFT">Unisys 7000/40</TD> |
---|
2377 | <TD ALIGN="LEFT">Unix</TD> |
---|
2378 | <TD ALIGN="LEFT">Berkeley Pascal</TD> |
---|
2379 | <TD ALIGN="LEFT">(5.24)</TD> |
---|
2380 | <TD ALIGN="LEFT">34.244</TD> |
---|
2381 | <TD ALIGN="LEFT">0.02920</TD> |
---|
2382 | </TR> |
---|
2383 | <TR><TD ALIGN="LEFT">VAX 8600</TD> |
---|
2384 | <TD ALIGN="LEFT">VMS</TD> |
---|
2385 | <TD ALIGN="LEFT">DEC VAX Pascal</TD> |
---|
2386 | <TD ALIGN="LEFT">(3.96)</TD> |
---|
2387 | <TD ALIGN="LEFT">25.889</TD> |
---|
2388 | <TD ALIGN="LEFT">0.03863</TD> |
---|
2389 | </TR> |
---|
2390 | <TR><TD ALIGN="LEFT">Sun SPARC IPX</TD> |
---|
2391 | <TD ALIGN="LEFT">SunOS</TD> |
---|
2392 | <TD ALIGN="LEFT">Gnu C version 2.1</TD> |
---|
2393 | <TD ALIGN="LEFT">(1.28)</TD> |
---|
2394 | <TD ALIGN="LEFT">23.689</TD> |
---|
2395 | <TD ALIGN="LEFT">0.04221</TD> |
---|
2396 | </TR> |
---|
2397 | <TR><TD ALIGN="LEFT">VAX 6000-530</TD> |
---|
2398 | <TD ALIGN="LEFT">VMS</TD> |
---|
2399 | <TD ALIGN="LEFT">DEC C</TD> |
---|
2400 | <TD ALIGN="LEFT">(0.858)</TD> |
---|
2401 | <TD ALIGN="LEFT">15.867</TD> |
---|
2402 | <TD ALIGN="LEFT">0.06303</TD> |
---|
2403 | </TR> |
---|
2404 | <TR><TD ALIGN="LEFT">VAXstation 4000</TD> |
---|
2405 | <TD ALIGN="LEFT">VMS</TD> |
---|
2406 | <TD ALIGN="LEFT">DEC C</TD> |
---|
2407 | <TD ALIGN="LEFT">(0.809)</TD> |
---|
2408 | <TD ALIGN="LEFT">14.978</TD> |
---|
2409 | <TD ALIGN="LEFT">0.06677</TD> |
---|
2410 | </TR> |
---|
2411 | <TR><TD ALIGN="LEFT">IBM RS/6000 540</TD> |
---|
2412 | <TD ALIGN="LEFT">AIX</TD> |
---|
2413 | <TD ALIGN="LEFT">XLP Pascal</TD> |
---|
2414 | <TD ALIGN="LEFT">(2.276)</TD> |
---|
2415 | <TD ALIGN="LEFT">14.866</TD> |
---|
2416 | <TD ALIGN="LEFT">0.06726</TD> |
---|
2417 | </TR> |
---|
2418 | <TR><TD ALIGN="LEFT">NeXTstation(040/25)</TD> |
---|
2419 | <TD ALIGN="LEFT">Mach</TD> |
---|
2420 | <TD ALIGN="LEFT">Gnu C</TD> |
---|
2421 | <TD ALIGN="LEFT">(0.75)</TD> |
---|
2422 | <TD ALIGN="LEFT">13.867</TD> |
---|
2423 | <TD ALIGN="LEFT">0.07212</TD> |
---|
2424 | </TR> |
---|
2425 | <TR><TD ALIGN="LEFT">Sun SPARC IPX</TD> |
---|
2426 | <TD ALIGN="LEFT">SunOS</TD> |
---|
2427 | <TD ALIGN="LEFT">Sun C</TD> |
---|
2428 | <TD ALIGN="LEFT">(0.68)</TD> |
---|
2429 | <TD ALIGN="LEFT">12.580</TD> |
---|
2430 | <TD ALIGN="LEFT">0.07951</TD> |
---|
2431 | </TR> |
---|
2432 | <TR><TD ALIGN="LEFT">486DX (33 MHz)</TD> |
---|
2433 | <TD ALIGN="LEFT">Linux</TD> |
---|
2434 | <TD ALIGN="LEFT">Gnu C #</TD> |
---|
2435 | <TD ALIGN="LEFT">(0.63)</TD> |
---|
2436 | <TD ALIGN="LEFT">11.666</TD> |
---|
2437 | <TD ALIGN="LEFT">0.08571</TD> |
---|
2438 | </TR> |
---|
2439 | <TR><TD ALIGN="LEFT">Sun SPARCstation-1</TD> |
---|
2440 | <TD ALIGN="LEFT">Unix</TD> |
---|
2441 | <TD ALIGN="LEFT">Sun Pascal</TD> |
---|
2442 | <TD ALIGN="LEFT">(1.7)</TD> |
---|
2443 | <TD ALIGN="LEFT">11.111</TD> |
---|
2444 | <TD ALIGN="LEFT">0.09000</TD> |
---|
2445 | </TR> |
---|
2446 | <TR><TD ALIGN="LEFT">DECstation 5000/200</TD> |
---|
2447 | <TD ALIGN="LEFT">Unix</TD> |
---|
2448 | <TD ALIGN="LEFT">DEC Ultrix C</TD> |
---|
2449 | <TD ALIGN="LEFT">(0.45)</TD> |
---|
2450 | <TD ALIGN="LEFT">8.333</TD> |
---|
2451 | <TD ALIGN="LEFT">0.12000</TD> |
---|
2452 | </TR> |
---|
2453 | <TR><TD ALIGN="LEFT">Sun SPARC 1+</TD> |
---|
2454 | <TD ALIGN="LEFT">SunOS</TD> |
---|
2455 | <TD ALIGN="LEFT">Sun C</TD> |
---|
2456 | <TD ALIGN="LEFT">(0.40)</TD> |
---|
2457 | <TD ALIGN="LEFT">7.400</TD> |
---|
2458 | <TD ALIGN="LEFT">0.13513</TD> |
---|
2459 | </TR> |
---|
2460 | <TR><TD ALIGN="LEFT">DECstation 3100</TD> |
---|
2461 | <TD ALIGN="LEFT">Unix</TD> |
---|
2462 | <TD ALIGN="LEFT">DEC Ultrix Pascal</TD> |
---|
2463 | <TD ALIGN="LEFT">(0.77)</TD> |
---|
2464 | <TD ALIGN="LEFT">5.022</TD> |
---|
2465 | <TD ALIGN="LEFT">0.1991</TD> |
---|
2466 | </TR> |
---|
2467 | <TR><TD ALIGN="LEFT">IBM 3090-300E</TD> |
---|
2468 | <TD ALIGN="LEFT">AIX</TD> |
---|
2469 | <TD ALIGN="LEFT">Metaware High C</TD> |
---|
2470 | <TD ALIGN="LEFT">(0.27)</TD> |
---|
2471 | <TD ALIGN="LEFT">5.000</TD> |
---|
2472 | <TD ALIGN="LEFT">0.2000</TD> |
---|
2473 | </TR> |
---|
2474 | <TR><TD ALIGN="LEFT">DECstation 5000/125</TD> |
---|
2475 | <TD ALIGN="LEFT">Unix</TD> |
---|
2476 | <TD ALIGN="LEFT">DEC Ultrix C</TD> |
---|
2477 | <TD ALIGN="LEFT">(0.267)</TD> |
---|
2478 | <TD ALIGN="LEFT">4.933</TD> |
---|
2479 | <TD ALIGN="LEFT">0.2027</TD> |
---|
2480 | </TR> |
---|
2481 | <TR><TD ALIGN="LEFT">DECstation 5000/200</TD> |
---|
2482 | <TD ALIGN="LEFT">Unix</TD> |
---|
2483 | <TD ALIGN="LEFT">DEC Ultrix C</TD> |
---|
2484 | <TD ALIGN="LEFT">(0.256)</TD> |
---|
2485 | <TD ALIGN="LEFT">4.733</TD> |
---|
2486 | <TD ALIGN="LEFT">0.2113</TD> |
---|
2487 | </TR> |
---|
2488 | <TR><TD ALIGN="LEFT">Sun SPARC 4/50</TD> |
---|
2489 | <TD ALIGN="LEFT">SunOS</TD> |
---|
2490 | <TD ALIGN="LEFT">Sun C</TD> |
---|
2491 | <TD ALIGN="LEFT">(0.249)</TD> |
---|
2492 | <TD ALIGN="LEFT">4.607</TD> |
---|
2493 | <TD ALIGN="LEFT">0.2171</TD> |
---|
2494 | </TR> |
---|
2495 | <TR><TD ALIGN="LEFT">DEC 3000/400 AXP</TD> |
---|
2496 | <TD ALIGN="LEFT">Unix</TD> |
---|
2497 | <TD ALIGN="LEFT">DEC C</TD> |
---|
2498 | <TD ALIGN="LEFT">(0.224)</TD> |
---|
2499 | <TD ALIGN="LEFT">4.144</TD> |
---|
2500 | <TD ALIGN="LEFT">0.2413</TD> |
---|
2501 | </TR> |
---|
2502 | <TR><TD ALIGN="LEFT">DECstation 5000/240</TD> |
---|
2503 | <TD ALIGN="LEFT">Unix</TD> |
---|
2504 | <TD ALIGN="LEFT">DEC Ultrix C</TD> |
---|
2505 | <TD ALIGN="LEFT">(0.1889)</TD> |
---|
2506 | <TD ALIGN="LEFT">3.496</TD> |
---|
2507 | <TD ALIGN="LEFT">0.2861</TD> |
---|
2508 | </TR> |
---|
2509 | <TR><TD ALIGN="LEFT">SGI Iris R4000</TD> |
---|
2510 | <TD ALIGN="LEFT">Unix</TD> |
---|
2511 | <TD ALIGN="LEFT">SGI C</TD> |
---|
2512 | <TD ALIGN="LEFT">(0.184)</TD> |
---|
2513 | <TD ALIGN="LEFT">3.404</TD> |
---|
2514 | <TD ALIGN="LEFT">0.2937</TD> |
---|
2515 | </TR> |
---|
2516 | <TR><TD ALIGN="LEFT">IBM 3090-300E</TD> |
---|
2517 | <TD ALIGN="LEFT">VM</TD> |
---|
2518 | <TD ALIGN="LEFT">Pascal VS</TD> |
---|
2519 | <TD ALIGN="LEFT">(0.464)</TD> |
---|
2520 | <TD ALIGN="LEFT">3.022</TD> |
---|
2521 | <TD ALIGN="LEFT">0.3309</TD> |
---|
2522 | </TR> |
---|
2523 | <TR><TD ALIGN="LEFT">DECstation 5000/200</TD> |
---|
2524 | <TD ALIGN="LEFT">Unix</TD> |
---|
2525 | <TD ALIGN="LEFT">DEC Ultrix Pascal</TD> |
---|
2526 | <TD ALIGN="LEFT">(0.39)</TD> |
---|
2527 | <TD ALIGN="LEFT">2.533</TD> |
---|
2528 | <TD ALIGN="LEFT">0.3947</TD> |
---|
2529 | </TR> |
---|
2530 | <TR><TD ALIGN="LEFT">Pentium 120</TD> |
---|
2531 | <TD ALIGN="LEFT">Linux</TD> |
---|
2532 | <TD ALIGN="LEFT">Gnu C</TD> |
---|
2533 | <TD ALIGN="LEFT">1.848</TD> |
---|
2534 | <TD ALIGN="LEFT">1.994</TD> |
---|
2535 | <TD ALIGN="LEFT">0.5016</TD> |
---|
2536 | </TR> |
---|
2537 | <TR><TD ALIGN="LEFT">Pentium Pro 180</TD> |
---|
2538 | <TD ALIGN="LEFT">Linux</TD> |
---|
2539 | <TD ALIGN="LEFT">Gnu C</TD> |
---|
2540 | <TD ALIGN="LEFT">1.009</TD> |
---|
2541 | <TD ALIGN="LEFT">1.088</TD> |
---|
2542 | <TD ALIGN="LEFT">0.9353</TD> |
---|
2543 | </TR> |
---|
2544 | <TR><TD ALIGN="LEFT">Pentium 266 MMX</TD> |
---|
2545 | <TD ALIGN="LEFT">Linux</TD> |
---|
2546 | <TD ALIGN="LEFT">Gnu C (PHYLIP 3.5)</TD> |
---|
2547 | <TD ALIGN="LEFT">(0.054)</TD> |
---|
2548 | <TD ALIGN="LEFT">1.0</TD> |
---|
2549 | <TD ALIGN="LEFT">1.0</TD> |
---|
2550 | </TR> |
---|
2551 | <TR><TD ALIGN="LEFT">Pentium 266 MMX</TD> |
---|
2552 | <TD ALIGN="LEFT">Linux</TD> |
---|
2553 | <TD ALIGN="LEFT">Gnu C</TD> |
---|
2554 | <TD ALIGN="LEFT">0.927</TD> |
---|
2555 | <TD ALIGN="LEFT">1.0</TD> |
---|
2556 | <TD ALIGN="LEFT">1.0</TD> |
---|
2557 | </TR> |
---|
2558 | <TR><TD ALIGN="LEFT">Pentium 200</TD> |
---|
2559 | <TD ALIGN="LEFT">Linux</TD> |
---|
2560 | <TD ALIGN="LEFT">Gnu C</TD> |
---|
2561 | <TD ALIGN="LEFT">0.853</TD> |
---|
2562 | <TD ALIGN="LEFT">0.9202</TD> |
---|
2563 | <TD ALIGN="LEFT">1.2647</TD> |
---|
2564 | </TR> |
---|
2565 | <TR><TD ALIGN="LEFT">SGI PowerChallenge</TD> |
---|
2566 | <TD ALIGN="LEFT">Irix</TD> |
---|
2567 | <TD ALIGN="LEFT">Gnu C</TD> |
---|
2568 | <TD ALIGN="LEFT">0.844</TD> |
---|
2569 | <TD ALIGN="LEFT">0.9297</TD> |
---|
2570 | <TD ALIGN="LEFT">1.0756</TD> |
---|
2571 | </TR> |
---|
2572 | <TR><TD ALIGN="LEFT">DEC Alpha 400 4/233</TD> |
---|
2573 | <TD ALIGN="LEFT">DUNIX</TD> |
---|
2574 | <TD ALIGN="LEFT">Digital C (cc -fast)</TD> |
---|
2575 | <TD ALIGN="LEFT">0.730</TD> |
---|
2576 | <TD ALIGN="LEFT">0.7875</TD> |
---|
2577 | <TD ALIGN="LEFT">1.2699</TD> |
---|
2578 | </TR> |
---|
2579 | <TR><TD ALIGN="LEFT">Pentium II 500</TD> |
---|
2580 | <TD ALIGN="LEFT">Linux</TD> |
---|
2581 | <TD ALIGN="LEFT">Gnu C</TD> |
---|
2582 | <TD ALIGN="LEFT">0.368</TD> |
---|
2583 | <TD ALIGN="LEFT">0.4053</TD> |
---|
2584 | <TD ALIGN="LEFT">2.467</TD> |
---|
2585 | </TR> |
---|
2586 | <TR><TD ALIGN="LEFT">Compaq/Digital Alpha 500au</TD> |
---|
2587 | <TD ALIGN="LEFT">DUNIX</TD> |
---|
2588 | <TD ALIGN="LEFT">Digital C (cc -fast)</TD> |
---|
2589 | <TD ALIGN="LEFT">0.167</TD> |
---|
2590 | <TD ALIGN="LEFT">0.1805</TD> |
---|
2591 | <TD ALIGN="LEFT">5.541</TD> |
---|
2592 | </TR> |
---|
2593 | </TABLE> |
---|
2594 | </DIV> |
---|
2595 | <P> |
---|
2596 | This benchmark not only reflects integer performance of these machines |
---|
2597 | (as DNAPARS has few floating-point operations) but also the efficiency |
---|
2598 | of the compilers. Some of the machines (the DEC 3000/400 AXP |
---|
2599 | and the IBM RS/6000, in particular) are much faster than this benchmark |
---|
2600 | would indicate. The numerical programs benchmark below gives them a |
---|
2601 | fairer test. The Compaq/Digital Alpha 500au times are exaggerated because, |
---|
2602 | although their compiles are optimized for that processor, the Pentium |
---|
2603 | compiles are not similarly optimized. |
---|
2604 | <P> |
---|
2605 | Note that parallel machines like the Sequent and the SGI PowerChallenge are not |
---|
2606 | really as slow as indicated by the data here, as these runs did nothing to take |
---|
2607 | advantage of their parallelism. |
---|
2608 | <P> |
---|
2609 | These benchmarks have now extended over 13 years, and in the DNAPARS |
---|
2610 | benchmark they extend over a range of 8000-fold in speed! |
---|
2611 | The experience of our laboratory, which seems typical, is that |
---|
2612 | computer power grows by a factor of about 1.85 per year. This is |
---|
2613 | roughly consistent with these benchmarks. |
---|
2614 | <P> |
---|
2615 | For a picture of speeds for a more numerically intensive program, |
---|
2616 | here are benchmarks using DNAML, with the Pentium MMX 266 |
---|
2617 | as the standard. Some of the timings, the ones in parentheses, are |
---|
2618 | using PHYLIP version 3.5, and those are compared to that version run on |
---|
2619 | the Pentium 266. Runs using the PHYLIP 3.4 Pascal version are adjusted |
---|
2620 | using the 386SX timings where both were run. Numbers are |
---|
2621 | total run times (total user time in the case of Unix) over all three data sets. |
---|
2622 | <P> |
---|
2623 | <DIV ALIGN="CENTER"> |
---|
2624 | <TABLE CELLPADDING=3 BORDER="1"> |
---|
2625 | <TR><TD ALIGN="LEFT"><B>Machine</B></TD> |
---|
2626 | <TD ALIGN="LEFT"><B>Operating<BR>System</B></TD> |
---|
2627 | <TD ALIGN="LEFT"><B>Compiler</B></TD> |
---|
2628 | <TD ALIGN="RIGHT"><B>Seconds</B></TD> |
---|
2629 | <TD ALIGN="LEFT"><B>Time</B></TD> |
---|
2630 | <TD ALIGN="RIGHT"><B>Relative<BR>Speed</B></TD> |
---|
2631 | </TR> |
---|
2632 | <TR><TD ALIGN="LEFT">386SX 16 Mhz</TD> |
---|
2633 | <TD ALIGN="LEFT">PCDOS</TD> |
---|
2634 | <TD ALIGN="LEFT">Turbo Pascal 6</TD> |
---|
2635 | <TD ALIGN="RIGHT">(7826)</TD> |
---|
2636 | <TD ALIGN="LEFT"> 181.18</TD> |
---|
2637 | <TD ALIGN="RIGHT">0.005519</TD> |
---|
2638 | </TR> |
---|
2639 | <TR><TD ALIGN="LEFT">386SX 16 Mhz</TD> |
---|
2640 | <TD ALIGN="LEFT">PCDOS</TD> |
---|
2641 | <TD ALIGN="LEFT">Quick C</TD> |
---|
2642 | <TD ALIGN="RIGHT">(6549.79)</TD> |
---|
2643 | <TD ALIGN="LEFT"> 181.18</TD> |
---|
2644 | <TD ALIGN="RIGHT">0.005519</TD> |
---|
2645 | </TR> |
---|
2646 | <TR><TD ALIGN="LEFT">Compudyne 486DX/33</TD> |
---|
2647 | <TD ALIGN="LEFT">Linux</TD> |
---|
2648 | <TD ALIGN="LEFT">Gnu C</TD> |
---|
2649 | <TD ALIGN="RIGHT">(1599.9)</TD> |
---|
2650 | <TD ALIGN="LEFT"> 44.26</TD> |
---|
2651 | <TD ALIGN="RIGHT">0.022595</TD> |
---|
2652 | </TR> |
---|
2653 | <TR><TD ALIGN="LEFT">SUN Sparcstation 1+</TD> |
---|
2654 | <TD ALIGN="LEFT">SunOS</TD> |
---|
2655 | <TD ALIGN="LEFT">Sun C</TD> |
---|
2656 | <TD ALIGN="RIGHT">(1402.8)</TD> |
---|
2657 | <TD ALIGN="LEFT"> 38.805</TD> |
---|
2658 | <TD ALIGN="RIGHT">0.025770</TD> |
---|
2659 | </TR> |
---|
2660 | <TR><TD ALIGN="LEFT">Everex STEP 386/20</TD> |
---|
2661 | <TD ALIGN="LEFT">PCDOS</TD> |
---|
2662 | <TD ALIGN="LEFT">Turbo Pascal 5.5</TD> |
---|
2663 | <TD ALIGN="RIGHT">(1440.8)</TD> |
---|
2664 | <TD ALIGN="LEFT"> 33.356</TD> |
---|
2665 | <TD ALIGN="RIGHT"> 0.029980</TD> |
---|
2666 | </TR> |
---|
2667 | <TR><TD ALIGN="LEFT">486DX/33</TD> |
---|
2668 | <TD ALIGN="LEFT">PCDOS</TD> |
---|
2669 | <TD ALIGN="LEFT">Turbo C++</TD> |
---|
2670 | <TD ALIGN="RIGHT">(1107.2)</TD> |
---|
2671 | <TD ALIGN="LEFT"> 30.628</TD> |
---|
2672 | <TD ALIGN="RIGHT">0.032650</TD> |
---|
2673 | </TR> |
---|
2674 | <TR><TD ALIGN="LEFT">Compudyne 486DX/33</TD> |
---|
2675 | <TD ALIGN="LEFT">PCDOS</TD> |
---|
2676 | <TD ALIGN="LEFT">Waterloo C/386</TD> |
---|
2677 | <TD ALIGN="RIGHT">(1045.78)</TD> |
---|
2678 | <TD ALIGN="LEFT"> 28.929</TD> |
---|
2679 | <TD ALIGN="RIGHT">0.034567</TD> |
---|
2680 | </TR> |
---|
2681 | <TR><TD ALIGN="LEFT">Sun SPARCstation IPX</TD> |
---|
2682 | <TD ALIGN="LEFT">SunOS</TD> |
---|
2683 | <TD ALIGN="LEFT">Gnu C</TD> |
---|
2684 | <TD ALIGN="RIGHT"> (960.2)</TD> |
---|
2685 | <TD ALIGN="LEFT"> 26.562</TD> |
---|
2686 | <TD ALIGN="RIGHT">0.037648</TD> |
---|
2687 | </TR> |
---|
2688 | <TR><TD ALIGN="LEFT">NeXTstation(68040/25)</TD> |
---|
2689 | <TD ALIGN="LEFT">Mach</TD> |
---|
2690 | <TD ALIGN="LEFT">Gnu C</TD> |
---|
2691 | <TD ALIGN="RIGHT"> (916.6)</TD> |
---|
2692 | <TD ALIGN="LEFT"> 25.355</TD> |
---|
2693 | <TD ALIGN="RIGHT">0.039439</TD> |
---|
2694 | </TR> |
---|
2695 | <TR><TD ALIGN="LEFT">486DX/33</TD> |
---|
2696 | <TD ALIGN="LEFT">PCDOS</TD> |
---|
2697 | <TD ALIGN="LEFT">Waterloo C/386</TD> |
---|
2698 | <TD ALIGN="RIGHT"> (861.0)</TD> |
---|
2699 | <TD ALIGN="LEFT"> 23.817</TD> |
---|
2700 | <TD ALIGN="RIGHT">0.041986</TD> |
---|
2701 | </TR> |
---|
2702 | <TR><TD ALIGN="LEFT">Sun SPARCstation IPX</TD> |
---|
2703 | <TD ALIGN="LEFT">SunOS</TD> |
---|
2704 | <TD ALIGN="LEFT">Sun C</TD> |
---|
2705 | <TD ALIGN="RIGHT"> (787.7)</TD> |
---|
2706 | <TD ALIGN="LEFT"> 21.790</TD> |
---|
2707 | <TD ALIGN="RIGHT">0.045893</TD> |
---|
2708 | </TR> |
---|
2709 | <TR><TD ALIGN="LEFT">486DX/33</TD> |
---|
2710 | <TD ALIGN="LEFT">PCDOS</TD> |
---|
2711 | <TD ALIGN="LEFT">Gnu C</TD> |
---|
2712 | <TD ALIGN="RIGHT"> (650.9)</TD> |
---|
2713 | <TD ALIGN="LEFT"> 18.006</TD> |
---|
2714 | <TD ALIGN="RIGHT">0.05554</TD> |
---|
2715 | </TR> |
---|
2716 | <TR><TD ALIGN="LEFT">VAX 6000-530</TD> |
---|
2717 | <TD ALIGN="LEFT">VMS</TD> |
---|
2718 | <TD ALIGN="LEFT">DEC C</TD> |
---|
2719 | <TD ALIGN="RIGHT"> (637.0)</TD> |
---|
2720 | <TD ALIGN="LEFT"> 17.621</TD> |
---|
2721 | <TD ALIGN="RIGHT">0.05675</TD> |
---|
2722 | </TR> |
---|
2723 | <TR><TD ALIGN="LEFT">DECstation 5000/200</TD> |
---|
2724 | <TD ALIGN="LEFT">Unix</TD> |
---|
2725 | <TD ALIGN="LEFT">DEC Ultrix RISC C</TD> |
---|
2726 | <TD ALIGN="RIGHT"> (423.3)</TD> |
---|
2727 | <TD ALIGN="LEFT"> 11.710</TD> |
---|
2728 | <TD ALIGN="RIGHT">0.08540</TD> |
---|
2729 | </TR> |
---|
2730 | <TR><TD ALIGN="LEFT">IBM 3090-300E</TD> |
---|
2731 | <TD ALIGN="LEFT">AIX</TD> |
---|
2732 | <TD ALIGN="LEFT">Metaware High C</TD> |
---|
2733 | <TD ALIGN="RIGHT"> (201.8)</TD> |
---|
2734 | <TD ALIGN="LEFT"> 5.582</TD> |
---|
2735 | <TD ALIGN="RIGHT">0.17914</TD> |
---|
2736 | </TR> |
---|
2737 | <TR><TD ALIGN="LEFT">Convex C240/1024</TD> |
---|
2738 | <TD ALIGN="LEFT">Unix</TD> |
---|
2739 | <TD ALIGN="LEFT">C</TD> |
---|
2740 | <TD ALIGN="RIGHT"> (101.6)</TD> |
---|
2741 | <TD ALIGN="LEFT"> 2.8105</TD> |
---|
2742 | <TD ALIGN="RIGHT">0.35581</TD> |
---|
2743 | </TR> |
---|
2744 | <TR><TD ALIGN="LEFT">DEC 3000/400 AXP</TD> |
---|
2745 | <TD ALIGN="LEFT">Unix</TD> |
---|
2746 | <TD ALIGN="LEFT">DEC C</TD> |
---|
2747 | <TD ALIGN="RIGHT"> (98.29)</TD> |
---|
2748 | <TD ALIGN="LEFT"> 2.7189</TD> |
---|
2749 | <TD ALIGN="RIGHT">0.36779</TD> |
---|
2750 | </TR> |
---|
2751 | <TR><TD ALIGN="LEFT">Pentium 120</TD> |
---|
2752 | <TD ALIGN="LEFT">Linux</TD> |
---|
2753 | <TD ALIGN="LEFT">Gnu C</TD> |
---|
2754 | <TD ALIGN="RIGHT">25.26</TD> |
---|
2755 | <TD ALIGN="LEFT">3.3906</TD> |
---|
2756 | <TD ALIGN="RIGHT">0.29493</TD> |
---|
2757 | </TR> |
---|
2758 | <TR><TD ALIGN="LEFT">Pentium Pro 180</TD> |
---|
2759 | <TD ALIGN="LEFT">Linux</TD> |
---|
2760 | <TD ALIGN="LEFT">Gnu C</TD> |
---|
2761 | <TD ALIGN="RIGHT">18.88</TD> |
---|
2762 | <TD ALIGN="LEFT">2.5342</TD> |
---|
2763 | <TD ALIGN="RIGHT">0.3946</TD> |
---|
2764 | </TR> |
---|
2765 | <TR><TD ALIGN="LEFT">Pentium 200</TD> |
---|
2766 | <TD ALIGN="LEFT">Linux</TD> |
---|
2767 | <TD ALIGN="LEFT">Gnu C</TD> |
---|
2768 | <TD ALIGN="RIGHT">16.51</TD> |
---|
2769 | <TD ALIGN="LEFT">2.2161</TD> |
---|
2770 | <TD ALIGN="RIGHT">0.4512</TD> |
---|
2771 | </TR> |
---|
2772 | <TR><TD ALIGN="LEFT">SGI PowerChallenge</TD> |
---|
2773 | <TD ALIGN="LEFT">IRIX</TD> |
---|
2774 | <TD ALIGN="LEFT">Gnu C</TD> |
---|
2775 | <TD ALIGN="RIGHT">12.446</TD> |
---|
2776 | <TD ALIGN="LEFT">1.6706</TD> |
---|
2777 | <TD ALIGN="RIGHT">0.5985</TD> |
---|
2778 | </TR> |
---|
2779 | <TR><TD ALIGN="LEFT">Pentium MMX 266</TD> |
---|
2780 | <TD ALIGN="LEFT">Linux</TD> |
---|
2781 | <TD ALIGN="LEFT">Gnu C (PHYLIP 3.5)</TD> |
---|
2782 | <TD ALIGN="RIGHT">(36.15)</TD> |
---|
2783 | <TD ALIGN="LEFT"> 1.0</TD> |
---|
2784 | <TD ALIGN="RIGHT"> 1.0</TD> |
---|
2785 | </TR> |
---|
2786 | <TR><TD ALIGN="LEFT">DEC Alpha 400 4/233</TD> |
---|
2787 | <TD ALIGN="LEFT">Linux</TD> |
---|
2788 | <TD ALIGN="LEFT">Gnu C (cc -fast)</TD> |
---|
2789 | <TD ALIGN="RIGHT">8.0418</TD> |
---|
2790 | <TD ALIGN="LEFT">1.0792</TD> |
---|
2791 | <TD ALIGN="RIGHT">0.9266</TD> |
---|
2792 | </TR> |
---|
2793 | <TR><TD ALIGN="LEFT">Pentium MMX 266</TD> |
---|
2794 | <TD ALIGN="LEFT">Linux</TD> |
---|
2795 | <TD ALIGN="LEFT">Gnu C</TD> |
---|
2796 | <TD ALIGN="RIGHT">7.45</TD> |
---|
2797 | <TD ALIGN="LEFT"> 1.0</TD> |
---|
2798 | <TD ALIGN="RIGHT"> 1.0</TD> |
---|
2799 | </TR> |
---|
2800 | <TR><TD ALIGN="LEFT">Pentium II 500</TD> |
---|
2801 | <TD ALIGN="LEFT">Linux</TD> |
---|
2802 | <TD ALIGN="LEFT">Gnu C</TD> |
---|
2803 | <TD ALIGN="RIGHT">6.02</TD> |
---|
2804 | <TD ALIGN="LEFT"> 0.8081</TD> |
---|
2805 | <TD ALIGN="RIGHT"> 1.2375</TD> |
---|
2806 | </TR> |
---|
2807 | <TR><TD ALIGN="LEFT">Compaq/Digital Alpha 500au</TD> |
---|
2808 | <TD ALIGN="LEFT">Linux</TD> |
---|
2809 | <TD ALIGN="LEFT">Gnu C (cc -fast)</TD> |
---|
2810 | <TD ALIGN="RIGHT">0.9383</TD> |
---|
2811 | <TD ALIGN="LEFT"> 0.1259</TD> |
---|
2812 | <TD ALIGN="RIGHT">7.940</TD> |
---|
2813 | </TR> |
---|
2814 | </TABLE> |
---|
2815 | </DIV> |
---|
2816 | <P> |
---|
2817 | As before, the parallel machines such as the Convex and the SGI PowerChallenge |
---|
2818 | were only run using one processor, which does not take into account the |
---|
2819 | gain that could be obtained by parallelizing the programs. The speed of the |
---|
2820 | Compaq/Digital Alpha 500au is exaggerated because it was compiled in a way |
---|
2821 | optimized for its processor, while the Pentium compiles were not. |
---|
2822 | <P> |
---|
2823 | You are invited to send me figures for your machine for |
---|
2824 | inclusion in future tables. Use the data sets above and compute the total |
---|
2825 | times for DNAPARS and for DNAML for the three data sets (setting the |
---|
2826 | frequencies of the four bases to 0.25 each for the DNAML runs). Be sure to |
---|
2827 | tell me the name and version of your compiler, and the version of PHYLIP you |
---|
2828 | tested. |
---|
2829 | If the times are too small to be measured accurately, obtain the times |
---|
2830 | for ten data sets (the Multiple data sets option) and divide by 10. |
---|
2831 | <P> |
---|
2832 | <A NAME="comments"><HR><P></A> |
---|
2833 | <DIV ALIGN="CENTER"> |
---|
2834 | <H2>General Comments on Adapting<BR> |
---|
2835 | the Package to Different Computer Systems</H2></DIV> |
---|
2836 | <P> |
---|
2837 | In the sections following you will find instructions on how to adapt the |
---|
2838 | programs to different computers and compilers. The programs should compile |
---|
2839 | without alteration on most versions of C. They use the "malloc" library |
---|
2840 | or "calloc" function to allocate memory so that the upper limits on how many |
---|
2841 | species or how many sites or characters they can run is set by the system memory |
---|
2842 | available to that memory-allocation function. |
---|
2843 | <P> |
---|
2844 | In the document file for each program, I have supplied a small |
---|
2845 | input example, and the output it produces, to help you check whether the |
---|
2846 | programs are running properly. |
---|
2847 | <P> |
---|
2848 | <DIV ALIGN=CENTER> |
---|
2849 | <A NAME="compiling"><HR><P></A> |
---|
2850 | <H2>Compiling the programs</H2> |
---|
2851 | </DIV> |
---|
2852 | <P> |
---|
2853 | If you have not been able to get executables for PHYLIP, you should be |
---|
2854 | able to make your own. This is easy under Unix and Linux, but more |
---|
2855 | difficult if you have a Macintosh or a Windows system. If you have the |
---|
2856 | latter, we stringly recommend you download and use the PowerMac and |
---|
2857 | Windows executables that we distribute. If you do that, you will not need |
---|
2858 | to have any compiler or to do any compiling. I get a certain number of |
---|
2859 | inquiries each year from confused users who are not sure what a compiler |
---|
2860 | is but think they need one. After downloading the executables they |
---|
2861 | contact me and complain that they did not find a compiler included in the |
---|
2862 | package, and would I please e-mail them the compiler. What they really |
---|
2863 | need to do is use the executables and forget about compiling them. |
---|
2864 | <P> |
---|
2865 | Some users may also need to compile the programs in order to modify them. |
---|
2866 | The instructions below will help with this. |
---|
2867 | <P> |
---|
2868 | I will discuss how to compile PHYLIP using one of a number of widely-used |
---|
2869 | compilers. After these I will comment on compiling PHYLIP on other, less |
---|
2870 | widely-used systems. |
---|
2871 | <P> |
---|
2872 | <H3>Unix and Linux</H3> |
---|
2873 | <P> |
---|
2874 | In Unix and Linux (which is Unix in all important functional respects, if |
---|
2875 | not in all |
---|
2876 | legal respects) it is easy to compile PHYLIP yourself, which is why we have |
---|
2877 | generally not bothered to distribute executables for Unix. Unix (and Linux) |
---|
2878 | systems generally have a C compiler and have the <TT>make</TT> utility. We |
---|
2879 | distribute with the PHYLIP source code a Unix-compatible <TT>Makefile</TT>. |
---|
2880 | <P> |
---|
2881 | After you have finished unpacking the Documentation and Source Code |
---|
2882 | archive, you will find that you have created a directory <TT>phylip</TT> |
---|
2883 | in which there are three |
---|
2884 | subdirectories, called <TT>exe</TT>, <TT>src</TT>, and <TT>doc</TT>. |
---|
2885 | There is also an HTML web page, <TT>phylip.html</TT>. The <TT>exe</TT> |
---|
2886 | directory |
---|
2887 | will be empty, <TT>src</TT> contains the source code files, including the |
---|
2888 | <TT>Makefile</TT>. Directory <TT>doc</TT> contains the documentation files. |
---|
2889 | <P> |
---|
2890 | Enter the <TT>src</TT> directory. Before you compile, you will want to |
---|
2891 | look at the makefile and see whether you want to alter the compilation |
---|
2892 | command. There are careful instructions in the Makefile telling you how to |
---|
2893 | do this. To compile all the programs just type: |
---|
2894 | <P> |
---|
2895 | <TT>make install</TT> |
---|
2896 | <P> |
---|
2897 | You will then see the compiling commands as they happen, with |
---|
2898 | occasional warning messages. If these are warnings, rather than errors, |
---|
2899 | they are not too serious. A typical warning would be like this: |
---|
2900 | <P> |
---|
2901 | <TT>dnaml.c:1204: warning: static declaration for re_move follows non-static</TT> |
---|
2902 | <P> |
---|
2903 | After a time the compiler will finish compiling. If you have done a |
---|
2904 | <TT>make install</TT> the system will then move the executables into the |
---|
2905 | <TT>exe</TT> subdirectory and also save space by erasing all the relocatable |
---|
2906 | object files that were produced in the process. You should be left with |
---|
2907 | useable executables in the <TT>exe</TT> directory, and the <TT>src</TT> |
---|
2908 | directory should be as before. To run the executables, go into the |
---|
2909 | <TT>exe</TT> directory and type the program name (say <TT>dnaml</TT>). |
---|
2910 | The names of the |
---|
2911 | executables will be the same as the names of the C programs, but without the |
---|
2912 | <TT>.c</TT> suffix. Thus <TT>dnaml.c</TT> compiles to make an executable called <TT>dnaml</TT>. |
---|
2913 | <P> |
---|
2914 | A typical Unix or Linux installation would put the directory <TT>phylip</TT> |
---|
2915 | in <TT>/usr/local</TT>. The name of the executables directory <TT>EXEDIR</TT> |
---|
2916 | could be changed to be <TT>/usr/local/bin</TT>, so that the <TT>make install</TT> |
---|
2917 | command puts the executables there. If the users have <TT>/usr/local/bin</TT> |
---|
2918 | in their paths, the programs would be found when their names are typed. |
---|
2919 | The font files <TT>font1</TT> through <TT>font6</TT> could also be |
---|
2920 | placed there. A batch script containing the lines |
---|
2921 | <P> |
---|
2922 | <PRE> |
---|
2923 | ln -s /usr/local/bin/font1 font1 |
---|
2924 | ln -s /usr/local/bin/font2 font2 |
---|
2925 | ln -s /usr/local/bin/font3 font3 |
---|
2926 | ln -s /usr/local/bin/font4 font4 |
---|
2927 | ln -s /usr/local/bin/font5 font5 |
---|
2928 | ln -s /usr/local/bin/font6 font6 |
---|
2929 | </PRE> |
---|
2930 | <P> |
---|
2931 | could be used to establish links in the user's working directory so that |
---|
2932 | Drawtree and Drawgram would find these font files when users |
---|
2933 | type a name such as <TT>font1</TT> when the program asks |
---|
2934 | them for a font file name. The |
---|
2935 | documentation web pages are in subdirectory <TT>doc</TT> of the |
---|
2936 | main PHYLIP directory, except for one, <TT>phylip.html</TT> which is |
---|
2937 | in the main PHYLIP directory. It has a table of all of the documentation |
---|
2938 | pages, including this one. If users create a bookmark to that page |
---|
2939 | it can be used to access all of the other documentation pages. |
---|
2940 | <P> |
---|
2941 | To compile just one program, such as DNAML, type: |
---|
2942 | <P> |
---|
2943 | <TT>make dnaml</TT> |
---|
2944 | <P> |
---|
2945 | After this compilation, <TT>dnaml</TT> will be in the <TT>src</TT> |
---|
2946 | subdirectory. So will some rrelocatable object code files that |
---|
2947 | were used to create the executable. These have names ending in |
---|
2948 | <TT>.o</TT> - they can safely be deleted. |
---|
2949 | <P> |
---|
2950 | If you have problems with the compilation command, you can edit the |
---|
2951 | <TT>Makefile</TT>. It has careful explanations at its front of how you |
---|
2952 | might want to do so. For example, you might want to change the C |
---|
2953 | compiler name <TT>cc</TT> to the name of the Gnu C compiler, <TT>gcc</TT>. |
---|
2954 | This can be done by removing the comment character <TT>#</TT> from the |
---|
2955 | front of one line, and placing it at the front of a nearby line. |
---|
2956 | How to do so should be clear from the material at the beginning of the |
---|
2957 | <TT>Makefile</TT>. We have included sample lines for using the <TT>gcc</TT> |
---|
2958 | compiler and for using the Cygwin Gnu C++ environment on Windows, as |
---|
2959 | well as the default of <TT>cc</TT>. |
---|
2960 | <P> |
---|
2961 | Some older C compilers (notably the Berkeley C compiler which is |
---|
2962 | included free with some Sun systems) do not adhere to the ANSI C |
---|
2963 | standard (because they were written before it was set down). |
---|
2964 | They have trouble with the function prototypes which are in |
---|
2965 | our programs. We have included an <TT>#ifndef</TT> preprocessor |
---|
2966 | command to eliminate the problem, if you use the switch <TT>-DOLDC</TT> |
---|
2967 | when compiling. Thus with these compilers you need only use this in |
---|
2968 | your C flags (in the Makefile) and compilers such as Berkeley C |
---|
2969 | will cause no trouble. |
---|
2970 | <P> |
---|
2971 | <H3>Macintosh PowerMacs</H3> |
---|
2972 | <P> |
---|
2973 | <B>Compiling with Metrowerks Codewarrior on Macintosh PowerMacs...</B> |
---|
2974 | <P> |
---|
2975 | We shall assume that you have a recent version of the Metrowerks |
---|
2976 | Codewarrior C++ |
---|
2977 | compiler. This description, and the project files that we provide, |
---|
2978 | assume Codewarrior 5.3. We also assume some familiarity with |
---|
2979 | the use of the Codewarrior compiler and its Integrated Development |
---|
2980 | Environment (IDE). |
---|
2981 | <P> |
---|
2982 | Start with our <TT>src</TT> directory (folder) that contains the C source |
---|
2983 | code files such as <TT>dnaml.c</TT> and also the Codewarrior resource |
---|
2984 | files such as <TT>dnaml.rsrc</TT>, which are provided by us. |
---|
2985 | <P> |
---|
2986 | <B>Creating the project file.</B> We will use DnaML as our example. |
---|
2987 | We have provided a full set of project files in the |
---|
2988 | self-extracting Macintosh archive. |
---|
2989 | <EM>If you have them then you do not need |
---|
2990 | to do the items on the following list:</EM> |
---|
2991 | <OL> |
---|
2992 | <LI>Start up the Codewarrior IDE integrated development environment. |
---|
2993 | <LI>Create a new project file by choosing <TT>New...</TT> on the <TT>File</TT> |
---|
2994 | menu. |
---|
2995 | <LI>Type in the project name <TT>dnaml.proj</TT> |
---|
2996 | <LI>On the Project menu on the left side of the <TT>New</TT> window, double-click on <TT>MacOS C/C++ Stationery</TT> |
---|
2997 | <LI>In the <TT>New project</TT> window that opens, click on the triangle |
---|
2998 | to the left of <TT>Standard Console</TT>. |
---|
2999 | <LI>Move the slider at the right of the window down until you reach |
---|
3000 | <TT>SIOUX-WASTE</TT> |
---|
3001 | <LI>Click on the triangle to the left of <TT>SIOUX-WASTE</TT>. This opens |
---|
3002 | another list of choices below. |
---|
3003 | <LI>Click on the menu item <TT>SIOUX-WASTE C PPC</TT>. Press the <TT>OK</TT> button. After a bit a window <TT>dnaml.proj</TT> will open. |
---|
3004 | <LI>Click on the triangle to the left of the <TT>Sources</TT> menu item. A |
---|
3005 | template item called <TT>HelloWorld.c</TT> will open. |
---|
3006 | <LI>Select <TT>HelloWorld.c</TT>. |
---|
3007 | <LI>Open the <TT>Edit</TT> menu at the top of the Mac screen and select |
---|
3008 | <TT>Clear</TT>. A box will open asking if you want to remove <TT>HelloWorld.c</TT> from the project. |
---|
3009 | <LI>Select <TT>OK</TT>. |
---|
3010 | <LI>If the <TT>dnaml.c</TT> file came from the self-extracting Macintosh |
---|
3011 | archive that we distribute, it should show a yellow-and-back-striped Metrowerks |
---|
3012 | icon (if not, as when you get it from some other form of our distribution, |
---|
3013 | you may have to pass it through a program like Microsoft Word, making |
---|
3014 | sure to save it as a Text Only file, to get |
---|
3015 | Metrowerks to be able to see it as a potential source code file). |
---|
3016 | <LI>Drag the <TT>dnaml.c</TT> file onto the <TT>Sources</TT> item in your |
---|
3017 | <TT>dnaml.proj</TT> window. |
---|
3018 | <LI>Drop it onto Sources so that it appears under the <TT>Sources</TT> choice. |
---|
3019 | This may take a few tries -- if it appears above <TT>Sources</TT> grab it |
---|
3020 | and move it again. |
---|
3021 | <LI>Now add the other files that must be compiled with <TT>dnaml.c</TT>. |
---|
3022 | These can be identified by looking at our <TT>Makefile</TT> -- for DnaML |
---|
3023 | they are <TT>seq.c</TT>, <TT>phylip.c</TT>, <TT>seq.h</TT>, and <TT>phylip.h</TT>. Each of them needs to be added to the project file in the same way that |
---|
3024 | <TT>dnaml.c</TT> was. |
---|
3025 | <LI>Drag <TT>dnaml.rsrc</TT> into <TT>Sources</TT> in the same way. It |
---|
3026 | doesn't matter whether it appears before or after <TT>dnaml.c</TT>. |
---|
3027 | <LI>Go to the <TT>Edit</TT> menu and select the <TT>PPC Std C SIOUX-WASTE Settings</TT> item. A window of that name will then open. |
---|
3028 | <LI>Under the <TT>Target</TT> item you will see a <TT>PPC Target</TT> item. |
---|
3029 | Select it. A <TT>PPC Target</TT> window will open to the right. |
---|
3030 | <LI>Change the name in the <TT>File Name</TT> box to be <TT>PHYLIP</TT> |
---|
3031 | <LI>Change the <TT>????</TT> in the <TT>Creator</TT> box to (say) <TT>PHYD</TT> |
---|
3032 | <LI>Change the <TT>Preferred Heap Size</TT> to <TT>1024</TT>. |
---|
3033 | <! need to add selections of PPC Processor here > |
---|
3034 | <! ditto for Global Optimization > |
---|
3035 | <LI>Under <TT>Language Settings</TT> in the left-hand menu of the window, |
---|
3036 | select <TT>C/C++ Language</TT>. A window called <TT>C/C++ Language</TT> |
---|
3037 | will open to the immediate right. |
---|
3038 | <LI>Click on <TT>Require Function Prototypes</TT> to deselect that setting. |
---|
3039 | <LI>Click on the <TT>Save</TT> button at the lower-right of the project |
---|
3040 | settings window. |
---|
3041 | <LI>Close the <TT>PPC Std C SIOUX-WASTE Settings</TT> window using the usual |
---|
3042 | box in the upper-left corner. |
---|
3043 | <LI>On your Desktop you should now find a folder <TT>PHYLIP</TT>. |
---|
3044 | If it has a |
---|
3045 | file called <TT>HelloWorld.c</TT> you may want to discard that file. |
---|
3046 | <LI>In that <TT>PHYLIP</TT> folder you will find a file <TT>dnaml.proj</TT>. |
---|
3047 | <LI>Double-click on that project file. If the Metrowerks is not already open, |
---|
3048 | it should open now. |
---|
3049 | <LI>If a window called <TT>Project Messages</TT> opens and there is a |
---|
3050 | complaint in it about access paths being wrong, you should fix these by |
---|
3051 | selecting the <TT>Reset project entry paths</TT> item in the <TT>Project</TT> |
---|
3052 | menu. |
---|
3053 | <LI>Select the <TT>Make</TT> item in the <TT>Project</TT> menu. |
---|
3054 | <LI>In the <TT>Project</TT> menu, select <TT>Make</TT> |
---|
3055 | </OL> |
---|
3056 | <B>Compiling a program once its resource file is available.</B>. |
---|
3057 | If the resource files are all available (as they should be), you did not need |
---|
3058 | to do any of the above. Usually users will have no need to compile |
---|
3059 | the programs, but occasionally they may want to change a setting or |
---|
3060 | add a feature. In that case the Metrowerks Codewarrior compiler can be |
---|
3061 | used. We have provided support for compiling the programs in its |
---|
3062 | most recent version, version 5.3. The following discussion will |
---|
3063 | assume that you have obtained and installed the compiler. |
---|
3064 | <P> |
---|
3065 | You should find in the source code directory |
---|
3066 | <TT>src</TT> a subdirectory called <TT>mac</TT> which contains the |
---|
3067 | Metrowerks Codewarrior compiler "project files" (with names ending in |
---|
3068 | <TT>.proj</TT>, as well as the resource files (which end in <TT>.rsrc</TT> |
---|
3069 | for each program. You can get into this subdirectory, activate the |
---|
3070 | Metrowerks compiler, and open the appropriate project file. To |
---|
3071 | compile the program, simply make sure that the project file is an |
---|
3072 | active window, and type <TT>Command-M</TT> (which is to say, hold down |
---|
3073 | the <TT>Command</TT> key while typing <TT>M</TT>). Alternatively, |
---|
3074 | pull down the <TT>Project</TT> window and select <TT>Make</TT>. The |
---|
3075 | program should then compile, possibly with ignorable warning messages. |
---|
3076 | <P> |
---|
3077 | <H3>Windows systems</H3> |
---|
3078 | <P> |
---|
3079 | <B>Compiling with Microsoft Visual C++</B> |
---|
3080 | <P> |
---|
3081 | Microsoft Visual C++ is used to compile the executables we distribute |
---|
3082 | Windows. It can compile using a Makefile. We have supplied this |
---|
3083 | in the source code distrubution as <TT>Makefile.msvc</TT>. |
---|
3084 | You will need to preserve the Unix Makefile by renaming it to, say, |
---|
3085 | <TT>Makefile.unix</TT>, then make a copy of <TT>Makefile.msvc</TT> |
---|
3086 | and call it <TT>Makefile</TT>. |
---|
3087 | <P> |
---|
3088 | <B>Setting the path.</B> |
---|
3089 | Before using <TT>nmake</TT> you will need to have the paths |
---|
3090 | set properly. For this, use the Start menu to open Command or |
---|
3091 | a Dos Prompt first. To set the path type<BR> |
---|
3092 | <PRE> |
---|
3093 | set MSVC=Path |
---|
3094 | </PRE> |
---|
3095 | where Path is where Microsoft Visual Studio is installed |
---|
3096 | (e.g. it might be in <TT>c:\Microsoft Visual Studio</TT>). |
---|
3097 | However the path you type should not have any spaces in it. |
---|
3098 | This means that you may have to use the directory's |
---|
3099 | DOS filename. In general to get a DOS name you take the first six letters of |
---|
3100 | the directory name and follow them by <TT>~1</TT>. For example, |
---|
3101 | <TT>Microsoft Visual Studio</TT> will have a DOS name |
---|
3102 | <TT>Micros~1</TT>, <TT>Program Files</TT> will be <TT>Progra~1</TT>). |
---|
3103 | Depending on what other |
---|
3104 | file are in the directory the DOS name may be the first six letters followed |
---|
3105 | by <TT>~2,~3,~4</TT>, etc... (e.g. <TT>Micros~3</TT> or <TT>Progra~5</TT>). |
---|
3106 | It may take some |
---|
3107 | experimentation to figure it out. With older Versions of Windows (pre-win2000) |
---|
3108 | it may be possible to just right click on the directory icon and select |
---|
3109 | Properties to get the DOS name. |
---|
3110 | <P> |
---|
3111 | Once you have set MSVC, type |
---|
3112 | <PRE> |
---|
3113 | PATH=%PATH%;%MSVC%\VC98\bin |
---|
3114 | </PRE> |
---|
3115 | Then the Makefile will need to be edited. The line |
---|
3116 | <PRE> |
---|
3117 | MSVCPATH=c:\Micros~1\VC98 |
---|
3118 | </PRE> |
---|
3119 | will need to be changed so that |
---|
3120 | It points to whereever Microsoft Visual Studio is installed followed by |
---|
3121 | <TT>\VC98</TT>. |
---|
3122 | <P> |
---|
3123 | <B>Using the Makefile</B>. The Makefile is invoked using the |
---|
3124 | <TT>nmake</TT> command. If you simply type <TT>nmake</TT> you |
---|
3125 | will get a list of possible <TT>make</TT> commands. For example, |
---|
3126 | to compile a single program such as <TT>Dnaml</TT> but not |
---|
3127 | install it, type <TT>make dnaml</TT>. To compile and install all |
---|
3128 | programs type <TT>make install</TT>. We have supplied all the |
---|
3129 | support files and icons needed for the compilations. They are |
---|
3130 | in subdirectory <TT>msvc</TT> of the main source code |
---|
3131 | directory. |
---|
3132 | <P> |
---|
3133 | <B>Compiling with Borland C++</B> |
---|
3134 | <P> |
---|
3135 | Borland C++ can be downloaded for free from Inprise (Borland) |
---|
3136 | (see their site |
---|
3137 | <A HREF="http://www.borland.com">http://www.borland.com</A> |
---|
3138 | It can compile using a Makefile. We have supplied this |
---|
3139 | in the source code distrubution as <TT>Makefile.bcc</TT>. |
---|
3140 | You will need to preserve the Unix Makefile by renaming it to, say, |
---|
3141 | <TT>Makefile.unix</TT>, then make a copy of <TT>Makefile.bcc</TT> |
---|
3142 | and call it <TT>Makefile</TT>. The Makefile is invoked using the |
---|
3143 | <TT>make</TT> command. If you simply type <TT>make</TT> you |
---|
3144 | will get a list of possible <TT>make</TT> commands. For example, |
---|
3145 | to compile a single program such as <TT>Dnaml</TT> but not |
---|
3146 | install it, type <TT>make dnaml</TT>. To compile and install all |
---|
3147 | programs type <TT>make install</TT>. We have supplied all the |
---|
3148 | the support files and icons needed for the compilations. They |
---|
3149 | are in subdirectory <TT>bcc</TT> of the main source code |
---|
3150 | directory. We have had to supply a complete |
---|
3151 | second set of the resource files with names <TT>*.brc</TT> |
---|
3152 | because Borland resource files have a minor incompatibility |
---|
3153 | with Microsoft Visual C++ resource files. |
---|
3154 | <P> |
---|
3155 | If this does not work the <TT>PATH</TT> may need to be set manually. |
---|
3156 | This can be done by opening a Command or DOS window using the Start |
---|
3157 | menu. To set the path, type |
---|
3158 | <PRE> |
---|
3159 | set BORLAND=Path |
---|
3160 | </PRE> |
---|
3161 | Where <TT>Path</TT> is where Borland is installed, such as |
---|
3162 | <TT>C:\Progra~1\Borland</TT>. |
---|
3163 | Then type |
---|
3164 | <PRE> |
---|
3165 | PATH=%PATH%;%BORLAND%\CBUILD~1\Bin |
---|
3166 | </PRE> |
---|
3167 | <P> |
---|
3168 | <B>Compiling with Metrowerks Codewarrior for Windows</B> |
---|
3169 | <P> |
---|
3170 | As with Macintosh systems, Metrowerks Codewarrior requires |
---|
3171 | you to have project files for each program you compile. |
---|
3172 | For Metrowerks Codewarrior for Windows we are not providing the projects |
---|
3173 | themselves, but we are providing |
---|
3174 | projects which have been exported as XML files. To open one of these one |
---|
3175 | cannot just click on |
---|
3176 | File/Open but instead on the menu option File/Import Project. |
---|
3177 | Metrowerks will then ask you for the project name. |
---|
3178 | Type in the name of the program (e.g. dnaml). Once this is done Metrowerks will |
---|
3179 | act like this is a regular project file. |
---|
3180 | <P> |
---|
3181 | We have supplied a complete set of these XML project files in the |
---|
3182 | source code distribution. They are in subdirectory <TT>metro</TT> |
---|
3183 | of the main source code directory. This is supplied with the |
---|
3184 | source code distribution for Windows (it is not in the source |
---|
3185 | code distributions for other platforms). |
---|
3186 | For Metrowerks Codewarrior for Windows we are not providing the projects |
---|
3187 | themselves, but we are providing |
---|
3188 | projects which have been exported as XML files. To open one of these one |
---|
3189 | cannot just click on |
---|
3190 | File/Open but instead on the menu option File/Import Project. |
---|
3191 | Metrowerks will then ask you for the project name. |
---|
3192 | Type in the name of the program (e.g. dnaml). Once this is done Metrowerks will |
---|
3193 | act like this is a regular project file. |
---|
3194 | <P> |
---|
3195 | To compile the program |
---|
3196 | pull down the <TT>Project</TT> menu and select <TT>Make</TT>. The |
---|
3197 | program should then compile, possibly with ignorable warning messages. |
---|
3198 | <P> |
---|
3199 | For the moment we are not giving here the details of |
---|
3200 | how to create these projects yourself -- you usually will not need |
---|
3201 | to, as you have the project files we have supplied. |
---|
3202 | <P> |
---|
3203 | <B>Compiling with Cygnus Gnu C++</B> |
---|
3204 | <P> |
---|
3205 | Cygnus Solutions (now a part of Red Hat, Inc.) has adapted the Gnu C compiler |
---|
3206 | to Windows systems and |
---|
3207 | provided an environment, CygWin, which mimics Unix for compiling. |
---|
3208 | This is available for purchase from them, and they also make it |
---|
3209 | available to be downloaded for free. The download is large. To get it, go |
---|
3210 | to <A HREF="http://sources.redhat.com/cygwin/download.html">their download site</A> at |
---|
3211 | <CODE>http://sources.redhat.com/cygwin/download.html</CODE> and follow the |
---|
3212 | instructions there. It is a bit |
---|
3213 | difficult to figure out how to download it -- you need to download |
---|
3214 | their <TT>setup.exe</TT> program and then it will download the rest |
---|
3215 | when it is run. You will need a lot of disk space for it. |
---|
3216 | <P> |
---|
3217 | Once you have |
---|
3218 | installed the free Cygnus environment and the associated Gnu C compiler |
---|
3219 | on your Windows system, compiling PHYLIP is essentially identical to |
---|
3220 | what one does for Unix or Linux. In PHYLIP's <TT>src</TT> directory, |
---|
3221 | change the name of our Unix <TT>Makefile</TT> to something like |
---|
3222 | <TT>Makefile.unx</TT> (so as to keep it around). There is a special |
---|
3223 | Makefile for the Cygwin |
---|
3224 | compiler called <TT>Makefile.cyg</TT>. Make a copy of it called |
---|
3225 | <TT>Makefile</TT>. |
---|
3226 | <P> |
---|
3227 | This Makefile should contain a compiling command: |
---|
3228 | <P> |
---|
3229 | <TT>CC = gcc</TT> |
---|
3230 | <P> |
---|
3231 | Now enter the Cygwin environment (which you can do using the Windows |
---|
3232 | <TT>Start</TT> menu and its <TT>Programs</TT> menu item. There should be |
---|
3233 | a <TT>Cygnus</TT> menu choice within that submenu, which you can use to |
---|
3234 | start the Cygnus environment. This puts you in an imitation of a Unix |
---|
3235 | shell. |
---|
3236 | <P> |
---|
3237 | On entering the CygWin environment you will find yourself in one of the |
---|
3238 | subdirectories of the CygWin directory. Change to the directory where the |
---|
3239 | PHYLIP programs have been put (for example by issuing the command |
---|
3240 | <P> |
---|
3241 | <TT>cd c:/phylip</TT><BR> |
---|
3242 | <BR> |
---|
3243 | You should then be able to compile PHYLIP |
---|
3244 | by issuing the appropriate make command, such as <TT>make install</TT>. |
---|
3245 | If you have modified one of our source code files such as <TT>dnaml.c</TT>, |
---|
3246 | it would be wise to |
---|
3247 | have saved the original version of it first as, say, <TT>dnaml.c0</TT>. |
---|
3248 | To associate an icon with a program (say DnaML), you need an icon |
---|
3249 | file (say <TT>dna.ico</TT> which contains the icon in standard format. |
---|
3250 | There should also be a file called <TT>dnaml.rc</TT> which contains the single |
---|
3251 | line: |
---|
3252 | <P> |
---|
3253 | <TT>dnaml ICON "dna.ico"</TT> |
---|
3254 | <P> |
---|
3255 | We have provided a subdirectory <TT>icons</TT> in the <TT>src</TT> |
---|
3256 | subdirectory, containing a full set of icons and a full set of resource |
---|
3257 | files (<TT>*.rc</TT>). |
---|
3258 | Our Cygwin Makefile will automatically invoke them. |
---|
3259 | <P> |
---|
3260 | <H3>VMS VAX systems</H3> |
---|
3261 | <P> |
---|
3262 | We have not tried to compile version 3.6 on an OpenVMS system but the |
---|
3263 | following instructions should work. |
---|
3264 | On the OpenVMS operating system with DEC VAX VMS C the programs will compile |
---|
3265 | without alteration. The commands for compiling a typical program |
---|
3266 | (DNAPARS, which depends on the separately compiled files <TT>phylip.c</TT> |
---|
3267 | and <TT>seq.c</TT>) are: |
---|
3268 | <P> |
---|
3269 | <TT>$ DEFINE LNK$LIBRARY SYS$LIBRARY:VAXCRTL |
---|
3270 | <BR> |
---|
3271 | $ CC DNAPARS.C |
---|
3272 | <BR> |
---|
3273 | $ CC PHYLIP.C |
---|
3274 | <BR> |
---|
3275 | $ CC SEQ.C |
---|
3276 | <BR> |
---|
3277 | $ LINK DNAPARS,PHYLIP,SEQ |
---|
3278 | <BR> |
---|
3279 | </TT> |
---|
3280 | <P> |
---|
3281 | Once you use this <TT>$ DEFINE</TT> statement during a given interactive session, |
---|
3282 | you need not repeat it again as the symbol <TT>LNK$LIBRARY</TT> is thereafter |
---|
3283 | properly defined. The compilation process leaves a file <TT>DNAPARS.OBJ</TT> |
---|
3284 | in your directory: this can |
---|
3285 | be discarded. The executable program is named <TT>DNAPARS.EXE</TT>. To run the program |
---|
3286 | one then uses the command: |
---|
3287 | <P> |
---|
3288 | <TT>$ R DNAPARS</TT> |
---|
3289 | <P> |
---|
3290 | The compiler defaults to the filenames <TT>INFILE.</TT>, <TT>OUTFILE.</TT>, and |
---|
3291 | <TT>TREEFILE.</TT>. |
---|
3292 | If the input file <TT>INFILE.</TT> does not exist the program will prompt you to |
---|
3293 | type in its name. Note that some commands on VMS such as <TT>TYPE OUTFILE</TT> |
---|
3294 | will fail because the name of the file that it will attempt to type out will be not |
---|
3295 | <TT>OUTFILE.</TT> but <TT>OUTFILE.LIS</TT>. To get it to type the write file you |
---|
3296 | would have to instead issue the command <TT>TYPE OUTFILE.</TT>. |
---|
3297 | <P> |
---|
3298 | When you are |
---|
3299 | using the interactive previewing feature of DRAWGRAM (or DRAWTREE) on |
---|
3300 | a Tektronix or DEC ReGIS compatible terminal, you will want before |
---|
3301 | running the program to have issued the command: |
---|
3302 | <P> |
---|
3303 | <TT>$ SET TERM/NOWRAP/ESCAPE</TT> |
---|
3304 | <P> |
---|
3305 | so that you do not run into trouble from the VMS line length limit of |
---|
3306 | 255 characters or the filtering of escape characters. |
---|
3307 | <P> |
---|
3308 | To know which files to compile together, look at the entries in the |
---|
3309 | <TT>Makefile</TT>. |
---|
3310 | <P> |
---|
3311 | VMS systems are rapidly disappearing, so we will not devote much |
---|
3312 | effort to get PHYLIP working on them. |
---|
3313 | <P> |
---|
3314 | <H3>Parallel computers</H3> |
---|
3315 | <P> |
---|
3316 | As parallel computers become more common, the issue of how to compile |
---|
3317 | PHYLIP for them has become more pressing. People have been compiling |
---|
3318 | PHYLIP for vector machines and parallel machines for many years. We |
---|
3319 | have not made a version for parallel machines because there is still |
---|
3320 | no standard parallel programming environment on such machines (or rather, |
---|
3321 | there are many standards, so that one cannot find one that makes |
---|
3322 | a parallel execution version of PHYLIP practical). However the |
---|
3323 | MPI Message Passing Interface is spreading rapidly, and we will |
---|
3324 | probably support it in future versions of PHYLIP. |
---|
3325 | <P> |
---|
3326 | Although the underlying algorithms of most programs, |
---|
3327 | which treat sites independently, should be amenable to vector and |
---|
3328 | parallel processors, |
---|
3329 | there are details of the code which might best be changed. |
---|
3330 | In certain of the programs (<TT>Dnaml</TT>, <TT>Dnamlk</TT>, |
---|
3331 | <TT>Proml</TT>, <TT>Promlk</TT>) I have put a special |
---|
3332 | comment statement next to the loops in the program where |
---|
3333 | the program will spend most of its time, and which are the places |
---|
3334 | most likely to benefit from parallelization. This comment statement is:<BR> |
---|
3335 | <PRE> |
---|
3336 | /* parallelize here */ |
---|
3337 | </PRE> |
---|
3338 | In particular |
---|
3339 | within these innermost loops of the programs there are often scalar quantities |
---|
3340 | that are used for temporary bookkeeping. These quantities, such as |
---|
3341 | <TT>sum1, sum2, zz, z1, yy, y1, aa, bb, cc, sum,</TT> and <TT>denom</TT> in procedure makenewv |
---|
3342 | of DNAML (and similar quantities in procedure nuview) are there to |
---|
3343 | minimize the number of array references. For vectorizing and parallelizing |
---|
3344 | compilers it will |
---|
3345 | be better to replace them by arrays so that processing can occur |
---|
3346 | simultaneously. |
---|
3347 | <P> |
---|
3348 | If you succeed in making a parallel version of PHYLIP we would like to |
---|
3349 | know how you did it. In particular, if you can prepare a web page which |
---|
3350 | describes how to do it for your computer system, we would like to have it |
---|
3351 | for inclusion in our PHYLIP web pages. Please e-mail it to me. We hope to |
---|
3352 | have a set of pages that give detailed instructions on how to make parallel |
---|
3353 | version of PHYLIP on various kinds of machines. Alternatively, if we |
---|
3354 | are given your modified version of the program we may be able to |
---|
3355 | figure out how to make modifications to our source code to allow |
---|
3356 | users to compile the program in a way which makes those modifications. |
---|
3357 | <P> |
---|
3358 | <H3>Other computer systems</H3> |
---|
3359 | <P> |
---|
3360 | As you can see from the variety of different systems on which these |
---|
3361 | programs have been successfully run, there are no serious |
---|
3362 | incompatibility problems with most computer systems. PHYLIP in various |
---|
3363 | past Pascal versions has also been compiled on 8080 and Z80 CP/M Systems, Apple |
---|
3364 | II systems running UCSD Pascal, a variety of minicomputer systems such as |
---|
3365 | DEC PDP-11's and HP 1000's, on 1970's era mainframes such as CDC |
---|
3366 | Cyber systems, and so on. In a later era |
---|
3367 | it was also compiled on IBM 370 mainframes, and of course on DOS and |
---|
3368 | Windows systems and on Macintosh and PowerMacintosh systems. |
---|
3369 | We have gradually |
---|
3370 | accumulated experience on a wider variety of C compilers. If you succeed in |
---|
3371 | compiling the C version of PHYLIP on a different machine or a different |
---|
3372 | compiler, I would like to |
---|
3373 | hear the details so that I can consider including the instructions in a future version |
---|
3374 | of this manual. |
---|
3375 | <P> |
---|
3376 | <DIV ALIGN="CENTER"> |
---|
3377 | <A NAME="FAQ"><HR><P></A> |
---|
3378 | <H2>Frequently Asked Questions</H2></DIV> |
---|
3379 | <P> |
---|
3380 | This set of Frequently Asked Questions, and their answers, is from the |
---|
3381 | PHYLIP web site. A more up-to-date version can be found there, at: |
---|
3382 | <P> |
---|
3383 | <DIV ALIGN="CENTER"> |
---|
3384 | <A HREF="http://evolution.gs.washington.edu/phylip/faq.html"> |
---|
3385 | <TT>http://evolution.gs.washington.edu/phylip/faq.html</TT></A></DIV> |
---|
3386 | <P> |
---|
3387 | <DL> |
---|
3388 | <DT><STRONG>"It doesn't work! <I>It doesn't work!!</I> It says <TT>can't find infile.</TT></STRONG> |
---|
3389 | <DD>Actually, it's working just fine. Many of the programs look for an input file called <TT>infile</TT>, |
---|
3390 | and if one of that name is not present in the current directory, they then ask |
---|
3391 | you to type in the name of the input file. That's all that it's doing. This |
---|
3392 | is done so that |
---|
3393 | you can get the program to read the file without you having to type in its |
---|
3394 | name, by making a copy of your input file and calling it <TT>infile</TT>. |
---|
3395 | If you don't do that, then the program issues this message. It looks |
---|
3396 | alarming, but really all that it is trying to do is to get you to type in |
---|
3397 | the name of the input file. Try giving it the name of the input file. |
---|
3398 | <DT><STRONG>"The program reads my data file and then says it's has |
---|
3399 | a memory allocation error!"</STRONG> |
---|
3400 | <DD>This is what tends to happen if there is a problem with the format of the data |
---|
3401 | file, so that the programs get confused and think they need to set aside memory |
---|
3402 | for 1,000,000 species or so. The result is a "memory allocation error". Check the data file format against the documentation: |
---|
3403 | make sure that the data files have <I>not</I> been saved in the format of |
---|
3404 | your word processor (such as Microsoft Word) but in a "flat ASCII" or "text only" |
---|
3405 | mode. Note that adding memory to your computer is <I>not</I> the |
---|
3406 | way to solve this problem -- you probably have plenty of memory |
---|
3407 | to run the program once the data file is in the correct format. |
---|
3408 | <DT><STRONG>"On our Macintosh, larger data files fail to run."</STRONG> |
---|
3409 | <DD>We have set the memory allowances on the Macintosh executables |
---|
3410 | to be generous, but not too big. You therefore may need to |
---|
3411 | increase them. Use the <TT>Get Info</TT> item on the Finder <TT>File</TT> menu. |
---|
3412 | <DT><STRONG>"I opened the program but I don't see where to create |
---|
3413 | a data file!"</STRONG> |
---|
3414 | <DD>The programs (there are more than one) use data |
---|
3415 | files that have been created outside of the program. They do not have any |
---|
3416 | data editor within them. You can create a data file by using an editor, |
---|
3417 | such as Microsoft Word, EMACS, vi, SimpleText, Notepad, etc. But be sure |
---|
3418 | <I>not</I> to save the file in Microsoft Word's own format. It should be saved in |
---|
3419 | Text Only format. You can use the documentation files, including the examples |
---|
3420 | at the end of those files, to figure out the format of the input file. |
---|
3421 | Documentation files such as <TT>main.html</TT>, <TT>sequence.html</TT>, |
---|
3422 | <TT>distance.html</TT> and many others should be consulted. Many users |
---|
3423 | create their data files by having their alignment program (such as |
---|
3424 | ClustalW), output its alignments in PHYLIP format. Many alignment programs |
---|
3425 | have options to do that. |
---|
3426 | menu while the program is selected. |
---|
3427 | <DT><STRONG>"I ran PHYLIP, and all it did was say it was extracting a bunch of files!"</STRONG> |
---|
3428 | <DD> |
---|
3429 | There is no executable program |
---|
3430 | named <TT>PHYLIP</TT> in the PHYLIP package! But in some cases |
---|
3431 | (especially the Windows distribution) there is a file called |
---|
3432 | <TT>phylip.exe</TT>. |
---|
3433 | That file is an archive of documentation and source code. Once you have |
---|
3434 | run it and extracted the files in it, so that they are in the directory, |
---|
3435 | running it again will just do the extraction again, which is unnecessary. |
---|
3436 | Similarly for the archive files for the Windows executables, which |
---|
3437 | have names like <TT>phylipwx.exe</TT> and <TT>phylipwy.exe</TT>. |
---|
3438 | They are run only once to extract their contents. |
---|
3439 | <DT><STRONG>"One program makes an output file and then the next program crashes while reading it!"</STRONG> |
---|
3440 | <DD>Did you rename the file? If a program makes a file called <TT>outfile</TT>, and then the |
---|
3441 | next program is told to use <TT>outfile</TT> as its input file, terrible things will |
---|
3442 | happen. The second program first opens <TT>outfile</TT> as an output file, thus |
---|
3443 | erasing it. When it then tries to read from this empty <TT>outfile</TT> |
---|
3444 | a psychological |
---|
3445 | crisis ensues. The solution is simply to rename <TT>outfile</TT> before trying to |
---|
3446 | use it as an input file. |
---|
3447 | <DT><STRONG>"I make a file called infile and then the program can't find it!"</STRONG> |
---|
3448 | <DD>Let me guess. You are using Windows, right? You made your file in Word or |
---|
3449 | in Notepad or WordPad, right? If you made a file in one of these editors, and |
---|
3450 | saved it, not in Word format, but in Text Only format, then you were doing the |
---|
3451 | right thing. But when you told the operating system to save the file as |
---|
3452 | <TT>infile</TT>, it actually didn't. It saved it as |
---|
3453 | <TT>infile.txt</TT>. Then just to make |
---|
3454 | life harder for you, the operating system is set up by default to not show |
---|
3455 | that three-letter extension to the file name. Next to its icon it will show |
---|
3456 | the name <TT>infile</TT>. So you think, quite reasonably, that |
---|
3457 | there is a file called <TT>infile</TT>. But there isn't a file of that |
---|
3458 | name, so the program, quite reasonably, can't find a file called |
---|
3459 | <TT>infile</TT>. If you want to check what the actual file name is, use |
---|
3460 | the <TT>Properties</TT> |
---|
3461 | menu item of the <TT>File</TT> item on your folder (in Windows versions, anyway). |
---|
3462 | You should be able to get the program to work by telling it that the file name |
---|
3463 | is <TT>INFILE.TXT</TT>. |
---|
3464 | <DT><STRONG>"Consense gives wierd branch lengths! How do I |
---|
3465 | get more reasonable ones?"</STRONG> |
---|
3466 | <DD>Consense gives branch lengths which are simply the numbers of replicates |
---|
3467 | that support the branch. This is not a good reflection of how long those |
---|
3468 | branches are estimated to be. The best way to put better branch lengths on a |
---|
3469 | consensus tree is to use it as a User Tree in a program that will estimate |
---|
3470 | branch lengths for it. You may need to convert it to being an unrooted tree, |
---|
3471 | using Retree, first. If the original program you were using was a parsimony |
---|
3472 | program, which does not estimate branch lengths, you may instead have to make |
---|
3473 | some distances between your species (using, for example, DnaDist), and use |
---|
3474 | Fitch to put branch lengths on the user tree. Here is the sequence of |
---|
3475 | steps you should go through: |
---|
3476 | <OL> |
---|
3477 | <LI>Take the tree and use Retree to make sure it is Unrooted (just |
---|
3478 | read it into Retree and then save it, specifying Unrooted) |
---|
3479 | <LI>Use the unrooted tree as a User Tree (option <TT>U</TT>) in one of |
---|
3480 | our programs (such as Fitch or DnaML). If you use Fitch, you also |
---|
3481 | need to use one of the distance programs such as DnaDist to |
---|
3482 | compute a set of distances to serve as its input. |
---|
3483 | <LI>Specify that the branch lengths |
---|
3484 | of the tree are not to be used but should be re-estimated. This |
---|
3485 | is actually the default. |
---|
3486 | </OL> |
---|
3487 | <DT><STRONG>"DrawTree (or DrawGram) doesn't work: it can't find the font file!"</STRONG> |
---|
3488 | <DD>Six font files, called <TT>font1</TT> through <TT>font6</TT>, are |
---|
3489 | distributed with the executables |
---|
3490 | (and with the source code too). The program looks for a copy of one of them |
---|
3491 | called <TT>fontfile</TT>. If you haven't made such a copy called |
---|
3492 | <TT>fontfile</TT> it then asks |
---|
3493 | you for the name of the font file. If they are in the current directory, just |
---|
3494 | type one of <TT>font1</TT> through <TT>font6</TT>. The reason for |
---|
3495 | having the program look for <TT>fontfile</TT> |
---|
3496 | is so that you can copy your favorite font file, call the copy |
---|
3497 | <TT>fontfile</TT>, |
---|
3498 | and then it will be found automatically without you having to type the name of |
---|
3499 | the font file each time. |
---|
3500 | <DT><STRONG>"Can DrawGram draw a scale beside the tree? Print the branch lengths as numbers?"</STRONG> |
---|
3501 | <DD>It can't do either of these. Doing so would make the program more complex, and |
---|
3502 | it is not obvious how to fit the branch length numbers into a tree that has |
---|
3503 | many very short internal branches. If you want these scales or numbers, |
---|
3504 | choose an output plot file format (such as Postscript, PICT or PCX) that can be read by |
---|
3505 | a drawing program such as Adobe Illustrator, Freehand, Canvas, CorelDraw, |
---|
3506 | or MacDraw. |
---|
3507 | Then you can add the scales and branch length numbers yourself by hand. Note |
---|
3508 | the menu option in DrawTree and DrawGram that specifies the tree size to be |
---|
3509 | a given number of centimeters per unit branch length. |
---|
3510 | <DT><STRONG>"How can I get DrawGram or DrawTree to print the bootstrap values |
---|
3511 | next to the branches?"</STRONG> |
---|
3512 | <DD>When you do bootstrapping and use Consense, it prints the bootstrap |
---|
3513 | values in its output file (both in a table of sets, and on the diagram |
---|
3514 | of the tree which it makes). These are also in the output tree file of |
---|
3515 | Consense. There they are in place of branch lengths. So to get them to |
---|
3516 | be on the output of DrawGram or DrawTree, you must write the tree in the |
---|
3517 | format of a drawing program and use it to put the values in by hand, as |
---|
3518 | mentioned in the answer to the previous question. |
---|
3519 | <DT><STRONG>"I have an HP Laserjet and can't get DrawGram to print on it"</STRONG> |
---|
3520 | <DD>DRAWGRAM and DRAWTREE produce a plot file (called <TT>plotfile</TT>): they |
---|
3521 | do not send it to the printer. It is up to you to get the plot file to |
---|
3522 | the printer. If you are running Windows or DOS this can probably be done |
---|
3523 | with the MSDOS command <TT>COPY/B PLOTFILE PRN:</TT>, unless your printer |
---|
3524 | is a networked printer. The <TT>/B</TT> |
---|
3525 | is important. If it is omitted the copy command will strip off the |
---|
3526 | highest bit of each byte, which can cause the printing to fail or produce |
---|
3527 | garbage. |
---|
3528 | <DT><STRONG>"DNAML won't read the treefile that is produced by DNAPARS!"</STRONG> |
---|
3529 | <DD>That's because the DnaPars tree file is a rooted tree, and DnaML wants an |
---|
3530 | unrooted tree. Try using Retree to change the file to be an unrooted tree |
---|
3531 | file.</DD> |
---|
3532 | <DT><STRONG>"In bootstrapping, SEQBOOT makes too large a file"</STRONG> |
---|
3533 | <DD>If there are 1000 bootstrap replicates, it will make a file |
---|
3534 | 1000 times as long as your original data set. But for many methods |
---|
3535 | there is another way that uses much less file space. You can use |
---|
3536 | SEQBOOT to make a file of multiple sets of weights, and use those |
---|
3537 | together with the original data set to do bootstrapping. |
---|
3538 | <DT><STRONG>"In bootstrapping, the output file gets too big."</STRONG> |
---|
3539 | <DD> When running a program such as NEIGHBOR or DNAPARS with multiple data |
---|
3540 | sets (or multiple weights) for purposes of bootstrapping, |
---|
3541 | the output file is usually not needed, as it |
---|
3542 | is the output tree file that is used next. You can use the menu |
---|
3543 | of the program to turn off the writing of trees into the |
---|
3544 | output file. The trees will still be written into the tree file. |
---|
3545 | <DT><STRONG>"Why doesn't NEIGHBOR read my DNA sequences correctly?"</STRONG> |
---|
3546 | <DD>Because it wants |
---|
3547 | to have as input a distance matrix, not sequences. You have to use DNADIST to |
---|
3548 | make the distance matrix first. |
---|
3549 | <P> |
---|
3550 | <H3>How to make it do various things</H3> |
---|
3551 | <P> |
---|
3552 | <DT><STRONG>"How do I bootstrap?"</STRONG> |
---|
3553 | <DD>The general method of bootstrapping |
---|
3554 | involves running SEQBOOT to make multiple bootstrapped data sets out of your |
---|
3555 | one data set, then running one of the tree-making programs with the Multiple |
---|
3556 | data sets option to analyze them all, then running CONSENSE to make a majority |
---|
3557 | rule consensus tree from the resulting tree file. Read the documentation of |
---|
3558 | SEQBOOT to get further information. Before, only parsimony methods could be |
---|
3559 | bootstrapped. With this new system almost any of the tree-making methods in |
---|
3560 | the package can be bootstrapped. It is somewhat more tedious but you will find |
---|
3561 | it much more rewarding. |
---|
3562 | <DT><STRONG>"How do I specify a multi-species outgroup |
---|
3563 | with your parsimony programs?"</STRONG> |
---|
3564 | <DD>It's not a feature but is not too hard to do in many of the programs. In |
---|
3565 | parsimony programs like MIX, for which the W (Weights) and A (Ancestral states) |
---|
3566 | options are available, and weights can be larger than 1, all you need to do is: |
---|
3567 | <DL COMPACT> |
---|
3568 | <DT><STRONG>(a)</STRONG> |
---|
3569 | <DD>In MIX, make up an extra character with states 0 for all the outgroups |
---|
3570 | and 1 for all the ingroups. If using DNAPARS the ingroup can have (say) |
---|
3571 | <TT>G</TT> and the outgroup <TT>A</TT>. |
---|
3572 | <DT><STRONG>(b)</STRONG> |
---|
3573 | <DD>Assign this character an enormous weight (such as <TT>Z</TT> for 35) using the W |
---|
3574 | option, all other characters getting weight 1, or whatever weight they had |
---|
3575 | before. |
---|
3576 | <DT><STRONG>(c)</STRONG> |
---|
3577 | <DD>If it is available, Use the A (Ancestral states) option to designate that |
---|
3578 | for that new character the state found in the outgroup is the ancestral |
---|
3579 | state. |
---|
3580 | <DT><STRONG>(d)</STRONG> |
---|
3581 | <DD>In MIX do not use the O (Outgroup) option. |
---|
3582 | <DT><STRONG>(e)</STRONG> |
---|
3583 | <DD>After the tree is found, the designated ingroup should have been held |
---|
3584 | together by the fake character. The tree will be rooted somewhere in the |
---|
3585 | outgroup (the program may or may not have a preference for one place in |
---|
3586 | the outgroup over another). Make sure that you subtract from the total |
---|
3587 | number of steps on the tree all steps in the new character. |
---|
3588 | </DL> |
---|
3589 | <P> |
---|
3590 | In programs like DNAPARS, you cannot use this method as weights of sites |
---|
3591 | cannot be greater than 1. But you do an analogous trick, by adding a |
---|
3592 | largish number of extra sites to the data, with one nucleotide state ("A") |
---|
3593 | for the ingroup and another ("G") for the outgroup. You will then have to |
---|
3594 | use RETREE to manually reroot the tree in the desired place. |
---|
3595 | <DT><STRONG>"How do I force certain groups to remain monophyletic in your |
---|
3596 | parsimony programs?"</STRONG> |
---|
3597 | <DD>By the same method as in the previous question, using multiple fake characters, any number of |
---|
3598 | groups of species can be forced to be monophyletic. In MOVE, DOLMOVE, and |
---|
3599 | DNAMOVE you can specify whatever outgroups you want without going to this |
---|
3600 | trouble. |
---|
3601 | <DT><STRONG>"How can I reroot one of the trees written out by PHYLIP?"</STRONG> |
---|
3602 | <DD>Use the program |
---|
3603 | RETREE. But keep in mind whether the tree inferred by the original program was |
---|
3604 | already rooted, or whether you are free to reroot it. |
---|
3605 | <DT><STRONG>"What do I do about deletions and insertions in my sequences?"</STRONG> |
---|
3606 | <DD>The |
---|
3607 | molecular sequence programs will accept sequences that have gaps (the "<TT>-</TT>" |
---|
3608 | character). They do various things with them, mostly not optimal. DNAPARS |
---|
3609 | counts "gap" as if it were a fifth nucleotide state (in addition to A, C, G, |
---|
3610 | and T). Each site counts one change when a gap arises or disappears. The |
---|
3611 | disadvantage of this treatment is that a long gap will be overweighted, with |
---|
3612 | one event per gapped site. So a gap of 10 nucleotides will count as being as |
---|
3613 | much evidence as 10 single site nucleotide substitutions. If there are not |
---|
3614 | overlapping gaps, one way to correct this is to recode the first site in the |
---|
3615 | gap as "<TT>-</TT>" but make all the others be "<TT>?</TT>" so the gap only counts as one event. |
---|
3616 | Other programs such as DNAML and DNADIST count gaps as equivalent to unknown |
---|
3617 | nucleotides (or unknown amino acids) on the grounds that we don't know what |
---|
3618 | would be there if something were there. This completely leaves out the |
---|
3619 | information from the presence or absence of the gap itself, but does not bias |
---|
3620 | the gapped sequence to be close to or far from other gapped or ungapped |
---|
3621 | sequences. |
---|
3622 | So it is not necessary to remove gapped regions from your |
---|
3623 | sequences, unless the presence of gaps indicates that the region is |
---|
3624 | badly aligned. |
---|
3625 | <DT><STRONG>"How can I produce distances for my data set which |
---|
3626 | has 0's and 1's?"</STRONG> |
---|
3627 | <DD>You can't do it in a simple and general |
---|
3628 | way, for a straightforward reason. Distance methods must correct the |
---|
3629 | distances for superimposed changes. Unless we know specifically how to |
---|
3630 | do this for your particular characters, we cannot accomplish the |
---|
3631 | correction. There are many formulas we could use, but we can't choose |
---|
3632 | among them without much more information. There are issues of superimposed |
---|
3633 | changes, as well as heterogeneity of rates of change in different |
---|
3634 | characters. Thus we have not provided a distance program for 0/1 data. |
---|
3635 | It is up to you to figure out what is an appropriate stochastic model |
---|
3636 | for your data and to find the right distance formulas. |
---|
3637 | <DT><STRONG>"I have RFLP fragment data: which programs should I |
---|
3638 | use?"</STRONG> |
---|
3639 | <DD>This is more difficult question than you may imagine. |
---|
3640 | Here is quick tour of the issues: |
---|
3641 | <UL><LI>You can code fragments are 0 and 1 and use a parsimony program. It is |
---|
3642 | not obvious in advance whether 0 or 1 is ancestral, though it is likely that |
---|
3643 | change in one direction is more likely than change in the other for each |
---|
3644 | fragment. One can use either Wagner parsimony (programs <TT>MIX</TT>, |
---|
3645 | <TT>PENNY</TT> or <TT>MOVE</TT>) or use Dollo parsimony |
---|
3646 | (<TT>DOLLOP, DOLPENNY</TT> or <TT>DOLMOVE</TT>) |
---|
3647 | with the ancestral states all set as unknown ("<TT>?</TT>"). |
---|
3648 | <LI>You can use a distance matrix method using the RFLP distance of Nei and |
---|
3649 | Li (1979). Their restriction fragment distance is available in our |
---|
3650 | program RestDist. |
---|
3651 | <LI>You should be very hesitant to bootstrap RFLP's. The individual |
---|
3652 | fragments do not evolve independently: a single nucleotide substitution |
---|
3653 | can eliminate one fragment and create two (or vice versa). |
---|
3654 | </UL> |
---|
3655 | For restriction <I>sites</I> (rather than fragments) life is a bit |
---|
3656 | easier: they evolve nearly independently so bootstrapping is possible |
---|
3657 | and <TT>RESTML</TT> can be used. Also directionality of change |
---|
3658 | is less ambiguous when parsimony is used. |
---|
3659 | <DT><STRONG>"Why don't your parsimony programs print out branch lengths?"</STRONG> |
---|
3660 | <DD>Well, DNAPARS and PARS can. The others have not yet been upgraded to the |
---|
3661 | same level. The longer answer is that it is because |
---|
3662 | there are problems defining the branch lengths. If you look closely at the |
---|
3663 | reconstructions of the states of the hypothetical ancestral nodes for almost |
---|
3664 | any data set and almost any parsimony method you will find some ambiguous |
---|
3665 | states on those nodes. There is then usually an ambiguity as to which branch |
---|
3666 | the change is actually on. Other parsimony programs resolve this in one or |
---|
3667 | another arbitrary fashion, sometimes with the user specifying how (for example, |
---|
3668 | methods that push the changes up the tree as far as possible or down it as far |
---|
3669 | as possible). Our older programs leave it to the user to do this. In |
---|
3670 | DNAPARS and PARS we use an algorithm discovered by Hochbaum and Pathria (1997) |
---|
3671 | (and independently by Wayne Maddison) to compute branch lengths that average |
---|
3672 | over all possible placements of the changes. But these branch lengths, as |
---|
3673 | nice as they are, do not correct for mulitple superimposed changes. Few |
---|
3674 | programs available from others currently correct the branch lengths for |
---|
3675 | multiple changes of state that may have overlain each other. One possible way |
---|
3676 | to get branch lengths with nucleotide sequence data is to take the tree |
---|
3677 | topology that you got, use RETREE to convert it to be unrooted, prepare a |
---|
3678 | distance matrix from your data using DNADIST, and then use FITCH with that tree |
---|
3679 | as User Tree and see what branch lengths it estimates. |
---|
3680 | <DT><STRONG>"Why can't your programs handle unordered multistate characters?"</STRONG> |
---|
3681 | <DD>In this 3.6 release there is a program PARS which does parsimony for |
---|
3682 | undordered multistate characters with up to 8 states, plus <TT>?</TT>. The |
---|
3683 | other the discrete characters parsimony programs can only handle two states, |
---|
3684 | <TT>0</TT> and <TT>1</TT>. |
---|
3685 | This is mostly because I have not yet had time to modify them to do so - the |
---|
3686 | modifications would have to be extensive. Ultimately I hope to get these done. |
---|
3687 | If you have four or fewer states and need a feature that is not in PARS, |
---|
3688 | you could recode your states to look like nucleotides |
---|
3689 | and use the parsimony programs in the molecular sequence section of PHYLIP, or |
---|
3690 | you could use one of the excellent parsimony programs produced by others. |
---|
3691 | <P> |
---|
3692 | <H3>Background information needed:</H3> |
---|
3693 | <P> |
---|
3694 | <DT><STRONG>"What file format do I use for the sequences?"<BR> |
---|
3695 | "How do I use the programs? I can't find any documentation!"</STRONG> |
---|
3696 | <DD>These are discussed in the documentation files. Do you have them? If you |
---|
3697 | have a copy of this page you probably do. They are |
---|
3698 | in a separate archive from the executables (they are in the Documentation and |
---|
3699 | Sources archives, which you should definitely fetch). Input file formats |
---|
3700 | are discussed in <TT>main.html</TT>, in <TT>sequence.html</TT>, <TT>distance.html</TT>, |
---|
3701 | <TT>contchar.html</TT>, <TT>discrete.html</TT>, and the documentation files for the |
---|
3702 | individual programs. |
---|
3703 | <DT><STRONG>"Where can I find out how to infer |
---|
3704 | phylogenies?</STRONG> |
---|
3705 | <DD>There are few books yet. For molecular data you could use one of these: |
---|
3706 | <UL> |
---|
3707 | <LI> Graur, D. and W.-H. Li. 2000. <EM>Fundamentals of Molecular |
---|
3708 | Evolution.</EM> Sinauer Associates, Sunderland, Massachusetts. (or the earlier edition |
---|
3709 | by Li and Graur). |
---|
3710 | <LI> Page, R. D. P. and E. C. Holmes. 1998. <EM>Molecular Evolution: |
---|
3711 | A Phylogenetic Approach.</EM> Blackwell, Oxford. |
---|
3712 | <LI> Nei, M. and S. Kumar. 2000. <EM>Molecular Evolution and |
---|
3713 | Phylogenetics.</EM> Oxford University Press, Oxford. |
---|
3714 | <LI> Li, W.-H. 1999. <EM>Molecular Evolution.</EM> Sinauer Associates, |
---|
3715 | Sunderland, Massachusetts. |
---|
3716 | </UL> |
---|
3717 | In addition, one of these three review articles may help: |
---|
3718 | <UL><LI>Swofford, D. L., G. J. Olsen, P. J. Waddell, and D. M. Hillis. 1996. |
---|
3719 | Phylogenetic inference. pp. 407-514 in <I>Molecular Systematics</I>, 2nd ed., |
---|
3720 | ed. D. M. Hillis, C. Moritz, and B. K. Mable. Sinauer Associates, Sunderland, |
---|
3721 | Massachusetts. |
---|
3722 | <LI>Felsenstein, J. 1988. Phylogenies from molecular sequences: inference and |
---|
3723 | reliability. <I>Annual Review of Genetics</I> <B>22:</B> 521-565. |
---|
3724 | <LI>Felsenstein, J. 1988. Phylogenies and quantitative |
---|
3725 | characters. <I>Annual Review of Ecology and Systematics</I> <B>19:</B> 445-471. |
---|
3726 | </UL> |
---|
3727 | My own book on phylogenies is due to be published in late 2002. It |
---|
3728 | will be called "Inferring Phylogenies". For information on whether it has |
---|
3729 | been published you should check the |
---|
3730 | <A HREF="http://www.sinauer.com">Sinauer Associates web site</A>. |
---|
3731 | <P> |
---|
3732 | <H3>Questions about distribution and citation:</H3> |
---|
3733 | <P> |
---|
3734 | <DT><STRONG>"If I copied PHYLIP from a friend without you knowing, should I try |
---|
3735 | to keep you from finding out?"</STRONG> |
---|
3736 | <DD>No. It is to your advantage and mine for you to |
---|
3737 | let me know. If you did not get PHYLIP "officially" from me or from someone |
---|
3738 | authorized by me, but copied a friend's version, you are not in my database of |
---|
3739 | users. You may also have an old version which has since been |
---|
3740 | substantially improved. I don't mind you "bootlegging" |
---|
3741 | PHYLIP (it's free anyway), but |
---|
3742 | you should realize that you may have copied an outdated version. If you are reading this |
---|
3743 | Web page, |
---|
3744 | you can get the latest version just as quickly over Internet. |
---|
3745 | It will help both of us if you get |
---|
3746 | onto my mailing list. If you are on it, then I will give your name to other |
---|
3747 | nearby users when they ask for the names of nearby users, and they are urged to contact you and |
---|
3748 | update your copy. (I benefit by getting a better feel for how many |
---|
3749 | distributions there have been, and having a better mailing list to use to give |
---|
3750 | other users local people to contact). Use the registration form which |
---|
3751 | can be accessed through our web site's registration page. |
---|
3752 | <DT><STRONG>"How do I make a citation to the PHYLIP package in the paper I am |
---|
3753 | writing?"</STRONG> |
---|
3754 | <DD>One way is like this: |
---|
3755 | <P> |
---|
3756 | Felsenstein, J. 2002. PHYLIP (Phylogeny Inference Package) version 3.6a3. |
---|
3757 | <I>Distributed by the author. Department of Genome Sciences, University of |
---|
3758 | Washington, Seattle.</I> |
---|
3759 | <P> |
---|
3760 | or if the editor for whom you are writing insists that the citation must be to |
---|
3761 | a printed publication, you could cite a notice for version 3.2 published in |
---|
3762 | Cladistics: |
---|
3763 | <P> |
---|
3764 | Felsenstein, J. 1989. PHYLIP - Phylogeny Inference Package (Version 3.2). |
---|
3765 | <I>Cladistics</I> <B>5:</B> 164-166. |
---|
3766 | <BR> |
---|
3767 | <P> |
---|
3768 | For a while a printed version of the PHYLIP documentation was available and one |
---|
3769 | could cite that. This is no longer true. Other than that, this is difficult, |
---|
3770 | because I have never written a paper announcing PHYLIP! My 1985b paper in |
---|
3771 | Evolution on the bootstrap method contains a |
---|
3772 | one-paragraph Appendix describing the availability of this package, and that |
---|
3773 | can also be cited as a reference for the package, although it was |
---|
3774 | distributed since 1980 while the bootstrap paper is 1985. A paper on PHYLIP |
---|
3775 | is needed mostly to give people something to cite, as word-of-mouth, references |
---|
3776 | in other people's papers, and electronic newsgroup postings have spread the |
---|
3777 | word about PHYLIP's existence quite effectively. |
---|
3778 | <DT><STRONG>"Can I make copies of PHYLIP available to the students in |
---|
3779 | my class?"</STRONG> |
---|
3780 | <DD>Generally, yes. Read the Copyright notice near the front of |
---|
3781 | this main documentation page. If you charge money for PHYLIP, |
---|
3782 | or use it in a service for which you charge money, you will need |
---|
3783 | to negotiate a royalty. But you can make it freely available |
---|
3784 | and you do not need to get any special permission from us to do so. |
---|
3785 | <DT><STRONG>"How many copies of PHYLIP have been distributed?"</STRONG> |
---|
3786 | <DD>On |
---|
3787 | 27 September, 1996 we reached 5,000 registered installations worldwide. |
---|
3788 | (By now we are well over 15,000 but have lost count for |
---|
3789 | the moment). Of course there are |
---|
3790 | many more people who have got copies from friends. PHYLIP is the most widely |
---|
3791 | distributed phylogeny package. (This situation may reverse itself rapidly |
---|
3792 | once PAUP* is fully released. During the years it was in full distribution, |
---|
3793 | PAUP was ahead in phylogenies published, and the availability of distance and |
---|
3794 | likelihood methods in PAUP* are making it very popular.) |
---|
3795 | In recent years magnetic tape distribution and e-mail distribution of |
---|
3796 | PHYLIP have disappeared, |
---|
3797 | and there has been a big decrease of diskette distributions (down to only |
---|
3798 | one or two per year). But all this has |
---|
3799 | been more than offset by, first, an explosion of distributions by anonymous ftp |
---|
3800 | over Internet, and then a bigger explosion of World Wide Web distributions and |
---|
3801 | registrations (about 6 registrations per day at the moment). |
---|
3802 | <P> |
---|
3803 | <H3>Questions about documentation</H3> |
---|
3804 | <P> |
---|
3805 | <DT><STRONG>"Where can I get a printed version of the PHYLIP documents?"</STRONG> |
---|
3806 | <DD>For the |
---|
3807 | moment, you can only get a printed version by printing it yourself. For |
---|
3808 | versions 3.1 to 3.3 a printed version was sold by Christopher Meacham and Tom |
---|
3809 | Duncan, then at the University Herbarium of the University of California at |
---|
3810 | Berkeley. But they have had to discontinue this as it was too much work. You |
---|
3811 | should be able to print out the documentation files on almost any printer and |
---|
3812 | make yourself a printed version of whichever of them you need. |
---|
3813 | <DT><STRONG>"Why have I been dropped from your newsletter mailing list?"</STRONG> |
---|
3814 | <DD>You haven't. |
---|
3815 | The newsletter was dropped. It simply was too hard to mail it out to such a |
---|
3816 | large mailing list. The last issue of the newsletter was Number 9 in May, |
---|
3817 | 1987. The Listserver News Bulletins that we tried for a while have also been dropped |
---|
3818 | as too hard to keep up to date. I am hoping that our World Wide Web site will take their place. |
---|
3819 | </DL> |
---|
3820 | <P> |
---|
3821 | <DIV ALIGN="CENTER"> |
---|
3822 | <H3>Additional Frequently Asked Questions, or:</B> |
---|
3823 | "Why didn't it occur to you to ...</H3></DIV> |
---|
3824 | <DL> |
---|
3825 | <DT><STRONG>... allow the options to be set on the command line?</STRONG> |
---|
3826 | <DD>We could in Unix and Linux, or somewhat differently in Windows. But |
---|
3827 | there are so many options that this would be difficult, especially |
---|
3828 | when the options require additional information to be supplied such as |
---|
3829 | rates of evolution for many categories of sites. You may be asking this |
---|
3830 | question because you want to automate the operation of PHYLIP programs |
---|
3831 | using batch files (command files) to run in background. If that is the |
---|
3832 | issue, see the section of this main documentation page on |
---|
3833 | "Running the programs in background or under control of a command file". |
---|
3834 | It explains how to set the options using input redirection and a file |
---|
3835 | that has the menu responses as keystrokes. |
---|
3836 | <DT><STRONG>... write these programs in Pascal?"</STRONG> |
---|
3837 | <DD>These programs started out |
---|
3838 | in Pascal in 1980. In 1993 we released both Pascal and C versions. The |
---|
3839 | present version (3.6) and |
---|
3840 | future versions will be C-only. I make fewer mistakes in Pascal and do |
---|
3841 | like the language better than C, but C has overtaken Pascal and Pascal |
---|
3842 | compilers are starting to be hard to find on some machines. Also C is a |
---|
3843 | bit better standardized which makes the number of modifications a user |
---|
3844 | has to make to adapt the programs to their system much less. |
---|
3845 | <DT><STRONG>... write these programs in Java?"</STRONG> |
---|
3846 | <DD>Well, we might. It is not completely clear which of two contenders, |
---|
3847 | C++ and Java, will become more widespread, and which one will gradually |
---|
3848 | fade away. Whichever one is more successful, we will probably want to use |
---|
3849 | for future versions of PHYLIP. As the C compilers that are used to |
---|
3850 | compile PHYLIP are usually also able to compile C++, we will be moving in |
---|
3851 | that direction, but with constant worrying about whether to convert PHYLIP |
---|
3852 | to Java instead.</DD> |
---|
3853 | <DT><STRONG>... forgot about all those inferior systems and just develop PHYLIP for Unix?"</STRONG> |
---|
3854 | <DD>This is self-answering, since the same people first said I should |
---|
3855 | just develop it for Apple II's, then for CP/M Z-80's, then for IBM PCDOS, |
---|
3856 | then for Macintoshes or for Sun |
---|
3857 | workstations, and then for Windows. If I had listened to them and done any one of these, I would |
---|
3858 | have had a very hard time adapting the package to any of the other ones once |
---|
3859 | these folks changed their mind (and most of them did)! |
---|
3860 | <DT><STRONG>... write these programs in PROLOG |
---|
3861 | (or Ada, or Modula-2, or SIMULA, or BCPL, or PL/I, or APL, or LISP)?"</STRONG> |
---|
3862 | <DD>These are all languages I have considered. All |
---|
3863 | have advantages, but they are not really widespread (as are C and C++). |
---|
3864 | <DT><STRONG>... include in the package a program to do the Distance Wagner method, (or |
---|
3865 | successive approximations character weighting, |
---|
3866 | or transformation series analysis)?"</STRONG> |
---|
3867 | <DD>In most cases where I have not |
---|
3868 | included other methods, it is because I decided that they had no substantial |
---|
3869 | advantages over methods that were included (such as the programs FITCH, |
---|
3870 | KITSCH, NEIGHBOR, the <TT>T</TT> option of MIX and DOLLOP, and the "<TT>?</TT>" ancestral |
---|
3871 | states option of the discrete characters parsimony programs). |
---|
3872 | <DT><STRONG>... include in the package ordination methods and more |
---|
3873 | clustering algorithms?"</STRONG> |
---|
3874 | <DD>Because this is <I>not</I> a clustering package, it's a |
---|
3875 | package for phylogeny estimation. Those are different tasks with different |
---|
3876 | objectives and mostly different methods. Mary Kuhner and Jon Yamato have, |
---|
3877 | however, |
---|
3878 | included in NEIGHBOR an option for UPGMA clustering, which will be very |
---|
3879 | similar to KITSCH in results. |
---|
3880 | <DT><STRONG>... include in the package a program to do nucleotide sequence |
---|
3881 | alignment?"</STRONG> |
---|
3882 | <DD>Well, yes, I should |
---|
3883 | have, and this is scheduled to be in future releases. But multiple sequence |
---|
3884 | alignment programs, in the era after Sankoff, Morel, and Cedergren's 1973 |
---|
3885 | classic paper, need to use substantial computer horsepower to estimate the |
---|
3886 | alignment and the tree together (but see Karl Nicholas's program |
---|
3887 | <TT>GeneDoc</TT> or Ward Wheeler and David Gladstein's <TT>MALIGN</TT>, as |
---|
3888 | well as more approximate methods of tree-based alignment used in |
---|
3889 | <TT>ClustalW</TT> or <TT>TreeAlign</TT>). |
---|
3890 | </DL> |
---|
3891 | <P> |
---|
3892 | <DIV ALIGN="CENTER"> |
---|
3893 | <H3>(Fortunately) obsolete questions</H3></DIV> |
---|
3894 | <P> |
---|
3895 | (The following four questions, once |
---|
3896 | common, have finally disappeared, I am pleased to report). |
---|
3897 | <H4>"Why didn't it occur to you to ...</H4></DIV> |
---|
3898 | <DL> |
---|
3899 | <DT><STRONG>... let me log in to your computer in Seattle |
---|
3900 | and copy the files out over a phone line?"</STRONG> |
---|
3901 | <DD>No thanks. It would cost you for a lot of |
---|
3902 | long-distance telephone time, plus a half hour of my time and yours in which |
---|
3903 | I had to explain to you how to log in and do the copying. |
---|
3904 | <DT><STRONG>... send me a listing of your program?"</STRONG> |
---|
3905 | <DD>Damn it, it's not "a program", |
---|
3906 | it's 35 programs, in a great many files. What were you |
---|
3907 | thinking of doing, having 1800-line programs typed in by slaves at your |
---|
3908 | end? If you were going to go to all that trouble why not try network |
---|
3909 | transfer? If you have these then you can print out all the |
---|
3910 | listings you want to and add them to the huge stack of printed output in |
---|
3911 | the corner of your office. |
---|
3912 | <DT><STRONG>... write a magnetic tape in our computer center's favorite format |
---|
3913 | (inverted Lithuanian EBCDIC at 998 bpi)?"</STRONG> |
---|
3914 | <DD>Because the ANSI standard |
---|
3915 | format is the most widely used one, and even though your computer center |
---|
3916 | may pretend it can't read a tape written this way, if you sniff around |
---|
3917 | you will find a utility to read it. It's just a <I>lot</I> easier for me to |
---|
3918 | let you do that work. If I tried to put the tape into your format, I |
---|
3919 | would probably get it wrong anyway. |
---|
3920 | <DT><STRONG>... give us a version of these in FORTRAN?"</STRONG> |
---|
3921 | <DD>Because the |
---|
3922 | programs are <I>far</I> easier to write and debug in C or Pascal, and cannot |
---|
3923 | easily be |
---|
3924 | rewritten into FORTRAN (they make extensive use of recursive calls and |
---|
3925 | of records and pointers). In any case, C is widely available. If you don't |
---|
3926 | have a C compiler or don't know |
---|
3927 | how to use it, you are going to have to learn a language like C or |
---|
3928 | Pascal sooner or later, and the sooner the better. |
---|
3929 | </DL> |
---|
3930 | <P> |
---|
3931 | <A NAME="newfeatures"><HR><P></A> |
---|
3932 | <DIV ALIGN="CENTER"> |
---|
3933 | <H2>New Features in This Version</H2></DIV> |
---|
3934 | <P> |
---|
3935 | Version 3.6 has many new features: |
---|
3936 | <UL><LI>Faster (well, less, slow) likelihood programs. |
---|
3937 | <LI>The DNA and protein likelihood and distance programs allow |
---|
3938 | for rate variation between sites using a gamma distribution of |
---|
3939 | rates among sites, or using a gamma distribution plus a given |
---|
3940 | fraction of sites which are assumed invariant. |
---|
3941 | <LI>A new multistate discrete characters parsimony program, PARS, that |
---|
3942 | handles unordered multistate characters. |
---|
3943 | <LI>The DNAPARS and PARS parsimony programs can infer multifurcating |
---|
3944 | trees, which sensibly reduces the number of tied trees they find. |
---|
3945 | <LI>A new protein sequence likelihood program, <TT>PROML</TT>, |
---|
3946 | and also a version, <TT>PROMLK</TT> which assumes a molecular clock. |
---|
3947 | <LI>A new restriction sites and restriction fragments distance program, |
---|
3948 | <TT>RESTDIST</TT>, that can also be used to compute distances for RAPD and |
---|
3949 | AFLP data. It also allows for gamma-distributed rate variation among |
---|
3950 | DNA sites. |
---|
3951 | <LI>In the DNA likelihood programs, you can now specify different |
---|
3952 | categories of rates of change (such as rates for first, second, and |
---|
3953 | third positions of a coding sequence) and assign them to specific sites. |
---|
3954 | This is in addition to the ability of the program to use the Hidden Markov |
---|
3955 | Model mechanism to allow rates of change to vary across sites in a way that |
---|
3956 | does not ask you to assign which rate goes with which site. |
---|
3957 | <LI>The input files for many of the programs are now |
---|
3958 | simpler, in that they do not contain options information such as specification |
---|
3959 | of weights and categories. That information is now provided in separete |
---|
3960 | files with default names such as <TT>weights</TT> and <TT>categories</TT>. |
---|
3961 | <LI>The DNA likelihood programs can now evaluate multifurcating |
---|
3962 | user trees (option <TT>U</TT>). |
---|
3963 | <LI>All programs that read in user-defined trees now do so from a separate |
---|
3964 | file, whose default name is <TT>intree</TT>, rather than requiring them to |
---|
3965 | be in the input file as before. |
---|
3966 | <LI>The DNA likelihood programs can infer the sequence at ancestral |
---|
3967 | nodes in the interior of the tree. |
---|
3968 | <LI>DNAPARS can now do transversion parsimony. |
---|
3969 | <LI>The bootstrapping program SEQBOOT now can, instead of producing a |
---|
3970 | large file containing multiple data sets, be asked instead |
---|
3971 | to produce a weights file with multiple sets of weights. Many |
---|
3972 | programs in this release can analyze those multiple weights together with |
---|
3973 | the original data set, which saves disk space. |
---|
3974 | <LI>The bootstrapping program SEQBOOT can pass weights and categories |
---|
3975 | information through to a multiple weights file or a multiple categories |
---|
3976 | file. |
---|
3977 | <LI>SEQBOOT can also convert sequence files from Interleaved to |
---|
3978 | Sequential form, or back. |
---|
3979 | <LI>SEQBOOT can also write a sequence data file into a preliminary version of |
---|
3980 | a new XML format which is being defined for sequence alignments, |
---|
3981 | for use by programs that need XML input |
---|
3982 | (none of the current PHYLIP programs yet need this format, but it |
---|
3983 | will be useful in the future). |
---|
3984 | <LI>RETREE can now write tree out into a preliminary version of a new XML tree |
---|
3985 | file format which is in the process of being defined. |
---|
3986 | <LI>The Kishino-Hasegawa-Templeton (KHT) test which compares user-defined |
---|
3987 | trees (option U) is now joined by the Shimodaira-Hasegawa (SH) test |
---|
3988 | (Shimodaira and Hasegawa, 1999) which corrects for comparisons among |
---|
3989 | multiple tests. This avoids a statistical problem with multiple user trees. |
---|
3990 | <LI>CONTRAST can now carry out an analysis that takes into account |
---|
3991 | within-species variation, according to a model similar (but not |
---|
3992 | identical) to that introduced by Michael Lynch (1990) |
---|
3993 | <LI>A new program, TREEDIST, computes the Robinson-Foulds symmetric |
---|
3994 | difference distance among trees. This measures the number of branches in |
---|
3995 | the trees that are present in one but not the other. |
---|
3996 | <LI>FITCH and KITSCH now have an option to make trees by the |
---|
3997 | minimum evolution distance matrix method. |
---|
3998 | <LI>The protein parsimony program PROTPARS now allows you to choose among |
---|
3999 | a number of different genetic codes such as mitochondrial codes. |
---|
4000 | <LI>The consensus tree program CONSENSE |
---|
4001 | can compute the M<SUB>l</SUB> family of consensus tree methods, which |
---|
4002 | generalize the Majority Rule consensus tree method. It can |
---|
4003 | also compute our extended Majority Rule consensus (which is |
---|
4004 | Majority Rule with some additional groups added to resolve the |
---|
4005 | tree more completely), and it can also compute the original |
---|
4006 | Majority Rule consensus tree method which does not add these |
---|
4007 | extra groups. It can also |
---|
4008 | compute the Strict consensus. |
---|
4009 | <LI>The tree-drawing programs DRAWGRAM and DRAWTREE have a number of new |
---|
4010 | options of kinds of file they can produce, including Windows Bitmap files, |
---|
4011 | files for the Idraw and FIG X windows drawing programs, the POV ray-tracer, |
---|
4012 | and even VRML Virtual Reality Markup Language files that will enable you |
---|
4013 | to wander around the tree using a VRML plugin for your browser, such as |
---|
4014 | Cosmo Player. |
---|
4015 | <LI>DRAWTREE now uses my new Equal Daylight Algorithm to draw unrooted |
---|
4016 | trees. This gives a much better-looking tree. Of course, competing programs |
---|
4017 | such as TREEVIEW and PAUP draw trees that look just as good - because they |
---|
4018 | too have started to use my method (with my encouragement). DRAWTREE also |
---|
4019 | can use another algorithm, the n-body method. |
---|
4020 | <LI>The tree-drawing programs can now produce trees across multiple |
---|
4021 | pages, which is handy for looking at trees with very large numbers |
---|
4022 | of tips, and for producing giant diagrams by pasting together |
---|
4023 | multiple sheets of paper. |
---|
4024 | </UL> |
---|
4025 | <P> |
---|
4026 | There are many more, lesser features added as well. |
---|
4027 | <P> |
---|
4028 | <A NAME="future"><HR><P></A> |
---|
4029 | <DIV ALIGN="CENTER"> |
---|
4030 | <H2>Coming Attractions, Future Plans</H2></DIV> |
---|
4031 | <P> |
---|
4032 | There are some obvious deficiencies in this version. Some of these |
---|
4033 | holes will be filled in the next few releases (leading to version |
---|
4034 | 4.0). They include: |
---|
4035 | <OL> |
---|
4036 | <LI>A program to align molecular sequences on a predefined User Tree may |
---|
4037 | ultimately be included. This will allow alignment and phylogeny |
---|
4038 | reconstruction to procede iteratively by successive runs of two programs, one |
---|
4039 | aligning on a tree and the other finding a better tree based on that alignment. |
---|
4040 | In the shorter run a simple two-sequence alignment program may be included. |
---|
4041 | <LI>An interactive "likelihood explorer" for DNA sequences will be written. |
---|
4042 | This will allow, either with or without the assumption of a molecular |
---|
4043 | clock, trees to be varied interactively so that the user can get a much |
---|
4044 | better feel for the shape of the likelihood surface. Likelihood will be |
---|
4045 | able to be plotted against branch lengths for any branch. |
---|
4046 | <LI>If possible we will find some way of correcting for purine/pyrimidine |
---|
4047 | richness variations among species, within the framework of the maximum |
---|
4048 | likelihood programs. That they maximum likelihood programs do not allow |
---|
4049 | for base composition variation is their major limitation at the moment. |
---|
4050 | <LI>The Hidden Markov Model (regional rates) option of DNAML and DNAMLK will |
---|
4051 | be generalized to allow |
---|
4052 | for rates at sites to gradually change as one moves along the tree, |
---|
4053 | in an attempt to implement Fitch and Markowitz's (1970) notion of "covarions". |
---|
4054 | <LI>Obviously we need to start thinking about a more visual mouse/windows |
---|
4055 | interface, but only if that can be used on X windows, Macintoshes, and |
---|
4056 | Windows. |
---|
4057 | <LI>Program PENNY and its relatives will improved so as to run faster |
---|
4058 | and find all most parsimonious trees more quickly. |
---|
4059 | <LI>A more sophisticated compatibility program should be included, if I can |
---|
4060 | find one. |
---|
4061 | <LI>An "evolutionary clock" version of CONTML will be done, and the same |
---|
4062 | may also be done for RESTML. |
---|
4063 | <LI>We are gradually generalizing the tree structures in the programs to |
---|
4064 | infer multifurcating trees as well as bifurcating ones. |
---|
4065 | We should be able to have any program read any tree and know what to do |
---|
4066 | with it, without the user having to fret about whether an unrooted tree was |
---|
4067 | fed to a program that needs a rooted tree. |
---|
4068 | <LI>We are economizing on the size of the source code, and enforcing some |
---|
4069 | standardization of it, by putting frequently used routines in separate |
---|
4070 | files which can be linked into various programs. This will enforce |
---|
4071 | a rather complete standardization of our code. |
---|
4072 | <LI>We will move our code to an object-oriented |
---|
4073 | language, most lkely C++. One could describe the language that version |
---|
4074 | 3.4 was written in as "Pascal", version 3.5 as "Pascal written in C", |
---|
4075 | version 3.6 as "C written in C", and maybe version 4.0 as "C++ written |
---|
4076 | in C" and then 4.1 as "C++ written in C++". At least that scenario |
---|
4077 | is one possibility. |
---|
4078 | </OL> |
---|
4079 | <P> |
---|
4080 | Much of the future development of the package will be in the DNA and protein |
---|
4081 | likelihood programs and the distance matrix programs. This is for several |
---|
4082 | reasons. First, I am more interested in those problems. Second, collection of |
---|
4083 | molecular data is increasing rapidly, and those programs have the most promise |
---|
4084 | for future development |
---|
4085 | for those data. |
---|
4086 | <P> |
---|
4087 | <A NAME="endorsements"><HR><P></A> |
---|
4088 | <DIV ALIGN="CENTER"> |
---|
4089 | <H2>Endorsements</H2></DIV> |
---|
4090 | <P> |
---|
4091 | Here are some comments people have made in print about PHYLIP. Explanatory |
---|
4092 | material in square brackets is my own. They fall naturally into two groups: |
---|
4093 | <P> |
---|
4094 | <H3>From the pages of <I>Cladistics</I>:</H3> |
---|
4095 | <P> |
---|
4096 | <BLOCKQUOTE> |
---|
4097 | "Under no circumstances can we recommend PHYLIP/WAG [their name for the |
---|
4098 | Wagner parsimony option of MIX]." |
---|
4099 | <DIV ALIGN="RIGHT"> |
---|
4100 | Luckow, M. and R. A. Pimentel (1985) |
---|
4101 | </DIV> |
---|
4102 | </BLOCKQUOTE> |
---|
4103 | <P> |
---|
4104 | <BLOCKQUOTE> |
---|
4105 | "PHYLIP has not proven very effective in implementing parsimony (Luckow and |
---|
4106 | Pimentel, 1985)." |
---|
4107 | <DIV ALIGN="RIGHT"> |
---|
4108 | J. Carpenter (1987a) |
---|
4109 | </DIV> |
---|
4110 | </BLOCKQUOTE> |
---|
4111 | <P> |
---|
4112 | <BLOCKQUOTE> |
---|
4113 | "... PHYLIP. This is the computer program where every newsletter concerning |
---|
4114 | it is mostly bug-catching, some of which have been put there by previous |
---|
4115 | corrections. As Platnick (1987) documents, through dint of much labor useful |
---|
4116 | results may be attained with this program, but I would suggest an |
---|
4117 | easier way: FORMAT b:" |
---|
4118 | <DIV ALIGN="RIGHT"> |
---|
4119 | J. Carpenter (1987b) |
---|
4120 | </DIV> |
---|
4121 | </BLOCKQUOTE> |
---|
4122 | <P> |
---|
4123 | <BLOCKQUOTE> |
---|
4124 | "PHYLIP is bug-infested and both less effective and orders of |
---|
4125 | magnitude slower than other programs ...." |
---|
4126 | <DIV ALIGN="RIGHT"> |
---|
4127 | "T. N. Nayenizgani" [J. S. Farris] (1990) |
---|
4128 | </DIV> |
---|
4129 | </BLOCKQUOTE> |
---|
4130 | <P> |
---|
4131 | <BLOCKQUOTE> |
---|
4132 | "Hennig86 [by J. S. Farris] provides such substantial improvements over |
---|
4133 | previously available programs (for both mainframes and microcomputers) that |
---|
4134 | it should now become the tool of choice for practising systematists." |
---|
4135 | <DIV ALIGN="RIGHT"> |
---|
4136 | N. Platnick (1989) |
---|
4137 | </DIV> |
---|
4138 | </BLOCKQUOTE> |
---|
4139 | <P> |
---|
4140 | <H3>... and in the pages of other journals:</H3> |
---|
4141 | <P> |
---|
4142 | <BLOCKQUOTE> |
---|
4143 | "The availability, within PHYLIP of distance, compatibility, maximum likelihood, |
---|
4144 | and generalized `invariants' algorithms (Cavender and Felsenstein, 1987) sets |
---|
4145 | it apart from other packages .... One of the strengths of PHYLIP is its |
---|
4146 | documentation ...." |
---|
4147 | <DIV ALIGN="RIGHT"> |
---|
4148 | Michael J. Sanderson (1990) |
---|
4149 | </DIV> |
---|
4150 | <EM>(Sanderson also criticizes PHYLIP for slowness and inflexibility of its |
---|
4151 | parsimony algorithms, and compliments other packages on their strengths).</EM> |
---|
4152 | </BLOCKQUOTE> |
---|
4153 | <P> |
---|
4154 | <BLOCKQUOTE> |
---|
4155 | "This package of programs has gradually become a basic necessity to anyone |
---|
4156 | working seriously on various aspects of phylogenetic inference .... The package |
---|
4157 | includes more programs than any other known phylogeny package. But it is not |
---|
4158 | just a collection of cladistic and related programs. The package has great |
---|
4159 | value added to the whole, and for this it is unique and of extreme |
---|
4160 | importance .... its various strengths are in the great array of methods |
---|
4161 | provided ...." |
---|
4162 | <DIV ALIGN="RIGHT"> |
---|
4163 | Bernard R. Baum (1989) |
---|
4164 | </DIV> |
---|
4165 | </BLOCKQUOTE> |
---|
4166 | <P> |
---|
4167 | (note also W. Fink's critical remarks (1986) on version 2.8 of PHYLIP). |
---|
4168 | <P> |
---|
4169 | <A NAME="references"><HR><P></A> |
---|
4170 | <DIV ALIGN="CENTER"> |
---|
4171 | <H2>References for the Documentation Files</H2></DIV> |
---|
4172 | <P> |
---|
4173 | In the documentation files that follow I frequently refer to papers |
---|
4174 | in the literature. In order to centralize the references they are given |
---|
4175 | in this section. The chapter by David Swofford, |
---|
4176 | Gary Olsen, Peter Waddell, and David Hillis |
---|
4177 | (1996) is also an excellent review of the issues in phylogeny |
---|
4178 | reconstruction. |
---|
4179 | If you want to find further papers beyond these, my |
---|
4180 | Quarterly Review of Biology review of 1982 and my Annual Review of Genetics |
---|
4181 | review of 1988 list many further references. |
---|
4182 | <P> |
---|
4183 | Adams, E. N. 1972. Consensus techniques and the comparison of |
---|
4184 | taxonomic trees. <I>Systematic Zoology</I> <B>21:</B> 390-397. |
---|
4185 | <P> |
---|
4186 | Adams, E. N. 1986. N-trees as nestings: complexity, similarity, and |
---|
4187 | consensus. <I>Journal of Classification</I> <B>3:</B> 299-317. |
---|
4188 | <P> |
---|
4189 | Archie, J. W. 1989. A randomization test for phylogenetic information in |
---|
4190 | systematic data. <I>Systematic Zoology</I> <B>38:</B> 219-252. |
---|
4191 | <P> |
---|
4192 | Barry, D., and J. A. Hartigan. 1987. Statistical analysis of hominoid |
---|
4193 | molecular evolution. <I>Statistical Science</I> <B>2:</B> 191-210. |
---|
4194 | <P> |
---|
4195 | Baum, B. R. 1989. PHYLIP: Phylogeny Inference Package. Version 3.2. (Software |
---|
4196 | review). <I>Quarterly Review of Biology</I> <B>64:</B> 539-541. |
---|
4197 | <P> |
---|
4198 | Bron, C., and J. Kerbosch. 1973. Algorithm 457: Finding all cliques |
---|
4199 | of an undirected graph. <I>Communications of the Association for Computing Machinery</I> <B>16:</B> 575-577. |
---|
4200 | <P> |
---|
4201 | Camin, J. H., and R. R. Sokal. 1965. A method for deducing branching |
---|
4202 | sequences in phylogeny. <I>Evolution</I> <B>19:</B> 311-326. |
---|
4203 | <P> |
---|
4204 | Carpenter, J. 1987a. A report on the Society for the Study of Evolution |
---|
4205 | workshop "Computer Programs for Inferring Phylogenies". <I>Cladistics</I> <B>3:</B> |
---|
4206 | 363-375. |
---|
4207 | <P> |
---|
4208 | Carpenter, J. 1987b. Cladistics of cladists. <I>Cladistics</I> <B>3:</B> 363-375. |
---|
4209 | <P> |
---|
4210 | Cavalli-Sforza, L. L., and A. W. F. Edwards. 1967. Phylogenetic |
---|
4211 | analysis: models and estimation procedures. <I>Evolution</I> <B>32:</B> 550-570 |
---|
4212 | (also <I>American Journal of Human Genetics</I> <B>19:</B> 233-257). |
---|
4213 | <P> |
---|
4214 | Cavender, J. A. and J. Felsenstein. 1987. Invariants of phylogenies in a |
---|
4215 | simple case with discrete states. <I>Journal of Classification</I> <B>4:</B> 57-71. |
---|
4216 | <P> |
---|
4217 | Churchill, G.A. 1989. Stochastic models for heterogeneous DNA sequences. |
---|
4218 | <I>Bulletin of Mathematical Biology</I> <B>51:</B> 79-94. |
---|
4219 | <P> |
---|
4220 | Conn, E. E. and P. K. Stumpf. 1963. <I>Outlines of Biochemistry.</I> John Wiley |
---|
4221 | and Sons, New York. |
---|
4222 | <P> |
---|
4223 | Day, W. H. E. 1983. Computationally difficult parsimony problems in |
---|
4224 | phylogenetic systematics. <I>Journal of Theoretical Biology</I> <B>103:</B> |
---|
4225 | 429-438. |
---|
4226 | <P> |
---|
4227 | Dayhoff, M. O. and R. V. Eck. 1968. <I>Atlas of Protein Sequence |
---|
4228 | and Structure 1967-1968.</I> National Biomedical Research Foundation, |
---|
4229 | Silver Spring, Maryland. |
---|
4230 | <P> |
---|
4231 | Dayhoff, M. O., R. M. Schwartz, and B. C. Orcutt. 1979. A model of |
---|
4232 | evolutionary change in proteins. pp. 345-352 in <I>Atlas of |
---|
4233 | Protein Sequence and Structure, volume 5, supplement 3, 1978,</I> ed. |
---|
4234 | M. O. Dayhoff. National Biomedical Research Foundation, Silver Spring, Maryland |
---|
4235 | . |
---|
4236 | <P> |
---|
4237 | Dayhoff, M. O. 1979. <I>Atlas of Protein Sequence and Structure, Volume 5, |
---|
4238 | Supplement 3, 1978.</I> National Biomedical Research Foundation, Washington, D.C. |
---|
4239 | <P> |
---|
4240 | DeBry, R. W. and N. A. Slade. 1985. Cladistic analysis of restriction |
---|
4241 | endonuclease cleavage maps within a maximum-likelihood framework. |
---|
4242 | <I>Systematic Zoology</I> <B>34:</B> 21-34. |
---|
4243 | <P> |
---|
4244 | Dempster, A. P., N. M. Laird, and D. B. Rubin. 1977. Maximum |
---|
4245 | likelihood from incomplete data via the EM algorithm. <I>Journal of the Royal Statistical Society B</I> <B>39:</B> 1-38. |
---|
4246 | <P> |
---|
4247 | Eck, R. V., and M. O. Dayhoff. 1966. <I>Atlas of Protein Sequence and |
---|
4248 | Structure 1966.</I> National Biomedical Research Foundation, Silver |
---|
4249 | Spring, Maryland. |
---|
4250 | <P> |
---|
4251 | Edwards, A. W. F., and L. L. Cavalli-Sforza. 1964. Reconstruction of |
---|
4252 | evolutionary trees. pp. 67-76 in <I>Phenetic and Phylogenetic |
---|
4253 | Classification,</I> ed. V. H. Heywood and J. McNeill. Systematics |
---|
4254 | Association Volume No. 6. Systematics Association, London. |
---|
4255 | <P> |
---|
4256 | Estabrook, G. F., C. S. Johnson, Jr., and F. R. McMorris. 1976a. A |
---|
4257 | mathematical foundation for the analysis of character |
---|
4258 | compatibility. <I>Mathematical Biosciences</I> <B>23:</B> 181-187. |
---|
4259 | <P> |
---|
4260 | Estabrook, G. F., C. S. Johnson, Jr., and F. R. McMorris. 1976b. An |
---|
4261 | algebraic analysis of cladistic characters. <I>Discrete Mathematics</I> <B>16:</B> 141-147. |
---|
4262 | <P> |
---|
4263 | Estabrook, G. F., F. R. McMorris, and C. A. Meacham. 1985. Comparison of |
---|
4264 | undirected phylogenetic trees based on subtrees of four evolutionary units. |
---|
4265 | <I>Systematic Zoology</I> <B>34:</B> 193-200. |
---|
4266 | <P> |
---|
4267 | Faith, D. P. 1990. Chance marsupial relationships. <I>Nature</I><B>345:</B> 393-394. |
---|
4268 | <P> |
---|
4269 | Faith, D. P. and P. S. Cranston. 1991. Could a cladogram this short have |
---|
4270 | arisen by chance alone?: On permutation tests for cladistic |
---|
4271 | structure. <I>Cladistics</I> <B>7:</B> 1-28. |
---|
4272 | <P> |
---|
4273 | Farris, J. S. 1977. Phylogenetic analysis under Dollo's Law. <I>Systematic Zoology</I> <B>26:</B> 77-88. |
---|
4274 | <P> |
---|
4275 | Farris, J. S. 1978a. Inferring phylogenetic trees from chromosome |
---|
4276 | inversion data. <I>Systematic Zoology</I> <B>27:</B> 275-284. |
---|
4277 | <P> |
---|
4278 | Farris, J. S. 1981. Distance data in phylogenetic analysis. pp. 3-23 |
---|
4279 | in <I>Advances in Cladistics: Proceedings of the first meeting of the |
---|
4280 | Willi Hennig Society,</I> ed. V. A. Funk and D. R. Brooks. New York |
---|
4281 | Botanical Garden, Bronx, New York. |
---|
4282 | <P> |
---|
4283 | Farris, J. S. 1983. The logical basis of phylogenetic analysis. pp. 1-47 |
---|
4284 | in <I>Advances in Cladistics, Volume 2, Proceedings of the Second Meeting of |
---|
4285 | the Willi Hennig Society.</I> ed. Norman I. Platnick and V. A. Funk. Columbia |
---|
4286 | University Press, New York. |
---|
4287 | <P> |
---|
4288 | Farris, J. S. 1985. Distance data revisited. <I>Cladistics</I> <B>1:</B> 67-85. |
---|
4289 | <P> |
---|
4290 | Farris, J. S. 1986. Distances and statistics. <I>Cladistics</I> <B>2:</B> 144-157. |
---|
4291 | <P> |
---|
4292 | Farris, J. S. ["T. N. Nayenizgani"]. 1990. The systematics association |
---|
4293 | enters its golden years (review of <I>Prospects in Systematics</I>, ed. D. |
---|
4294 | Hawksworth). <I>Cladistics</I> <B>6:</B> 307-314. |
---|
4295 | <P> |
---|
4296 | Felsenstein, J. 1973a. Maximum likelihood and minimum-steps methods |
---|
4297 | for estimating evolutionary trees from data on discrete characters. |
---|
4298 | <I>Systematic Zoology</I> <B>22:</B> 240-249. |
---|
4299 | <P> |
---|
4300 | Felsenstein, J. 1973b. Maximum-likelihood estimation of evolutionary |
---|
4301 | trees from continuous characters. <I>American Journal of Human Genetics</I> <B>25:</B> |
---|
4302 | 471-492. |
---|
4303 | <P> |
---|
4304 | Felsenstein, J. 1978a. The number of evolutionary trees. <I>Systematic Zoology</I> <B>27:</B> 27-33. |
---|
4305 | <P> |
---|
4306 | Felsenstein, J. 1978b. Cases in which parsimony and compatibility |
---|
4307 | methods will be positively misleading. <I>Systematic Zoology</I> <B>27:</B> |
---|
4308 | 401-410. |
---|
4309 | <P> |
---|
4310 | Felsenstein, J. 1979. Alternative methods of phylogenetic inference |
---|
4311 | and their interrelationship. <I>Systematic Zoology</I> <B>28:</B> 49-62. |
---|
4312 | <P> |
---|
4313 | Felsenstein, J. 1981a. Evolutionary trees from DNA sequences: a |
---|
4314 | maximum likelihood approach. <I>Journal of Molecular Evolution</I> <B>17:</B> 368-376. |
---|
4315 | <P> |
---|
4316 | Felsenstein, J. 1981b. A likelihood approach to character weighting |
---|
4317 | and what it tells us about parsimony and compatibility. <I>Biological Journal of the Linnean Society</I> <B>16:</B> 183-196. |
---|
4318 | <P> |
---|
4319 | Felsenstein, J. 1981c. Evolutionary trees from gene frequencies and |
---|
4320 | quantitative characters: finding maximum likelihood estimates. |
---|
4321 | <I>Evolution</I> <B>35:</B> 1229-1242. |
---|
4322 | <P> |
---|
4323 | Felsenstein, J. 1982. Numerical methods for inferring evolutionary |
---|
4324 | trees. <I>Quarterly Review of Biology</I> <B>57:</B> 379-404. |
---|
4325 | <P> |
---|
4326 | Felsenstein, J. 1983b. Parsimony in systematics: biological and |
---|
4327 | statistical issues. <I>Annual Review of Ecology and Systematics</I> <B>14:</B> 313-333. |
---|
4328 | <P> |
---|
4329 | Felsenstein, J. 1984a. Distance methods for inferring phylogenies: a |
---|
4330 | justification. <I>Evolution</I> <B>38:</B> 16-24. |
---|
4331 | <P> |
---|
4332 | Felsenstein, J. 1984b. The statistical approach to inferring |
---|
4333 | evolutionary trees and what it tells us about parsimony and |
---|
4334 | compatibility. pp. 169-191 in: <I>Cladistics: Perspectives in the |
---|
4335 | Reconstruction of Evolutionary History,</I> edited by T. Duncan and T. F. |
---|
4336 | Stuessy. Columbia University Press, New York. |
---|
4337 | <P> |
---|
4338 | Felsenstein, J. 1985a. Confidence limits on phylogenies with a molecular |
---|
4339 | clock. <I>Systematic Zoology</I> <B>34:</B> 152-161. |
---|
4340 | <P> |
---|
4341 | Felsenstein, J. 1985b. Confidence limits on phylogenies: an approach |
---|
4342 | using the bootstrap. <I>Evolution</I> <B>39:</B> 783-791. |
---|
4343 | <P> |
---|
4344 | Felsenstein, J. 1985c. Phylogenies from gene frequencies: a statistical |
---|
4345 | problem. <I>Systematic Zoology</I> <B>34:</B> 300-311. |
---|
4346 | <P> |
---|
4347 | Felsenstein, J. 1985d. Phylogenies and the comparative method. <I>American Naturalist</I> <B>125:</B> 1-12. |
---|
4348 | <P> |
---|
4349 | Felsenstein, J. 1986. Distance methods: a reply to Farris. <I>Cladistics</I> <B>2:</B> |
---|
4350 | 130-144. |
---|
4351 | <P> |
---|
4352 | Felsenstein, J. and E. Sober. 1986. Parsimony and likelihood: an |
---|
4353 | exchange. <I>Systematic Zoology</I> <B>35:</B> 617-626. |
---|
4354 | <P> |
---|
4355 | Felsenstein, J. 1988a. Phylogenies and quantitative characters. <I>Annual Review of Ecology and Systematics</I> <B>19:</B> 445-471. |
---|
4356 | <P> |
---|
4357 | Felsenstein, J. 1988b. Phylogenies from molecular sequences: inference and |
---|
4358 | reliability. <I>Annual Review of Genetics</I> <B>22:</B> 521-565. |
---|
4359 | <P> |
---|
4360 | Felsenstein, J. 1992. Phylogenies from restriction sites, a |
---|
4361 | maximum likelihood approach. <I>Evolution</I> <B>46:</B> 159-173. |
---|
4362 | <P> |
---|
4363 | Felsenstein, J. and G. A. Churchill. 1996. |
---|
4364 | A hidden Markov model approach to variation among sites in rate of evolution |
---|
4365 | <I>Molecular Biology and Evolution</I> <B>13:</B> 93-104. |
---|
4366 | <P> |
---|
4367 | Fink, W. L. 1986. Microcomputers and phylogenetic analysis. <I>Science</I> <B>234:</B> 1135-1139. |
---|
4368 | <P> |
---|
4369 | Fitch, W. M., and E. Markowitz. 1970. An improved method for determining |
---|
4370 | codon variability in a gene and its application to the rate of fixation of |
---|
4371 | mutations in evolution. <I>Biochemical Genetics</I> <B>4:</B> 579-593. |
---|
4372 | <P> |
---|
4373 | Fitch, W. M., and E. Margoliash. 1967. Construction of phylogenetic |
---|
4374 | trees. <I>Science</I> <B>155:</B> 279-284. |
---|
4375 | <P> |
---|
4376 | Fitch, W. M. 1971. Toward defining the course of evolution: minimum |
---|
4377 | change for a specified tree topology. <I>Systematic Zoology</I> <B>20:</B> 406-416. |
---|
4378 | <P> |
---|
4379 | Fitch, W. M. 1975. Toward finding the tree of maximum parsimony. pp. 189-230 |
---|
4380 | in Proceedings of the Eighth International Conference on Numerical Taxonomy, |
---|
4381 | ed. G. F. Estabrook. W. H. Freeman, San Francisco. |
---|
4382 | <P> |
---|
4383 | Fitch, W. M. and E. Markowitz. 1970. An improved method for determining |
---|
4384 | codon variability and its application to the rate of fixation of mutations |
---|
4385 | in evolution. <I>Biochemical Genetics</I> <B>4:</B> 579-593. |
---|
4386 | <P> |
---|
4387 | George, D. G., L. T. Hunt, and W. C. Barker. 1988. Current methods in |
---|
4388 | sequence comparison and analysis. pp. 127-149 in Macromolecular Sequencing |
---|
4389 | and Synthesis, ed. D. H. Schlesinger. Alan R. Liss, New York. |
---|
4390 | <P> |
---|
4391 | Gomberg, D. 1966. "Bayesian" post-diction in an evolution process. |
---|
4392 | unpublished manuscript: University of Pavia, Italy. |
---|
4393 | <P> |
---|
4394 | Graham, R. L., and L. R. Foulds. 1982. Unlikelihood that minimal |
---|
4395 | phylogenies for a realistic biological study can be constructed in |
---|
4396 | reasonable computational time. <I>Mathematical Biosciences</I> <B>60:</B> 133-142. |
---|
4397 | <P> |
---|
4398 | Hasegawa, M. and T. Yano. 1984a. Maximum likelihood method of phylogenetic |
---|
4399 | inference from DNA sequence data. <I>Bulletin of the Biometric Society of Japan</I> No. 5: 1-7. |
---|
4400 | <P> |
---|
4401 | Hasegawa, M. and T. Yano. 1984b. Phylogeny and classification of |
---|
4402 | Hominoidea as inferred from DNA sequence data. <I>Proceedings of the Japan Academy</I> <B>60 B:</B> 389-392. |
---|
4403 | <P> |
---|
4404 | Hasegawa, M., Y. Iida, T. Yano, F. Takaiwa, and M. Iwabuchi. 1985a. |
---|
4405 | Phylogenetic relationships among eukaryotic kingdoms as inferred from |
---|
4406 | ribosomal RNA sequences. Journal of Molecular Evolution 22: 32-38. |
---|
4407 | <P> |
---|
4408 | Hasegawa, M., H. Kishino, and T. Yano. 1985b. Dating of the human-ape |
---|
4409 | splitting by a molecular clock of mitochondrial DNA. Journal of Molecular |
---|
4410 | Evolution 22: 160-174. |
---|
4411 | <P> |
---|
4412 | Hendy, M. D., and D. Penny. 1982. Branch and bound algorithms to |
---|
4413 | determine minimal evolutionary trees. <I>Mathematical Biosciences</I> <B>59:</B> 277-290. |
---|
4414 | <P> |
---|
4415 | Higgins, D. G. and P. M. Sharp. 1989. Fast and sensitive |
---|
4416 | multiple sequence alignments on a microcomputer. <I>Computer Applications in the Biological Sciences (CABIOS)</I> <B>5:</B> 151-153. |
---|
4417 | <P> |
---|
4418 | Hochbaum, D. S. and A. Pathria. 1997. Path costs in evolutionary |
---|
4419 | tree reconstruction. <I>Journal of Computational Biology</I> <B>4:</B> 163-175. |
---|
4420 | <P> |
---|
4421 | Holmquist, R., M. M. Miyamoto, and M. Goodman. 1988. Higher-primate |
---|
4422 | phylogeny - why can't we decide? <I>Molecular Biology and Evolution</I> <B>5:</B> 201-216. |
---|
4423 | <P> |
---|
4424 | Inger, R. F. 1967. The development of a phylogeny of frogs. |
---|
4425 | <I>Evolution</I> <B>21:</B> 369-384. |
---|
4426 | <P> |
---|
4427 | Jin, L. and M. Nei. 1990. Limitations of the evolutionary parsimony method |
---|
4428 | of phylogenetic analysis. <I>Molecular Biology and Evolution</I> <B>7:</B> 82-102. |
---|
4429 | <P> |
---|
4430 | Jones, D. T., W. R. Taylor and J. M. Thornton. 1992. The rapid generation of |
---|
4431 | mutation data matrices from protein sequences. <I>Computer Applications |
---|
4432 | in the Biosciences (CABIOS)</I> <B>8:</B> 275-282. |
---|
4433 | <P> |
---|
4434 | Jukes, T. H. and C. R. Cantor. 1969. Evolution of protein molecules. pp. |
---|
4435 | 21-132 in Mammalian Protein Metabolism, ed. H. N. Munro. Academic Press, New |
---|
4436 | York. |
---|
4437 | <P> |
---|
4438 | Kidd, K. K. and L. A. Sgaramella-Zonta. 1971. Phylogenetic analysis: concepts |
---|
4439 | and methods. <I>American Journal of Human Genetics</I> <B>23:</B> 235-252. |
---|
4440 | <P> |
---|
4441 | Kim, J. and M. A. Burgman. 1988. Accuracy of phylogenetic-estimation |
---|
4442 | methods using simulated allele-frequency data. <I>Evolution</I> <B>42:</B> 596-602. |
---|
4443 | <P> |
---|
4444 | Kimura, M. 1980. A simple model for estimating evolutionary rates of base |
---|
4445 | substitutions through comparative studies of nucleotide sequences. <I>Journal of Molecular Evolution</I> <B>16:</B> 111-120. |
---|
4446 | <P> |
---|
4447 | Kimura, M. 1983. The Neutral Theory of Molecular Evolution. Cambridge |
---|
4448 | University Press, Cambridge. |
---|
4449 | <P> |
---|
4450 | Kingman, J. F. C. 1982a. The coalescent. <I>Stochastic Processes and Their Applications</I> <B>13:</B> 235-248. |
---|
4451 | <P> |
---|
4452 | Kingman, J. F. C. 1982b. On the genealogy of large populations. <I>Journal of Applied Probability</I> <B>19A:</B> 27-43. |
---|
4453 | <P> |
---|
4454 | Kishino, H. and M. Hasegawa. 1989. Evaluation of the maximum likelihood |
---|
4455 | estimate of the evolutionary tree topologies from DNA sequence data, and the |
---|
4456 | branching order in Hominoidea. <I>Journal of Molecular Evolution</I> <B>29:</B> 170-179. |
---|
4457 | <P> |
---|
4458 | Kluge, A. G., and J. S. Farris. 1969. Quantitative phyletics and the |
---|
4459 | evolution of anurans. <I>Systematic Zoology</I> <B>18:</B> 1-32. |
---|
4460 | <P> |
---|
4461 | Kuhner, M. K. and J. Felsenstein. 1994. A simulation comparison of |
---|
4462 | phylogeny algorithms under equal and unequal evolutionary rates. |
---|
4463 | <I>Molecular Biology and Evolution</I> <B>11:</B> 459-468 (Erratum <B>12:</B> 525 1995). |
---|
4464 | <P> |
---|
4465 | Künsch, H. R. 1989. The jackknife and the bootstrap for general stationary |
---|
4466 | observations. <I>Annals of Statistics</I> <B>17:</B> 1217-1241. |
---|
4467 | <P> |
---|
4468 | Lake, J. A. 1987. A rate-independent technique for analysis of nucleic acid |
---|
4469 | sequences: evolutionary parsimony. <I>Molecular Biology and Evolution</I> <B>4:</B> 167-191. |
---|
4470 | <P> |
---|
4471 | Lake, J. A. 1994. Reconstructing evolutionary trees from DNA and protein |
---|
4472 | sequences: paralinear distances. |
---|
4473 | <I>Proceedings of the Natonal Academy of Sciences, USA</I> <B>91:</B> 1455-1459. |
---|
4474 | <P> |
---|
4475 | Le Quesne, W. J. 1969. A method of selection of characters in |
---|
4476 | numerical taxonomy. <I>Systematic Zoology</I> <B>18:</B> 201-205. |
---|
4477 | <P> |
---|
4478 | Le Quesne, W. J. 1974. The uniquely evolved character concept and its |
---|
4479 | cladistic application. <I>Systematic Zoology</I> <B>23:</B> 513-517. |
---|
4480 | <P> |
---|
4481 | Lewis, H. R., and C. H. Papadimitriou. 1978. The efficiency of |
---|
4482 | algorithms. <I>Scientific American</I> <B>238:</B> 96-109 (January issue) |
---|
4483 | <P> |
---|
4484 | Lockhart, P. J., M. A. Steel, M. D. Hendy, and D. Penny. 1994. |
---|
4485 | Recovering evolutionary trees under a more realistic model of sequence |
---|
4486 | evolution. <I>Molecular Biology and Evolution</I> <B>11:</B> 605-612. |
---|
4487 | <P> |
---|
4488 | López-Martínez, N.; Álvarez-Sierra, |
---|
4489 | M. A. & García Moreno, E. 1986. Paleontología y |
---|
4490 | Bioestratigrafía |
---|
4491 | (Micromamíferos) del Mioceno medio-superior del Sector Central de |
---|
4492 | la Cuenca del Duero. <I>Stvdia Geologica Salmanticensia</I> |
---|
4493 | <B>22:</B> 146-191. |
---|
4494 | <P> |
---|
4495 | Luckow, M. and D. Pimentel. 1985. An empirical comparison of |
---|
4496 | numerical Wagner computer programs. <I>Cladistics</I> <B>1:</B> 47-66. |
---|
4497 | <P> |
---|
4498 | Lynch, M. 1990. Methods for the analysis of comparative data in evolutionary |
---|
4499 | biology. <I>Evolution</I> <B>45:</B> 1065-1080. |
---|
4500 | <P> |
---|
4501 | Maddison, D. R. 1991. The discovery and importance of multiple islands of |
---|
4502 | most-parsimonious trees. <I>Systematic Zoology</I> <B>40:</B> 315-328. |
---|
4503 | <P> |
---|
4504 | Margush, T. and F. R. McMorris. 1981. Consensus n-trees. <I>Bulletin of Mathematical Biology</I> <B>43:</B> 239-244. |
---|
4505 | <P> |
---|
4506 | Nelson, G. 1979. Cladistic analysis and synthesis: principles and definitions, |
---|
4507 | with a historical note on Adanson's <I>Familles des Plantes</I> |
---|
4508 | (1763-1764). <I>Systematic Zoology</I> <B>28:</B> 1-21. |
---|
4509 | <P> |
---|
4510 | Nei, M. 1972. Genetic distance between populations. <I>American Naturalist</I> <B>106:</B> 283-292. |
---|
4511 | <P> |
---|
4512 | Nei, M. and W.-H. Li. 1979. Mathematical model for studying genetic variation |
---|
4513 | in terms of restriction endonucleases. <I>Proceedings of the National Academy of Sciences, USA</I> <B>76:</B> 5269-5273. |
---|
4514 | <P> |
---|
4515 | Page, R. D. M. 1989. Comments on component-compatibility in historical |
---|
4516 | biogeography. <I>Cladistics</I> <B>5:</B> 167-182. |
---|
4517 | <P> |
---|
4518 | Penny, D. and M. D. Hendy. 1985. Testing methods of evolutionary tree |
---|
4519 | construction. <I>Cladistics</I> <B>1:</B> 266-278. |
---|
4520 | <P> |
---|
4521 | Platnick, N. 1987. An empirical comparison of microcomputer parsimony |
---|
4522 | programs. <I>Cladistics</I> <B>3:</B> 121-144. |
---|
4523 | <P> |
---|
4524 | Platnick, N. 1989. An empirical comparison of microcomputer parsimony |
---|
4525 | programs. II. <I>Cladistics</I> <B>5:</B> 145-161. |
---|
4526 | <P> |
---|
4527 | Reynolds, J. B., B. S. Weir, and C. C. Cockerham. 1983. Estimation of the |
---|
4528 | coancestry coefficient: basis for a short-term genetic |
---|
4529 | distance. <I>Genetics</I> <B>105:</B> 767-779. |
---|
4530 | <P> |
---|
4531 | Robinson, D. F. and L. R. Foulds. 1981. Comparison of phylogenetic trees. |
---|
4532 | <I>Mathematical Biosciences</I> <B>53:</B> 131-147. |
---|
4533 | <P> |
---|
4534 | Rohlf, F. J. and M. C. Wooten. 1988. Evaluation of the restricted maximum |
---|
4535 | likelihood method for estimating phylogenetic trees using simulated allele- |
---|
4536 | frequency data. <I>Evolution</I> <B>42:</B> 581-595. |
---|
4537 | <P> |
---|
4538 | Rzhetsky, A., and M. Nei. 1992. Statistical properties of the ordinary |
---|
4539 | least-squares, generalized least-squares, and minimum-evolution methods |
---|
4540 | of phylogenetic inference. <I>Journal of Molecular Evolution</I> <B>35:</B> |
---|
4541 | 367-375 . |
---|
4542 | <P> |
---|
4543 | Saitou, N., Nei, M. 1987. The neighbor-joining method: a new method for |
---|
4544 | reconstructing phylogenetic trees. <I>Molecular Biology and Evolution</I> <B>4:</B> 406-425. |
---|
4545 | <P> |
---|
4546 | Sanderson, M. J. 1990. Flexible phylogeny reconstruction: a review of |
---|
4547 | phylogenetic inference packages using parsimony. <I>Systematic Zoology</I> <B>39:</B> 414-420. |
---|
4548 | <P> |
---|
4549 | Sankoff, D. D., C. Morel, R. J. Cedergren. 1973. Evolution of 5S RNA and |
---|
4550 | the nonrandomness of base replacement. <I>Nature New Biology</I> <B>245:</B> 232-234. |
---|
4551 | <P> |
---|
4552 | Shimodaira, H. and M. Hasegawa. 1999. Multiple comparisons of log-likelihoods |
---|
4553 | with applications to phylogenetic inference. <EM>Molecular Biology and |
---|
4554 | Evolution</EM> <B>16:</B> 1114-1116. |
---|
4555 | <P> |
---|
4556 | Sokal, R. R. and P. H. A. Sneath. 1963. <I>Principles of Numerical Taxonomy.</I> |
---|
4557 | W. H. Freeman, San Francisco. |
---|
4558 | <P> |
---|
4559 | Smouse, P. E. and W.-H. Li. 1987. Likelihood analysis of mitochondrial |
---|
4560 | restriction-cleavage patterns for the human-chimpanzee-gorilla trichotomy. |
---|
4561 | <I>Evolution</I> <B>41:</B> 1162-1176. |
---|
4562 | <P> |
---|
4563 | Sober, E. 1983a. Parsimony in systematics: philosophical issues. <I>Annual Review of Ecology and Systematics</I> <B>14:</B> 335-357. |
---|
4564 | <P> |
---|
4565 | Sober, E. 1983b. A likelihood justification of parsimony. <I>Cladistics</I> <B>1:</B> 209-233. |
---|
4566 | <P> |
---|
4567 | Sober, E. 1988. <I>Reconstructing the Past: Parsimony, Evolution, |
---|
4568 | and Inference.</I> MIT Press, Cambridge, Massachusetts. |
---|
4569 | <P> |
---|
4570 | Sokal, R. R., and P. H. A. Sneath. 1963. <I>Principles of Numerical |
---|
4571 | Taxonomy.</I> W. H. Freeman, San Francisco. |
---|
4572 | <P> |
---|
4573 | Steel, M. A. 1994. Recovering a tree from the Markov leaf colourations |
---|
4574 | it generates under a Markov model. <I>Applied Mathematics Letters</I> |
---|
4575 | <B>7:</B> 19-23. |
---|
4576 | <P> |
---|
4577 | Studier, J. A. and K. J. Keppler. 1988. A note on the neighbor-joining |
---|
4578 | algorithm of Saitou and Nei. <I>Molecular Biology and Evolution</I><B>5:</B> 729-731. |
---|
4579 | <P> |
---|
4580 | Swofford, D. L. and G. J. Olsen. 1990. Phylogeny reconstruction. Chapter |
---|
4581 | 11, pages 411-501 in <I>Molecular Systematics,</I> ed. D. M. Hillis and C. Moritz. |
---|
4582 | Sinauer Associates, Sunderland, Massachusetts. |
---|
4583 | <P> |
---|
4584 | Swofford, D. L., G. J. Olsen, P. J. Waddell, and D. M. Hillis. 1996. |
---|
4585 | Phylogenetic inference. pp. 407-514 in <I>Molecular Systematics</I>, 2nd ed., |
---|
4586 | ed. D. M. Hillis, C. Moritz, and B. K. Mable. Sinauer Associates, Sunderland, |
---|
4587 | Massachusetts. |
---|
4588 | <P> |
---|
4589 | Templeton, A. R. 1983. Phylogenetic inference from restriction endonuclease |
---|
4590 | cleavage site maps with particular reference to the evolution of humans and the |
---|
4591 | apes. <I>Evolution</I> <B>37:</B> 221-244. |
---|
4592 | <P> |
---|
4593 | Thompson, E. A. 1975. <I>Human Evolutionary Trees.</I> Cambridge University |
---|
4594 | Press, Cambridge. |
---|
4595 | <P> |
---|
4596 | Wu, C. F. J. 1986. Jackknife, bootstrap and other resampling plans in |
---|
4597 | regression analysis. <I>Annals of Statistics</I> <B>14:</B> 1261-1295. |
---|
4598 | <P> |
---|
4599 | Yang, Z. 1993. Maximum-likelihood estimation of phylogeny from DNA sequences |
---|
4600 | when substitution rates differ over sites. <I>Molecular Biology and |
---|
4601 | Evolution</I> <B>10:</B> 1396-1401. |
---|
4602 | <P> |
---|
4603 | Yang, Z. 1994. Maximum likelihood phylogenetic estimation from DNA sequences |
---|
4604 | with variable rates over sites: approximate methods. <I>Journal of Molecular |
---|
4605 | Evolution</I> <B>39:</B> 306-314. |
---|
4606 | <P> |
---|
4607 | Yang, Z. 1995. A space-time process model for the evolution of DNA sequences. |
---|
4608 | <I>Genetics</I> <B>139:</B> 993-1005. |
---|
4609 | <P> |
---|
4610 | <DIV ALIGN="CENTER"> |
---|
4611 | <H2>Credits</H2></DIV> |
---|
4612 | <P> |
---|
4613 | Over the years various granting agencies have contributed to the |
---|
4614 | support of the PHYLIP project (at first without knowing it). They are: |
---|
4615 | <P> |
---|
4616 | <TABLE CELLPADDING=3 BORDER="1"> |
---|
4617 | <TR><TD ALIGN="LEFT">Years</TD> |
---|
4618 | <TD ALIGN="LEFT">Agency</TD> |
---|
4619 | <TD ALIGN="LEFT">Grant or Contract Number</TD> |
---|
4620 | </TR> |
---|
4621 | <TR><TD ALIGN="LEFT">1999-2002</TD> |
---|
4622 | <TD ALIGN="LEFT">NSF</TD> |
---|
4623 | <TD ALIGN="LEFT">BIR-9527687</TD> |
---|
4624 | </TR> |
---|
4625 | <TR><TD ALIGN="LEFT">1999-2002</TD> |
---|
4626 | <TD ALIGN="LEFT">NIH NIGMS</TD> |
---|
4627 | <TD ALIGN="LEFT">R01 GM51929-04</TD> |
---|
4628 | </TR> |
---|
4629 | <TR><TD ALIGN="LEFT">1999-2001</TD> |
---|
4630 | <TD ALIGN="LEFT">NIH NIMH</TD> |
---|
4631 | <TD ALIGN="LEFT">R01 HG01989-01</TD> |
---|
4632 | </TR> |
---|
4633 | <TR><TD ALIGN="LEFT">1995-1999</TD> |
---|
4634 | <TD ALIGN="LEFT">NIH NIGMS</TD> |
---|
4635 | <TD ALIGN="LEFT">R01 GM51929-01</TD> |
---|
4636 | </TR> |
---|
4637 | <TR><TD ALIGN="LEFT">1992-1995 </TD> |
---|
4638 | <TD ALIGN="LEFT">National Science Foundation</TD> |
---|
4639 | <TD ALIGN="LEFT">DEB-9207558</TD> |
---|
4640 | </TR> |
---|
4641 | <TR><TD ALIGN="LEFT">1992-1994</TD> |
---|
4642 | <TD ALIGN="LEFT">NIH NIGMS Shannon Award</TD> |
---|
4643 | <TD ALIGN="LEFT">2 R55 GM41716-04</TD> |
---|
4644 | </TR> |
---|
4645 | <TR><TD ALIGN="LEFT"> |
---|
4646 | 1989-1992</TD> |
---|
4647 | <TD ALIGN="LEFT">NIH NIGMS</TD> |
---|
4648 | <TD ALIGN="LEFT">1 R01-GM41716-01</TD> |
---|
4649 | </TR> |
---|
4650 | <TR><TD ALIGN="LEFT"> |
---|
4651 | 1990-1992</TD> |
---|
4652 | <TD ALIGN="LEFT">National Science Foundation</TD> |
---|
4653 | <TD ALIGN="LEFT">BSR-8918333</TD> |
---|
4654 | </TR> |
---|
4655 | <TR><TD ALIGN="LEFT"> |
---|
4656 | 1987-1990</TD> |
---|
4657 | <TD ALIGN="LEFT">National Science Foundation</TD> |
---|
4658 | <TD ALIGN="LEFT">BSR-8614807</TD> |
---|
4659 | </TR> |
---|
4660 | <TR><TD ALIGN="LEFT">1979-1987</TD> |
---|
4661 | <TD ALIGN="LEFT">U.S. Department of Energy</TD> |
---|
4662 | <TD ALIGN="LEFT">DE-AM06-76RLO2225 TA DE-AT06-76EV71005</TD> |
---|
4663 | </TR> |
---|
4664 | </TABLE> |
---|
4665 | <P> |
---|
4666 | I am particularly grateful to program administrators William Moore, |
---|
4667 | Irene Eckstrand, Peter Arzberger, and Conrad Istock, who have |
---|
4668 | gone beyond the call of duty to make sure that PHYLIP continued. |
---|
4669 | <P> |
---|
4670 | Booby prizes for funding are awarded to: |
---|
4671 | <UL><LI>The people at the U.S. Department of Energy who, in 1987, decided they |
---|
4672 | were "not interested in phylogenies", |
---|
4673 | <LI>The members of the Systematics Panel of NSF who twice (in 1989 and 1992) |
---|
4674 | positively recommended that my applications <I>not</I> be funded. I am very |
---|
4675 | grateful to program director William Moore for courageously overruling |
---|
4676 | their decision the first time. The 1992 NSF Systematics Panel could claim |
---|
4677 | no credit for PHYLIP whatsoever. |
---|
4678 | <LI>The members of the 1992 Genetics Study Section of NIH who rated my |
---|
4679 | proposal in the 53rd percentile (I don't know if that's 53rd from |
---|
4680 | the top or the bottom, but does it matter?), thus denying it funding. I am, |
---|
4681 | however, grateful to the NIGMS administrators, especially Irene Eckstrand, |
---|
4682 | who supported giving me |
---|
4683 | a "Shannon award" partially funding my work for a period in spite of this |
---|
4684 | rating. |
---|
4685 | </UL> |
---|
4686 | <P> |
---|
4687 | The original Camin-Sokal parsimony program and the polymorphism parsimony |
---|
4688 | program were written by me in 1977 and 1978. They were Pascal versions of |
---|
4689 | earlier FORTRAN programs I wrote in 1966 and 1967 using the same algorithm to |
---|
4690 | infer phylogenies under the Camin-Sokal and polymorphism parsimony |
---|
4691 | criteria. Harvey Motulsky worked for me as a programmer in 1971 and wrote |
---|
4692 | FORTRAN programs to carry out the Camin-Sokal, Dollo, and polymorphism |
---|
4693 | methods (he is known these days as the author of the scientific |
---|
4694 | graphing package GraphPad). But most of the early work on PHYLIP other than my own was by Jerry |
---|
4695 | Shurman and Mark Moehring. Jerry Shurman worked for me in the summers of |
---|
4696 | 1979 and 1980, and Mark Moehring worked for me in the summers of 1980 and |
---|
4697 | 1981. Both wrote original versions of many of the other programs, based on |
---|
4698 | the original versions of my Camin-Sokal parsimony program and POLYM. These |
---|
4699 | formed the basis of Version 1 of the Package, first distributed in October, |
---|
4700 | 1980. |
---|
4701 | <P> |
---|
4702 | Version 2, released in the spring of 1982, involved a fairly complete rewrite |
---|
4703 | by me of many of those programs. Hisashi Horino for |
---|
4704 | version 3.3 reworked some parts of the programs CLIQUE and CONSENSE |
---|
4705 | to make their output more comprehensible, and has added some code to the |
---|
4706 | tree-drawing programs DRAWGRAM and DRAWTREE as well. He also worked on |
---|
4707 | some of the Drawtree and Drawgram driver code. |
---|
4708 | <P> |
---|
4709 | My more recent part-time programmers Akiko Fuseki, Sean Lamont, |
---|
4710 | Andrew Keeffe, Daniel Yek, Dan Fineman, Patrick Colacurcio, |
---|
4711 | Mike Palczewski, and Doug Buxton gave |
---|
4712 | me substantial help with the current release, and their excellent work is |
---|
4713 | greatly appreciated. Akiko in particular did much of the hard work of adding |
---|
4714 | new features and changing old ones in the 3.4 and 3.5 releases, |
---|
4715 | centralized many of the C routines in support files, and is responsible for the |
---|
4716 | new versions of DNAPARS and PARS. Andrew |
---|
4717 | prepared the Macintosh version, wrote RETREE, added the ray-tracing |
---|
4718 | and PICT code to the DRAW programs and has since done much other work. Sean |
---|
4719 | was central to the conversion to |
---|
4720 | C, and tested it extensively. My postdoctoral fellow |
---|
4721 | Mary Kuhner and her associate Jon Yamato created NEIGHBOR, the |
---|
4722 | neighbor-joining and UPGMA program, for the current release, for which I am |
---|
4723 | also grateful (Naruya Saitou and Li Jin kindly encouraged us to use some of the |
---|
4724 | code from their own implementation of this method). |
---|
4725 | <P> |
---|
4726 | I am very grateful to over 200 |
---|
4727 | users for algorithmic suggestions, complaints about features (or lack of |
---|
4728 | features), and information about the behavior of their operating systems |
---|
4729 | and compilers. A list of some of their names will be found at the credits page |
---|
4730 | on the PHYLIP web site. |
---|
4731 | <P> |
---|
4732 | A major contribution to this package has been made by others |
---|
4733 | writing programs or parts of programs. Chris Meacham contributed the |
---|
4734 | important program FACTOR, long demanded by users, and the even more |
---|
4735 | important ones PLOTREE and PLOTGRAM. Important parts of the code in |
---|
4736 | DRAWGRAM and DRAWTREE were taken over from those two programs. |
---|
4737 | Kent Fiala wrote |
---|
4738 | function "reroot" to do outgroup-rooting, which was an essential part of many |
---|
4739 | programs in earlier versions. Someone at the Western Australia Institute of |
---|
4740 | Technology suggested the name PHYLIP (by writing it the label on the |
---|
4741 | outside of a magnetic tape), but they all seem to deny having done |
---|
4742 | so (and I've lost the relevant letter). |
---|
4743 | <P> |
---|
4744 | The distribution of the package also owes much to Buz Wilson and Willem Ellis, |
---|
4745 | who put a lot of effort into the early distributions of the PCDOS and |
---|
4746 | Macintosh versions respectively. Christopher Meacham and Tom Duncan for three |
---|
4747 | versions distributed a printed version of these documentation files (they are no |
---|
4748 | longer able to do so), and I am |
---|
4749 | very grateful to them for those efforts. William H.E. Day and F. James Rohlf |
---|
4750 | have been very helpful in setting up the listserver news bulletin service which |
---|
4751 | succeeded the PHYLIP newsletter for a time. |
---|
4752 | <P> |
---|
4753 | I also wish to thank the people who have made computer resources available to |
---|
4754 | me, mostly in the loan of use of microcomputers. These include Jeremy |
---|
4755 | Field, Clem Furlong, Rick Garber, Dan Jacobson, Rochelle Kochin, Monty Slatkin, |
---|
4756 | Jim Archie, Jim Thomas, and George Gilchrist. |
---|
4757 | <P> |
---|
4758 | I should also note the computers used to develop this package: |
---|
4759 | These include a CDC 6400, two DECSystem 1090s, my trusty old SOL-20, my |
---|
4760 | old Osborne-1, a VAX 11/780, a VAX 8600, a MicroVAX I, a DECstation |
---|
4761 | 3100, my old Toshiba 1100+, my |
---|
4762 | DECstation 5000/200, a DECstation 5000/125, a Compudyne 486DX/33, a |
---|
4763 | Trinity Genesis 386SX, a Zenith Z386, a Mac Classic, a DEC Alphastation 400 |
---|
4764 | 4/233, a Pentium 120, a Pentium 200, a PowerMac 6100, and a Macintosh G3. |
---|
4765 | (One of the reasons |
---|
4766 | we have been successful in achieving compatibility between different computer |
---|
4767 | systems is that I have had to run them myself under so many different operating |
---|
4768 | systems and compilers). |
---|
4769 | <P> |
---|
4770 | <A NAME="otherprograms"><HR><P></A> |
---|
4771 | <DIV ALIGN="CENTER"> |
---|
4772 | <H2>Other Phylogeny Programs Available Elsewhere</H2></DIV> |
---|
4773 | <P> |
---|
4774 | A comprehensive list of phylogeny programs is maintained at the PHYLIP |
---|
4775 | web site on the Phylogeny Programs pages: |
---|
4776 | <P> |
---|
4777 | <DIV ALIGN="CENTER"> |
---|
4778 | <FONT SIZE=+2><A HREF="http://evolution.gs.washington.edu/phylip/software.html"> |
---|
4779 | <TT>http://evolution.gs.washington.edu/phylip/software.html</TT></FONT></A></DIV> |
---|
4780 | <P> |
---|
4781 | Here we will simply mention some of the major general-purpose programs. For |
---|
4782 | many more and much more, see those web pages. |
---|
4783 | <P> |
---|
4784 | <B>PAUP*</B> A comprehensive program with parsimony, likelihood, and |
---|
4785 | distance matrix methods. It competes with PHYLIP to be responsible for |
---|
4786 | the most trees published. Written by David Swofford and distributed by |
---|
4787 | Sinauer Associates of Sunderland, Massachusetts. |
---|
4788 | It is described in a web pages for |
---|
4789 | <A HREF="http://www.sinauer.com/detail.php?id=8060">the Macintosh version,</A> |
---|
4790 | <A HREF="http://www.sinauer.com/detail.php?id=8079">the Windows version,</A> |
---|
4791 | and |
---|
4792 | <A HREF="http://www.sinauer.com/detail.php?id=8044">the Unix/OpenVMS version.</A> |
---|
4793 | Current prices are $100 for the Macintosh version, $85 for the |
---|
4794 | Windows version, and $150 for Unix versions for many kinds of workstations. |
---|
4795 | <P> |
---|
4796 | <B>MacClade</B> An interactive Macintosh and PowerMac program to |
---|
4797 | rearrange trees and watch the changes in the fit of the trees to |
---|
4798 | data as judged by parsimony. MacClade has a great many features including |
---|
4799 | a spreadsheet data editor and many different descriptive statistics |
---|
4800 | for different kinds of data. It is particularly designed to export and |
---|
4801 | import data to and from PAUP*. |
---|
4802 | MacClade is available for $100 from Sinauer Associates, of Sunderland, |
---|
4803 | Massachusetts. It is described in a web page at |
---|
4804 | <A HREF="http://www.sinauer.com/detail.php?id=4707"> |
---|
4805 | <TT>http://www.sinauer.com/detail.php?id=4707</TT></A>. |
---|
4806 | MacClade is also described on its <A HREF="http://phylogeny.arizona.edu/macclade/macclade.html"> |
---|
4807 | Web page</A>, at <CODE>http://phylogeny.arizona.edu/macclade/macclade.html</CODE |
---|
4808 | >. |
---|
4809 | <P> |
---|
4810 | <B>MEGA</B> A Windows and DOS program by Sudhir Kumar of Arizona State University |
---|
4811 | (written together with Koichiro Tamura and Masatoshi Nei while he was a |
---|
4812 | student in Nei's lab at Pennsylvania |
---|
4813 | State University). It can carry out parsimony and distance matrix methods |
---|
4814 | for DNA sequence data. Version 2.1 for Windows |
---|
4815 | can be downloaded from <A HREF="http://www.megasoftware.net"> |
---|
4816 | the MEGA web site</A> |
---|
4817 | at <TT>http://www.megasoftware.net</TT>. |
---|
4818 | <P> |
---|
4819 | <B>PAML</B> Ziheng Yang of the Department of Genetics and Biometry at |
---|
4820 | University College, London has written this package of programs to |
---|
4821 | carry out likelihood analysis of DNA and protein sequence data. PAML is |
---|
4822 | particularly strong in the options for coping with variability of rates |
---|
4823 | of evolution from site to site, though it is less able than some other |
---|
4824 | packages to search effectively for the best tree. It is available as |
---|
4825 | C source code and as PowerMac and Windows executables from its web site at |
---|
4826 | <A HREF="http://abacus.gene.ucl.ac.uk/software/paml.html"> |
---|
4827 | <TT>http://abacus.gene.ucl.ac.uk/software/paml.html</TT></A>. |
---|
4828 | <P> |
---|
4829 | <B>TREE-PUZZLE</B> This package by Korbinian Strimmer and Arndt von Haeseler |
---|
4830 | was begun when they were at the Uviversität Munchen in Germany. |
---|
4831 | TREE-PUZZLE can carry out likelihood |
---|
4832 | methods for DNA and protein data, searching by the strategy of |
---|
4833 | "quartet puzzling" which they invented. It can also compute distances. |
---|
4834 | It superimposes trees estimated |
---|
4835 | from many quartets of species. TREE-PUZZLE is available for Unix, Macintoshes, |
---|
4836 | or Windows from their web site at |
---|
4837 | <A HREF="http://www.tree-puzzle.de/"><TT>http://www.tree-puzzle.de/</TT></A>. |
---|
4838 | <P> |
---|
4839 | <B>DAMBE</B> A package written by Xuhua Xia, then of the |
---|
4840 | Department of |
---|
4841 | Ecology and Biodiversity of the University of Hong Kong. |
---|
4842 | Its initials stand for Data Analysis in Molecular Biology and Evolution. |
---|
4843 | DAMBE is a general-purpose package for DNA and protein sequence phylogenies. |
---|
4844 | It can read and |
---|
4845 | convert a number of file formats, and has many features for |
---|
4846 | descriptive statistics, and can compute a number of commonly-used |
---|
4847 | distance matrix measures and infer phylogenies by parsimony, distance, |
---|
4848 | or likelihood methods, including bootstrapping and jackknifing. There are |
---|
4849 | a number of kinds of statistical tests of trees available and it |
---|
4850 | can also display phylogenies. DAMBE includes a copy of ClustalW as well; |
---|
4851 | DAMBE consists of Windows95 executables. It is available from its |
---|
4852 | web site at <A HREF="http://web.hku.hk/~xxia/software/software.htm"> |
---|
4853 | <CODE>http://web.hku.hk/~xxia/software/software.htm</CODE></A>. |
---|
4854 | Xia has now moved to the Department of Biology of the University of Ottawa, |
---|
4855 | Canada, and I suspect the DAMBE web site will soon follow him there. |
---|
4856 | <P> |
---|
4857 | <B>MOLPHY</B> A package of programs for carrying out likelihood analysis |
---|
4858 | of DNA and protein data, written by Jun Adachi and Masami Hasegawa of the |
---|
4859 | Institute of Statistical Mathematics in Tokyo, Japan. The source code |
---|
4860 | is available from them at |
---|
4861 | <A HREF="http://www.ism.ac.jp/software/ismlib/softother.e.html"> |
---|
4862 | the MOLPHY web site</A> at |
---|
4863 | <CODE>http://www.ism.ac.jp/software/ismlib/softother.e.html</CODE>, and |
---|
4864 | Windows executables are available from Russell Malmberg's web site at |
---|
4865 | <A HREF="http://dogwood.botany.uga.edu/malmberg/software.html"> |
---|
4866 | <TT>http://dogwood.botany.uga.edu/malmberg/software.html</TT></A>. |
---|
4867 | <P> |
---|
4868 | <B>Hennig86</B> A fast parsimony program by J. S. Farris of the |
---|
4869 | Naturhistoriska Riksmuseet in Stockholm, Sweden for discrete characters |
---|
4870 | data (it can handle DNA if its states are recoded to be digits). |
---|
4871 | Reputed to be faster than PAUP*. |
---|
4872 | The program is distributed as an executable and costs $50, plus $5 |
---|
4873 | mailing costs ($10 outside of of the U.S.). The user's name should be stated, |
---|
4874 | as copies are personalized as a copy-protection measure. It is |
---|
4875 | distributed by Arnold Kluge, Amphibians and Reptiles, Museum of Zoology, |
---|
4876 | University of |
---|
4877 | Michigan, Ann Arbor, Michigan 48109-1079, U.S.A. (<TT>akluge@umich.edu</TT>) and |
---|
4878 | by Diana Lipscomb at George Washington University (<TT>BIODL@gwuvm.gwu.edu</TT>). |
---|
4879 | <P> |
---|
4880 | <B>RnA</B> J. S. Farris's very fast program which uses parsimony |
---|
4881 | to carry out jackknifing resampling of DNA sequence data. This would be |
---|
4882 | nearly equivalent in properties to bootstrapping if the jackknifing were |
---|
4883 | sampling random halves of the data, but Farris prefers to have each |
---|
4884 | jackknife sample delete a fraction 1/<I>e</I> of the data, which will give |
---|
4885 | most groups too much support (he would disagree with this statement). |
---|
4886 | RnA is available from Arnold Kluge, Amphibians and Reptiles, Museum of Zoology, |
---|
4887 | University of |
---|
4888 | Michigan, Ann Arbor, Michigan 48109-1079, U.S.A. (<TT>akluge@umich.edu</TT>) |
---|
4889 | and Diana Lipscomb |
---|
4890 | at George Washington University (<TT>BIODL@gwuvm.gwu.edu</TT>) who may be |
---|
4891 | contacted for details. The cost is about $30 US. |
---|
4892 | <P> |
---|
4893 | <B>NONA</B> Pablo Goloboff, of the Instituto Miguel Lillo in |
---|
4894 | Tucuman, Argentina has written these very fast parsimony programs, capable |
---|
4895 | of some relevant forms of weighted parsimony, which can handle either |
---|
4896 | DNA sequence data or discrete characters. It is available as shareware |
---|
4897 | from <A HREF="http://www.cladistics.com/aboutNona.htm"> |
---|
4898 | <TT>http://www.cladistics.com/aboutNona.htm</TT></A> |
---|
4899 | There is a 30 day free trial, after which |
---|
4900 | NONA must be purchased separately by sending a check for $40.00 to |
---|
4901 | either directly to the the author, or to: James M. Carpenter, Attn: NONA, |
---|
4902 | Division of Invertebrate Zoology, American Museum of Natural History, |
---|
4903 | Central Park West at 79th Street, New York, NY 10024. |
---|
4904 | <P> |
---|
4905 | <B>TNT</B> This program, by Pablo Goloboff, J. S. Farris, and Kevin Nixon, |
---|
4906 | is for searching large data sets for most parsimonious trees. |
---|
4907 | The authors are respectively at the Instituto Miguel Lillo in Tucuman, |
---|
4908 | Argentina, the Naturhistoriska Riksmuseet in Stockholm, Sweden, and the |
---|
4909 | Hortorium, Cornell University, Ithaca, New York. |
---|
4910 | TNT is described |
---|
4911 | as faster than other methods, though not faster than NONA for small to |
---|
4912 | medium data sets. Its distribution status is somewhat uncertain. The site |
---|
4913 | <A HREF="http://www.cladistics.com/aboutTNT.html"> |
---|
4914 | <TT>http://www.cladistics.com/aboutTNT.html</TT></A> |
---|
4915 | describes it as unavailable, |
---|
4916 | while the web site <A HREF="http://www.cladistics.com/webtnt.html"> |
---|
4917 | <TT>http://www.cladistics.com/webtnt.html</TT></A> makes a beta version |
---|
4918 | available for download. The program downloaded is free but needs a password to |
---|
4919 | function, which the user should obtain from Pablo Goloboff (see the latter |
---|
4920 | web page for details). |
---|
4921 | <P> |
---|
4922 | These are only a few of the more than 194 different phylogeny packages that |
---|
4923 | are now available (as of January, 2001 - the number keeps increasing). The |
---|
4924 | others are described (and web links and ftp addresses provided) at my |
---|
4925 | Phylogeny Programs web pages at the address given above. |
---|
4926 | <P> |
---|
4927 | <A NAME="helpme"><HR><P></A> |
---|
4928 | <DIV ALIGN="CENTER"> |
---|
4929 | <H2>How You Can Help Me</H2></DIV> |
---|
4930 | <P> |
---|
4931 | Simply let me know of any problems you have had adapting the |
---|
4932 | programs to your computer. I can often make "transparent" changes that, by |
---|
4933 | making the code avoid the wilder, woolier, and less standard parts of |
---|
4934 | C, not only help others who have your machine but even improve the |
---|
4935 | chance of the programs functioning on new machines. I would like fairly |
---|
4936 | detailed information on what gave trouble, on what operating system, |
---|
4937 | machine, and (if relevant) compiler, and what had to be done to make the |
---|
4938 | programs work. I am sometimes able to do some over-the-telephone |
---|
4939 | trouble-shooting, particularly |
---|
4940 | if I don't have to pay for the call, but electronic mail is a the best |
---|
4941 | way for me to be asked about problems, as you can include your |
---|
4942 | input and output files so I can see what is going on (please do <EM>not</EM> |
---|
4943 | send them as Attachments, but as part of the body of a message). I'd really |
---|
4944 | like these programs to be |
---|
4945 | able to run with only routine changes on <I>absolutely everything</I>, down to |
---|
4946 | and possibly including the Amana Touchmatic Radarange Microwave Oven |
---|
4947 | which was an Intel 8080 system (in fact, early versions of this package did |
---|
4948 | run successfully on Intel 8080 systems running the CP/M operating system). |
---|
4949 | A PalmPilot version is contemplated too. |
---|
4950 | <P> |
---|
4951 | I would also like to know timings of programs from the package, when |
---|
4952 | run on the three test input files provided above, for various computer and |
---|
4953 | compiler combinations, so that I can provide this information in the |
---|
4954 | section on speeds of this document. |
---|
4955 | <P> |
---|
4956 | For the phylogeny plotting programs DRAWGRAM and DRAWTREE, |
---|
4957 | I am particularly interested in knowing what has to be done |
---|
4958 | to adapt them for other graphic file formats. |
---|
4959 | <P> |
---|
4960 | You can also be helpful to PHYLIP users in your part of the world by |
---|
4961 | helping them get the latest version of PHYLIP from our web site |
---|
4962 | and by helping them with any |
---|
4963 | problems they may have in getting PHYLIP working on their data. |
---|
4964 | <P> |
---|
4965 | Your help is appreciated. I am always happy to hear suggestions |
---|
4966 | for features and programs that ought to be incorporated in the package, |
---|
4967 | but please do not be upset if I turn out to have already considered the |
---|
4968 | particular possibility you suggest and decided against it. |
---|
4969 | <P> |
---|
4970 | <A NAME="trouble"><HR><P></A> |
---|
4971 | <DIV ALIGN="CENTER"> |
---|
4972 | <H2>In Case of Trouble</H2></DIV> |
---|
4973 | <P> |
---|
4974 | <I>Read The (documentation) Files Meticulously</I> ("RTFM"). If that doesn't solve the |
---|
4975 | problem, please check the Frequently Asked Questions web page at the |
---|
4976 | PHYLIP web site: |
---|
4977 | <P> |
---|
4978 | <FONT SIZE=+2> |
---|
4979 | <TT><A HREF="http://evolution.gs.washington.edu/phylip/faq.html"> |
---|
4980 | http://evolution.gs.washington.edu/phylip/faq.html</TT></A></FONT> |
---|
4981 | <P> |
---|
4982 | and the PHYLIP Bugs web page at that site: |
---|
4983 | <P> |
---|
4984 | <FONT SIZE=+2> |
---|
4985 | <TT><A HREF="http://evolution.gs.washington.edu/phylip/bugs.html"> |
---|
4986 | http://evolution.gs.washington.edu/phylip/bugs.html</TT></A></FONT> |
---|
4987 | <P> |
---|
4988 | If none of these answers your question, get in touch with me. My electronic mail address |
---|
4989 | is given below. If you do ask about a problem, please specify the program |
---|
4990 | name, version of the package, computer operating system, and |
---|
4991 | send me your data file so I can test the problem. Do <I>not</I> |
---|
4992 | send your data file as an e-mail Attachment but instead |
---|
4993 | as the body of a message. I read the e-mail on a Unix system, which makes |
---|
4994 | it impossible to read some formats of attachments without |
---|
4995 | running around to other machines and moving the files there. This |
---|
4996 | is one of my least favorite activities, so please do not use attachments. |
---|
4997 | Also it will help if you |
---|
4998 | have the relevant output and documentation files so that you |
---|
4999 | can refer to them in any correspondence. I can also be reached by telephone |
---|
5000 | by calling me in my office: |
---|
5001 | +1-(206)-543-0150, or at home: +1-(206)-526-9057 (how's <I>that</I> for user |
---|
5002 | support!). If I cannot be reached at either place, a message can be left at |
---|
5003 | the office of |
---|
5004 | the Department of Genome Sciences, (206)-221-7377 but I prefer strongly that I not |
---|
5005 | call you, as in any phone consultation the least you can do is pay the phone |
---|
5006 | bill. Better yet, use electronic mail. |
---|
5007 | <P> |
---|
5008 | Particularly if you are in a part of the world distant from me, you may also |
---|
5009 | want to try to get in touch with other users of PHYLIP nearby. I can also, |
---|
5010 | if requested, provide a list of nearby users. |
---|
5011 | <P> |
---|
5012 | <DIV ALIGN="RIGHT"> |
---|
5013 | <TABLE><TR><TD ALIGN=LEFT> |
---|
5014 | Joe Felsenstein<BR> |
---|
5015 | Department of Genome Sciences<BR> |
---|
5016 | University of Washington<BR> |
---|
5017 | Box 357730<BR> |
---|
5018 | Seattle, Washington 98195-7730, U.S.A. |
---|
5019 | </TD></TR></TABLE> |
---|
5020 | </DIV> |
---|
5021 | <P> |
---|
5022 | Electronic mail addresses: <TT>joe@gs.washington.edu</TT> |
---|
5023 | <BR><HR> |
---|
5024 | </BODY> |
---|
5025 | </HTML> |
---|