| 1 | fastDNAml 1.2 |
|---|
| 2 | |
|---|
| 3 | |
|---|
| 4 | Gary J. Olsen, Department of Microbiology |
|---|
| 5 | University of Illinois, Urbana, IL |
|---|
| 6 | gary@phylo.life.uiuc.edu |
|---|
| 7 | |
|---|
| 8 | Ross Overbeek, Mathematics and Computer Science |
|---|
| 9 | Argonne National Laboratory, Argonne, IL |
|---|
| 10 | overbeek@mcs.anl.gov |
|---|
| 11 | |
|---|
| 12 | |
|---|
| 13 | |
|---|
| 14 | Citing fastDNAml |
|---|
| 15 | |
|---|
| 16 | If you publish work using fastDNAml, please cite the following publications: |
|---|
| 17 | |
|---|
| 18 | Olsen, G. J., Matsuda, H., Hagstrom, R., and Overbeek, R. 1994. fastDNAml: |
|---|
| 19 | A tool for construction of phylogenetic trees of DNA sequences using maximum |
|---|
| 20 | likelihood. Comput. Appl. Biosci. 10: 41-48. |
|---|
| 21 | |
|---|
| 22 | Felsenstein, J. 1981. Evolutionary trees from DNA sequences: |
|---|
| 23 | A maximum likelihood approach. J. Mol. Evol. 17: 368-376. |
|---|
| 24 | |
|---|
| 25 | |
|---|
| 26 | |
|---|
| 27 | What is fastDNAml |
|---|
| 28 | |
|---|
| 29 | fastDNAml is a program derived from Joseph Felsenstein's version 3.3 DNAML |
|---|
| 30 | (part of his PHYLIP package). Users should consult the documentation for |
|---|
| 31 | DNAML before using this program. |
|---|
| 32 | |
|---|
| 33 | fastDNAml is an attempt to solve the same problem as DNAML, but to do so |
|---|
| 34 | faster and using less memory, so that larger trees and/or more bootstrap |
|---|
| 35 | replicates become tractable. Much of fastDNAml is merely a recoding of the |
|---|
| 36 | PHYLIP 3.3 DNAML program from PASCAL to C. |
|---|
| 37 | |
|---|
| 38 | DNAML includes the following notice: |
|---|
| 39 | |
|---|
| 40 | version 3.3. (c) Copyright 1986, 1990 by the University of Washington and |
|---|
| 41 | Joseph Felsenstein. Written by Joseph Felsenstein. Permission is granted to |
|---|
| 42 | copy and use this program provided no fee is charged for it and provided that |
|---|
| 43 | this copyright notice is not removed. |
|---|
| 44 | |
|---|
| 45 | |
|---|
| 46 | |
|---|
| 47 | Why is fastDNAml faster? |
|---|
| 48 | |
|---|
| 49 | Some recomputation of values has been eliminated (Joe Felsenstein has done |
|---|
| 50 | much of this in version 3.4 DNAML). |
|---|
| 51 | |
|---|
| 52 | The optimization of branch lengths has been accelerated by changing from an EM |
|---|
| 53 | method to Newton's method (Joe Felsenstein has done much of this in version 3.4 |
|---|
| 54 | DNAML). |
|---|
| 55 | |
|---|
| 56 | The strategy for simultaneously optimizing all of the branches on the tree has |
|---|
| 57 | been modified to spend less time getting an individual branch right before |
|---|
| 58 | improving the other branches. |
|---|
| 59 | |
|---|
| 60 | |
|---|
| 61 | |
|---|
| 62 | Other new features in fastDNAml |
|---|
| 63 | |
|---|
| 64 | fastDNAml includes a checkpoint feature to regularly save its progress toward |
|---|
| 65 | finding a large tree. If the program is interrupted, a minor change to the |
|---|
| 66 | input file and adding the R (restart) option permits the work to be resumed |
|---|
| 67 | from the last checkpoint. |
|---|
| 68 | |
|---|
| 69 | The new R {restart) option can also be used for more rapid addition of new |
|---|
| 70 | sequences to a previously computed tree (when new sequences are added to the |
|---|
| 71 | alignment, it is best if the relative alignment of the previous sequences is |
|---|
| 72 | not altered). |
|---|
| 73 | |
|---|
| 74 | The G (global) option has been generalized to permit crossing any number of |
|---|
| 75 | branches during tree rearrangements. In addition, it is possible to modify |
|---|
| 76 | the extent of rearrangement explored during the sequential addition phase of |
|---|
| 77 | tree building. |
|---|
| 78 | |
|---|
| 79 | The G U (global and user tree) option combination instructs the program to |
|---|
| 80 | find the best of the user trees, and then look for rearrangements that are |
|---|
| 81 | better still. |
|---|
| 82 | |
|---|
| 83 | The number of available rate categories has been raised from 9 to 35. |
|---|
| 84 | |
|---|
| 85 | The weighting mask accepts values from 0 through 35. |
|---|
| 86 | |
|---|
| 87 | The new B (bootstrap) option causes generation of a bootstrap sample, drawn |
|---|
| 88 | from the input data. |
|---|
| 89 | |
|---|
| 90 | The program includes "P4" code for distributing the problem over multiple |
|---|
| 91 | processors (either within one machine, or across multiple machines). |
|---|
| 92 | |
|---|
| 93 | |
|---|
| 94 | |
|---|
| 95 | Do DNAML and fastDNAml give the same answer? |
|---|
| 96 | |
|---|
| 97 | Generally yes, though there are some reservations: |
|---|
| 98 | |
|---|
| 99 | One or the other might find a better tree due to minor changes in the ways |
|---|
| 100 | trees are searched. When sequence addition is replicated with different |
|---|
| 101 | values of the jumble random number seed, they have about the same probability |
|---|
| 102 | of finding the best tree, but any given seed might give different trees. |
|---|
| 103 | |
|---|
| 104 | The likelihoods and branch lengths sometimes differ very slightly due to |
|---|
| 105 | different criteria for stopping the optimization process. |
|---|
| 106 | |
|---|
| 107 | Little has been done to check the confidence limits on branch lengths. There |
|---|
| 108 | seem to be some instances in which they disagree, and we think that fastDNAml |
|---|
| 109 | is correct. However, do not take the "significantly greater than zero" too |
|---|
| 110 | seriously. |
|---|
| 111 | |
|---|
| 112 | If you are concerned, you can supply a tree inferred by fastDNAml as a user |
|---|
| 113 | tree to DNAML and let it (1) reoptimize branch lengths, (2) tell you |
|---|
| 114 | the confidence limits and (3) tell you the tree likelihood. |
|---|
| 115 | |
|---|
| 116 | |
|---|
| 117 | |
|---|
| 118 | Changes and new features in version 1.2 |
|---|
| 119 | |
|---|
| 120 | The program can now calculate the likelihood of extremely large user trees. |
|---|
| 121 | The largest tree we have tested had 3200 taxa. Generally, you will run out |
|---|
| 122 | of computer memory before you excede an intrinsic limitation. (With this, |
|---|
| 123 | it is possible to compare trees found by whatever your favorite methods are |
|---|
| 124 | under the likelihood criterion.) |
|---|
| 125 | |
|---|
| 126 | The computation has been changed to permit ease of implimenting new models |
|---|
| 127 | of evolution and analysis of amino acid sequences (though these have not yet |
|---|
| 128 | been done). This has slowed down the program 5-10%. |
|---|
| 129 | |
|---|
| 130 | |
|---|
| 131 | |
|---|
| 132 | Changes and new features in version 1.1 |
|---|
| 133 | |
|---|
| 134 | The quickadd option is now the default. This has the ugly effect of reversing |
|---|
| 135 | the meaning of putting a Q on the option line. (Sorry, about this, and the |
|---|
| 136 | next note, but in the long run it it is the better behavior.) |
|---|
| 137 | |
|---|
| 138 | Use of empirical base frequencies is now the default. This reverses the |
|---|
| 139 | meaning of the F option, making the default behavior more like that of PHYLIP. |
|---|
| 140 | |
|---|
| 141 | The tree output file is now generated by default and should be more compatible |
|---|
| 142 | with the files written and read by the PHILIP programs. In particular, the |
|---|
| 143 | comments with information about the tree, its likelihood, etc. are removed, and |
|---|
| 144 | there are no quotation marks around names unless there are unusual characters |
|---|
| 145 | within the name. (There are two things to be very careful about in names: |
|---|
| 146 | there is no completely consistent way to handle both blanks and underscores in |
|---|
| 147 | names without quotation marks, and when a name is spaced in from the margin in |
|---|
| 148 | the input file, there are leading blank spaces in the name, which can be very |
|---|
| 149 | hard to make compatible with some programs.) |
|---|
| 150 | |
|---|
| 151 | Maintaining a list of the several best trees, not just the (single) best. In |
|---|
| 152 | particular, when evaluating user-supplied trees, the program tries to same |
|---|
| 153 | information about all of the trees and provides a Hasegawa and Kashino type |
|---|
| 154 | test of whether each tree is better than optimum. Note, the current version |
|---|
| 155 | of the program prints the report in the order of tree likelihood, NOT in the |
|---|
| 156 | order the trees are supplied to the program. The best way (at present) to |
|---|
| 157 | figure out which tree is which is to look at the likelihoods. This is the |
|---|
| 158 | same test used in PHILIP, but I had removed access in version 1.0 of fastDNAml |
|---|
| 159 | due to differences in how the programs handle multiple trees. The difference |
|---|
| 160 | is that fastDNAml can maintain nearly optimal trees all the time, so you can |
|---|
| 161 | get a list of the N best trees found by using the new K option (below). |
|---|
| 162 | |
|---|
| 163 | The program should accept rooted trees (strictly bifurcating), as well as |
|---|
| 164 | unrooted trees (with a trifurcation at the deepest level). This is not fully |
|---|
| 165 | tested, but it seems to work. |
|---|
| 166 | |
|---|
| 167 | |
|---|
| 168 | |
|---|
| 169 | Features in the works |
|---|
| 170 | |
|---|
| 171 | Test subtree exchanges (as well as moving a single subtree) in the search for |
|---|
| 172 | better trees. |
|---|
| 173 | |
|---|
| 174 | Allowing the program to optimize any user-defined subset of branches when user |
|---|
| 175 | lengths are supplied. |
|---|
| 176 | |
|---|
| 177 | |
|---|
| 178 | |
|---|
| 179 | Input and Options |
|---|
| 180 | |
|---|
| 181 | |
|---|
| 182 | Basics |
|---|
| 183 | |
|---|
| 184 | The input to fastDNAml is similar to that used by DNAML (and the other PHYLIP |
|---|
| 185 | programs). The user should consult the PHYLIP documentation for a basic |
|---|
| 186 | description of the format. |
|---|
| 187 | |
|---|
| 188 | This version of fastDNAml expects to get its input from stdin (standard input) |
|---|
| 189 | and writes its output to stdout (standard output). (There are compile time |
|---|
| 190 | options to modify this, for those who care to get into such things.) |
|---|
| 191 | |
|---|
| 192 | On a UNIX or DOS system, it is a simple matter to redirect input from a file |
|---|
| 193 | and output to a file: |
|---|
| 194 | |
|---|
| 195 | fastDNAml < infile > outfile |
|---|
| 196 | |
|---|
| 197 | On a VMS system it is only slightly more difficult. Immediately before |
|---|
| 198 | running the program, one includes two commands that define the input and |
|---|
| 199 | output files: |
|---|
| 200 | |
|---|
| 201 | $ Define/User Sys$Input infile |
|---|
| 202 | $ Define/User Sys$Output outfile |
|---|
| 203 | $ Run fastDNAml |
|---|
| 204 | |
|---|
| 205 | The default input data format is Interleaved (see I option). To help get data |
|---|
| 206 | from a GenBank or similar format, the interleaved option can be switched off with the I option. Numbers in the sequence data (i.e., sequence position |
|---|
| 207 | numbers) will be ignored, so they need not be stripped out. |
|---|
| 208 | |
|---|
| 209 | (Note that the program also writes a file called checkpoint.PID. See the R |
|---|
| 210 | option below for more description.) |
|---|
| 211 | |
|---|
| 212 | |
|---|
| 213 | 1 -- Print Data |
|---|
| 214 | |
|---|
| 215 | By default, fastDNAml does not echo the sequence data to the output file. |
|---|
| 216 | Option 1 reverses this. |
|---|
| 217 | |
|---|
| 218 | |
|---|
| 219 | 3 -- Do Not Print Tree |
|---|
| 220 | |
|---|
| 221 | By default, fastDNAml prints the final tree to the output file. Option 3 |
|---|
| 222 | reverses this. |
|---|
| 223 | |
|---|
| 224 | |
|---|
| 225 | 4 -- Do Not Write Tree to File (***** Changed in version 1.1 *****) |
|---|
| 226 | |
|---|
| 227 | By default, fastDNAml versions 1.1 and 1.2 write a machine readable (Newick |
|---|
| 228 | format) copy of the final tree to an output file. Option 4 reverses this. |
|---|
| 229 | The tree output file will be called treefile.PID (where PID is the process ID |
|---|
| 230 | under which fastDNAml is running). Look at the Y option below for more |
|---|
| 231 | information on alternative tree formats. |
|---|
| 232 | |
|---|
| 233 | |
|---|
| 234 | B -- Bootstrap |
|---|
| 235 | |
|---|
| 236 | Generates a bootstrap sample of the input data. Requires auxiliary data line |
|---|
| 237 | of the form: |
|---|
| 238 | |
|---|
| 239 | B random_number_seed |
|---|
| 240 | |
|---|
| 241 | Example: |
|---|
| 242 | |
|---|
| 243 | 5 114 B |
|---|
| 244 | B 137 |
|---|
| 245 | Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG |
|---|
| 246 | ... |
|---|
| 247 | |
|---|
| 248 | If the W option is used, only positions that have nonzero weights are used in |
|---|
| 249 | computing the bootstrap sample. Warning: For a given random number seed, the |
|---|
| 250 | sample will always be the same. |
|---|
| 251 | |
|---|
| 252 | PHYLIP DNAML does not include a bootstrap option. (Use the SEQBOOT program.) |
|---|
| 253 | |
|---|
| 254 | |
|---|
| 255 | C -- Categories |
|---|
| 256 | |
|---|
| 257 | Requires auxiliary data of the form: |
|---|
| 258 | |
|---|
| 259 | C number_of_categories list_of_category_rates |
|---|
| 260 | |
|---|
| 261 | The maximum number of categories is 35. This line is followed by a list of |
|---|
| 262 | the rates for each site: |
|---|
| 263 | |
|---|
| 264 | Categories list_of_categories [per site, one or more lines] |
|---|
| 265 | |
|---|
| 266 | Category "numbers" are ordered: 1, 2, 3, ..., 9, A, B, ..., Y, Z. Category |
|---|
| 267 | zero (undefined rate) is permitted at sites with a zero in a user-supplied |
|---|
| 268 | weighting mask. |
|---|
| 269 | |
|---|
| 270 | Example: |
|---|
| 271 | |
|---|
| 272 | 5 114 C |
|---|
| 273 | C 12 0.0625 0.125 0.25 0.5 1 2 4 8 16 32 64 128 |
|---|
| 274 | Categories 5111136343678975AAA8949995566778888889AAAAAA9239898629AAAAA9 |
|---|
| 275 | 633792246624457364222574877188898132984963499AA9899975 |
|---|
| 276 | Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG |
|---|
| 277 | ... |
|---|
| 278 | |
|---|
| 279 | PHYLIP DNAML is limited to categories 1 through 9. Also, in PHYLIP version |
|---|
| 280 | 3.3, the categories data came after all the other auxiliary data, but before |
|---|
| 281 | the user-supplied base frequencies and sequence data. If you make the C line |
|---|
| 282 | your last auxiliary data line, the programs will behave the same. |
|---|
| 283 | |
|---|
| 284 | |
|---|
| 285 | F -- Empirical Frequencies (***** Changed in version 1.1 *****) |
|---|
| 286 | |
|---|
| 287 | By default (starting with version 1.1), the program uses base frequencies |
|---|
| 288 | derived from the sequence data (called emperical base frequencies). Therefore |
|---|
| 289 | the input file should normally NOT include a base frequencies line preceding |
|---|
| 290 | the data. If you want to include your own base freqency data, it is now |
|---|
| 291 | necessary to use the F option, and add a line to the input file that supplies |
|---|
| 292 | the frequency data: |
|---|
| 293 | |
|---|
| 294 | Instructs the program to use user-supllied base frequencies derived from the |
|---|
| 295 | sequence data. Therefore the input file should not include a base frequencies |
|---|
| 296 | line IMMEDIATELY preceding the data: |
|---|
| 297 | |
|---|
| 298 | 5 114 F |
|---|
| 299 | 0.25 0.30 0.20 0.25 |
|---|
| 300 | Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG |
|---|
| 301 | ... |
|---|
| 302 | |
|---|
| 303 | There is an alternative format: the frequencies can be anywhere in the list of |
|---|
| 304 | auxilliary data lines if they are preceded by an F in the first column: |
|---|
| 305 | |
|---|
| 306 | 5 114 F C W |
|---|
| 307 | F 0.25 0.30 0.20 0.25 |
|---|
| 308 | C ... |
|---|
| 309 | ... |
|---|
| 310 | W ... |
|---|
| 311 | Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG |
|---|
| 312 | ... |
|---|
| 313 | |
|---|
| 314 | |
|---|
| 315 | G -- Global |
|---|
| 316 | |
|---|
| 317 | If the global option is specified, there may also be an [optional] auxiliary |
|---|
| 318 | data line of form: |
|---|
| 319 | |
|---|
| 320 | G N1 |
|---|
| 321 | |
|---|
| 322 | or |
|---|
| 323 | |
|---|
| 324 | G N1 N2 |
|---|
| 325 | |
|---|
| 326 | N1 is the number of branches to cross in rearrangements of the completed tree. |
|---|
| 327 | The value of N2 is the number of branches to cross in testing rearrangements |
|---|
| 328 | during the sequential addition phase of tree inference. |
|---|
| 329 | |
|---|
| 330 | N1 = 1: local rearrangement (default without G option) |
|---|
| 331 | |
|---|
| 332 | 1 < N1 < numsp-3: regional rearrangements (crossing N1 branches) |
|---|
| 333 | |
|---|
| 334 | N1>= numsp-3: global rearrangements (default with G option) |
|---|
| 335 | |
|---|
| 336 | |
|---|
| 337 | |
|---|
| 338 | N2 <= N1 the default N2 is 1, local rearrangements. |
|---|
| 339 | |
|---|
| 340 | The G option can also be used to force branch swapping on user trees, that is, |
|---|
| 341 | a combination of G and U options. |
|---|
| 342 | |
|---|
| 343 | If the auxiliary line is supplied, it cannot be the last line of auxiliary |
|---|
| 344 | data. (It may be necessary to add the T option with an auxiliary data line of |
|---|
| 345 | |
|---|
| 346 | T 2.0 |
|---|
| 347 | |
|---|
| 348 | if no other auxiliary data are used.) |
|---|
| 349 | |
|---|
| 350 | Examples: |
|---|
| 351 | |
|---|
| 352 | Do local rearrangements after each addition, and global after last addition: |
|---|
| 353 | |
|---|
| 354 | 5 114 G |
|---|
| 355 | Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG |
|---|
| 356 | ... |
|---|
| 357 | |
|---|
| 358 | Do local rearrangements after each addition, and regional (crossing 4 |
|---|
| 359 | branches) after last addition: |
|---|
| 360 | |
|---|
| 361 | 5 114 G T |
|---|
| 362 | G 4 |
|---|
| 363 | T 2.0 |
|---|
| 364 | Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG |
|---|
| 365 | ... |
|---|
| 366 | |
|---|
| 367 | Do no rearrangements after each addition, and local after last addition: |
|---|
| 368 | |
|---|
| 369 | 5 114 G T |
|---|
| 370 | G 1 0 |
|---|
| 371 | T 2.0 |
|---|
| 372 | Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG |
|---|
| 373 | ... |
|---|
| 374 | |
|---|
| 375 | PHYLIP DNAML does not support the auxiliary data line or branch swapping on a |
|---|
| 376 | user tree. |
|---|
| 377 | |
|---|
| 378 | |
|---|
| 379 | I -- Not Interleaved |
|---|
| 380 | |
|---|
| 381 | By default, fastDNAml 1.2 expects data lines for the various sequences in an |
|---|
| 382 | interleaved format (as did PHYLIP 3.3 DNAML). The I option reverses the |
|---|
| 383 | expected format (to non-interleaved data, in which all the data lines for one |
|---|
| 384 | sequence before the next sequence begins). This is particularly useful for |
|---|
| 385 | editing a GenBank or equivalent format into a valid input file (note that |
|---|
| 386 | numbers within the sequence data are ignored, so it is not necessary to remove |
|---|
| 387 | them). |
|---|
| 388 | |
|---|
| 389 | If all the data for each sequence are on one line, then the interleaved and |
|---|
| 390 | non-interleaved formats are degenerate. (This is the way David Swofford's |
|---|
| 391 | PAUP program writes PHYLIP format output files.) The drawback is that many |
|---|
| 392 | programs do not handle long lines of text. This includes the vi and EDT text |
|---|
| 393 | editors, many electronic mail programs, and some versions of FTP for VAX/VMS |
|---|
| 394 | systems. |
|---|
| 395 | |
|---|
| 396 | PHYLIP 3.3 DNAML expects interleaved data, and does not include an I option to |
|---|
| 397 | alter this. PHYLIP 3.4 DNAML accepts an I option, but the default format is |
|---|
| 398 | reversed. |
|---|
| 399 | |
|---|
| 400 | |
|---|
| 401 | J -- Jumble |
|---|
| 402 | |
|---|
| 403 | Randomize the sequence addition order. Requires an auxiliary input line of |
|---|
| 404 | the form: |
|---|
| 405 | |
|---|
| 406 | J random_number_seed |
|---|
| 407 | |
|---|
| 408 | Example: |
|---|
| 409 | |
|---|
| 410 | 5 114 J |
|---|
| 411 | J 137 |
|---|
| 412 | Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG |
|---|
| 413 | ... |
|---|
| 414 | |
|---|
| 415 | Note that fastDNAml explores a very small number of alternative tree |
|---|
| 416 | topologies relative to a typical parsimony program. There is a very real |
|---|
| 417 | chance that the search procedure will not find the tree topology with the |
|---|
| 418 | highest likelihood. Altering the order of taxon addition and comparing the |
|---|
| 419 | trees found is a fairly efficient method for testing convergence. Typically, |
|---|
| 420 | it would be nice to find the same best tree at least twice (if not three |
|---|
| 421 | times), as opposed to simply performing some fixed number of jumbles and |
|---|
| 422 | hoping that at least one of them will be the optimum. |
|---|
| 423 | |
|---|
| 424 | |
|---|
| 425 | K -- Keep multiple best trees (***** New in version 1.1 *****) |
|---|
| 426 | |
|---|
| 427 | The program can keep a list of the best trees that it has found. When the |
|---|
| 428 | program is done, it prints a list of these, from best to worst, and print |
|---|
| 429 | a Hasegawa and Kishino type test as to which trees are significantly worse |
|---|
| 430 | than the best tree found. When evaluating user-supplied trees, the program |
|---|
| 431 | automatically keeps all trees. In other situations, the program keeps only |
|---|
| 432 | the best tree that it has found. The K option, and associate auxilliary data |
|---|
| 433 | line, can be used to define an alternative number: |
|---|
| 434 | |
|---|
| 435 | Example, to keep the 15 best trees found: |
|---|
| 436 | |
|---|
| 437 | 5 114 K |
|---|
| 438 | K 15 |
|---|
| 439 | Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG |
|---|
| 440 | ... |
|---|
| 441 | |
|---|
| 442 | Example, to keep only the one best tree of possibly numerous user-supplied |
|---|
| 443 | trees: |
|---|
| 444 | |
|---|
| 445 | 5 114 K U |
|---|
| 446 | K 1 |
|---|
| 447 | Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG |
|---|
| 448 | ... |
|---|
| 449 | |
|---|
| 450 | |
|---|
| 451 | |
|---|
| 452 | L -- User Lengths |
|---|
| 453 | |
|---|
| 454 | Causes user trees to be read with branch lengths (and it is an error to omit |
|---|
| 455 | any of them). Without the L option, branch lengths in user trees are not |
|---|
| 456 | required, and are ignored if present. |
|---|
| 457 | |
|---|
| 458 | Example: |
|---|
| 459 | |
|---|
| 460 | 5 114 U L |
|---|
| 461 | Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG |
|---|
| 462 | ... |
|---|
| 463 | |
|---|
| 464 | (The U is for user tree and the L for user lengths) |
|---|
| 465 | |
|---|
| 466 | |
|---|
| 467 | O -- Outgroup |
|---|
| 468 | |
|---|
| 469 | Use the specified sequence number for the outgroup. Requires an auxiliary |
|---|
| 470 | data line of the form: |
|---|
| 471 | |
|---|
| 472 | O outgroup_number |
|---|
| 473 | |
|---|
| 474 | Example: |
|---|
| 475 | |
|---|
| 476 | 5 114 O |
|---|
| 477 | O 5 |
|---|
| 478 | Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG |
|---|
| 479 | ... |
|---|
| 480 | |
|---|
| 481 | This option only affects the way the tree is drawn (and written to the |
|---|
| 482 | treefile). |
|---|
| 483 | |
|---|
| 484 | |
|---|
| 485 | |
|---|
| 486 | Q -- Quickadd (***** Changed in version 1.1 *****) |
|---|
| 487 | |
|---|
| 488 | The quickadd feature greatly decreases the time in initially placing a new |
|---|
| 489 | sequence in the growing tree (but does not change the time required to |
|---|
| 490 | subsequently test rearrangements). The overall time savings seems to be about |
|---|
| 491 | 30%, based on a number of test cases. Its downside, if any, is unknown. This |
|---|
| 492 | is now (starting in version 1.1) the default program behavior. |
|---|
| 493 | |
|---|
| 494 | If the analysis is run with a global option of "G 0 0", so that no |
|---|
| 495 | rearrangements are permitted, the tree is build very approximately, but very |
|---|
| 496 | quickly. This may be of greatest interest if the question is, "Where does |
|---|
| 497 | this one new sequence fit into this known tree? The known tree is provided |
|---|
| 498 | with the restart option (below). |
|---|
| 499 | |
|---|
| 500 | PHYLIP DNAML does not include anything comparable to the quickadd feature. |
|---|
| 501 | |
|---|
| 502 | The quickadd feature can be turned OFF by adding a Q to the first line of the |
|---|
| 503 | input file. |
|---|
| 504 | |
|---|
| 505 | |
|---|
| 506 | |
|---|
| 507 | R -- Restart |
|---|
| 508 | |
|---|
| 509 | The R option causes the program to read a user-supplied tree with less than |
|---|
| 510 | the full number of taxa as the starting point for sequential addition of the |
|---|
| 511 | remaining taxa. Thus, the sequence data must be followed by a valid (Newick |
|---|
| 512 | format) tree. (The phylip_tree/2, prolog fact format, is now also supported.) |
|---|
| 513 | |
|---|
| 514 | The restart option can also be used to increase the range of the search for |
|---|
| 515 | alternative (better) trees. For example, you can take a tree produced with |
|---|
| 516 | only "local" tree rearrangements, and increase the rearrangements to |
|---|
| 517 | "regional" or "global" by combining the appropriate global option with the |
|---|
| 518 | restart option. If the starting tree was written by fastDNAml, then the |
|---|
| 519 | extent of rearrangements is saved with the tree, and will be used as the |
|---|
| 520 | starting point for the additional search. If the tree was already globally |
|---|
| 521 | optimized, then no additional searching will be performed. |
|---|
| 522 | |
|---|
| 523 | To support the R option, after each taxon is added to the growing tree, and |
|---|
| 524 | after each round of rearrangements, the program appends a checkpoint tree to a |
|---|
| 525 | file called checkpoint.PID, where PID is the process number of the running |
|---|
| 526 | fastDNAml program. The last line of this file needs to be appended to the |
|---|
| 527 | input file when the R option is used. (This should not be confused with the U |
|---|
| 528 | (user tree) option, which expects a number followed by that number of trees. |
|---|
| 529 | No additional taxa are added to user trees.) |
|---|
| 530 | |
|---|
| 531 | The UNIX utility tail can be used to remove the last tree from the checkpoint |
|---|
| 532 | file, and the utility cat can be used to append it to the input. For example, |
|---|
| 533 | the following script can be used to add a starting tree and the R option to a |
|---|
| 534 | data file, and restart fastDNAml: |
|---|
| 535 | |
|---|
| 536 | #! /bin/sh |
|---|
| 537 | if test $# -ne 1 |
|---|
| 538 | then echo "Usage: restart checkpoint_file" |
|---|
| 539 | exit |
|---|
| 540 | fi |
|---|
| 541 | read first_line # first line of data file |
|---|
| 542 | echo "$first_line R" # add restart option |
|---|
| 543 | cat - # rest of data file |
|---|
| 544 | tail -1 $1 # append last tree in checkpoint file |
|---|
| 545 | |
|---|
| 546 | If this shell script is in the file called restart, then one might use the |
|---|
| 547 | command: |
|---|
| 548 | |
|---|
| 549 | restart checkpoint.21312 < infile | fastDNAml > new_outfile |
|---|
| 550 | ^script ^checkpoint tree ^data ^dnaml program ^output_file |
|---|
| 551 | |
|---|
| 552 | If this is too opaque, don't worry about it, or talk with your local unix |
|---|
| 553 | wizard. In the mean time, this and other useful shell scripts are provided |
|---|
| 554 | with the program. |
|---|
| 555 | |
|---|
| 556 | PHYLIP DNAML does not write checkpoint trees and does not have a restart |
|---|
| 557 | option. |
|---|
| 558 | |
|---|
| 559 | |
|---|
| 560 | |
|---|
| 561 | T -- Transition/transversion ratio |
|---|
| 562 | |
|---|
| 563 | Use a user-specified ratio of transition to transversion type substitutions. |
|---|
| 564 | Without the T option, a value of 2.0 is used. Requires an auxiliary data line |
|---|
| 565 | of the form: |
|---|
| 566 | |
|---|
| 567 | T ratio |
|---|
| 568 | |
|---|
| 569 | Example: |
|---|
| 570 | |
|---|
| 571 | 5 114 T |
|---|
| 572 | T 1.0 |
|---|
| 573 | Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG |
|---|
| 574 | ... |
|---|
| 575 | |
|---|
| 576 | (Note that a T option with a value of 2.0 does nothing, but it can provide |
|---|
| 577 | a last auxiliary data line following optional auxiliary data. See the |
|---|
| 578 | examples for G and Y.) |
|---|
| 579 | |
|---|
| 580 | |
|---|
| 581 | |
|---|
| 582 | U -- User Tree(s) |
|---|
| 583 | |
|---|
| 584 | Read an input line with the number of user-specified trees, followed by the |
|---|
| 585 | specified number of trees. These data immediately follow the sequence data. |
|---|
| 586 | |
|---|
| 587 | The trees must be in Newick format, and terminated with a semicolon. (The |
|---|
| 588 | program also accepts a pseudo_newick format, which is a valid prolog fact.) |
|---|
| 589 | |
|---|
| 590 | The tree reader in this program is more powerful than that in PHYLIP 3.3. In |
|---|
| 591 | particular, material enclosed in square brackets, [ like this ], is ignored as |
|---|
| 592 | comments; taxa names can be wrapped in single quotation marks to support the |
|---|
| 593 | inclusion of characters that would otherwise end the name (i.e., '(', ')', |
|---|
| 594 | ':', ';', '[', ']', ',' and ' '); names of internal nodes are properly |
|---|
| 595 | ignored; and exponential notation (such as 1.0E-6) for branch lengths is |
|---|
| 596 | supported. |
|---|
| 597 | |
|---|
| 598 | |
|---|
| 599 | |
|---|
| 600 | W -- Weights |
|---|
| 601 | |
|---|
| 602 | Read user-specified column weighting information. This option requires |
|---|
| 603 | auxiliary data of the form: |
|---|
| 604 | |
|---|
| 605 | Weights list_of_weight_values [per site, one or more lines] |
|---|
| 606 | |
|---|
| 607 | Example: |
|---|
| 608 | |
|---|
| 609 | 5 114 W |
|---|
| 610 | Weights 111111111111001100000100011111100000000000000110000110000000 |
|---|
| 611 | 111101111111111111111111011100000111001011100000000011 |
|---|
| 612 | Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG |
|---|
| 613 | ... |
|---|
| 614 | |
|---|
| 615 | It is necessary that the weight values not start before the 11'th character in |
|---|
| 616 | the line, or some of them will be lost. Weights from 0 to 35 are indicated by |
|---|
| 617 | the series: 0, 1, 2, 3, ..., 9, A, B, ..., Y, Z. |
|---|
| 618 | |
|---|
| 619 | PHYLIP DNAML does not support user weights with values other than 1 or 0. |
|---|
| 620 | This limit has been removed in fastDNAml to permit the use of user weights |
|---|
| 621 | as a mechanism for representing a bootstrap sample (that is, only the |
|---|
| 622 | auxiliary data lines change, not the body of the data file). |
|---|
| 623 | |
|---|
| 624 | |
|---|
| 625 | |
|---|
| 626 | Y -- Write Tree (***** Changed in version 1.1 *****) |
|---|
| 627 | |
|---|
| 628 | fastDNAml writes the final tree to an output file called treefile.PID. By |
|---|
| 629 | default the tree is in PHYLIP format. The Y option allows turning this off, |
|---|
| 630 | or changing the format of the tree. |
|---|
| 631 | |
|---|
| 632 | The Y option by itself toggles the saving of the tree, on or off. If there |
|---|
| 633 | is also an auxiliary input line of the form: |
|---|
| 634 | |
|---|
| 635 | Y number |
|---|
| 636 | |
|---|
| 637 | where number can be 1, 2, or 3, the number selects one of three tree output |
|---|
| 638 | formats: |
|---|
| 639 | |
|---|
| 640 | 1 Newick |
|---|
| 641 | 2 Prolog |
|---|
| 642 | 3 PHYLIP (default) |
|---|
| 643 | |
|---|
| 644 | Newick is the tree standard used by PAUP, MacClade, and serveral other |
|---|
| 645 | programs. The tree includes a comment about the analysis that the tree is |
|---|
| 646 | based upon. fastDNAml uses this comment when it reads a tree. In addition, |
|---|
| 647 | the names of the taxa are enclosed in quotation marks. Both of these |
|---|
| 648 | features of the file make it incompatible with the PHYLIP package. |
|---|
| 649 | |
|---|
| 650 | PHYLIP is the subset of the Newick tree standard used by programs in the |
|---|
| 651 | PHYLIP package. There are no comments and no quotations marks around names. |
|---|
| 652 | (If a name includes unusual characters, such as a comma, fastDNAml will put |
|---|
| 653 | it in quotation marks, making it a valid tree, but it cannot be read by the |
|---|
| 654 | PHYLIP programs.) |
|---|
| 655 | |
|---|
| 656 | The Prolog format very similar to the Newick format, but it is a valid prolog |
|---|
| 657 | fact that permits direct loading into some sequence analysis tools that we |
|---|
| 658 | use. The structure of the term is: |
|---|
| 659 | |
|---|
| 660 | pseudo_newick([Comment], (Subtree1, Subtree2, Subtree3): Length). |
|---|
| 661 | |
|---|
| 662 | where each subtree is either |
|---|
| 663 | |
|---|
| 664 | (Subtree1,Subtree2): Length |
|---|
| 665 | |
|---|
| 666 | or |
|---|
| 667 | |
|---|
| 668 | Label: Length |
|---|
| 669 | |
|---|
| 670 | The comment is a valid prolog term when && is defined as a unary operator. |
|---|
| 671 | Label is a prolog atom (it is a valid Newick label, with single quotation |
|---|
| 672 | marks). Length is a number. |
|---|
| 673 | |
|---|
| 674 | Because the Y auxiliary input line is optional, it cannot be the last auxiliary |
|---|
| 675 | data line. |
|---|
| 676 | |
|---|
| 677 | Examples. To turn of the saving of the tree, |
|---|
| 678 | |
|---|
| 679 | 5 114 Y |
|---|
| 680 | Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG |
|---|
| 681 | ... |
|---|
| 682 | |
|---|
| 683 | or, to change the output to the full Newick format, |
|---|
| 684 | |
|---|
| 685 | 5 114 Y T |
|---|
| 686 | Y 1 |
|---|
| 687 | T 2.0 |
|---|
| 688 | Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG |
|---|
| 689 | ... |
|---|
| 690 | |
|---|
| 691 | PHYLIP DNAML does not append the PID (process ID) to the tree file name and |
|---|
| 692 | does not support the full Newick standard or the prolog format output. |
|---|
| 693 | |
|---|
| 694 | ============================================================================= |
|---|
| 695 | |
|---|
| 696 | Acknowledgements: |
|---|
| 697 | |
|---|
| 698 | The origin and development of fastDNAml as a program to extend the use of |
|---|
| 699 | maximum likelihood phylogenetic inference to larger sets of DNA sequences |
|---|
| 700 | was encouraged by Carl Woese. Through the development and evolution of the |
|---|
| 701 | program, Joseph Felsenstein has been extremely helpful and encouraging. |
|---|
| 702 | |
|---|
| 703 | Numerous users have made suggestions and/or reported program bugs: |
|---|
| 704 | |
|---|
| 705 | Gary Nunn |
|---|
| 706 | Tom Schmidt |
|---|
| 707 | Ross Overbeek |
|---|
| 708 | Hideo Matsuda |
|---|
| 709 | Mitchell Sogin |
|---|
| 710 | Brenden Rielly |
|---|
| 711 | |
|---|
| 712 | ============================================================================= |
|---|
| 713 | |
|---|
| 714 | Examples: |
|---|
| 715 | |
|---|
| 716 | Data file with empirical frequencies (generic analysis) (notice that blank |
|---|
| 717 | lines are permitted in the data): |
|---|
| 718 | |
|---|
| 719 | 5 114 |
|---|
| 720 | Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG |
|---|
| 721 | Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG |
|---|
| 722 | Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG |
|---|
| 723 | Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG |
|---|
| 724 | Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG |
|---|
| 725 | |
|---|
| 726 | AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG |
|---|
| 727 | AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG |
|---|
| 728 | AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG |
|---|
| 729 | ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG |
|---|
| 730 | ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG |
|---|
| 731 | |
|---|
| 732 | |
|---|
| 733 | Data file with empirical frequencies and a random addition order: |
|---|
| 734 | |
|---|
| 735 | 5 114 J |
|---|
| 736 | J 137 |
|---|
| 737 | Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG |
|---|
| 738 | Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG |
|---|
| 739 | Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG |
|---|
| 740 | Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG |
|---|
| 741 | Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG |
|---|
| 742 | |
|---|
| 743 | AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG |
|---|
| 744 | AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG |
|---|
| 745 | AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG |
|---|
| 746 | ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG |
|---|
| 747 | ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG |
|---|
| 748 | |
|---|
| 749 | |
|---|
| 750 | Data file with empirical frequencies and a bootstrap resampling: |
|---|
| 751 | |
|---|
| 752 | 5 114 B |
|---|
| 753 | B 137 |
|---|
| 754 | Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG |
|---|
| 755 | Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG |
|---|
| 756 | Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG |
|---|
| 757 | Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG |
|---|
| 758 | Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG |
|---|
| 759 | |
|---|
| 760 | AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG |
|---|
| 761 | AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG |
|---|
| 762 | AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG |
|---|
| 763 | ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG |
|---|
| 764 | ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG |
|---|
| 765 | |
|---|
| 766 | |
|---|
| 767 | Data with weighting mask and rate categories: |
|---|
| 768 | |
|---|
| 769 | 5 114 W C |
|---|
| 770 | Weights 111111111111001100000100011111100000000000000110000110000000 |
|---|
| 771 | 111101111111111111111111011100000111001011100000000011 |
|---|
| 772 | C 10 0.0625 0.125 0.25 0.5 1 2 4 8 16 32 |
|---|
| 773 | Categories 5111136343678975AAA8949995566778888889AAAAAA9239898629AAAAA9 |
|---|
| 774 | 633792246624457364222574877188898132984963499AA9899975 |
|---|
| 775 | Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG |
|---|
| 776 | Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG |
|---|
| 777 | Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG |
|---|
| 778 | Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG |
|---|
| 779 | Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG |
|---|
| 780 | |
|---|
| 781 | AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG |
|---|
| 782 | AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG |
|---|
| 783 | AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG |
|---|
| 784 | ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG |
|---|
| 785 | ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG |
|---|
| 786 | |
|---|
| 787 | |
|---|
| 788 | Data with three user-specified tree branching orders: |
|---|
| 789 | |
|---|
| 790 | 5 114 U |
|---|
| 791 | Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG |
|---|
| 792 | Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG |
|---|
| 793 | Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG |
|---|
| 794 | Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG |
|---|
| 795 | Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG |
|---|
| 796 | |
|---|
| 797 | AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG |
|---|
| 798 | AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG |
|---|
| 799 | AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG |
|---|
| 800 | ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG |
|---|
| 801 | ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG |
|---|
| 802 | 3 |
|---|
| 803 | (Sequence1,(Sequence2,Sequence3),(Sequence4,Sequence5)); |
|---|
| 804 | (Sequence2,(Sequence1,Sequence3),(Sequence4,Sequence5)); |
|---|
| 805 | (Sequence3,(Sequence1,Sequence2),(Sequence4,Sequence5)); |
|---|
| 806 | |
|---|
| 807 | |
|---|
| 808 | Data with transition/transversion ratio and base frequencies to |
|---|
| 809 | simulate Jukes & Cantor model: |
|---|
| 810 | |
|---|
| 811 | 5 114 T F |
|---|
| 812 | T 0.501 |
|---|
| 813 | F 0.25 0.25 0.25 0.25 |
|---|
| 814 | Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG |
|---|
| 815 | Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG |
|---|
| 816 | Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG |
|---|
| 817 | Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG |
|---|
| 818 | Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG |
|---|
| 819 | |
|---|
| 820 | AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG |
|---|
| 821 | AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG |
|---|
| 822 | AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG |
|---|
| 823 | ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG |
|---|
| 824 | ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG |
|---|
| 825 | |
|---|
| 826 | |
|---|
| 827 | Non-interleaved data: |
|---|
| 828 | |
|---|
| 829 | 5 114 I |
|---|
| 830 | Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG |
|---|
| 831 | AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG |
|---|
| 832 | Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG |
|---|
| 833 | AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG |
|---|
| 834 | Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG |
|---|
| 835 | AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG |
|---|
| 836 | Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG |
|---|
| 837 | ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG |
|---|
| 838 | Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG |
|---|
| 839 | ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG |
|---|
| 840 | |
|---|
| 841 | |
|---|
| 842 | Non-interleaved data by editing a GenBank format (make sure that the names are |
|---|
| 843 | padded to at least ten characters with blanks): |
|---|
| 844 | |
|---|
| 845 | 5 114 I |
|---|
| 846 | Sequence1 |
|---|
| 847 | 1 ACACGGTGTC GTATCATGCT GCAGGATGCT AGACTGCGTC ANATGTTCGT ACTAACTGTG |
|---|
| 848 | 61 AGCTCGATGA TCGGTGACGT AGACTCAGGG GCCATGCCGC GAGTTTGCGA TGCG |
|---|
| 849 | Sequence2 |
|---|
| 850 | 1 ACGCGGTGTC GTGTCATGCT ACATTATGCT AGACTGCGTC GGATGCTCGT ATTGACTGCG |
|---|
| 851 | 61 AGCACGGTGA TCAATGACGT AGNCTCAGGR TCCACGCCGT GACTTTGTGA TNCG |
|---|
| 852 | Sequence3 |
|---|
| 853 | 1 ACGCGGTGCC GTGTNATGCT GCATTATGCT CGACTGCGRC GGATGCTAGT ATTGACTGCG |
|---|
| 854 | 61 AGCACGATGA CCGATGACGT AGACTGAGGG TCCGTGCCGC GACTTTGTGA TGCG |
|---|
| 855 | Sequence4 |
|---|
| 856 | 1 ACGCGCTGCC GTGTCATCCT ACACGATGCY AGACAGCGTC AGCTGCTAGT ACTGGCTGAG |
|---|
| 857 | 61 ACCTCGGTGA TTGATGACGT AGACTGCGGG TCCATGCCGC GATTTTGCGR TGCG |
|---|
| 858 | Sequence5 |
|---|
| 859 | 1 ACGCGCTGTC GTGTCATACT GCAGGATGCT AGACTGCGTC AGCTGCTAGT ACTGGCTGAG |
|---|
| 860 | 61 ACCTCGATGC TCGATGACGT AGACTGCGGG TCCATGCCGT GATTTTGCGA TGCG |
|---|
| 861 | |
|---|
| 862 | |
|---|
| 863 | Data analysis restarted from a four-taxon tree (which happens to be wrong, |
|---|
| 864 | but it will be corrected by local rearrangements after the tree is read): |
|---|
| 865 | |
|---|
| 866 | 5 114 R |
|---|
| 867 | Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG |
|---|
| 868 | Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG |
|---|
| 869 | Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG |
|---|
| 870 | Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG |
|---|
| 871 | Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG |
|---|
| 872 | |
|---|
| 873 | AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG |
|---|
| 874 | AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG |
|---|
| 875 | AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG |
|---|
| 876 | ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG |
|---|
| 877 | ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG |
|---|
| 878 | (Sequence4:0.1,Sequence2:0.1,(Sequence1:0.1,Sequence5:0.1):0.1):0.0; |
|---|
| 879 | |
|---|
| 880 | |
|---|
| 881 | Data analysis restarted from a four-taxon tree (which is wrong, and which |
|---|
| 882 | will not be corrected after the tree is read due to the suppression of all |
|---|
| 883 | rearrangements by the global 0 0 option): |
|---|
| 884 | |
|---|
| 885 | 5 114 R G T |
|---|
| 886 | G 0 0 |
|---|
| 887 | T 2.0 |
|---|
| 888 | Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG |
|---|
| 889 | Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG |
|---|
| 890 | Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG |
|---|
| 891 | Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG |
|---|
| 892 | Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG |
|---|
| 893 | |
|---|
| 894 | AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG |
|---|
| 895 | AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG |
|---|
| 896 | AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG |
|---|
| 897 | ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG |
|---|
| 898 | ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG |
|---|
| 899 | (Sequence4:0.1,Sequence2:0.1,(Sequence1:0.1,Sequence5:0.1):0.1):0.0; |
|---|