| 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> |
|---|
| 2 | <HTML> |
|---|
| 3 | <HEAD> |
|---|
| 4 | <TITLE>main</TITLE> |
|---|
| 5 | <META NAME="description" CONTENT="dnapars"> |
|---|
| 6 | <META NAME="keywords" CONTENT="dnapars"> |
|---|
| 7 | <META NAME="resource-type" CONTENT="document"> |
|---|
| 8 | <META NAME="distribution" CONTENT="global"> |
|---|
| 9 | <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> |
|---|
| 10 | </HEAD> |
|---|
| 11 | <BODY BGCOLOR="#ccffff"> |
|---|
| 12 | <P> |
|---|
| 13 | <DIV ALIGN=RIGHT> |
|---|
| 14 | version 3.6 |
|---|
| 15 | </DIV> |
|---|
| 16 | <P> |
|---|
| 17 | <DIV ALIGN=CENTER> |
|---|
| 18 | <H1>DNAPARS -- DNA Parsimony Program</H1> |
|---|
| 19 | </DIV> |
|---|
| 20 | <P> |
|---|
| 21 | © Copyright 1986-2002 by The University of |
|---|
| 22 | Washington. Written by Joseph Felsenstein. Permission is granted to copy |
|---|
| 23 | this document provided that no fee is charged for it and that this copyright |
|---|
| 24 | notice is not removed. |
|---|
| 25 | <P> |
|---|
| 26 | This program carries out unrooted parsimony (analogous to Wagner |
|---|
| 27 | trees) (Eck and Dayhoff, 1966; Kluge and Farris, 1969) on DNA |
|---|
| 28 | sequences. The method of Fitch (1971) is used to count the number of |
|---|
| 29 | changes of base needed on a given tree. |
|---|
| 30 | The assumptions of this method are analogous to those of MIX: |
|---|
| 31 | <OL> |
|---|
| 32 | <LI>Each site evolves independently. |
|---|
| 33 | <LI>Different lineages evolve independently. |
|---|
| 34 | <LI>The probability of a base substitution at a given site is |
|---|
| 35 | small over the lengths of time involved in |
|---|
| 36 | a branch of the phylogeny. |
|---|
| 37 | <LI>The expected amounts of change in different branches of the phylogeny |
|---|
| 38 | do not vary by so much that two changes in a high-rate branch |
|---|
| 39 | are more probable than one change in a low-rate branch. |
|---|
| 40 | <LI>The expected amounts of change do not vary enough among sites that two |
|---|
| 41 | changes in one site are more probable than one change in another. |
|---|
| 42 | </OL> |
|---|
| 43 | <P> |
|---|
| 44 | That these are the assumptions of parsimony methods has been documented |
|---|
| 45 | in a series of papers of mine: (1973a, 1978b, 1979, 1981b, 1983b, 1988b). For |
|---|
| 46 | an opposing view arguing that the parsimony methods make no substantive |
|---|
| 47 | assumptions such as these, see the papers by Farris (1983) and Sober (1983a, |
|---|
| 48 | 1983b, 1988), but also read the exchange between Felsenstein and Sober (1986). |
|---|
| 49 | <P> |
|---|
| 50 | Change from an occupied site to a deletion is counted as one |
|---|
| 51 | change. Reversion from a deletion to an occupied site is allowed and is also |
|---|
| 52 | counted as one change. Note that this in effect assumes that a deletion |
|---|
| 53 | N bases long is N separate events. |
|---|
| 54 | <P> |
|---|
| 55 | Dnapars can handle both bifurcating and multifurcating trees. In doing its |
|---|
| 56 | search for most parsimonious trees, it adds species not only by creating new |
|---|
| 57 | forks in the middle of existing branches, but it also tries putting them at |
|---|
| 58 | the end of new branches which are added to existing forks. Thus it searches |
|---|
| 59 | among both bifurcating and multifurcating trees. If a branch in a tree |
|---|
| 60 | does not have any characters which might change in that branch in the most |
|---|
| 61 | parsimonious tree, it does not save that tree. Thus in any tree that |
|---|
| 62 | results, a branch exists only if some character has a most parsimonious |
|---|
| 63 | reconstruction that would involve change in that branch. |
|---|
| 64 | <P> |
|---|
| 65 | It also saves a number of trees tied for best (you can alter the number |
|---|
| 66 | it saves using the V option in the menu). When rearranging trees, it |
|---|
| 67 | tries rearrangements of all of the saved trees. This makes the algorithm |
|---|
| 68 | slower than earlier versions of Dnapars. |
|---|
| 69 | <P> |
|---|
| 70 | The input data is standard. The first line of the input file contains the |
|---|
| 71 | number of species and the number of sites. |
|---|
| 72 | <P> |
|---|
| 73 | Next come the species data. Each |
|---|
| 74 | sequence starts on a new line, has a ten-character species name |
|---|
| 75 | that must be blank-filled to be of that length, followed immediately |
|---|
| 76 | by the species data in the one-letter code. The sequences must either |
|---|
| 77 | be in the "interleaved" or "sequential" formats |
|---|
| 78 | described in the Molecular Sequence Programs document. The I option |
|---|
| 79 | selects between them. The sequences can have internal |
|---|
| 80 | blanks in the sequence but there must be no extra blanks at the end of the |
|---|
| 81 | terminated line. Note that a blank is not a valid symbol for a deletion. |
|---|
| 82 | <P> |
|---|
| 83 | The options are selected using an interactive menu. The menu looks like this: |
|---|
| 84 | <P> |
|---|
| 85 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 86 | <PRE> |
|---|
| 87 | |
|---|
| 88 | DNA parsimony algorithm, version 3.6a3 |
|---|
| 89 | |
|---|
| 90 | Setting for this run: |
|---|
| 91 | U Search for best tree? Yes |
|---|
| 92 | S Search option? More thorough search |
|---|
| 93 | V Number of trees to save? 100 |
|---|
| 94 | J Randomize input order of sequences? No. Use input order |
|---|
| 95 | O Outgroup root? No, use as outgroup species 1 |
|---|
| 96 | T Use Threshold parsimony? No, use ordinary parsimony |
|---|
| 97 | N Use Transversion parsimony? No, count all steps |
|---|
| 98 | W Sites weighted? No |
|---|
| 99 | M Analyze multiple data sets? No |
|---|
| 100 | I Input sequences interleaved? Yes |
|---|
| 101 | 0 Terminal type (IBM PC, ANSI, none)? (none) |
|---|
| 102 | 1 Print out the data at start of run No |
|---|
| 103 | 2 Print indications of progress of run Yes |
|---|
| 104 | 3 Print out tree Yes |
|---|
| 105 | 4 Print out steps in each site No |
|---|
| 106 | 5 Print sequences at all nodes of tree No |
|---|
| 107 | 6 Write out trees onto tree file? Yes |
|---|
| 108 | |
|---|
| 109 | Y to accept these or type the letter for one to change |
|---|
| 110 | </PRE> |
|---|
| 111 | </TD></TR></TABLE> |
|---|
| 112 | <P> |
|---|
| 113 | The user either types "Y" (followed, of course, by a carriage-return) |
|---|
| 114 | if the settings shown are to be accepted, or the letter or digit corresponding |
|---|
| 115 | to an option that is to be changed. |
|---|
| 116 | <P> |
|---|
| 117 | The N option allows you to choose transversion parsimony, which counts only |
|---|
| 118 | transversions (changes between one of the purines A or G and one of the |
|---|
| 119 | pyrimidines C or T). This setting is turned off by default. |
|---|
| 120 | <P> |
|---|
| 121 | The Weights (W) option |
|---|
| 122 | takes the weights from a file whose default name is "weights". The weights |
|---|
| 123 | follow the format described in the main documentation file, with integer |
|---|
| 124 | weights from 0 to 35 allowed by using the characters 0, 1, 2, ..., 9 and |
|---|
| 125 | A, B, ... Z. |
|---|
| 126 | <P> |
|---|
| 127 | The User tree (option U) is read from a file whose default name is <TT>intree</TT>. |
|---|
| 128 | The trees can be multifurcating. They must be preceded in the file by a |
|---|
| 129 | line giving the number of trees in the file. |
|---|
| 130 | <P> |
|---|
| 131 | The options J, O, T, M, and 0 are the usual ones. They are described in the |
|---|
| 132 | main documentation file of this package. Option I is the same as in |
|---|
| 133 | other molecular sequence programs and is described in the documentation file |
|---|
| 134 | for the sequence programs. |
|---|
| 135 | <P> |
|---|
| 136 | The M (multiple data sets option) will ask you whether you want to |
|---|
| 137 | use multiple sets of weights (from the weights file) or multiple data sets. |
|---|
| 138 | The ability to use a single data set with multiple weights means that |
|---|
| 139 | much less disk space will be used for this input data. The bootstrapping |
|---|
| 140 | and jackknifing tool Seqboot has the ability to create a weights file with |
|---|
| 141 | multiple weights. |
|---|
| 142 | <P> |
|---|
| 143 | The O (outgroup) option will have no effect if the U (user-defined tree) |
|---|
| 144 | option is in effect. |
|---|
| 145 | The T (threshold) option allows a continuum of methods |
|---|
| 146 | between parsimony and compatibility. Thresholds less than or equal to 1.0 do |
|---|
| 147 | not have any meaning and should |
|---|
| 148 | not be used: they will result in a tree dependent only on the input |
|---|
| 149 | order of species and not at all on the data! |
|---|
| 150 | <P> |
|---|
| 151 | Output is standard: if option 1 is toggled on, the data is printed out, |
|---|
| 152 | with the convention that "." means "the same as in the first species". |
|---|
| 153 | Then comes a list of equally parsimonious trees. |
|---|
| 154 | Each tree has branch lengths. These are computed using an algorithm |
|---|
| 155 | published by Hochbaum and Pathria (1997) which I first heard of from |
|---|
| 156 | Wayne Maddison who invented it independently of them. This algorithm |
|---|
| 157 | averages the number of reconstructed changes of state over all sites a |
|---|
| 158 | over all possible most parsimonious placements of the changes of state |
|---|
| 159 | among branches. Note that it does not correct in any way for multiple |
|---|
| 160 | changes that overlay each other. |
|---|
| 161 | <P> |
|---|
| 162 | If option 2 is |
|---|
| 163 | toggled on a table of the |
|---|
| 164 | number of changes of state required in each character is also |
|---|
| 165 | printed. If option 5 is toggled |
|---|
| 166 | on, a table is printed |
|---|
| 167 | out after each tree, showing for each branch whether there are known to be |
|---|
| 168 | changes in the branch, and what the states are inferred to have been at the |
|---|
| 169 | top end of the branch. This is a reconstruction of the ancestral sequences |
|---|
| 170 | in the tree. If you choose option 5, a menu item D appears which gives you |
|---|
| 171 | the opportunity to turn off dot-differencing so that complete ancestral |
|---|
| 172 | sequences are shown. If the inferred state is a "?" or one of the IUB |
|---|
| 173 | ambiguity symbols, there will be multiple |
|---|
| 174 | equally-parsimonious assignments of states; the user must work these out for |
|---|
| 175 | themselves by hand. A "?" in the reconstructed states means that in |
|---|
| 176 | addition to one or more bases, a deletion may or may not be present. If |
|---|
| 177 | option 6 is left in its default state the trees |
|---|
| 178 | found will be written to a tree file, so that they are available to be used |
|---|
| 179 | in other programs. |
|---|
| 180 | <P> |
|---|
| 181 | If the U (User Tree) option is used and more than one tree is supplied, the |
|---|
| 182 | program also performs a statistical test of each of these trees against the |
|---|
| 183 | best tree. This test, which is a version of the test proposed by |
|---|
| 184 | Alan Templeton (1983) and evaluated in a test case by me (1985a). It is |
|---|
| 185 | closely parallel to a test using log likelihood differences |
|---|
| 186 | due to Kishino and Hasegawa (1989), and |
|---|
| 187 | uses the mean and variance of |
|---|
| 188 | step differences between trees, taken across sites. If the mean |
|---|
| 189 | is more than 1.96 standard deviations different then the trees are declared |
|---|
| 190 | significantly different. The program |
|---|
| 191 | prints out a table of the steps for each tree, the differences of |
|---|
| 192 | each from the best one, the variance of that quantity as determined by |
|---|
| 193 | the step differences at individual sites, and a conclusion as to |
|---|
| 194 | whether that tree is or is not significantly worse than the best one. |
|---|
| 195 | If the U (User Tree) option is used and more than one tree is supplied, |
|---|
| 196 | and the program is not told to assume autocorrelation between the |
|---|
| 197 | rates at different sites, the |
|---|
| 198 | program also performs a statistical test of each of these trees against the |
|---|
| 199 | one with highest likelihood. If there are two user trees, this |
|---|
| 200 | is a version of the test proposed by |
|---|
| 201 | Alan Templeton (1983) and evaluated in a test case by me (1985a). It is |
|---|
| 202 | closely parallel to a test using log likelihood differences |
|---|
| 203 | due to Kishino and Hasegawa (1989) |
|---|
| 204 | It uses the mean and variance of the differences in the number |
|---|
| 205 | of steps between trees, taken across sites. If the two |
|---|
| 206 | trees' means are more than 1.96 standard deviations different, |
|---|
| 207 | then the trees are |
|---|
| 208 | declared significantly different. |
|---|
| 209 | <P> |
|---|
| 210 | If there are more than two trees, the test done is an extension of |
|---|
| 211 | the KHT test, due to Shimodaira and Hasegawa (1999). They pointed out |
|---|
| 212 | that a correction for the number of trees was necessary, and they |
|---|
| 213 | introduced a resampling method to make this correction. In the version |
|---|
| 214 | used here the variances and covariances of the sums of steps across |
|---|
| 215 | sites are computed for all pairs of trees. To test whether the |
|---|
| 216 | difference between each tree and the best one is larger than could have |
|---|
| 217 | been expected if they all had the same expected number of steps, |
|---|
| 218 | numbers of steps for all trees are sampled with these covariances and equal |
|---|
| 219 | means (Shimodaira and Hasegawa's "least favorable hypothesis"), |
|---|
| 220 | and a P value is computed from the fraction of times the difference between |
|---|
| 221 | the tree's value and the lowest number of steps exceeds that actually |
|---|
| 222 | observed. Note that this sampling needs random numbers, and so the |
|---|
| 223 | program will prompt the user for a random number seed if one has not |
|---|
| 224 | already been supplied. With the two-tree KHT test no random numbers |
|---|
| 225 | are used. |
|---|
| 226 | <P> |
|---|
| 227 | In either the KHT or the SH test the program |
|---|
| 228 | prints out a table of the number of steps for each tree, the differences of |
|---|
| 229 | each from the lowest one, the variance of that quantity as determined by |
|---|
| 230 | the differences of the numbers of steps at individual sites, |
|---|
| 231 | and a conclusion as to |
|---|
| 232 | whether that tree is or is not significantly worse than the best one. |
|---|
| 233 | <P> |
|---|
| 234 | Option 6 in the menu controls whether the tree estimated by the program |
|---|
| 235 | is written onto a tree file. The default name of this output tree file |
|---|
| 236 | is "outtree". If the U option is in effect, all the user-defined |
|---|
| 237 | trees are written to the output tree file. |
|---|
| 238 | <P> |
|---|
| 239 | The program is a straightforward relative of MIX |
|---|
| 240 | and runs reasonably quickly, especially with many sites and few species. |
|---|
| 241 | <P> |
|---|
| 242 | <HR> |
|---|
| 243 | <P> |
|---|
| 244 | <H3>TEST DATA SET</H3> |
|---|
| 245 | <P> |
|---|
| 246 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 247 | <PRE> |
|---|
| 248 | 5 13 |
|---|
| 249 | Alpha AACGUGGCCAAAU |
|---|
| 250 | Beta AAGGUCGCCAAAC |
|---|
| 251 | Gamma CAUUUCGUCACAA |
|---|
| 252 | Delta GGUAUUUCGGCCU |
|---|
| 253 | Epsilon GGGAUCUCGGCCC |
|---|
| 254 | </PRE> |
|---|
| 255 | </TD></TR></TABLE> |
|---|
| 256 | <P> |
|---|
| 257 | <HR> |
|---|
| 258 | <P> |
|---|
| 259 | <H3>CONTENTS OF OUTPUT FILE (if all numerical options are on)</H3> |
|---|
| 260 | <P> |
|---|
| 261 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 262 | <PRE> |
|---|
| 263 | |
|---|
| 264 | DNA parsimony algorithm, version 3.6a3 |
|---|
| 265 | |
|---|
| 266 | 5 species, 13 sites |
|---|
| 267 | |
|---|
| 268 | |
|---|
| 269 | Name Sequences |
|---|
| 270 | ---- --------- |
|---|
| 271 | |
|---|
| 272 | Alpha AACGUGGCCA AAU |
|---|
| 273 | Beta ..G..C.... ..C |
|---|
| 274 | Gamma C.UU.C.U.. C.A |
|---|
| 275 | Delta GGUA.UU.GG CC. |
|---|
| 276 | Epsilon GGGA.CU.GG CCC |
|---|
| 277 | |
|---|
| 278 | |
|---|
| 279 | |
|---|
| 280 | One most parsimonious tree found: |
|---|
| 281 | |
|---|
| 282 | |
|---|
| 283 | +-----Epsilon |
|---|
| 284 | +----------------------------3 |
|---|
| 285 | +------------2 +-------Delta |
|---|
| 286 | | | |
|---|
| 287 | | +----------------Gamma |
|---|
| 288 | | |
|---|
| 289 | 1----Beta |
|---|
| 290 | | |
|---|
| 291 | +---------Alpha |
|---|
| 292 | |
|---|
| 293 | |
|---|
| 294 | requires a total of 19.000 |
|---|
| 295 | |
|---|
| 296 | between and length |
|---|
| 297 | ------- --- ------ |
|---|
| 298 | 1 2 0.217949 |
|---|
| 299 | 2 3 0.487179 |
|---|
| 300 | 3 Epsilon 0.096154 |
|---|
| 301 | 3 Delta 0.134615 |
|---|
| 302 | 2 Gamma 0.275641 |
|---|
| 303 | 1 Beta 0.076923 |
|---|
| 304 | 1 Alpha 0.173077 |
|---|
| 305 | |
|---|
| 306 | steps in each site: |
|---|
| 307 | 0 1 2 3 4 5 6 7 8 9 |
|---|
| 308 | *----------------------------------------- |
|---|
| 309 | 0| 2 1 3 2 0 2 1 1 1 |
|---|
| 310 | 10| 1 1 1 3 |
|---|
| 311 | |
|---|
| 312 | From To Any Steps? State at upper node |
|---|
| 313 | ( . means same as in the node below it on tree) |
|---|
| 314 | |
|---|
| 315 | 1 AABGTCGCCA AAY |
|---|
| 316 | 1 2 yes V.KD...... C.. |
|---|
| 317 | 2 3 yes GG.A..T.GG .C. |
|---|
| 318 | 3 Epsilon maybe ..G....... ..C |
|---|
| 319 | 3 Delta yes ..T..T.... ..T |
|---|
| 320 | 2 Gamma yes C.TT...T.. ..A |
|---|
| 321 | 1 Beta maybe ..G....... ..C |
|---|
| 322 | 1 Alpha yes ..C..G.... ..T |
|---|
| 323 | |
|---|
| 324 | |
|---|
| 325 | </PRE> |
|---|
| 326 | <P> |
|---|
| 327 | </TD></TR></TABLE> |
|---|
| 328 | </BODY> |
|---|
| 329 | </HTML> |
|---|
| 330 | |
|---|