| 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> |
|---|
| 2 | <HTML> |
|---|
| 3 | <HEAD> |
|---|
| 4 | <TITLE>restml</TITLE> |
|---|
| 5 | <META NAME="description" CONTENT="restml"> |
|---|
| 6 | <META NAME="keywords" CONTENT="restml"> |
|---|
| 7 | <META NAME="resource-type" CONTENT="document"> |
|---|
| 8 | <META NAME="distribution" CONTENT="global"> |
|---|
| 9 | <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> |
|---|
| 10 | </HEAD> |
|---|
| 11 | <BODY BGCOLOR="#ccffff"> |
|---|
| 12 | <DIV ALIGN=RIGHT> |
|---|
| 13 | version 3.6 |
|---|
| 14 | </DIV> |
|---|
| 15 | <P> |
|---|
| 16 | <DIV ALIGN=CENTER> |
|---|
| 17 | <H1>RESTML -- Restriction sites Maximum Likelihood program</H1> |
|---|
| 18 | </DIV> |
|---|
| 19 | <P> |
|---|
| 20 | © Copyright 1986-2000 by The University of |
|---|
| 21 | Washington. Written by Joseph Felsenstein. Permission is granted to copy |
|---|
| 22 | this document provided that no fee is charged for it and that this copyright |
|---|
| 23 | notice is not removed. |
|---|
| 24 | <P> |
|---|
| 25 | This program implements a maximum likelihood method for restriction sites |
|---|
| 26 | data (not restriction fragment data). This program is one of the slowest |
|---|
| 27 | programs in this package, and can be very tedious to run. It is |
|---|
| 28 | possible to have the program search for the maximum likelihood tree. |
|---|
| 29 | It will be more practical for some users (those that do not have |
|---|
| 30 | fast machines) to use the U (User Tree) |
|---|
| 31 | option, which takes less run time, optimizing branch lengths and computing |
|---|
| 32 | likelihoods for particular tree topologies suggested by the user. The |
|---|
| 33 | model used here is essentially identical to that used by Smouse and Li (1987) |
|---|
| 34 | who give explicit expressions for computing the likelihood for three-species |
|---|
| 35 | trees. It does not place prior probabilities on trees as they do. The present |
|---|
| 36 | program extends their approach to multiple species by |
|---|
| 37 | a technique which, while it does not give explicit expressions for likelihoods, |
|---|
| 38 | does enable their computation and the iterative improvement of branch |
|---|
| 39 | lengths. It also allows for multiple restriction enzymes. The algorithm |
|---|
| 40 | has been described in a paper (Felsenstein, 1992). Another relevant |
|---|
| 41 | paper is that of DeBry and Slade (1985). |
|---|
| 42 | <P> |
|---|
| 43 | The assumptions of the present model are: |
|---|
| 44 | <P> |
|---|
| 45 | <OL> |
|---|
| 46 | <LI>Each restriction site evolves independently. |
|---|
| 47 | <LI>Different lineages evolve independently. |
|---|
| 48 | <LI>Each site undergoes substitution at an expected rate which we specify. |
|---|
| 49 | <LI>Substitutions consist of replacement of a nucleotide by one of the |
|---|
| 50 | other three nucleotides, chosen at random. |
|---|
| 51 | </OL> |
|---|
| 52 | <P> |
|---|
| 53 | Note that if the existing base is, say, an A, the chance of it being |
|---|
| 54 | replaced by a G is 1/3, and so is the chance that it is replaced by |
|---|
| 55 | a T. This means that there can be no difference in the (expected) |
|---|
| 56 | rate of transitions and transversions. Users who are upset at this |
|---|
| 57 | might ponder the fact that a version allowing different rates of |
|---|
| 58 | transitions and transversions would run an estimated 16 times |
|---|
| 59 | slower. If it also allowed for unequal frequencies of the four bases, |
|---|
| 60 | it would run about 300,000 times slower! For the moment, until a better |
|---|
| 61 | method is available, I guess I'll stick with this one! |
|---|
| 62 | <P> |
|---|
| 63 | <H2>INPUT FORMAT AND OPTIONS</H2> |
|---|
| 64 | <P> |
|---|
| 65 | Subject to these assumptions, the program is an approximately |
|---|
| 66 | correct maximum likelihood method. The |
|---|
| 67 | input is fairly standard, with one addition. As usual the first line of the |
|---|
| 68 | file gives the number of species and the number of sites, but there is also |
|---|
| 69 | a third number, which is the number of different restriction enzymes that were |
|---|
| 70 | used to detect the restriction sites. Thus a data set with 10 species and |
|---|
| 71 | 35 different sites, representing digestion with 4 different enzymes, would |
|---|
| 72 | have the first line of the data file look like this: |
|---|
| 73 | <P> |
|---|
| 74 | <PRE> |
|---|
| 75 | 10 35 4 |
|---|
| 76 | </PRE> |
|---|
| 77 | <P> |
|---|
| 78 | The first line of the data file will also contain a letter W following |
|---|
| 79 | these numbers (and separated from them by a space) if the Weights option |
|---|
| 80 | is being used. As with all programs using the weights option, a line or lines |
|---|
| 81 | must then follow, before the data, with the weights for each site. |
|---|
| 82 | <P> |
|---|
| 83 | The site data are in standard form. Each species starts with a species name |
|---|
| 84 | whose maximum length is given by the constant "nmlngth" |
|---|
| 85 | (whose value in the |
|---|
| 86 | program as distributed is 10 characters). The name should, as usual, be padded |
|---|
| 87 | out to that length with blanks if necessary. The sites data then follows, one |
|---|
| 88 | character per site (any blanks will |
|---|
| 89 | be skipped and ignored). Like the DNA and protein sequence data, the |
|---|
| 90 | restriction sites data may be either in the "interleaved" form or the |
|---|
| 91 | "sequential" form. Note that if you are analyzing restriction sites |
|---|
| 92 | data with the programs DOLLOP or MIX or other discrete character |
|---|
| 93 | programs, at the moment those programs do not use the "aligned" or |
|---|
| 94 | "interleaved" data format. Therefore you may want to avoid that format |
|---|
| 95 | when you have restriction sites data that you will want to feed into |
|---|
| 96 | those programs. |
|---|
| 97 | <P> |
|---|
| 98 | The presence of a site is indicated by a "+" and the absence by a "-". I have |
|---|
| 99 | also allowed the use of "1" and "0" as synonyms for "+" and "-", for |
|---|
| 100 | compatibility with MIX and DOLLOP which do not allow "+" and "-". If the |
|---|
| 101 | presence of |
|---|
| 102 | the site is unknown (for example, if the DNA containing it has been deleted so |
|---|
| 103 | that one |
|---|
| 104 | does not know whether it would have contained the site) then the state "?" can |
|---|
| 105 | be used to indicate that the state of this site is unknown. |
|---|
| 106 | <P> |
|---|
| 107 | User-defined trees may follow the |
|---|
| 108 | data in the usual way. The trees must be unrooted, which means that at their |
|---|
| 109 | base they must have a trifurcation. |
|---|
| 110 | <P> |
|---|
| 111 | The options are selected by a menu, which looks like this: |
|---|
| 112 | <P> |
|---|
| 113 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 114 | <PRE> |
|---|
| 115 | |
|---|
| 116 | Restriction site Maximum Likelihood method, version 3.6 |
|---|
| 117 | |
|---|
| 118 | Settings for this run: |
|---|
| 119 | U Search for best tree? Yes |
|---|
| 120 | A Are all sites detected? No |
|---|
| 121 | S Speedier but rougher analysis? Yes |
|---|
| 122 | G Global rearrangements? No |
|---|
| 123 | J Randomize input order of sequences? No. Use input order |
|---|
| 124 | L Site length? 6 |
|---|
| 125 | O Outgroup root? No, use as outgroup species 1 |
|---|
| 126 | M Analyze multiple data sets? No |
|---|
| 127 | I Input sequences interleaved? Yes |
|---|
| 128 | 0 Terminal type (IBM PC, ANSI, none)? (none) |
|---|
| 129 | 1 Print out the data at start of run No |
|---|
| 130 | 2 Print indications of progress of run Yes |
|---|
| 131 | 3 Print out tree Yes |
|---|
| 132 | 4 Write out trees onto tree file? Yes |
|---|
| 133 | |
|---|
| 134 | Y to accept these or type the letter for one to change |
|---|
| 135 | </PRE> |
|---|
| 136 | </TD></TR></TABLE> |
|---|
| 137 | <P> |
|---|
| 138 | The U, J, O, M, and 0 options are the usual ones, described in the main |
|---|
| 139 | documentation file. The user trees for option U are read from a file whose |
|---|
| 140 | default name is <TT>intree</TT>. The I option selects between Interleaved and |
|---|
| 141 | Sequential input data formats, and is described in the documentation file for |
|---|
| 142 | the molecular sequences programs. |
|---|
| 143 | <P> |
|---|
| 144 | The G |
|---|
| 145 | (global search) option causes, after the last species is added to the tree, |
|---|
| 146 | each possible group to be removed and re-added. This improves the result, |
|---|
| 147 | since the position of every species is reconsidered. It approximately triples |
|---|
| 148 | the run-time of the program. |
|---|
| 149 | <P> |
|---|
| 150 | The two options specific to this program are the A, and L options. The L |
|---|
| 151 | (Length) option allows the user to specify the length in bases of the |
|---|
| 152 | restriction sites. Allowed values are 1 to 8 (the |
|---|
| 153 | constant "maxcutter" in file <TT>phylip.h</TT> |
|---|
| 154 | controls the maximum allowed value). At the |
|---|
| 155 | moment the program assumes that all sites have the same length (for example, |
|---|
| 156 | that all enzymes are 6-base-cutters). The default value for this parameter is |
|---|
| 157 | 6, which will be used if the L option is not invoked. A desirable future |
|---|
| 158 | development for the package would be allowing the L parameter to be different |
|---|
| 159 | for every site. It would also be desirable to allow for ambiguities in the |
|---|
| 160 | recognition site, since some enzymes recognize 2 or 4 sequences. Both of these |
|---|
| 161 | would require fairly complicated programming or else slower execution times. |
|---|
| 162 | <P> |
|---|
| 163 | The A (All) option specifies that all sites are detected, even those for which |
|---|
| 164 | all of the species have the recognition sequence absent (character state |
|---|
| 165 | "-"). The default condition is that it is assumed that such sites will not |
|---|
| 166 | occur in the data. The likelihood computed when the A option is not used is |
|---|
| 167 | the probability of the pattern of sites given that tree and conditional on the |
|---|
| 168 | pattern not being all absences. This will be realistic for most data, except |
|---|
| 169 | for cases in which the data are extracted from sites data for a larger number |
|---|
| 170 | of species, in which case some of the site positions could have all absences in |
|---|
| 171 | the subset of species. In such cases an effective way of analyzing the data |
|---|
| 172 | would be to omit those sites and not use the A option, as such positions, even |
|---|
| 173 | if not absolutely excluded, are nevertheless less likely than random to have |
|---|
| 174 | been incorporated in the data set. |
|---|
| 175 | <P> |
|---|
| 176 | The W (Weights) option, which is invoked in the input file rather than in |
|---|
| 177 | the menu, allows the user to select a subset of sites to |
|---|
| 178 | be analyzed. It is invoked in the usual way, except that only weights |
|---|
| 179 | 0 and 1 are allowed. If the W option is not used, all sites will be |
|---|
| 180 | analyzed. If the Weights option is used, there must be a W in the first |
|---|
| 181 | line of the input file. |
|---|
| 182 | <P> |
|---|
| 183 | <H2>OUTPUT FORMAT</H2> |
|---|
| 184 | <P> |
|---|
| 185 | The output starts by giving the number of species, and the number of sites. If |
|---|
| 186 | the default condition is used instead of the A option the program states that |
|---|
| 187 | it is assuming that sites absent in all species have been omitted. The value |
|---|
| 188 | of the site length (6 bases, for example) is also given. |
|---|
| 189 | <P> |
|---|
| 190 | If option 2 (print out data) has been selected, |
|---|
| 191 | there then follow the restriction site sequences, printed in |
|---|
| 192 | groups of ten sites. The trees found are printed as an unrooted |
|---|
| 193 | tree topology (possibly rooted by outgroup if so requested). The |
|---|
| 194 | internal nodes are numbered arbitrarily for the sake of |
|---|
| 195 | identification. The number of trees evaluated so far and the log |
|---|
| 196 | likelihood of the tree are also given. |
|---|
| 197 | <P> |
|---|
| 198 | A table is printed |
|---|
| 199 | showing the length of each tree segment, as well as (very) rough confidence |
|---|
| 200 | limits on the length. As with DNAML, if a confidence limit is |
|---|
| 201 | negative, this indicates that rearrangement of the tree in that region |
|---|
| 202 | is not excluded, while if both limits are positive, rearrangement is |
|---|
| 203 | still not necessarily excluded because the variance calculation on which |
|---|
| 204 | the confidence limits are based results in an underestimate, which makes |
|---|
| 205 | the confidence limits too narrow. |
|---|
| 206 | <P> |
|---|
| 207 | In addition to the confidence limits, |
|---|
| 208 | the program performs a crude Likelihood Ratio Test (LRT) for each |
|---|
| 209 | branch of the tree. The program computes the ratio of likelihoods with and |
|---|
| 210 | without this branch length forced to zero length. This done by comparing the |
|---|
| 211 | likelihoods changing only that branch length. A truly correct LRT would |
|---|
| 212 | force that branch length to zero and also allow the other branch lengths to |
|---|
| 213 | adjust to that. The result would be a likelihood ratio closer to 1. Therefore |
|---|
| 214 | the present LRT will err on the side of being too significant. |
|---|
| 215 | <P> |
|---|
| 216 | One should also |
|---|
| 217 | realize that if you are looking not at a previously-chosen branch but at all |
|---|
| 218 | branches, that you are seeing the results of multiple tests. With 20 tests, |
|---|
| 219 | one is expected to reach significance at the P = .05 level purely by |
|---|
| 220 | chance. You should therefore use a much more conservative significance level, |
|---|
| 221 | such as .05 divided by the number of tests. The significance of these tests |
|---|
| 222 | is shown by printing asterisks next to |
|---|
| 223 | the confidence interval on each branch length. It is important to keep |
|---|
| 224 | in mind that both the confidence limits and the tests |
|---|
| 225 | are very rough and approximate, and probably indicate more significance than |
|---|
| 226 | they should. Nevertheless, maximum likelihood is one of the few methods that |
|---|
| 227 | can give you any indication of its own error; most other methods simply fail to |
|---|
| 228 | warn the user that there is any error! (In fact, whole philosophical schools |
|---|
| 229 | of taxonomists exist whose main point seems to be that there isn't any |
|---|
| 230 | error, that the "most parsimonious" tree is the best tree by definition and |
|---|
| 231 | that's that). |
|---|
| 232 | <P> |
|---|
| 233 | The log likelihood printed out with the final tree can be used to perform |
|---|
| 234 | various likelihood ratio tests. Remember that testing one tree topology |
|---|
| 235 | against another is not a simple matter, because two different tree topologies |
|---|
| 236 | are not hypotheses that are |
|---|
| 237 | nested one within the other. If the trees differ by only one branch |
|---|
| 238 | swap, it seems to be conservative to test the difference between their |
|---|
| 239 | likelihoods with one degree of freedom, but other than that little is known and |
|---|
| 240 | more work on this is needed. |
|---|
| 241 | <P> |
|---|
| 242 | If the U (User Tree) option is used and more than one tree is supplied, |
|---|
| 243 | and the program is not told to assume autocorrelation between the |
|---|
| 244 | rates at different sites, the |
|---|
| 245 | program also performs a statistical test of each of these trees against the |
|---|
| 246 | one with highest likelihood. If there are two user trees, the test |
|---|
| 247 | done is one which is due to Kishino and Hasegawa (1989), a version |
|---|
| 248 | of a test originally introduced by Templeton (1983). In this |
|---|
| 249 | implementation it uses the mean and variance of |
|---|
| 250 | log-likelihood differences between trees, taken across sites. If the two |
|---|
| 251 | trees' means are more than 1.96 standard deviations different |
|---|
| 252 | then the trees are |
|---|
| 253 | declared significantly different. This use of the empirical variance of |
|---|
| 254 | log-likelihood differences is more robust and nonparametric than the |
|---|
| 255 | classical likelihood ratio test, and may to some extent compensate for the |
|---|
| 256 | any lack of realism in the model underlying this program. |
|---|
| 257 | <P> |
|---|
| 258 | If there are more than two trees, the test done is an extension of |
|---|
| 259 | the KHT test, due to Shimodaira and Hasegawa (1999). They pointed out |
|---|
| 260 | that a correction for the number of trees was necessary, and they |
|---|
| 261 | introduced a resampling method to make this correction. In the version |
|---|
| 262 | used here the variances and covariances of the sum of log likelihoods across |
|---|
| 263 | sites are computed for all pairs of trees. To test whether the |
|---|
| 264 | difference between each tree and the best one is larger than could have |
|---|
| 265 | been expected if they all had the same expected log-likelihood, |
|---|
| 266 | log-likelihoods for all trees are sampled with these covariances and equal |
|---|
| 267 | means (Shimodaira and Hasegawa's "least favorable hypothesis"), |
|---|
| 268 | and a P value is computed from the fraction of times the difference between |
|---|
| 269 | the tree's value and the highest log-likelihood exceeds that actually |
|---|
| 270 | observed. Note that this sampling needs random numbers, and so the |
|---|
| 271 | program will prompt the user for a random number seed if one has not |
|---|
| 272 | already been supplied. With the two-tree KHT test no random numbers |
|---|
| 273 | are used. |
|---|
| 274 | <P> |
|---|
| 275 | In either the KHT or the SH test the program |
|---|
| 276 | prints out a table of the log-likelihoods of each tree, the differences of |
|---|
| 277 | each from the highest one, the variance of that quantity as determined by |
|---|
| 278 | the log-likelihood differences at individual sites, and a conclusion as to |
|---|
| 279 | whether that tree is or is not significantly worse than the best one. |
|---|
| 280 | <P> |
|---|
| 281 | The branch lengths printed out are scaled in terms of expected numbers of |
|---|
| 282 | base substitutions, not counting replacements of a base by itself. Of course, |
|---|
| 283 | when a branch is twice as long this does not mean that there will be twice as |
|---|
| 284 | much net change expected along it, since some of the changes occur in the same |
|---|
| 285 | site and overlie or even reverse each other. Confidence limits on the branch |
|---|
| 286 | lengths are also given. Of course a negative value of the branch length is |
|---|
| 287 | meaningless, and a confidence limit overlapping zero simply means that the |
|---|
| 288 | branch length is not necessarily significantly different from zero. Because of |
|---|
| 289 | limitations of the numerical algorithm, branch length estimates of zero will |
|---|
| 290 | often print out as small numbers such as 0.00001. If you see a branch length |
|---|
| 291 | that small, it is really estimated to be of zero length. |
|---|
| 292 | <P> |
|---|
| 293 | Another possible source of confusion is the existence of negative values for |
|---|
| 294 | the log likelihood. This is not really a problem; the log likelihood is not a |
|---|
| 295 | probability but the logarithm of a probability, and since probabilities never |
|---|
| 296 | exceed 1.0 this logarithm will typically be negative. The log likelihood is |
|---|
| 297 | maximized by being made more positive: -30.23 is worse than -29.14. The log |
|---|
| 298 | likelihood will not always be negative since a combinatorial constant has been |
|---|
| 299 | left out of the expression for the likelihood. This does not affect the tree |
|---|
| 300 | found or the likelihood ratios (or log likelihood differences) between trees. |
|---|
| 301 | <P> |
|---|
| 302 | <H2>THE ALGORITHM</H2> |
|---|
| 303 | <P> |
|---|
| 304 | The program uses a Newton-Raphson algorithm to update one branch length at a |
|---|
| 305 | time. This is faster than the EM algorithm which was described in my |
|---|
| 306 | paper on restriction sites maximum likelihood (Felsenstein, 1992). The |
|---|
| 307 | likelihood that is being |
|---|
| 308 | maximized is the same one used by Smouse and Li (1987) extended for multiple |
|---|
| 309 | species. |
|---|
| 310 | moving down on the likelihood surface. You may have to "tune" the value of |
|---|
| 311 | extrapol to suit your data. |
|---|
| 312 | <P> |
|---|
| 313 | <H2>PROGRAM CONSTANTS</H2> |
|---|
| 314 | <P> |
|---|
| 315 | The constants include "maxcutter" (set in <TT>phylip.h</TT>), |
|---|
| 316 | the maximum length of an enzyme |
|---|
| 317 | recognition site. The memory used by the program will be approximately |
|---|
| 318 | proportional to this value, which is 8 in the distribution copy. |
|---|
| 319 | The program also uses constants |
|---|
| 320 | "iterations" and "smoothings", and decreasing "epsilon". Reducing |
|---|
| 321 | "iterations" and "smoothings" or increasing "epsilon" |
|---|
| 322 | will result in faster execution but a worse result. These values will |
|---|
| 323 | not usually have to be changed. |
|---|
| 324 | <P> |
|---|
| 325 | The program spends most of its time doing real arithmetic. The algorithm, with |
|---|
| 326 | separate and independent computations |
|---|
| 327 | occurring at each site, lends itself readily to parallel processing. |
|---|
| 328 | <P> |
|---|
| 329 | A feature of the algorithm is that it saves time by recognizing sites at which |
|---|
| 330 | the pattern of presence/absence is the same, and does that computation only |
|---|
| 331 | once. Thus if we have only four species but a large number of sites, there are |
|---|
| 332 | only about (ignoring ambiguous bases) 16 different patterns of presence/absence |
|---|
| 333 | (2 x 2 x 2 x 2) that can occur. The program automatically counts |
|---|
| 334 | occurrences of each and does the computation for each pattern only once, so |
|---|
| 335 | that it only needs to do as much computation as would be needed with at most |
|---|
| 336 | 16 sites, even though the number of sites is actually much larger. Thus |
|---|
| 337 | the program will run very effectively with few species and many sites. |
|---|
| 338 | <P> |
|---|
| 339 | <H2>PAST AND FUTURE OF THE PROGRAM</H2> |
|---|
| 340 | <P> |
|---|
| 341 | This program was developed by modifying DNAML version 3.1 and also adding |
|---|
| 342 | some of the modifications that were added to DNAML version 3.2, with which |
|---|
| 343 | it shares many of its data structures and much of its strategy. Version |
|---|
| 344 | 3.6 changed from EM iterations of branch lengths, which involved arbitrary |
|---|
| 345 | extrapolation factors, to the Newton-Raphson algorithm, which improved the |
|---|
| 346 | speed of the program (though only from "very slow" to "slow"). |
|---|
| 347 | <P> |
|---|
| 348 | There are a number of obvious directions in which the program needs to be |
|---|
| 349 | modified in the future. Extension to allow for different rates of transition |
|---|
| 350 | and transversion is straightforward, but would slow down the program |
|---|
| 351 | considerably, as I have mentioned above. I have not included in the program |
|---|
| 352 | any provision for saving and printing out |
|---|
| 353 | multiple trees tied for highest likelihood, in part because an exact tie is |
|---|
| 354 | unlikely. |
|---|
| 355 | <P> |
|---|
| 356 | <HR> |
|---|
| 357 | <P> |
|---|
| 358 | <H3>TEST DATA SET</H3> |
|---|
| 359 | <P> |
|---|
| 360 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 361 | <PRE> |
|---|
| 362 | 5 13 2 |
|---|
| 363 | Alpha ++-+-++--+++- |
|---|
| 364 | Beta ++++--+--+++- |
|---|
| 365 | Gamma -+--+-++-+-++ |
|---|
| 366 | Delta ++-+----++--- |
|---|
| 367 | Epsilon ++++----++--- |
|---|
| 368 | </PRE> |
|---|
| 369 | </TD></TR></TABLE> |
|---|
| 370 | <P> |
|---|
| 371 | <HR> |
|---|
| 372 | <P> |
|---|
| 373 | <H3>CONTENTS OF OUTPUT FILE (if all numerical options are on)</H3> |
|---|
| 374 | <P> |
|---|
| 375 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 376 | <PRE> |
|---|
| 377 | |
|---|
| 378 | Restriction site Maximum Likelihood method, version 3.6 |
|---|
| 379 | |
|---|
| 380 | 5 Species, 13 Sites, 2 Enzymes |
|---|
| 381 | |
|---|
| 382 | Recognition sequences all 6 bases long |
|---|
| 383 | |
|---|
| 384 | Sites absent from all species are assumed to have been omitted |
|---|
| 385 | |
|---|
| 386 | |
|---|
| 387 | Name Sites |
|---|
| 388 | ---- ----- |
|---|
| 389 | |
|---|
| 390 | Alpha ++-+-++--+ ++- |
|---|
| 391 | Beta ++++--+--+ ++- |
|---|
| 392 | Gamma -+--+-++-+ -++ |
|---|
| 393 | Delta ++-+----++ --- |
|---|
| 394 | Epsilon ++++----++ --- |
|---|
| 395 | |
|---|
| 396 | |
|---|
| 397 | |
|---|
| 398 | |
|---|
| 399 | |
|---|
| 400 | +----Gamma |
|---|
| 401 | | |
|---|
| 402 | | +Epsilon |
|---|
| 403 | | +--3 |
|---|
| 404 | 1--2 +Delta |
|---|
| 405 | | | |
|---|
| 406 | | +Beta |
|---|
| 407 | | |
|---|
| 408 | +Alpha |
|---|
| 409 | |
|---|
| 410 | |
|---|
| 411 | remember: this is an unrooted tree! |
|---|
| 412 | |
|---|
| 413 | Ln Likelihood = -40.34358 |
|---|
| 414 | |
|---|
| 415 | |
|---|
| 416 | Between And Length Approx. Confidence Limits |
|---|
| 417 | ------- --- ------ ------- ---------- ------ |
|---|
| 418 | 1 Gamma 0.10813 ( 0.01154, 0.21901) ** |
|---|
| 419 | 1 2 0.01156 ( zero, 0.04578) |
|---|
| 420 | 2 3 0.05885 ( zero, 0.12697) ** |
|---|
| 421 | 3 Epsilon 0.00100 ( zero, 0.00617) |
|---|
| 422 | 3 Delta 0.01460 ( zero, 0.05036) |
|---|
| 423 | 2 Beta 0.00100 ( zero, infinity) |
|---|
| 424 | 1 Alpha 0.01310 ( zero, 0.04806) |
|---|
| 425 | |
|---|
| 426 | * = significantly positive, P < 0.05 |
|---|
| 427 | ** = significantly positive, P < 0.01 |
|---|
| 428 | |
|---|
| 429 | |
|---|
| 430 | </PRE> |
|---|
| 431 | </TD></TR></TABLE> |
|---|
| 432 | </BODY> |
|---|
| 433 | </HTML> |
|---|