source: trunk/GDE/PHYLIP/doc/promlk.html

Last change on this file was 2176, checked in by westram, 21 years ago

* empty log message *

  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
File size: 29.8 KB
Line 
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
2<HTML>
3<HEAD>
4<TITLE>dnamlk</TITLE>
5<META NAME="description" CONTENT="dnamlk">
6<META NAME="keywords" CONTENT="dnamlk">
7<META NAME="resource-type" CONTENT="document">
8<META NAME="distribution" CONTENT="global">
9<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
10</HEAD>
11<BODY BGCOLOR="#ccffff">
12<DIV ALIGN=RIGHT>
13version 3.6
14</DIV>
15<P>
16<DIV ALIGN=CENTER>
17<H1>ProMLK -- Protein maximum likelihood program<BR>with molecular clock</H1>
18</DIV>
19<P>
20&#169; Copyright 2000-2002 by the University of
21Washington.  Written by Joseph Felsenstein.  Permission is granted to copy
22this document provided that no fee is charged for it and that this copyright
23notice is not removed.
24<P>
25This program implements the maximum likelihood method for protein amino
26acid sequences under the constraint that the trees estimated must be
27consistent with a molecular clock.  The molecular clock is the
28assumption that the tips of the tree are all equidistant, in branch
29length, from its root.  This program is indirectly related to PROML.
30It uses the Dayhoff probability model of change between amino acids.
31Its algorithmic details are not yet published, but many of them are
32similar to DNAMLK.
33<P>
34The assumptions of the model are:
35<OL>
36<LI>Each position in the sequence evolves independently.
37<LI>Different lineages evolve independently.
38<LI>Each position undergoes substitution at an expected rate which is
39chosen from a series of rates (each with a probability of occurrence)
40which we specify.
41<LI>All relevant positions are included in the sequence, not just those that
42have changed or those that are "phylogenetically informative".
43<LI>The probabilities of change between amino acids are given by the
44model of Dayhoff (Dayhoff and Eck, 1968; Dayhoff et. al., 1979).
45</OL>
46<P>
47Note the assumption that we are looking at all positions, including those
48that have not changed at all.  It is important not to restrict attention
49to some positions based on whether or not they have changed; doing that
50would bias branch lengths by making them too long, and that in turn
51would cause the method to misinterpret the meaning of those positions that
52had changed.
53<P>
54This program uses a Hidden Markov Model (HMM)
55method of inferring different rates of evolution at different amino acid
56positions.  This
57was described in a paper by me and Gary Churchill (1996).  It allows us to
58specify to the program that there will be
59a number of different possible evolutionary rates, what the prior
60probabilities of occurrence of each is, and what the average length of a
61patch of positions all having the same rate.  The rates can also be chosen
62by the program to approximate a Gamma distribution of rates, or a
63Gamma distribution plus a class of invariant positions.  The program computes the
64likelihood by summing it over all possible assignments of rates to positions,
65weighting each by its prior probability of occurrence.
66<P>
67For example, if we have used the C and A options (described below) to specify
68that there are three possible rates of evolution,  1.0, 2.4, and 0.0,
69that the prior probabilities of a position having these rates are 0.4, 0.3, and
700.3, and that the average patch length (number of consecutive positions
71with the same rate) is 2.0, the program will sum the likelihood over
72all possibilities, but giving less weight to those that (say) assign all
73positions to rate 2.4, or that fail to have consecutive positions that have the
74same rate.
75<P>
76The Hidden Markov Model framework for rate variation among positions
77was independently developed by Yang (1993, 1994, 1995).  We have
78implemented a general scheme for a Hidden Markov Model of
79rates; we allow the rates and their prior probabilities to be specified
80arbitrarily  by the user, or by a discrete approximation to a Gamma
81distribution of rates (Yang, 1995), or by a mixture of a Gamma
82distribution and a class of invariant positions.
83<P>
84This feature effectively removes the artificial assumption that all positions
85have the same rate, and also means that we need not know in advance the
86identities of the positions that have a particular rate of evolution.
87<P>
88Another layer of rate variation also is available.  The user can assign
89categories of rates to each positions (for example, we might want
90amino acid positions in the active site of a protein to change more slowly
91than other positions.  This is done with the categories input file and the
92C option.  We then specify (using the menu) the relative rates of evolution of
93amino acid positions
94in the different categories.  For example, we might specify that positions
95in  the active site
96evolve at relative rates of 0.2 compared to 1.0 at other positions.  If we
97are assuming that a particular position maintains a cysteine bridge to another,
98we may want to put it in a category of positions (including perhaps the
99initial position of the protein sequence which maintains methionine) which
100changes at a rate of 0.0.
101<P>
102If both user-assigned rate categories and Hidden Markov Model rates
103are allowed, the program assumes that the
104actual rate at a position is the product of the user-assigned category rate
105and the Hidden Markov Model regional rate.  (This may not always make
106perfect biological sense: it would be more natural to assume some upper
107bound to the rate, as we have discussed in the Felsenstein and Churchill
108paper).  Nevertheless you may want to use both types of rate variation.
109<P>
110<H2>INPUT FORMAT AND OPTIONS</H2>
111<P>
112Subject to these assumptions, the program is a
113correct maximum likelihood method.  The
114input is fairly standard, with one addition.  As usual the first line of the
115file gives the number of species and the number of amino acid positions.
116<P>
117Next come the species data.  Each
118sequence starts on a new line, has a ten-character species name
119that must be blank-filled to be of that length, followed immediately
120by the species data in the one-letter amino acid code.  The sequences must
121either be in the "interleaved" or "sequential" formats
122described in the Molecular Sequence Programs document.  The I option
123selects between them.  The sequences can have internal
124blanks in the sequence but there must be no extra blanks at the end of the
125terminated line.  Note that a blank is not a valid symbol for a deletion.
126<P>
127The options are selected using an interactive menu.  The menu looks like this:
128<P>
129<TABLE><TR><TD BGCOLOR=white>
130<PRE>
131
132Amino acid sequence
133   Maximum Likelihood method with molecular clock, version 3.6a3
134
135Settings for this run:
136  U                 Search for best tree?  Yes
137  P   JTT or PAM amino acid change model?  Jones-Taylor-Thornton model
138  C   One category of substitution rates?  Yes
139  R           Rate variation among sites?  constant rate of change
140  G                Global rearrangements?  No
141  W                       Sites weighted?  No
142  J   Randomize input order of sequences?  No. Use input order
143  M           Analyze multiple data sets?  No
144  I          Input sequences interleaved?  Yes
145  0   Terminal type (IBM PC, ANSI, none)?  (none)
146  1    Print out the data at start of run  No
147  2  Print indications of progress of run  Yes
148  3                        Print out tree  Yes
149  4       Write out trees onto tree file?  Yes
150  5   Reconstruct hypothetical sequences?  No
151
152Are these settings correct? (type Y or the letter for one to change)
153
154</PRE>
155</TD></TR></TABLE>
156<P>
157The user either types "Y" (followed, of course, by a carriage-return)
158if the settings shown are to be accepted, or the letter or digit corresponding
159to an option that is to be changed.
160<P>
161The options U, W, J, O, M, and 0 are the usual ones.  They are described in the
162main documentation file of this package.  Option I is the same as in
163other molecular sequence programs and is described in the documentation file
164for the sequence programs.
165<P>
166The R (Hidden Markov Model rates) option allows the user to
167approximate a Gamma distribution of rates among positions, or a
168Gamma distribution plus a class of invariant positions, or to specify how
169many categories of
170substitution rates there will be in a Hidden Markov Model of rate
171variation, and what are the rates and probabilities
172for each.   By repeatedly selecting the R option one toggles among
173no rate variation, the Gamma, Gamma+I, and general HMM possibilities.
174<P>
175If you choose Gamma or Gamma+I the program will ask how many rate
176categories you want.  If you have chosen Gamma+I, keep in mind that
177one rate category will be set aside for the invariant class and only
178the remaining ones used to approximate the Gamma distribution.
179For the approximation we do not use the quantile method of Yang (1995)
180but instead use a quadrature method using generalized Laguerre
181polynomials.  This should give a good approximation to the Gamma
182distribution with as few as 5 or 6 categories.
183<P>
184In the Gamma and Gamma+I cases, the user will be
185asked to supply the coefficient of variation of the rate of substitution
186among positions.  This is different from the parameters used by Nei and Jin
187(1990) but
188related to them:  their parameter <EM>a</EM> is also known as "alpha",
189the shape parameter of the Gamma distribution.  It is
190related to the coefficient of variation by
191<P>
192&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;CV = 1 / a<SUP>1/2</SUP>
193<P>
194or
195<P>
196&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;a = 1 / (CV)<SUP>2</SUP>
197<P>
198(their parameter <EM>b</EM> is absorbed here by the requirement that time is scaled so
199that the mean rate of evolution is 1 per unit time, which means that <EM>a = b</EM>).
200As we consider cases in which the rates are less variable we should set <EM>a</EM>
201larger and larger, as <EM>CV</EM> gets smaller and smaller.
202<P>
203If the user instead chooses the general Hidden Markov Model option,
204they are first asked how many HMM rate categories there
205will be (for the moment there is an upper limit of 9,
206which should not be restrictive).  Then
207the program asks for the rates for each category.  These rates are
208only meaningful relative to each other, so that rates 1.0, 2.0, and 2.4
209have the exact same effect as rates 2.0, 4.0, and 4.8.  Note that an
210HMM rate category
211can have rate of change 0, so that this allows us to take into account that
212there may be a category of amino acid positions that are invariant.  Note that
213the run time
214of the program will be proportional to the number of HMM rate categories:
215twice as
216many categories means twice as long a run.  Finally the program will ask for
217the probabilities of a random amino acid position falling into each of these
218regional rate categories.  These probabilities must be nonnegative and sum to
2191.  Default
220for the program is one category, with rate 1.0 and probability 1.0 (actually
221the rate does not matter in that case).
222<P>
223If more than one HMM rate category is specified, then another
224option, A, becomes
225visible in the menu.  This allows us to specify that we want to assume that
226positions that have the same HMM rate category are expected to be clustered
227so that there is autocorrelation of rates.  The
228program asks for the value of the average patch length.  This is an expected
229length of patches that have the same rate.  If it is 1, the rates of
230successive positions will be independent.  If it is, say, 10.25, then the
231chance of change to a new rate will be 1/10.25 after every position.  However
232the "new rate" is randomly drawn from the mix of rates, and hence could
233even be the same.  So the actual observed length of patches with the same
234rate will be a bit larger than 10.25.  Note below that if you choose
235multiple patches, there will be an estimate in the output file as to
236which combination of rate categories contributed most to the likelihood.
237<P>
238Note that the autocorrelation scheme we use is somewhat different
239from Yang's (1995) autocorrelated Gamma distribution.  I am unsure
240whether this difference is of any importance -- our scheme is chosen
241for the ease with which it can be implemented.
242<P>
243The C option allows user-defined rate categories.  The user is prompted
244for the number of user-defined rates, and for the rates themselves,
245which cannot be negative but can be zero.  These numbers, which must be
246nonnegative (some could be 0),
247are defined relative to each other, so that if rates for three categories
248are set to 1 : 3 : 2.5 this would have the same meaning as setting them
249to 2 : 6 : 5.
250The assignment of rates to amino acid positions
251is then made by reading a file whose default name is "categories".
252It should contain a string of digits 1 through 9.  A new line or a blank
253can occur after any character in this string.  Thus the categories file
254might look like this:
255<P>
256<PRE>
257122231111122411155
2581155333333444
259</PRE>
260<P>
261With the current options R, A, and C the program has a good
262ability to infer different rates at different positions and estimate
263phylogenies under a more realistic model.  Note that Likelihood Ratio
264Tests can be used to test whether one combination of rates is
265significantly better than another, provided one rate scheme represents
266a restriction of another with fewer parameters.  The number of parameters
267needed for rate variation is the number of regional rate categories, plus
268the number of user-defined rate categories less 2, plus one if the
269regional rate categories have a nonzero autocorrelation.
270<P>
271The G (global search) option causes, after the last species is added to
272the tree, each possible group to be removed and re-added.  This improves the
273result, since the position of every species is reconsidered.  It
274approximately triples the run-time of the program.
275<P>
276The User tree (option U) is read from a file whose default name is
277<TT>intree</TT>.  The trees can be multifurcating.  This allows us to test the
278hypothesis that a given branch has zero length.
279<P>
280If the U (user tree) option is chosen another option appears in
281the menu, the L option.  If it is selected,
282it signals the program that it
283should take any branch lengths that are in the user tree and
284simply evaluate the likelihood of that tree, without further altering
285those branch lengths.   In the case of a clock, if some branches have lengths
286and others do not, the program does not estimate the lengths of those that
287do not have lengths given in the user tree.  If any of the branches
288do not have lengths, the program re-estimates the lengths of all of them.
289This is done because estimating some and not others is hard in the
290case of a clock.
291<P>
292The W (Weights) option is invoked in the usual way, with only weights 0
293and 1 allowed.  It selects a set of positions to be analyzed, ignoring the
294others.  The positions selected are those with weight 1.  If the W option is
295not invoked, all positions are analyzed.
296The Weights (W) option
297takes the weights from a file whose default name is "weights".  The weights
298follow the format described in the main documentation file.
299<P>
300The M (multiple data sets) option will ask you whether you want to
301use multiple sets of weights (from the weights file) or multiple data sets
302from the input file.
303The ability to use a single data set with multiple weights means that
304much less disk space will be used for this input data.  The bootstrapping
305and jackknifing tool Seqboot has the ability to create a weights file with
306multiple weights.  Note also that when we use multiple weights for
307bootstrapping we can also then maintain different rate categories for
308different positions in a meaningful way.  You should not use the multiple
309data sets option without using multiple weights, you should not at the
310same time use the user-defined rate categories option (option C).
311<P>
312The algorithm used for searching among trees is faster than it was in
313version 3.5, thanks to using a technique invented by David Swofford
314and J. S. Rogers.  This involves not iterating most branch lengths on most
315trees when searching among tree topologies,  This is of necessity a
316"quick-and-dirty" search but it saves much time.
317<P>
318<H2>OUTPUT FORMAT</H2>
319<P>
320The output starts by giving the number of species, the number of amino
321acid positions.
322<P>
323If the R (HMM rates) option is used a table of the relative rates of
324expected substitution at each category of positions is printed, as well
325as the probabilities of each of those rates.
326<P>
327There then follow the data sequences, if the user has selected the menu
328option to print them out, with the base sequences printed in
329groups of ten amino acids.  The
330trees found are printed as a rooted
331tree topology.  The
332internal nodes are numbered arbitrarily for the sake of
333identification.  The number of trees evaluated so far and the log
334likelihood of the tree are also given.  The branch lengths in the diagram are
335roughly proportional to the estimated branch lengths, except that very short
336branches are printed out at least three characters in length so that the
337connections can be seen.
338<P>
339A table is printed
340showing the length of each tree segment, and the time (in units of
341expected amino acid substitutions per position) of each fork in the tree,
342measured from the root of the tree.  I have not attempted in include
343code for approximate confidence limits on branch points, as I have done
344for branch lengths in PROML, both because of the extreme crudeness of
345that test, and because the variation of times for different forks would be
346highly correlated.
347<P>
348The log likelihood printed out with the final tree can be used to perform
349various likelihood ratio tests.  One can, for example, compare runs with
350different values of the relative rate of change in the active site and in
351the rest of the protein to determine
352which value is the maximum likelihood estimate, and what is the allowable range
353of values (using a likelihood ratio test, which you will find described in
354mathematical statistics books).  One could also estimate the base frequencies
355in the same way.  Both of these, particularly the latter, require multiple runs
356of the program to evaluate different possible values, and this might get
357expensive.
358<P>
359This program makes possible a (reasonably) legitimate
360statistical test of the molecular clock.  To do such a test, run PROML
361and PROMLK on the same data.  If the trees obtained are of the same
362topology (when considered as unrooted), it is legitimate to compare
363their likelihoods by the likelihood ratio test.  In PROML the likelihood
364has been computed by estimating 2n-3 branch lengths, if their are n tips
365on the tree.  In PROMLK it has been computed by estimating n-1 branching
366times (in effect, n-1 branch lengths).  The difference in the number of
367parameters is (2n-3)-(n-1) =  n-2.  To perform the test take the
368difference in log likelihoods between the two runs (PROML should be the
369higher of the two, barring numerical iteration difficulties) and double
370it.  Look this up on a chi-square distribution with n-2 degrees of
371freedom.  If the result is significant, the log likelihood has been
372significantly increased by allowing all 2n-3 branch lengths to be
373estimated instead of just n-1, and molecular clock may be rejected.
374<P>
375If the U (User Tree) option is used and more than one tree is supplied,
376and the program is not told to assume autocorrelation between the
377rates at different amino acid positions, the
378program also performs a statistical test of each of these trees against the
379one with highest likelihood.   If there are two user trees, the test
380done is one which is due to Kishino and Hasegawa (1989), a version
381of a test originally introduced by Templeton (1983).  In this
382implementation it uses the mean and variance of
383log-likelihood differences between trees, taken across amino acid
384positions.  If the two
385trees' means are more than 1.96 standard deviations different
386then the trees are
387declared significantly different.  This use of the empirical variance of
388log-likelihood differences is more robust and nonparametric than the
389classical likelihood ratio test, and may to some extent compensate for the
390any lack of realism in the model underlying this program.
391<P>
392If there are more than two trees, the test done is an extension of
393the KHT test, due to Shimodaira and Hasegawa (1999).  They pointed out
394that a correction for the number of trees was necessary, and they
395introduced a resampling method to make this correction.  In the version
396used here the variances and covariances of the sum of log likelihoods across
397amino acid positions are computed for all pairs of trees.  To test whether the
398difference between each tree and the best one is larger than could have
399been expected if they all had the same expected log-likelihood,
400log-likelihoods for all trees are sampled with these covariances and equal
401means (Shimodaira and Hasegawa's "least favorable hypothesis"),
402and a P value is computed from the fraction of times the difference between
403the tree's value and the highest log-likelihood exceeds that actually
404observed.  Note that this sampling needs random numbers, and so the
405program will prompt the user for a random number seed if one has not
406already been supplied.  With the two-tree KHT test no random numbers
407are used.
408<P>
409In either the KHT or the SH test the program
410prints out a table of the log-likelihoods of each tree, the differences of
411each from the highest one, the variance of that quantity as determined by
412the log-likelihood differences at individual sites, and a conclusion as to
413whether that tree is or is not significantly worse than the best one.  However
414the test is not available if we assume that there
415is autocorrelation of rates at neighboring positions (option A) and is not
416done in those cases.
417<P>
418The branch lengths printed out are scaled in terms of expected numbers of
419amino acid substitutions, scaled so that the average rate of
420change, averaged over all the positions analyzed, is set to 1.0.
421if there are multiple categories of positions.  This means that whether or not
422there are multiple categories of positions, the expected fraction of change
423for very small branches is equal to the branch length.  Of course,
424when a branch is twice as
425long this does not mean that there will be twice as much net change expected
426along it, since some of the changes occur in the same position and overlie or
427even reverse each
428other.  The branch length estimates here are in terms of the expected
429underlying numbers of changes.  That means that a branch of length 0.26
430is 26 times as long as one which would show a 1% difference between
431the amino acid sequences at the beginning and end of the branch.  But we
432would not expect the sequences at the beginning and end of the branch to be
43326% different, as there would be some overlaying of changes.
434<P>
435Because of limitations of the numerical
436algorithm, branch length estimates of zero will often print out as small
437numbers such as 0.00001.  If you see a branch length that small, it is really
438estimated to be of zero length.
439<P>
440Another possible source of confusion is the existence of negative values for
441the log likelihood.  This is not really a problem; the log likelihood is not a
442probability but the logarithm of a probability.  When it is
443negative it simply means that the corresponding probability is less
444than one (since we are seeing its logarithm).  The log likelihood is
445maximized by being made more positive: -30.23 is worse than -29.14.
446<P>
447At the end of the output, if the R option is in effect with multiple
448HMM rates, the program will print a list of what amino acid position
449categories contributed the most to the final likelihood.  This combination of
450HMM rate categories need not have contributed a majority of the likelihood,
451just a plurality.  Still, it will be helpful as a view of where the
452program infers that the higher and lower rates are.  Note that the
453use in this calculations of the prior probabilities of different rates,
454and the average patch length, gives this inference a "smoothed"
455appearance: some other combination of rates might make a greater
456contribution to the likelihood, but be discounted because it conflicts
457with this prior information.  See the example output below to see
458what this printout of rate categories looks like.
459A second list will also be printed out, showing for each position which
460rate accounted for the highest fraction of the likelihood.  If the fraction
461of the likelihood accounted for is less than 95%, a dot is printed instead.
462<P>
463Option 3 in the menu controls whether the tree is printed out into
464the output file.  This is on by default, and usually you will want to
465leave it this way.  However for runs with multiple data sets such as
466bootstrapping runs, you will primarily be interested in the trees
467which are written onto the output tree file, rather than the trees
468printed on the output file.  To keep the output file from becoming too
469large, it may be wisest to use option 3 to prevent trees being
470printed onto the output file.
471<P>
472Option 4 in the menu controls whether the tree estimated by the program
473is written onto a tree file.  The default name of this output tree file
474is "outtree".  If the U option is in effect, all the user-defined
475trees are written to the output tree file.
476<P>
477Option 5 in the menu controls whether ancestral states are estimated
478at each node in the tree.  If it is in effect, a table of ancestral
479sequences is printed out (including the sequences in the tip species which
480are the input sequences).
481The symbol printed out is for the amino acid which accounts for the
482largest fraction of the likelihood at that position.
483In that table, if a position has an amino acid which
484accounts for more than 95% of the likelihood, its symbol printed in capital
485letters (W rather than w).  One limitation of the current
486version of the program is that when there are multiple HMM rates
487(option R) the reconstructed amino acids are based on only the single
488assignment of rates to positions which accounts for the largest amount of the
489likelihood.  Thus the assessment of 95% of the likelihood, in tabulating
490the ancestral states, refers to 95% of the likelihood that is accounted
491for by that particular combination of rates.
492<P>
493<H2>PROGRAM CONSTANTS</H2>
494<P>
495The constants defined at the beginning of the program include "maxtrees",
496the maximum number of user trees that can be processed.  It is small (100)
497at present to save some further memory but the cost of increasing it
498is not very great.  Other constants
499include "maxcategories", the maximum number of position
500categories, "namelength", the length of species names in
501characters, and three others, "smoothings", "iterations", and "epsilon", that
502help "tune" the algorithm and define the compromise between execution speed and
503the quality of the branch lengths found by iteratively maximizing the
504likelihood.  Reducing iterations and smoothings, and increasing epsilon, will
505result in faster execution but a worse result.  These values
506will not usually have to be changed.
507<P>
508The program spends most of its time doing real arithmetic.
509The algorithm, with separate and independent computations
510occurring for each pattern, lends itself readily to parallel processing.
511<P>
512<H2>PAST AND FUTURE OF THE PROGRAM</H2>
513<P>
514This program was developed in version 3.6 by Lucas Mix by combining code
515from DNAMLK and from PROML.
516<P>
517<HR>
518<P>
519<H3>TEST DATA SET</H3>
520<P>
521<TABLE><TR><TD BGCOLOR=white>
522<PRE>
523   5   13
524Alpha     AACGTGGCCAAAT
525Beta      AAGGTCGCCAAAC
526Gamma     CATTTCGTCACAA
527Delta     GGTATTTCGGCCT
528Epsilon   GGGATCTCGGCCC
529</PRE>
530</TD></TR></TABLE>
531<P>
532<HR>
533<H3>CONTENTS OF OUTPUT FILE (with all numerical options on)</H3>
534<P>
535(It was run with HMM rates having gamma-distributed rates
536approximated by 5 rate categories,
537with coefficient of variation of rates 1.0, and with patch length
538parameter = 1.5.  Two user-defined rate categories were used, one for
539the first 6 positions, the other for the last 7, with rates 1.0 : 2.0.
540Weights were used, with sites 1 and 13 given weight 0, and all others
541weight 1.)
542<P>
543<TABLE><TR><TD BGCOLOR=white>
544<PRE>
545
546Amino acid sequence
547   Maximum Likelihood method with molecular clock, version 3.6a3
548
549 5 species,  13  sites
550
551    Site categories are:
552
553             1111112222 222
554
555
556    Sites are weighted as follows:
557
558             0111111111 111
559
560Jones-Taylor-Thornton model of amino acid change
561
562
563Name            Sequences
564----            ---------
565
566Alpha        AACGTGGCCA AAT
567Beta         ..G..C.... ..C
568Gamma        C.TT.C.T.. C.A
569Delta        GGTA.TT.GG CC.
570Epsilon      GGGA.CT.GG CCC
571
572
573
574Discrete approximation to gamma distributed rates
575 Coefficient of variation of rates = 1.000000  (alpha = 1.000000)
576
577State in HMM    Rate of change    Probability
578
579        1           0.264            0.522
580        2           1.413            0.399
581        3           3.596            0.076
582        4           7.086            0.0036
583        5          12.641            0.000023
584
585Expected length of a patch of sites having the same rate =    1.500
586
587
588Site category   Rate of change
589
590        1           1.000
591        2           2.000
592
593
594
595
596
597
598                                               +-----------------Epsilon   
599     +-----------------------------------------4 
600  +--3                                         +-----------------Delta     
601  !  ! 
602--2  +-----------------------------------------------------------Gamma     
603  ! 
604  !                                +--------------------------Beta     
605  +--------------------------------1 
606                                   +--------------------------Alpha     
607
608
609Ln Likelihood =  -138.46858
610
611 Ancestor      Node      Node Height     Length
612 --------      ----      ---- ------     ------
613 root            2     
614   2             3          0.00010      0.00010
615   3             4          6.92817      6.92807
616   4          Epsilon       9.99990      3.07173
617   4          Delta         9.99990      3.07173
618   3          Gamma         9.99990      9.99980
619   2             1          5.47444      5.47444
620   1          Beta          9.99990      4.52546
621   1          Alpha         9.99990      4.52546
622
623Combination of categories that contributes the most to the likelihood:
624
625             3333333333 333
626
627Most probable category at each site if > 0.95 probability ("." otherwise)
628
629             .......... ...
630
631
632
633Probable sequences at interior nodes:
634
635  node       Reconstructed sequence (caps if > 0.95)
636
637    2        .AeDesDDdd eSe
638    3        .AeDesDDDd eSe
639    4        .GEDssDEDD ESs
640 Epsilon     GGGATCTCGG CCC
641 Delta       GGTATTTCGG CCT
642 Gamma       CATTTCGTCA CAA
643    1        .AeDEdDDds sSE
644 Beta        AAGGTCGCCA AAC
645 Alpha       AACGTGGCCA AAT
646
647</PRE>
648</TD></TR></TABLE>
649</BODY>
650</HTML>
Note: See TracBrowser for help on using the repository browser.