Context Navigation

promlk.html

Visit:

Last change on this file was 2176, checked in by westram, 22 years ago
* empty log message *
Property svn:eol-style set to `native` Property svn:keywords set to `Author Date Id Revision`
File size: 29.8 KB

Line
1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
2	<HTML>
3	<HEAD>
4	<TITLE>dnamlk</TITLE>
5	<META NAME="description" CONTENT="dnamlk">
6	<META NAME="keywords" CONTENT="dnamlk">
7	<META NAME="resource-type" CONTENT="document">
8	<META NAME="distribution" CONTENT="global">
9	<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
10	</HEAD>
11	<BODY BGCOLOR="#ccffff">
12	<DIV ALIGN=RIGHT>
13	version 3.6
14	</DIV>
15	<P>
16	<DIV ALIGN=CENTER>
17	<H1>ProMLK -- Protein maximum likelihood program<BR>with molecular clock</H1>
18	</DIV>
19	<P>
20	© Copyright 2000-2002 by the University of
21	Washington. Written by Joseph Felsenstein. Permission is granted to copy
22	this document provided that no fee is charged for it and that this copyright
23	notice is not removed.
24	<P>
25	This program implements the maximum likelihood method for protein amino
26	acid sequences under the constraint that the trees estimated must be
27	consistent with a molecular clock. The molecular clock is the
28	assumption that the tips of the tree are all equidistant, in branch
29	length, from its root. This program is indirectly related to PROML.
30	It uses the Dayhoff probability model of change between amino acids.
31	Its algorithmic details are not yet published, but many of them are
32	similar to DNAMLK.
33	<P>
34	The assumptions of the model are:
35	<OL>
36	<LI>Each position in the sequence evolves independently.
37	<LI>Different lineages evolve independently.
38	<LI>Each position undergoes substitution at an expected rate which is
39	chosen from a series of rates (each with a probability of occurrence)
40	which we specify.
41	<LI>All relevant positions are included in the sequence, not just those that
42	have changed or those that are "phylogenetically informative".
43	<LI>The probabilities of change between amino acids are given by the
44	model of Dayhoff (Dayhoff and Eck, 1968; Dayhoff et. al., 1979).
45	</OL>
46	<P>
47	Note the assumption that we are looking at all positions, including those
48	that have not changed at all. It is important not to restrict attention
49	to some positions based on whether or not they have changed; doing that
50	would bias branch lengths by making them too long, and that in turn
51	would cause the method to misinterpret the meaning of those positions that
52	had changed.
53	<P>
54	This program uses a Hidden Markov Model (HMM)
55	method of inferring different rates of evolution at different amino acid
56	positions. This
57	was described in a paper by me and Gary Churchill (1996). It allows us to
58	specify to the program that there will be
59	a number of different possible evolutionary rates, what the prior
60	probabilities of occurrence of each is, and what the average length of a
61	patch of positions all having the same rate. The rates can also be chosen
62	by the program to approximate a Gamma distribution of rates, or a
63	Gamma distribution plus a class of invariant positions. The program computes the
64	likelihood by summing it over all possible assignments of rates to positions,
65	weighting each by its prior probability of occurrence.
66	<P>
67	For example, if we have used the C and A options (described below) to specify
68	that there are three possible rates of evolution, 1.0, 2.4, and 0.0,
69	that the prior probabilities of a position having these rates are 0.4, 0.3, and
70	0.3, and that the average patch length (number of consecutive positions
71	with the same rate) is 2.0, the program will sum the likelihood over
72	all possibilities, but giving less weight to those that (say) assign all
73	positions to rate 2.4, or that fail to have consecutive positions that have the
74	same rate.
75	<P>
76	The Hidden Markov Model framework for rate variation among positions
77	was independently developed by Yang (1993, 1994, 1995). We have
78	implemented a general scheme for a Hidden Markov Model of
79	rates; we allow the rates and their prior probabilities to be specified
80	arbitrarily by the user, or by a discrete approximation to a Gamma
81	distribution of rates (Yang, 1995), or by a mixture of a Gamma
82	distribution and a class of invariant positions.
83	<P>
84	This feature effectively removes the artificial assumption that all positions
85	have the same rate, and also means that we need not know in advance the
86	identities of the positions that have a particular rate of evolution.
87	<P>
88	Another layer of rate variation also is available. The user can assign
89	categories of rates to each positions (for example, we might want
90	amino acid positions in the active site of a protein to change more slowly
91	than other positions. This is done with the categories input file and the
92	C option. We then specify (using the menu) the relative rates of evolution of
93	amino acid positions
94	in the different categories. For example, we might specify that positions
95	in the active site
96	evolve at relative rates of 0.2 compared to 1.0 at other positions. If we
97	are assuming that a particular position maintains a cysteine bridge to another,
98	we may want to put it in a category of positions (including perhaps the
99	initial position of the protein sequence which maintains methionine) which
100	changes at a rate of 0.0.
101	<P>
102	If both user-assigned rate categories and Hidden Markov Model rates
103	are allowed, the program assumes that the
104	actual rate at a position is the product of the user-assigned category rate
105	and the Hidden Markov Model regional rate. (This may not always make
106	perfect biological sense: it would be more natural to assume some upper
107	bound to the rate, as we have discussed in the Felsenstein and Churchill
108	paper). Nevertheless you may want to use both types of rate variation.
109	<P>
110	<H2>INPUT FORMAT AND OPTIONS</H2>
111	<P>
112	Subject to these assumptions, the program is a
113	correct maximum likelihood method. The
114	input is fairly standard, with one addition. As usual the first line of the
115	file gives the number of species and the number of amino acid positions.
116	<P>
117	Next come the species data. Each
118	sequence starts on a new line, has a ten-character species name
119	that must be blank-filled to be of that length, followed immediately
120	by the species data in the one-letter amino acid code. The sequences must
121	either be in the "interleaved" or "sequential" formats
122	described in the Molecular Sequence Programs document. The I option
123	selects between them. The sequences can have internal
124	blanks in the sequence but there must be no extra blanks at the end of the
125	terminated line. Note that a blank is not a valid symbol for a deletion.
126	<P>
127	The options are selected using an interactive menu. The menu looks like this:
128	<P>
129	<TABLE><TR><TD BGCOLOR=white>
130	<PRE>
131
132	Amino acid sequence
133	Maximum Likelihood method with molecular clock, version 3.6a3
134
135	Settings for this run:
136	U Search for best tree? Yes
137	P JTT or PAM amino acid change model? Jones-Taylor-Thornton model
138	C One category of substitution rates? Yes
139	R Rate variation among sites? constant rate of change
140	G Global rearrangements? No
141	W Sites weighted? No
142	J Randomize input order of sequences? No. Use input order
143	M Analyze multiple data sets? No
144	I Input sequences interleaved? Yes
145	0 Terminal type (IBM PC, ANSI, none)? (none)
146	1 Print out the data at start of run No
147	2 Print indications of progress of run Yes
148	3 Print out tree Yes
149	4 Write out trees onto tree file? Yes
150	5 Reconstruct hypothetical sequences? No
151
152	Are these settings correct? (type Y or the letter for one to change)
153
154	</PRE>
155	</TD></TR></TABLE>
156	<P>
157	The user either types "Y" (followed, of course, by a carriage-return)
158	if the settings shown are to be accepted, or the letter or digit corresponding
159	to an option that is to be changed.
160	<P>
161	The options U, W, J, O, M, and 0 are the usual ones. They are described in the
162	main documentation file of this package. Option I is the same as in
163	other molecular sequence programs and is described in the documentation file
164	for the sequence programs.
165	<P>
166	The R (Hidden Markov Model rates) option allows the user to
167	approximate a Gamma distribution of rates among positions, or a
168	Gamma distribution plus a class of invariant positions, or to specify how
169	many categories of
170	substitution rates there will be in a Hidden Markov Model of rate
171	variation, and what are the rates and probabilities
172	for each. By repeatedly selecting the R option one toggles among
173	no rate variation, the Gamma, Gamma+I, and general HMM possibilities.
174	<P>
175	If you choose Gamma or Gamma+I the program will ask how many rate
176	categories you want. If you have chosen Gamma+I, keep in mind that
177	one rate category will be set aside for the invariant class and only
178	the remaining ones used to approximate the Gamma distribution.
179	For the approximation we do not use the quantile method of Yang (1995)
180	but instead use a quadrature method using generalized Laguerre
181	polynomials. This should give a good approximation to the Gamma
182	distribution with as few as 5 or 6 categories.
183	<P>
184	In the Gamma and Gamma+I cases, the user will be
185	asked to supply the coefficient of variation of the rate of substitution
186	among positions. This is different from the parameters used by Nei and Jin
187	(1990) but
188	related to them: their parameter <EM>a</EM> is also known as "alpha",
189	the shape parameter of the Gamma distribution. It is
190	related to the coefficient of variation by
191	<P>
192	CV = 1 / a<SUP>1/2</SUP>
193	<P>
194	or
195	<P>
196	a = 1 / (CV)<SUP>2</SUP>
197	<P>
198	(their parameter <EM>b</EM> is absorbed here by the requirement that time is scaled so
199	that the mean rate of evolution is 1 per unit time, which means that <EM>a = b</EM>).
200	As we consider cases in which the rates are less variable we should set <EM>a</EM>
201	larger and larger, as <EM>CV</EM> gets smaller and smaller.
202	<P>
203	If the user instead chooses the general Hidden Markov Model option,
204	they are first asked how many HMM rate categories there
205	will be (for the moment there is an upper limit of 9,
206	which should not be restrictive). Then
207	the program asks for the rates for each category. These rates are
208	only meaningful relative to each other, so that rates 1.0, 2.0, and 2.4
209	have the exact same effect as rates 2.0, 4.0, and 4.8. Note that an
210	HMM rate category
211	can have rate of change 0, so that this allows us to take into account that
212	there may be a category of amino acid positions that are invariant. Note that
213	the run time
214	of the program will be proportional to the number of HMM rate categories:
215	twice as
216	many categories means twice as long a run. Finally the program will ask for
217	the probabilities of a random amino acid position falling into each of these
218	regional rate categories. These probabilities must be nonnegative and sum to
219	1. Default
220	for the program is one category, with rate 1.0 and probability 1.0 (actually
221	the rate does not matter in that case).
222	<P>
223	If more than one HMM rate category is specified, then another
224	option, A, becomes
225	visible in the menu. This allows us to specify that we want to assume that
226	positions that have the same HMM rate category are expected to be clustered
227	so that there is autocorrelation of rates. The
228	program asks for the value of the average patch length. This is an expected
229	length of patches that have the same rate. If it is 1, the rates of
230	successive positions will be independent. If it is, say, 10.25, then the
231	chance of change to a new rate will be 1/10.25 after every position. However
232	the "new rate" is randomly drawn from the mix of rates, and hence could
233	even be the same. So the actual observed length of patches with the same
234	rate will be a bit larger than 10.25. Note below that if you choose
235	multiple patches, there will be an estimate in the output file as to
236	which combination of rate categories contributed most to the likelihood.
237	<P>
238	Note that the autocorrelation scheme we use is somewhat different
239	from Yang's (1995) autocorrelated Gamma distribution. I am unsure
240	whether this difference is of any importance -- our scheme is chosen
241	for the ease with which it can be implemented.
242	<P>
243	The C option allows user-defined rate categories. The user is prompted
244	for the number of user-defined rates, and for the rates themselves,
245	which cannot be negative but can be zero. These numbers, which must be
246	nonnegative (some could be 0),
247	are defined relative to each other, so that if rates for three categories
248	are set to 1 : 3 : 2.5 this would have the same meaning as setting them
249	to 2 : 6 : 5.
250	The assignment of rates to amino acid positions
251	is then made by reading a file whose default name is "categories".
252	It should contain a string of digits 1 through 9. A new line or a blank
253	can occur after any character in this string. Thus the categories file
254	might look like this:
255	<P>
256	<PRE>
257	122231111122411155
258	1155333333444
259	</PRE>
260	<P>
261	With the current options R, A, and C the program has a good
262	ability to infer different rates at different positions and estimate
263	phylogenies under a more realistic model. Note that Likelihood Ratio
264	Tests can be used to test whether one combination of rates is
265	significantly better than another, provided one rate scheme represents
266	a restriction of another with fewer parameters. The number of parameters
267	needed for rate variation is the number of regional rate categories, plus
268	the number of user-defined rate categories less 2, plus one if the
269	regional rate categories have a nonzero autocorrelation.
270	<P>
271	The G (global search) option causes, after the last species is added to
272	the tree, each possible group to be removed and re-added. This improves the
273	result, since the position of every species is reconsidered. It
274	approximately triples the run-time of the program.
275	<P>
276	The User tree (option U) is read from a file whose default name is
277	<TT>intree</TT>. The trees can be multifurcating. This allows us to test the
278	hypothesis that a given branch has zero length.
279	<P>
280	If the U (user tree) option is chosen another option appears in
281	the menu, the L option. If it is selected,
282	it signals the program that it
283	should take any branch lengths that are in the user tree and
284	simply evaluate the likelihood of that tree, without further altering
285	those branch lengths. In the case of a clock, if some branches have lengths
286	and others do not, the program does not estimate the lengths of those that
287	do not have lengths given in the user tree. If any of the branches
288	do not have lengths, the program re-estimates the lengths of all of them.
289	This is done because estimating some and not others is hard in the
290	case of a clock.
291	<P>
292	The W (Weights) option is invoked in the usual way, with only weights 0
293	and 1 allowed. It selects a set of positions to be analyzed, ignoring the
294	others. The positions selected are those with weight 1. If the W option is
295	not invoked, all positions are analyzed.
296	The Weights (W) option
297	takes the weights from a file whose default name is "weights". The weights
298	follow the format described in the main documentation file.
299	<P>
300	The M (multiple data sets) option will ask you whether you want to
301	use multiple sets of weights (from the weights file) or multiple data sets
302	from the input file.
303	The ability to use a single data set with multiple weights means that
304	much less disk space will be used for this input data. The bootstrapping
305	and jackknifing tool Seqboot has the ability to create a weights file with
306	multiple weights. Note also that when we use multiple weights for
307	bootstrapping we can also then maintain different rate categories for
308	different positions in a meaningful way. You should not use the multiple
309	data sets option without using multiple weights, you should not at the
310	same time use the user-defined rate categories option (option C).
311	<P>
312	The algorithm used for searching among trees is faster than it was in
313	version 3.5, thanks to using a technique invented by David Swofford
314	and J. S. Rogers. This involves not iterating most branch lengths on most
315	trees when searching among tree topologies, This is of necessity a
316	"quick-and-dirty" search but it saves much time.
317	<P>
318	<H2>OUTPUT FORMAT</H2>
319	<P>
320	The output starts by giving the number of species, the number of amino
321	acid positions.
322	<P>
323	If the R (HMM rates) option is used a table of the relative rates of
324	expected substitution at each category of positions is printed, as well
325	as the probabilities of each of those rates.
326	<P>
327	There then follow the data sequences, if the user has selected the menu
328	option to print them out, with the base sequences printed in
329	groups of ten amino acids. The
330	trees found are printed as a rooted
331	tree topology. The
332	internal nodes are numbered arbitrarily for the sake of
333	identification. The number of trees evaluated so far and the log
334	likelihood of the tree are also given. The branch lengths in the diagram are
335	roughly proportional to the estimated branch lengths, except that very short
336	branches are printed out at least three characters in length so that the
337	connections can be seen.
338	<P>
339	A table is printed
340	showing the length of each tree segment, and the time (in units of
341	expected amino acid substitutions per position) of each fork in the tree,
342	measured from the root of the tree. I have not attempted in include
343	code for approximate confidence limits on branch points, as I have done
344	for branch lengths in PROML, both because of the extreme crudeness of
345	that test, and because the variation of times for different forks would be
346	highly correlated.
347	<P>
348	The log likelihood printed out with the final tree can be used to perform
349	various likelihood ratio tests. One can, for example, compare runs with
350	different values of the relative rate of change in the active site and in
351	the rest of the protein to determine
352	which value is the maximum likelihood estimate, and what is the allowable range
353	of values (using a likelihood ratio test, which you will find described in
354	mathematical statistics books). One could also estimate the base frequencies
355	in the same way. Both of these, particularly the latter, require multiple runs
356	of the program to evaluate different possible values, and this might get
357	expensive.
358	<P>
359	This program makes possible a (reasonably) legitimate
360	statistical test of the molecular clock. To do such a test, run PROML
361	and PROMLK on the same data. If the trees obtained are of the same
362	topology (when considered as unrooted), it is legitimate to compare
363	their likelihoods by the likelihood ratio test. In PROML the likelihood
364	has been computed by estimating 2n-3 branch lengths, if their are n tips
365	on the tree. In PROMLK it has been computed by estimating n-1 branching
366	times (in effect, n-1 branch lengths). The difference in the number of
367	parameters is (2n-3)-(n-1) = n-2. To perform the test take the
368	difference in log likelihoods between the two runs (PROML should be the
369	higher of the two, barring numerical iteration difficulties) and double
370	it. Look this up on a chi-square distribution with n-2 degrees of
371	freedom. If the result is significant, the log likelihood has been
372	significantly increased by allowing all 2n-3 branch lengths to be
373	estimated instead of just n-1, and molecular clock may be rejected.
374	<P>
375	If the U (User Tree) option is used and more than one tree is supplied,
376	and the program is not told to assume autocorrelation between the
377	rates at different amino acid positions, the
378	program also performs a statistical test of each of these trees against the
379	one with highest likelihood. If there are two user trees, the test
380	done is one which is due to Kishino and Hasegawa (1989), a version
381	of a test originally introduced by Templeton (1983). In this
382	implementation it uses the mean and variance of
383	log-likelihood differences between trees, taken across amino acid
384	positions. If the two
385	trees' means are more than 1.96 standard deviations different
386	then the trees are
387	declared significantly different. This use of the empirical variance of
388	log-likelihood differences is more robust and nonparametric than the
389	classical likelihood ratio test, and may to some extent compensate for the
390	any lack of realism in the model underlying this program.
391	<P>
392	If there are more than two trees, the test done is an extension of
393	the KHT test, due to Shimodaira and Hasegawa (1999). They pointed out
394	that a correction for the number of trees was necessary, and they
395	introduced a resampling method to make this correction. In the version
396	used here the variances and covariances of the sum of log likelihoods across
397	amino acid positions are computed for all pairs of trees. To test whether the
398	difference between each tree and the best one is larger than could have
399	been expected if they all had the same expected log-likelihood,
400	log-likelihoods for all trees are sampled with these covariances and equal
401	means (Shimodaira and Hasegawa's "least favorable hypothesis"),
402	and a P value is computed from the fraction of times the difference between
403	the tree's value and the highest log-likelihood exceeds that actually
404	observed. Note that this sampling needs random numbers, and so the
405	program will prompt the user for a random number seed if one has not
406	already been supplied. With the two-tree KHT test no random numbers
407	are used.
408	<P>
409	In either the KHT or the SH test the program
410	prints out a table of the log-likelihoods of each tree, the differences of
411	each from the highest one, the variance of that quantity as determined by
412	the log-likelihood differences at individual sites, and a conclusion as to
413	whether that tree is or is not significantly worse than the best one. However
414	the test is not available if we assume that there
415	is autocorrelation of rates at neighboring positions (option A) and is not
416	done in those cases.
417	<P>
418	The branch lengths printed out are scaled in terms of expected numbers of
419	amino acid substitutions, scaled so that the average rate of
420	change, averaged over all the positions analyzed, is set to 1.0.
421	if there are multiple categories of positions. This means that whether or not
422	there are multiple categories of positions, the expected fraction of change
423	for very small branches is equal to the branch length. Of course,
424	when a branch is twice as
425	long this does not mean that there will be twice as much net change expected
426	along it, since some of the changes occur in the same position and overlie or
427	even reverse each
428	other. The branch length estimates here are in terms of the expected
429	underlying numbers of changes. That means that a branch of length 0.26
430	is 26 times as long as one which would show a 1% difference between
431	the amino acid sequences at the beginning and end of the branch. But we
432	would not expect the sequences at the beginning and end of the branch to be
433	26% different, as there would be some overlaying of changes.
434	<P>
435	Because of limitations of the numerical
436	algorithm, branch length estimates of zero will often print out as small
437	numbers such as 0.00001. If you see a branch length that small, it is really
438	estimated to be of zero length.
439	<P>
440	Another possible source of confusion is the existence of negative values for
441	the log likelihood. This is not really a problem; the log likelihood is not a
442	probability but the logarithm of a probability. When it is
443	negative it simply means that the corresponding probability is less
444	than one (since we are seeing its logarithm). The log likelihood is
445	maximized by being made more positive: -30.23 is worse than -29.14.
446	<P>
447	At the end of the output, if the R option is in effect with multiple
448	HMM rates, the program will print a list of what amino acid position
449	categories contributed the most to the final likelihood. This combination of
450	HMM rate categories need not have contributed a majority of the likelihood,
451	just a plurality. Still, it will be helpful as a view of where the
452	program infers that the higher and lower rates are. Note that the
453	use in this calculations of the prior probabilities of different rates,
454	and the average patch length, gives this inference a "smoothed"
455	appearance: some other combination of rates might make a greater
456	contribution to the likelihood, but be discounted because it conflicts
457	with this prior information. See the example output below to see
458	what this printout of rate categories looks like.
459	A second list will also be printed out, showing for each position which
460	rate accounted for the highest fraction of the likelihood. If the fraction
461	of the likelihood accounted for is less than 95%, a dot is printed instead.
462	<P>
463	Option 3 in the menu controls whether the tree is printed out into
464	the output file. This is on by default, and usually you will want to
465	leave it this way. However for runs with multiple data sets such as
466	bootstrapping runs, you will primarily be interested in the trees
467	which are written onto the output tree file, rather than the trees
468	printed on the output file. To keep the output file from becoming too
469	large, it may be wisest to use option 3 to prevent trees being
470	printed onto the output file.
471	<P>
472	Option 4 in the menu controls whether the tree estimated by the program
473	is written onto a tree file. The default name of this output tree file
474	is "outtree". If the U option is in effect, all the user-defined
475	trees are written to the output tree file.
476	<P>
477	Option 5 in the menu controls whether ancestral states are estimated
478	at each node in the tree. If it is in effect, a table of ancestral
479	sequences is printed out (including the sequences in the tip species which
480	are the input sequences).
481	The symbol printed out is for the amino acid which accounts for the
482	largest fraction of the likelihood at that position.
483	In that table, if a position has an amino acid which
484	accounts for more than 95% of the likelihood, its symbol printed in capital
485	letters (W rather than w). One limitation of the current
486	version of the program is that when there are multiple HMM rates
487	(option R) the reconstructed amino acids are based on only the single
488	assignment of rates to positions which accounts for the largest amount of the
489	likelihood. Thus the assessment of 95% of the likelihood, in tabulating
490	the ancestral states, refers to 95% of the likelihood that is accounted
491	for by that particular combination of rates.
492	<P>
493	<H2>PROGRAM CONSTANTS</H2>
494	<P>
495	The constants defined at the beginning of the program include "maxtrees",
496	the maximum number of user trees that can be processed. It is small (100)
497	at present to save some further memory but the cost of increasing it
498	is not very great. Other constants
499	include "maxcategories", the maximum number of position
500	categories, "namelength", the length of species names in
501	characters, and three others, "smoothings", "iterations", and "epsilon", that
502	help "tune" the algorithm and define the compromise between execution speed and
503	the quality of the branch lengths found by iteratively maximizing the
504	likelihood. Reducing iterations and smoothings, and increasing epsilon, will
505	result in faster execution but a worse result. These values
506	will not usually have to be changed.
507	<P>
508	The program spends most of its time doing real arithmetic.
509	The algorithm, with separate and independent computations
510	occurring for each pattern, lends itself readily to parallel processing.
511	<P>
512	<H2>PAST AND FUTURE OF THE PROGRAM</H2>
513	<P>
514	This program was developed in version 3.6 by Lucas Mix by combining code
515	from DNAMLK and from PROML.
516	<P>
517	<HR>
518	<P>
519	<H3>TEST DATA SET</H3>
520	<P>
521	<TABLE><TR><TD BGCOLOR=white>
522	<PRE>
523	5 13
524	Alpha AACGTGGCCAAAT
525	Beta AAGGTCGCCAAAC
526	Gamma CATTTCGTCACAA
527	Delta GGTATTTCGGCCT
528	Epsilon GGGATCTCGGCCC
529	</PRE>
530	</TD></TR></TABLE>
531	<P>
532	<HR>
533	<H3>CONTENTS OF OUTPUT FILE (with all numerical options on)</H3>
534	<P>
535	(It was run with HMM rates having gamma-distributed rates
536	approximated by 5 rate categories,
537	with coefficient of variation of rates 1.0, and with patch length
538	parameter = 1.5. Two user-defined rate categories were used, one for
539	the first 6 positions, the other for the last 7, with rates 1.0 : 2.0.
540	Weights were used, with sites 1 and 13 given weight 0, and all others
541	weight 1.)
542	<P>
543	<TABLE><TR><TD BGCOLOR=white>
544	<PRE>
545
546	Amino acid sequence
547	Maximum Likelihood method with molecular clock, version 3.6a3
548
549	5 species, 13 sites
550
551	Site categories are:
552
553	1111112222 222
554
555
556	Sites are weighted as follows:
557
558	0111111111 111
559
560	Jones-Taylor-Thornton model of amino acid change
561
562
563	Name Sequences
564	---- ---------
565
566	Alpha AACGTGGCCA AAT
567	Beta ..G..C.... ..C
568	Gamma C.TT.C.T.. C.A
569	Delta GGTA.TT.GG CC.
570	Epsilon GGGA.CT.GG CCC
571
572
573
574	Discrete approximation to gamma distributed rates
575	Coefficient of variation of rates = 1.000000 (alpha = 1.000000)
576
577	State in HMM Rate of change Probability
578
579	1 0.264 0.522
580	2 1.413 0.399
581	3 3.596 0.076
582	4 7.086 0.0036
583	5 12.641 0.000023
584
585	Expected length of a patch of sites having the same rate = 1.500
586
587
588	Site category Rate of change
589
590	1 1.000
591	2 2.000
592
593
594
595
596
597
598	+-----------------Epsilon
599	+-----------------------------------------4
600	+--3 +-----------------Delta
601	! !
602	--2 +-----------------------------------------------------------Gamma
603	!
604	! +--------------------------Beta
605	+--------------------------------1
606	+--------------------------Alpha
607
608
609	Ln Likelihood = -138.46858
610
611	Ancestor Node Node Height Length
612	-------- ---- ---- ------ ------
613	root 2
614	2 3 0.00010 0.00010
615	3 4 6.92817 6.92807
616	4 Epsilon 9.99990 3.07173
617	4 Delta 9.99990 3.07173
618	3 Gamma 9.99990 9.99980
619	2 1 5.47444 5.47444
620	1 Beta 9.99990 4.52546
621	1 Alpha 9.99990 4.52546
622
623	Combination of categories that contributes the most to the likelihood:
624
625	3333333333 333
626
627	Most probable category at each site if > 0.95 probability ("." otherwise)
628
629	.......... ...
630
631
632
633	Probable sequences at interior nodes:
634
635	node Reconstructed sequence (caps if > 0.95)
636
637	2 .AeDesDDdd eSe
638	3 .AeDesDDDd eSe
639	4 .GEDssDEDD ESs
640	Epsilon GGGATCTCGG CCC
641	Delta GGTATTTCGG CCT
642	Gamma CATTTCGTCA CAA
643	1 .AeDEdDDds sSE
644	Beta AAGGTCGCCA AAC
645	Alpha AACGTGGCCA AAT
646
647	</PRE>
648	</TD></TR></TABLE>
649	</BODY>
650	</HTML>

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: branches/profile/GDE/PHYLIP/doc/promlk.html

Download in other formats: