Context Navigation

contrast.html

Visit:

Last change on this file was 2176, checked in by westram, 22 years ago
* empty log message *
Property svn:eol-style set to `native` Property svn:keywords set to `Author Date Id Revision`
File size: 10.8 KB

Line
1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
2	<HTML>
3	<HEAD>
4	<TITLE>contchar</TITLE>
5	<META NAME="description" CONTENT="contchar">
6	<META NAME="keywords" CONTENT="contchar">
7	<META NAME="resource-type" CONTENT="document">
8	<META NAME="distribution" CONTENT="global">
9	<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
10	</HEAD>
11	<BODY BGCOLOR="#ccffff">
12	<DIV ALIGN=RIGHT>
13	version 3.6
14	</DIV>
15	<P>
16	<DIV ALIGN=CENTER>
17	<H1>CONTRAST -- Computes contrasts for comparative method</H1>
18	</DIV>
19	<P>
20	<PRE>
21	</PRE>
22	<P>
23	© Copyright 1991-2002 by the University of
24	Washington. Written by Joseph Felsenstein. Permission is granted to copy
25	this document provided that no fee is charged for it and that this copyright
26	notice is not removed.
27	<P>
28	This program implements the contrasts calculation described in my 1985
29	paper on the comparative method (Felsenstein, 1985d). It reads in a
30	data set of the standard quantitative characters sort, and also a
31	tree from the treefile. It then forms the contrasts between species
32	that, according to that tree, are statistically independent. This is
33	done for each character. The contrasts are all standardized by
34	branch lengths (actually, square roots of branch lengths).
35	<P>
36	The method is explained in the 1985 paper. It assumes
37	a Brownian motion model. This model was introduced by Edwards and
38	Cavalli-Sforza (1964; Cavalli-Sforza and Edwards, 1967)
39	as an approximation to the evolution of gene frequencies. I have
40	discussed (Felsenstein, 1973b, 1981c, 1985d, 1988b) the difficulties
41	inherent in using it as a model for the evolution of quantitative
42	characters. Chief among these is that the characters do not necessarily evolve
43	independently or at equal rates. This program allows one to evaluate this,
44	if there is independent information on the phylogeny. You can
45	compute the variance of the contrasts for each character, as a measure of
46	the variance accumulating per unit branch length. You can also test
47	covariances of characters.
48	<P>
49	The input file is as described in the continuous characters
50	documentation file above, for the case of continuous quantitative
51	characters (not gene frequencies). Options are selected using a menu:
52	<P>
53	<TABLE><TR><TD BGCOLOR=white>
54	<PRE>
55
56	Continuous character comparative analysis, version 3.6a3
57
58	Settings for this run:
59	W within-population variation in data? No, species values are means
60	R Print out correlations and regressions? Yes
61	A LRT test of no phylogenetic component? Yes, with and without VarA
62	C Print out contrasts? No
63	M Analyze multiple trees? No
64	0 Terminal type (IBM PC, ANSI, none)? (none)
65	1 Print out the data at start of run No
66	2 Print indications of progress of run Yes
67
68	Y to accept these or type the letter for one to change
69
70	</PRE>
71	</TD></TR></TABLE>
72	<P>
73	Option W makes the program expect not means of the phenotypes in each
74	species, but phenotypes of individual specimens. The details of
75	the input file format in that case are given below. In that case the
76	program estimates the covariances of the phenotypic change, as well as
77	covariances of within-species phenotypic variation. The model used is
78	similar to (but not identical to) that of Lynch (1990). The
79	algorithms used differ from the ones he
80	gives in that paper. They will be described in a forthcoming paper by
81	me. In the case that has within-species samples contrasts are used by
82	the program, but it does not make sense to write them out to an
83	output file for direct analysis. They are of two kinds, contrasts
84	within species and contrasts between species. The former are
85	affected only by the within-species phenotypic covariation, but the
86	latter are affected by both within- and between-species covariation.
87	CONTRAST infers these two kinds of covariances and writes the
88	estimates out.
89	<P>
90	M is similar to the usual multiple data sets input option, but is used here
91	to allow multiple trees to be read from the treefile, not multiple
92	data sets to be read from the input file. In this way you can
93	use bootstrapping on the data that estimated these trees, get
94	multiple bootstrap estimates of the tree, and then use the M
95	option to make multiple analyses of the contrasts and the
96	covariances, correlations, and regressions. In this way (Felsenstein,
97	1988b) you can assess the effect of the inaccuracy of the trees on
98	your estimates of these statistics.
99	<P>
100	R allows you to turn off or on the printing out of the statistics.
101	If it is off only the contrasts will be printed out (unless option
102	1 is selected). With only the contrasts printed out, they are in
103	a simple array that is in a form that many statistics packages should
104	be able to read. The contrasts are rows, and each row has one contrast
105	for each character. Any multivariate statistics package should be able
106	to analyze these (but keep in mind that the contrasts have, by virtue
107	of the way they are generated, expectation zero, so all regressions
108	must pass through the origin). If the W option has been set to
109	analyze within-species as well as between-species variation, the R
110	option does not appear in the menu as the regression and correlation
111	statistics should always be computed in that case.
112	<P>
113	As usual, the tree file has the default name <TT>intree</TT>. It
114	should contain the desired tree or trees. These can be
115	either in bifurcating form, or may have the bottommost fork be a
116	trifurcation (it should not matter which of these ways you present the tree).
117	The tree must, of course, have branch lengths.
118	<P>
119	If you have a molecular data set (for example) and also, on the same
120	species, quantitative measurements, here is how you can allow for the
121	uncertainty of yor estimate of the tree. Use SEQBOOT to generate multiple
122	data sets from your molecular data. Then, whichever method you use to
123	analyze it (the relevant ones are those that produce estimates of the
124	branch lengths: DNAML, DNAMLK, FITCH, KITSCH, and NEIGHBOR -- the latter
125	three require you to use DNADIST to turn the bootstrap data sets into
126	multiple distance matrices), you should use the Multiple Data Sets
127	option of that program. This will result in a tree file with many
128	trees on it. Then use this tree file with the input file containing
129	your continuous quantitative characters, choosing the Multiple Trees
130	(M) option. You will get one set of contrasts and statistics for each
131	tree in the tree file. At the moment there is no overall summary:
132	you will have to tabulate these by hand. A similar process can be
133	followed if you have restriction sites data (using RESTML) or
134	gene frequencies data.
135	<P>
136	The statistics that are printed out include the covariances between
137	all pairs of characters, the regressions of each character on each
138	other (column j is regressed on row i), and the correlations between
139	all pairs of characters. In assessing degress of freedom it is
140	important to realize that each contrast was taken to have
141	expectation zero, which is known because each contrast could as
142	easily have been computed xi-xj instead of xj-xi. Thus there is no
143	loss of a degree of freedom for estimation of a mean. The degrees
144	of freedom is thus the same as the number of contrasts, namely one
145	less than the number of species (tips). If you feed these contrasts
146	into a multivariate statistics program make sure that it knows that
147	each variable has expectation exactly zero.
148	<P>
149	<DIV CENTER>
150	<H2>Within-species variation</H2>
151	</DIV>
152	With the W option selected, CONTRAST analyzes data sets with variation within
153	species, using a model like that proposed by Michael Lynch (1990).
154	If you select the W option for within-species variation, the data
155	set should have this structure (on the left are the data, on the right
156	my comments:
157	<P>
158	<TABLE><TR><TD bgcolor=white>
159	<PRE>
160	10 5
161	Alpha 2
162	2.01 5.3 1.5 -3.41 0.3
163	1.98 4.3 2.1 -2.98 0.45
164	Gammarus 3
165	6.57 3.1 2.0 -1.89 0.6
166	7.62 3.4 1.9 -2.01 0.7
167	6.02 3.0 1.9 -2.03 0.6
168	...
169	</PRE>
170	</TD>
171	<TD>
172	<PRE>
173	number of species, number of characters
174	name of 1st species, # of individuals
175	data for individual #1
176	data for individual #2
177	name of 2nd species, # of individuals
178	data for individual #1
179	data for individual #2
180	data for individual #3
181	(and so on)
182	</PRE>
183	</TD></TR></TABLE>
184	<P>
185	The covariances, correlations, and regressions for the "additive"
186	(between-species evolutionary variation) and "environmental" (within-species
187	phenotypic variation) are
188	printed out (the maximum likelihood estimates of each).
189	The program also estimates the within-species phenotypic variation in the
190	case where the between-species evolutionary covariances are forced to be
191	zero. The log-likelihoods of these two cases are compared and a
192	likelihood ratio test (LRT) is carried out. The program prints the result
193	of this test as a chi-square variate, and gives the number of degrees of
194	freedom of the LRT. You have to look up the chi-square variable on a
195	table of the chi-square distribution.
196	<P>
197	The log-likelihood of the data under the models with and without
198	between-species For the moment the program cannot handle the case where
199	within-species variation is to be taken into account but where only species
200	means are available. (It can handle cases where some species have only one
201	member in their sample).
202	<P>
203	We hope to fix this soon. We are also on our way to
204	incorporating full-sib, half-sib, or clonal groups within species, so as
205	to do one analysis for within-species genetic and between-species
206	phylogenetic variation.
207	<P>
208	The data set used as an example below is the example from a
209	paper by Michael Lynch (1990), his characters having been log-transformed.
210	In the case where there is only one specimen per species, Lynch's model
211	is identical to our model of within-species variation (for
212	multiple individuals per species it is not a subcase of his model).
213	<P>
214	<HR>
215	<P>
216	<H3>TEST SET INPUT</H3>
217	<P>
218	<TABLE><TR><TD BGCOLOR=white>
219	<PRE>
220	5 2
221	Homo 4.09434 4.74493
222	Pongo 3.61092 3.33220
223	Macaca 2.37024 3.36730
224	Ateles 2.02815 2.89037
225	Galago -1.46968 2.30259
226	</PRE>
227	<P>
228	</TD></TR></TABLE>
229	<HR>
230	<P>
231	<H3>TEST SET INPUT TREEFILE</H3>
232	<P>
233	<TABLE><TR><TD BGCOLOR=white>
234	<PRE>
235	((((Homo:0.21,Pongo:0.21):0.28,Macaca:0.49):0.13,Ateles:0.62):0.38,Galago:1.00);
236	</PRE>
237	</TD></TR></TABLE>
238	<P>
239	<HR>
240	<P>
241	<H3>TEST SET OUTPUT (with all numerical options on )<H3>
242	<P>
243	<TABLE><TR><TD BGCOLOR=white>
244	<PRE>
245
246	Continuous character contrasts analysis, version 3.6a3
247
248	5 Populations, 2 Characters
249
250	Name Phenotypes
251	---- ----------
252
253	Homo 4.09434 4.74493
254	Pongo 3.61092 3.33220
255	Macaca 2.37024 3.36730
256	Ateles 2.02815 2.89037
257	Galago -1.46968 2.30259
258
259
260	Covariance matrix
261	---------- ------
262
263	4.1991 1.3844
264	1.3844 0.7125
265
266	Regressions (columns on rows)
267	----------- -------- -- -----
268
269	1.0000 0.3297
270	1.9430 1.0000
271
272	Correlations
273	------------
274
275	1.0000 0.8004
276	0.8004 1.0000
277
278	</PRE>
279	</TD></TR></TABLE>
280	</BODY>
281	</HTML>

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: trunk/GDE/PHYLIP/doc/contrast.html

Download in other formats: