1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> |
---|
2 | <HTML> |
---|
3 | <HEAD> |
---|
4 | <TITLE>contchar</TITLE> |
---|
5 | <META NAME="description" CONTENT="contchar"> |
---|
6 | <META NAME="keywords" CONTENT="contchar"> |
---|
7 | <META NAME="resource-type" CONTENT="document"> |
---|
8 | <META NAME="distribution" CONTENT="global"> |
---|
9 | <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> |
---|
10 | </HEAD> |
---|
11 | <BODY BGCOLOR="#ccffff"> |
---|
12 | <DIV ALIGN=RIGHT> |
---|
13 | version 3.6 |
---|
14 | </DIV> |
---|
15 | <P> |
---|
16 | <DIV ALIGN=CENTER> |
---|
17 | <H1>CONTRAST -- Computes contrasts for comparative method</H1> |
---|
18 | </DIV> |
---|
19 | <P> |
---|
20 | <PRE> |
---|
21 | </PRE> |
---|
22 | <P> |
---|
23 | © Copyright 1991-2002 by the University of |
---|
24 | Washington. Written by Joseph Felsenstein. Permission is granted to copy |
---|
25 | this document provided that no fee is charged for it and that this copyright |
---|
26 | notice is not removed. |
---|
27 | <P> |
---|
28 | This program implements the contrasts calculation described in my 1985 |
---|
29 | paper on the comparative method (Felsenstein, 1985d). It reads in a |
---|
30 | data set of the standard quantitative characters sort, and also a |
---|
31 | tree from the treefile. It then forms the contrasts between species |
---|
32 | that, according to that tree, are statistically independent. This is |
---|
33 | done for each character. The contrasts are all standardized by |
---|
34 | branch lengths (actually, square roots of branch lengths). |
---|
35 | <P> |
---|
36 | The method is explained in the 1985 paper. It assumes |
---|
37 | a Brownian motion model. This model was introduced by Edwards and |
---|
38 | Cavalli-Sforza (1964; Cavalli-Sforza and Edwards, 1967) |
---|
39 | as an approximation to the evolution of gene frequencies. I have |
---|
40 | discussed (Felsenstein, 1973b, 1981c, 1985d, 1988b) the difficulties |
---|
41 | inherent in using it as a model for the evolution of quantitative |
---|
42 | characters. Chief among these is that the characters do not necessarily evolve |
---|
43 | independently or at equal rates. This program allows one to evaluate this, |
---|
44 | if there is independent information on the phylogeny. You can |
---|
45 | compute the variance of the contrasts for each character, as a measure of |
---|
46 | the variance accumulating per unit branch length. You can also test |
---|
47 | covariances of characters. |
---|
48 | <P> |
---|
49 | The input file is as described in the continuous characters |
---|
50 | documentation file above, for the case of continuous quantitative |
---|
51 | characters (not gene frequencies). Options are selected using a menu: |
---|
52 | <P> |
---|
53 | <TABLE><TR><TD BGCOLOR=white> |
---|
54 | <PRE> |
---|
55 | |
---|
56 | Continuous character comparative analysis, version 3.6a3 |
---|
57 | |
---|
58 | Settings for this run: |
---|
59 | W within-population variation in data? No, species values are means |
---|
60 | R Print out correlations and regressions? Yes |
---|
61 | A LRT test of no phylogenetic component? Yes, with and without VarA |
---|
62 | C Print out contrasts? No |
---|
63 | M Analyze multiple trees? No |
---|
64 | 0 Terminal type (IBM PC, ANSI, none)? (none) |
---|
65 | 1 Print out the data at start of run No |
---|
66 | 2 Print indications of progress of run Yes |
---|
67 | |
---|
68 | Y to accept these or type the letter for one to change |
---|
69 | |
---|
70 | </PRE> |
---|
71 | </TD></TR></TABLE> |
---|
72 | <P> |
---|
73 | Option W makes the program expect not means of the phenotypes in each |
---|
74 | species, but phenotypes of individual specimens. The details of |
---|
75 | the input file format in that case are given below. In that case the |
---|
76 | program estimates the covariances of the phenotypic change, as well as |
---|
77 | covariances of within-species phenotypic variation. The model used is |
---|
78 | similar to (but not identical to) that of Lynch (1990). The |
---|
79 | algorithms used differ from the ones he |
---|
80 | gives in that paper. They will be described in a forthcoming paper by |
---|
81 | me. In the case that has within-species samples contrasts are used by |
---|
82 | the program, but it does not make sense to write them out to an |
---|
83 | output file for direct analysis. They are of two kinds, contrasts |
---|
84 | within species and contrasts between species. The former are |
---|
85 | affected only by the within-species phenotypic covariation, but the |
---|
86 | latter are affected by both within- and between-species covariation. |
---|
87 | CONTRAST infers these two kinds of covariances and writes the |
---|
88 | estimates out. |
---|
89 | <P> |
---|
90 | M is similar to the usual multiple data sets input option, but is used here |
---|
91 | to allow multiple trees to be read from the treefile, not multiple |
---|
92 | data sets to be read from the input file. In this way you can |
---|
93 | use bootstrapping on the data that estimated these trees, get |
---|
94 | multiple bootstrap estimates of the tree, and then use the M |
---|
95 | option to make multiple analyses of the contrasts and the |
---|
96 | covariances, correlations, and regressions. In this way (Felsenstein, |
---|
97 | 1988b) you can assess the effect of the inaccuracy of the trees on |
---|
98 | your estimates of these statistics. |
---|
99 | <P> |
---|
100 | R allows you to turn off or on the printing out of the statistics. |
---|
101 | If it is off only the contrasts will be printed out (unless option |
---|
102 | 1 is selected). With only the contrasts printed out, they are in |
---|
103 | a simple array that is in a form that many statistics packages should |
---|
104 | be able to read. The contrasts are rows, and each row has one contrast |
---|
105 | for each character. Any multivariate statistics package should be able |
---|
106 | to analyze these (but keep in mind that the contrasts have, by virtue |
---|
107 | of the way they are generated, expectation zero, so all regressions |
---|
108 | must pass through the origin). If the W option has been set to |
---|
109 | analyze within-species as well as between-species variation, the R |
---|
110 | option does not appear in the menu as the regression and correlation |
---|
111 | statistics should always be computed in that case. |
---|
112 | <P> |
---|
113 | As usual, the tree file has the default name <TT>intree</TT>. It |
---|
114 | should contain the desired tree or trees. These can be |
---|
115 | either in bifurcating form, or may have the bottommost fork be a |
---|
116 | trifurcation (it should not matter which of these ways you present the tree). |
---|
117 | The tree must, of course, have branch lengths. |
---|
118 | <P> |
---|
119 | If you have a molecular data set (for example) and also, on the same |
---|
120 | species, quantitative measurements, here is how you can allow for the |
---|
121 | uncertainty of yor estimate of the tree. Use SEQBOOT to generate multiple |
---|
122 | data sets from your molecular data. Then, whichever method you use to |
---|
123 | analyze it (the relevant ones are those that produce estimates of the |
---|
124 | branch lengths: DNAML, DNAMLK, FITCH, KITSCH, and NEIGHBOR -- the latter |
---|
125 | three require you to use DNADIST to turn the bootstrap data sets into |
---|
126 | multiple distance matrices), you should use the Multiple Data Sets |
---|
127 | option of that program. This will result in a tree file with many |
---|
128 | trees on it. Then use this tree file with the input file containing |
---|
129 | your continuous quantitative characters, choosing the Multiple Trees |
---|
130 | (M) option. You will get one set of contrasts and statistics for each |
---|
131 | tree in the tree file. At the moment there is no overall summary: |
---|
132 | you will have to tabulate these by hand. A similar process can be |
---|
133 | followed if you have restriction sites data (using RESTML) or |
---|
134 | gene frequencies data. |
---|
135 | <P> |
---|
136 | The statistics that are printed out include the covariances between |
---|
137 | all pairs of characters, the regressions of each character on each |
---|
138 | other (column j is regressed on row i), and the correlations between |
---|
139 | all pairs of characters. In assessing degress of freedom it is |
---|
140 | important to realize that each contrast was taken to have |
---|
141 | expectation zero, which is known because each contrast could as |
---|
142 | easily have been computed xi-xj instead of xj-xi. Thus there is no |
---|
143 | loss of a degree of freedom for estimation of a mean. The degrees |
---|
144 | of freedom is thus the same as the number of contrasts, namely one |
---|
145 | less than the number of species (tips). If you feed these contrasts |
---|
146 | into a multivariate statistics program make sure that it knows that |
---|
147 | each variable has expectation exactly zero. |
---|
148 | <P> |
---|
149 | <DIV CENTER> |
---|
150 | <H2>Within-species variation</H2> |
---|
151 | </DIV> |
---|
152 | With the W option selected, CONTRAST analyzes data sets with variation within |
---|
153 | species, using a model like that proposed by Michael Lynch (1990). |
---|
154 | If you select the W option for within-species variation, the data |
---|
155 | set should have this structure (on the left are the data, on the right |
---|
156 | my comments: |
---|
157 | <P> |
---|
158 | <TABLE><TR><TD bgcolor=white> |
---|
159 | <PRE> |
---|
160 | 10 5 |
---|
161 | Alpha 2 |
---|
162 | 2.01 5.3 1.5 -3.41 0.3 |
---|
163 | 1.98 4.3 2.1 -2.98 0.45 |
---|
164 | Gammarus 3 |
---|
165 | 6.57 3.1 2.0 -1.89 0.6 |
---|
166 | 7.62 3.4 1.9 -2.01 0.7 |
---|
167 | 6.02 3.0 1.9 -2.03 0.6 |
---|
168 | ... |
---|
169 | </PRE> |
---|
170 | </TD> |
---|
171 | <TD> |
---|
172 | <PRE> |
---|
173 | number of species, number of characters |
---|
174 | name of 1st species, # of individuals |
---|
175 | data for individual #1 |
---|
176 | data for individual #2 |
---|
177 | name of 2nd species, # of individuals |
---|
178 | data for individual #1 |
---|
179 | data for individual #2 |
---|
180 | data for individual #3 |
---|
181 | (and so on) |
---|
182 | </PRE> |
---|
183 | </TD></TR></TABLE> |
---|
184 | <P> |
---|
185 | The covariances, correlations, and regressions for the "additive" |
---|
186 | (between-species evolutionary variation) and "environmental" (within-species |
---|
187 | phenotypic variation) are |
---|
188 | printed out (the maximum likelihood estimates of each). |
---|
189 | The program also estimates the within-species phenotypic variation in the |
---|
190 | case where the between-species evolutionary covariances are forced to be |
---|
191 | zero. The log-likelihoods of these two cases are compared and a |
---|
192 | likelihood ratio test (LRT) is carried out. The program prints the result |
---|
193 | of this test as a chi-square variate, and gives the number of degrees of |
---|
194 | freedom of the LRT. You have to look up the chi-square variable on a |
---|
195 | table of the chi-square distribution. |
---|
196 | <P> |
---|
197 | The log-likelihood of the data under the models with and without |
---|
198 | between-species For the moment the program cannot handle the case where |
---|
199 | within-species variation is to be taken into account but where only species |
---|
200 | means are available. (It can handle cases where some species have only one |
---|
201 | member in their sample). |
---|
202 | <P> |
---|
203 | We hope to fix this soon. We are also on our way to |
---|
204 | incorporating full-sib, half-sib, or clonal groups within species, so as |
---|
205 | to do one analysis for within-species genetic and between-species |
---|
206 | phylogenetic variation. |
---|
207 | <P> |
---|
208 | The data set used as an example below is the example from a |
---|
209 | paper by Michael Lynch (1990), his characters having been log-transformed. |
---|
210 | In the case where there is only one specimen per species, Lynch's model |
---|
211 | is identical to our model of within-species variation (for |
---|
212 | multiple individuals per species it is not a subcase of his model). |
---|
213 | <P> |
---|
214 | <HR> |
---|
215 | <P> |
---|
216 | <H3>TEST SET INPUT</H3> |
---|
217 | <P> |
---|
218 | <TABLE><TR><TD BGCOLOR=white> |
---|
219 | <PRE> |
---|
220 | 5 2 |
---|
221 | Homo 4.09434 4.74493 |
---|
222 | Pongo 3.61092 3.33220 |
---|
223 | Macaca 2.37024 3.36730 |
---|
224 | Ateles 2.02815 2.89037 |
---|
225 | Galago -1.46968 2.30259 |
---|
226 | </PRE> |
---|
227 | <P> |
---|
228 | </TD></TR></TABLE> |
---|
229 | <HR> |
---|
230 | <P> |
---|
231 | <H3>TEST SET INPUT TREEFILE</H3> |
---|
232 | <P> |
---|
233 | <TABLE><TR><TD BGCOLOR=white> |
---|
234 | <PRE> |
---|
235 | ((((Homo:0.21,Pongo:0.21):0.28,Macaca:0.49):0.13,Ateles:0.62):0.38,Galago:1.00); |
---|
236 | </PRE> |
---|
237 | </TD></TR></TABLE> |
---|
238 | <P> |
---|
239 | <HR> |
---|
240 | <P> |
---|
241 | <H3>TEST SET OUTPUT (with all numerical options on )<H3> |
---|
242 | <P> |
---|
243 | <TABLE><TR><TD BGCOLOR=white> |
---|
244 | <PRE> |
---|
245 | |
---|
246 | Continuous character contrasts analysis, version 3.6a3 |
---|
247 | |
---|
248 | 5 Populations, 2 Characters |
---|
249 | |
---|
250 | Name Phenotypes |
---|
251 | ---- ---------- |
---|
252 | |
---|
253 | Homo 4.09434 4.74493 |
---|
254 | Pongo 3.61092 3.33220 |
---|
255 | Macaca 2.37024 3.36730 |
---|
256 | Ateles 2.02815 2.89037 |
---|
257 | Galago -1.46968 2.30259 |
---|
258 | |
---|
259 | |
---|
260 | Covariance matrix |
---|
261 | ---------- ------ |
---|
262 | |
---|
263 | 4.1991 1.3844 |
---|
264 | 1.3844 0.7125 |
---|
265 | |
---|
266 | Regressions (columns on rows) |
---|
267 | ----------- -------- -- ----- |
---|
268 | |
---|
269 | 1.0000 0.3297 |
---|
270 | 1.9430 1.0000 |
---|
271 | |
---|
272 | Correlations |
---|
273 | ------------ |
---|
274 | |
---|
275 | 1.0000 0.8004 |
---|
276 | 0.8004 1.0000 |
---|
277 | |
---|
278 | </PRE> |
---|
279 | </TD></TR></TABLE> |
---|
280 | </BODY> |
---|
281 | </HTML> |
---|