| 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> |
|---|
| 2 | <HTML> |
|---|
| 3 | <HEAD> |
|---|
| 4 | <TITLE>contchar</TITLE> |
|---|
| 5 | <META NAME="description" CONTENT="contchar"> |
|---|
| 6 | <META NAME="keywords" CONTENT="contchar"> |
|---|
| 7 | <META NAME="resource-type" CONTENT="document"> |
|---|
| 8 | <META NAME="distribution" CONTENT="global"> |
|---|
| 9 | <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> |
|---|
| 10 | </HEAD> |
|---|
| 11 | <BODY BGCOLOR="#ccffff"> |
|---|
| 12 | <DIV ALIGN=RIGHT> |
|---|
| 13 | version 3.6 |
|---|
| 14 | </DIV> |
|---|
| 15 | <P> |
|---|
| 16 | <DIV ALIGN=CENTER> |
|---|
| 17 | <H1>CONTRAST -- Computes contrasts for comparative method</H1> |
|---|
| 18 | </DIV> |
|---|
| 19 | <P> |
|---|
| 20 | <PRE> |
|---|
| 21 | </PRE> |
|---|
| 22 | <P> |
|---|
| 23 | © Copyright 1991-2002 by the University of |
|---|
| 24 | Washington. Written by Joseph Felsenstein. Permission is granted to copy |
|---|
| 25 | this document provided that no fee is charged for it and that this copyright |
|---|
| 26 | notice is not removed. |
|---|
| 27 | <P> |
|---|
| 28 | This program implements the contrasts calculation described in my 1985 |
|---|
| 29 | paper on the comparative method (Felsenstein, 1985d). It reads in a |
|---|
| 30 | data set of the standard quantitative characters sort, and also a |
|---|
| 31 | tree from the treefile. It then forms the contrasts between species |
|---|
| 32 | that, according to that tree, are statistically independent. This is |
|---|
| 33 | done for each character. The contrasts are all standardized by |
|---|
| 34 | branch lengths (actually, square roots of branch lengths). |
|---|
| 35 | <P> |
|---|
| 36 | The method is explained in the 1985 paper. It assumes |
|---|
| 37 | a Brownian motion model. This model was introduced by Edwards and |
|---|
| 38 | Cavalli-Sforza (1964; Cavalli-Sforza and Edwards, 1967) |
|---|
| 39 | as an approximation to the evolution of gene frequencies. I have |
|---|
| 40 | discussed (Felsenstein, 1973b, 1981c, 1985d, 1988b) the difficulties |
|---|
| 41 | inherent in using it as a model for the evolution of quantitative |
|---|
| 42 | characters. Chief among these is that the characters do not necessarily evolve |
|---|
| 43 | independently or at equal rates. This program allows one to evaluate this, |
|---|
| 44 | if there is independent information on the phylogeny. You can |
|---|
| 45 | compute the variance of the contrasts for each character, as a measure of |
|---|
| 46 | the variance accumulating per unit branch length. You can also test |
|---|
| 47 | covariances of characters. |
|---|
| 48 | <P> |
|---|
| 49 | The input file is as described in the continuous characters |
|---|
| 50 | documentation file above, for the case of continuous quantitative |
|---|
| 51 | characters (not gene frequencies). Options are selected using a menu: |
|---|
| 52 | <P> |
|---|
| 53 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 54 | <PRE> |
|---|
| 55 | |
|---|
| 56 | Continuous character comparative analysis, version 3.6a3 |
|---|
| 57 | |
|---|
| 58 | Settings for this run: |
|---|
| 59 | W within-population variation in data? No, species values are means |
|---|
| 60 | R Print out correlations and regressions? Yes |
|---|
| 61 | A LRT test of no phylogenetic component? Yes, with and without VarA |
|---|
| 62 | C Print out contrasts? No |
|---|
| 63 | M Analyze multiple trees? No |
|---|
| 64 | 0 Terminal type (IBM PC, ANSI, none)? (none) |
|---|
| 65 | 1 Print out the data at start of run No |
|---|
| 66 | 2 Print indications of progress of run Yes |
|---|
| 67 | |
|---|
| 68 | Y to accept these or type the letter for one to change |
|---|
| 69 | |
|---|
| 70 | </PRE> |
|---|
| 71 | </TD></TR></TABLE> |
|---|
| 72 | <P> |
|---|
| 73 | Option W makes the program expect not means of the phenotypes in each |
|---|
| 74 | species, but phenotypes of individual specimens. The details of |
|---|
| 75 | the input file format in that case are given below. In that case the |
|---|
| 76 | program estimates the covariances of the phenotypic change, as well as |
|---|
| 77 | covariances of within-species phenotypic variation. The model used is |
|---|
| 78 | similar to (but not identical to) that of Lynch (1990). The |
|---|
| 79 | algorithms used differ from the ones he |
|---|
| 80 | gives in that paper. They will be described in a forthcoming paper by |
|---|
| 81 | me. In the case that has within-species samples contrasts are used by |
|---|
| 82 | the program, but it does not make sense to write them out to an |
|---|
| 83 | output file for direct analysis. They are of two kinds, contrasts |
|---|
| 84 | within species and contrasts between species. The former are |
|---|
| 85 | affected only by the within-species phenotypic covariation, but the |
|---|
| 86 | latter are affected by both within- and between-species covariation. |
|---|
| 87 | CONTRAST infers these two kinds of covariances and writes the |
|---|
| 88 | estimates out. |
|---|
| 89 | <P> |
|---|
| 90 | M is similar to the usual multiple data sets input option, but is used here |
|---|
| 91 | to allow multiple trees to be read from the treefile, not multiple |
|---|
| 92 | data sets to be read from the input file. In this way you can |
|---|
| 93 | use bootstrapping on the data that estimated these trees, get |
|---|
| 94 | multiple bootstrap estimates of the tree, and then use the M |
|---|
| 95 | option to make multiple analyses of the contrasts and the |
|---|
| 96 | covariances, correlations, and regressions. In this way (Felsenstein, |
|---|
| 97 | 1988b) you can assess the effect of the inaccuracy of the trees on |
|---|
| 98 | your estimates of these statistics. |
|---|
| 99 | <P> |
|---|
| 100 | R allows you to turn off or on the printing out of the statistics. |
|---|
| 101 | If it is off only the contrasts will be printed out (unless option |
|---|
| 102 | 1 is selected). With only the contrasts printed out, they are in |
|---|
| 103 | a simple array that is in a form that many statistics packages should |
|---|
| 104 | be able to read. The contrasts are rows, and each row has one contrast |
|---|
| 105 | for each character. Any multivariate statistics package should be able |
|---|
| 106 | to analyze these (but keep in mind that the contrasts have, by virtue |
|---|
| 107 | of the way they are generated, expectation zero, so all regressions |
|---|
| 108 | must pass through the origin). If the W option has been set to |
|---|
| 109 | analyze within-species as well as between-species variation, the R |
|---|
| 110 | option does not appear in the menu as the regression and correlation |
|---|
| 111 | statistics should always be computed in that case. |
|---|
| 112 | <P> |
|---|
| 113 | As usual, the tree file has the default name <TT>intree</TT>. It |
|---|
| 114 | should contain the desired tree or trees. These can be |
|---|
| 115 | either in bifurcating form, or may have the bottommost fork be a |
|---|
| 116 | trifurcation (it should not matter which of these ways you present the tree). |
|---|
| 117 | The tree must, of course, have branch lengths. |
|---|
| 118 | <P> |
|---|
| 119 | If you have a molecular data set (for example) and also, on the same |
|---|
| 120 | species, quantitative measurements, here is how you can allow for the |
|---|
| 121 | uncertainty of yor estimate of the tree. Use SEQBOOT to generate multiple |
|---|
| 122 | data sets from your molecular data. Then, whichever method you use to |
|---|
| 123 | analyze it (the relevant ones are those that produce estimates of the |
|---|
| 124 | branch lengths: DNAML, DNAMLK, FITCH, KITSCH, and NEIGHBOR -- the latter |
|---|
| 125 | three require you to use DNADIST to turn the bootstrap data sets into |
|---|
| 126 | multiple distance matrices), you should use the Multiple Data Sets |
|---|
| 127 | option of that program. This will result in a tree file with many |
|---|
| 128 | trees on it. Then use this tree file with the input file containing |
|---|
| 129 | your continuous quantitative characters, choosing the Multiple Trees |
|---|
| 130 | (M) option. You will get one set of contrasts and statistics for each |
|---|
| 131 | tree in the tree file. At the moment there is no overall summary: |
|---|
| 132 | you will have to tabulate these by hand. A similar process can be |
|---|
| 133 | followed if you have restriction sites data (using RESTML) or |
|---|
| 134 | gene frequencies data. |
|---|
| 135 | <P> |
|---|
| 136 | The statistics that are printed out include the covariances between |
|---|
| 137 | all pairs of characters, the regressions of each character on each |
|---|
| 138 | other (column j is regressed on row i), and the correlations between |
|---|
| 139 | all pairs of characters. In assessing degress of freedom it is |
|---|
| 140 | important to realize that each contrast was taken to have |
|---|
| 141 | expectation zero, which is known because each contrast could as |
|---|
| 142 | easily have been computed xi-xj instead of xj-xi. Thus there is no |
|---|
| 143 | loss of a degree of freedom for estimation of a mean. The degrees |
|---|
| 144 | of freedom is thus the same as the number of contrasts, namely one |
|---|
| 145 | less than the number of species (tips). If you feed these contrasts |
|---|
| 146 | into a multivariate statistics program make sure that it knows that |
|---|
| 147 | each variable has expectation exactly zero. |
|---|
| 148 | <P> |
|---|
| 149 | <DIV CENTER> |
|---|
| 150 | <H2>Within-species variation</H2> |
|---|
| 151 | </DIV> |
|---|
| 152 | With the W option selected, CONTRAST analyzes data sets with variation within |
|---|
| 153 | species, using a model like that proposed by Michael Lynch (1990). |
|---|
| 154 | If you select the W option for within-species variation, the data |
|---|
| 155 | set should have this structure (on the left are the data, on the right |
|---|
| 156 | my comments: |
|---|
| 157 | <P> |
|---|
| 158 | <TABLE><TR><TD bgcolor=white> |
|---|
| 159 | <PRE> |
|---|
| 160 | 10 5 |
|---|
| 161 | Alpha 2 |
|---|
| 162 | 2.01 5.3 1.5 -3.41 0.3 |
|---|
| 163 | 1.98 4.3 2.1 -2.98 0.45 |
|---|
| 164 | Gammarus 3 |
|---|
| 165 | 6.57 3.1 2.0 -1.89 0.6 |
|---|
| 166 | 7.62 3.4 1.9 -2.01 0.7 |
|---|
| 167 | 6.02 3.0 1.9 -2.03 0.6 |
|---|
| 168 | ... |
|---|
| 169 | </PRE> |
|---|
| 170 | </TD> |
|---|
| 171 | <TD> |
|---|
| 172 | <PRE> |
|---|
| 173 | number of species, number of characters |
|---|
| 174 | name of 1st species, # of individuals |
|---|
| 175 | data for individual #1 |
|---|
| 176 | data for individual #2 |
|---|
| 177 | name of 2nd species, # of individuals |
|---|
| 178 | data for individual #1 |
|---|
| 179 | data for individual #2 |
|---|
| 180 | data for individual #3 |
|---|
| 181 | (and so on) |
|---|
| 182 | </PRE> |
|---|
| 183 | </TD></TR></TABLE> |
|---|
| 184 | <P> |
|---|
| 185 | The covariances, correlations, and regressions for the "additive" |
|---|
| 186 | (between-species evolutionary variation) and "environmental" (within-species |
|---|
| 187 | phenotypic variation) are |
|---|
| 188 | printed out (the maximum likelihood estimates of each). |
|---|
| 189 | The program also estimates the within-species phenotypic variation in the |
|---|
| 190 | case where the between-species evolutionary covariances are forced to be |
|---|
| 191 | zero. The log-likelihoods of these two cases are compared and a |
|---|
| 192 | likelihood ratio test (LRT) is carried out. The program prints the result |
|---|
| 193 | of this test as a chi-square variate, and gives the number of degrees of |
|---|
| 194 | freedom of the LRT. You have to look up the chi-square variable on a |
|---|
| 195 | table of the chi-square distribution. |
|---|
| 196 | <P> |
|---|
| 197 | The log-likelihood of the data under the models with and without |
|---|
| 198 | between-species For the moment the program cannot handle the case where |
|---|
| 199 | within-species variation is to be taken into account but where only species |
|---|
| 200 | means are available. (It can handle cases where some species have only one |
|---|
| 201 | member in their sample). |
|---|
| 202 | <P> |
|---|
| 203 | We hope to fix this soon. We are also on our way to |
|---|
| 204 | incorporating full-sib, half-sib, or clonal groups within species, so as |
|---|
| 205 | to do one analysis for within-species genetic and between-species |
|---|
| 206 | phylogenetic variation. |
|---|
| 207 | <P> |
|---|
| 208 | The data set used as an example below is the example from a |
|---|
| 209 | paper by Michael Lynch (1990), his characters having been log-transformed. |
|---|
| 210 | In the case where there is only one specimen per species, Lynch's model |
|---|
| 211 | is identical to our model of within-species variation (for |
|---|
| 212 | multiple individuals per species it is not a subcase of his model). |
|---|
| 213 | <P> |
|---|
| 214 | <HR> |
|---|
| 215 | <P> |
|---|
| 216 | <H3>TEST SET INPUT</H3> |
|---|
| 217 | <P> |
|---|
| 218 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 219 | <PRE> |
|---|
| 220 | 5 2 |
|---|
| 221 | Homo 4.09434 4.74493 |
|---|
| 222 | Pongo 3.61092 3.33220 |
|---|
| 223 | Macaca 2.37024 3.36730 |
|---|
| 224 | Ateles 2.02815 2.89037 |
|---|
| 225 | Galago -1.46968 2.30259 |
|---|
| 226 | </PRE> |
|---|
| 227 | <P> |
|---|
| 228 | </TD></TR></TABLE> |
|---|
| 229 | <HR> |
|---|
| 230 | <P> |
|---|
| 231 | <H3>TEST SET INPUT TREEFILE</H3> |
|---|
| 232 | <P> |
|---|
| 233 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 234 | <PRE> |
|---|
| 235 | ((((Homo:0.21,Pongo:0.21):0.28,Macaca:0.49):0.13,Ateles:0.62):0.38,Galago:1.00); |
|---|
| 236 | </PRE> |
|---|
| 237 | </TD></TR></TABLE> |
|---|
| 238 | <P> |
|---|
| 239 | <HR> |
|---|
| 240 | <P> |
|---|
| 241 | <H3>TEST SET OUTPUT (with all numerical options on )<H3> |
|---|
| 242 | <P> |
|---|
| 243 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 244 | <PRE> |
|---|
| 245 | |
|---|
| 246 | Continuous character contrasts analysis, version 3.6a3 |
|---|
| 247 | |
|---|
| 248 | 5 Populations, 2 Characters |
|---|
| 249 | |
|---|
| 250 | Name Phenotypes |
|---|
| 251 | ---- ---------- |
|---|
| 252 | |
|---|
| 253 | Homo 4.09434 4.74493 |
|---|
| 254 | Pongo 3.61092 3.33220 |
|---|
| 255 | Macaca 2.37024 3.36730 |
|---|
| 256 | Ateles 2.02815 2.89037 |
|---|
| 257 | Galago -1.46968 2.30259 |
|---|
| 258 | |
|---|
| 259 | |
|---|
| 260 | Covariance matrix |
|---|
| 261 | ---------- ------ |
|---|
| 262 | |
|---|
| 263 | 4.1991 1.3844 |
|---|
| 264 | 1.3844 0.7125 |
|---|
| 265 | |
|---|
| 266 | Regressions (columns on rows) |
|---|
| 267 | ----------- -------- -- ----- |
|---|
| 268 | |
|---|
| 269 | 1.0000 0.3297 |
|---|
| 270 | 1.9430 1.0000 |
|---|
| 271 | |
|---|
| 272 | Correlations |
|---|
| 273 | ------------ |
|---|
| 274 | |
|---|
| 275 | 1.0000 0.8004 |
|---|
| 276 | 0.8004 1.0000 |
|---|
| 277 | |
|---|
| 278 | </PRE> |
|---|
| 279 | </TD></TR></TABLE> |
|---|
| 280 | </BODY> |
|---|
| 281 | </HTML> |
|---|