| 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> |
|---|
| 2 | <HTML> |
|---|
| 3 | <HEAD> |
|---|
| 4 | <TITLE>contchar</TITLE> |
|---|
| 5 | <META NAME="description" CONTENT="contchar"> |
|---|
| 6 | <META NAME="keywords" CONTENT="contchar"> |
|---|
| 7 | <META NAME="resource-type" CONTENT="document"> |
|---|
| 8 | <META NAME="distribution" CONTENT="global"> |
|---|
| 9 | <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> |
|---|
| 10 | </HEAD> |
|---|
| 11 | <BODY BGCOLOR="#ccffff"> |
|---|
| 12 | <DIV ALIGN=RIGHT> |
|---|
| 13 | version 3.6 |
|---|
| 14 | </DIV> |
|---|
| 15 | <P> |
|---|
| 16 | <DIV ALIGN=CENTER> |
|---|
| 17 | <H1>Gene Frequencies and Continuous Character Data Programs</H1> |
|---|
| 18 | </DIV> |
|---|
| 19 | <P> |
|---|
| 20 | © Copyright 1986-2000 by the University of |
|---|
| 21 | Washington. Written by Joseph Felsenstein. Permission is granted to copy |
|---|
| 22 | this document provided that no fee is charged for it and that this copyright |
|---|
| 23 | notice is not removed. |
|---|
| 24 | <P> |
|---|
| 25 | The programs in this group |
|---|
| 26 | use gene frequencies and quantitative character values. One (CONTML) |
|---|
| 27 | constructs maximum likelihood estimates of the phylogeny, another |
|---|
| 28 | (GENDIST) computes genetic distances for use in the distance matrix |
|---|
| 29 | programs, and the third (CONTRAST) examines correlation of traits as |
|---|
| 30 | they evolve along a given phylogeny. |
|---|
| 31 | <P> |
|---|
| 32 | When the gene frequencies data are used in CONTML or GENDIST, this |
|---|
| 33 | involves the following assumptions: |
|---|
| 34 | <P> |
|---|
| 35 | <OL> |
|---|
| 36 | <LI>Different lineages evolve independently. |
|---|
| 37 | <LI>After two lineages split, their characters change |
|---|
| 38 | independently. |
|---|
| 39 | <LI>Each gene frequency changes by genetic drift, with or without mutation |
|---|
| 40 | (this varies from method to method). |
|---|
| 41 | <LI>Different loci or characters drift independently. |
|---|
| 42 | </OL> |
|---|
| 43 | <P> |
|---|
| 44 | How these assumptions affect the methods will be seen in my papers on |
|---|
| 45 | inference of phylogenies from gene frequency and continuous character |
|---|
| 46 | data (Felsenstein, 1973b, 1981c, 1985c). |
|---|
| 47 | <P> |
|---|
| 48 | The input formats are fairly similar to the discrete-character |
|---|
| 49 | programs, but with one difference. When CONTML is used in the gene-frequency |
|---|
| 50 | mode (its usual, default mode), or when GENDIST is used, |
|---|
| 51 | the first line contains the number of species (or |
|---|
| 52 | populations) and the number of loci and the options information. |
|---|
| 53 | There then follows a line which |
|---|
| 54 | gives the numbers of alleles at each locus, in order. This must be |
|---|
| 55 | the full number of alleles, not the number of alleles which will be input: |
|---|
| 56 | i. e. for a two-allele locus the number should be 2, not 1. There |
|---|
| 57 | then follow the species (population) data, each species beginning |
|---|
| 58 | on a new line. The first 10 characters are taken as the name, and |
|---|
| 59 | thereafter the values of the individual characters are read free-format, |
|---|
| 60 | preceded and separated by blanks. They can go to a new line if desired, |
|---|
| 61 | though of course not in the middle of a number. Missing data is not |
|---|
| 62 | allowed - an important limitation. In the default configuration, for |
|---|
| 63 | each locus, the numbers should be |
|---|
| 64 | the frequencies of all but one allele. The menu option A (All) signals that |
|---|
| 65 | the frequencies of all alleles are provided in the input data -- the |
|---|
| 66 | program will then automatically ignore the last of them. So without the |
|---|
| 67 | A option, for a |
|---|
| 68 | three-allele locus there should be two numbers, the frequencies of |
|---|
| 69 | two of the alleles (and of course it must always be the same |
|---|
| 70 | two!). Here is a typical data set without the A option: |
|---|
| 71 | <P> |
|---|
| 72 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 73 | <PRE> |
|---|
| 74 | 5 3 |
|---|
| 75 | 2 3 2 |
|---|
| 76 | Alpha 0.90 0.80 0.10 0.56 |
|---|
| 77 | Beta 0.72 0.54 0.30 0.20 |
|---|
| 78 | Gamma 0.38 0.10 0.05 0.98 |
|---|
| 79 | Delta 0.42 0.40 0.43 0.97 |
|---|
| 80 | Epsilon 0.10 0.30 0.70 0.62 |
|---|
| 81 | </PRE> |
|---|
| 82 | </TD></TR></TABLE> |
|---|
| 83 | <P> |
|---|
| 84 | whereas here is what it would have to look like if the A option were |
|---|
| 85 | invoked: |
|---|
| 86 | <P> |
|---|
| 87 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 88 | <PRE> |
|---|
| 89 | 5 3 |
|---|
| 90 | 2 3 2 |
|---|
| 91 | Alpha 0.90 0.10 0.80 0.10 0.10 0.56 0.44 |
|---|
| 92 | Beta 0.72 0.28 0.54 0.30 0.16 0.20 0.80 |
|---|
| 93 | Gamma 0.38 0.62 0.10 0.05 0.85 0.98 0.02 |
|---|
| 94 | Delta 0.42 0.58 0.40 0.43 0.17 0.97 0.03 |
|---|
| 95 | Epsilon 0.10 0.90 0.30 0.70 0.00 0.62 0.38 |
|---|
| 96 | </PRE> |
|---|
| 97 | </TD></TR></TABLE> |
|---|
| 98 | <P> |
|---|
| 99 | The first line has the number of species (or populations) and the number |
|---|
| 100 | of loci. The second line has the number of alleles for each of the 3 loci. |
|---|
| 101 | The species lines have names (filled out to 10 characters with blanks) |
|---|
| 102 | followed by the gene frequencies of the 2 alleles for the first locus, the |
|---|
| 103 | 3 alleles for the second locus, and the 2 alleles for the third locus. |
|---|
| 104 | You can start a new line after any of these allele frequencies, and |
|---|
| 105 | continue to give the frequencies on that line (without repeating the |
|---|
| 106 | species name). |
|---|
| 107 | <P> |
|---|
| 108 | If all alleles of a locus are given, it is important to have them add up |
|---|
| 109 | to 1. Roundoff of the frequencies may cause the program to conclude that |
|---|
| 110 | the numbers do not sum to 1, and stop with an error message. |
|---|
| 111 | <P> |
|---|
| 112 | While many compilers may be more tolerant, it is probably wise to |
|---|
| 113 | make sure that each number, including the first, is preceded by a blank, |
|---|
| 114 | and that there are digits both preceding and following any decimal |
|---|
| 115 | points. |
|---|
| 116 | <P> |
|---|
| 117 | CONTML and CONTRAST also treat quantitative characters (the |
|---|
| 118 | continuous-characters mode in CONTML, which is option C). It is assumed |
|---|
| 119 | that each character is evolving according to a Brownian motion model, at the |
|---|
| 120 | same rate, and independently. In |
|---|
| 121 | reality it is almost always impossible to guarantee this. The issue is |
|---|
| 122 | discussed at length |
|---|
| 123 | in my review article in Annual Review of Ecology and Systematics (Felsenstein, |
|---|
| 124 | 1988a), where I point out the difficulty of transforming the characters so |
|---|
| 125 | that they are not only genetically independent but have independent selection |
|---|
| 126 | acting on them. If you are going to use CONTML to model evolution of |
|---|
| 127 | continuous characters, then you should at least make some attempt to remove |
|---|
| 128 | genetic correlations between the characters (usually all one can do is remove |
|---|
| 129 | phenotypic correlations by transforming the characters so that there is no |
|---|
| 130 | within-population covariance and so that the within-population |
|---|
| 131 | variances of the characters are equal -- this is equivalent to using |
|---|
| 132 | Canonical Variates). However, this will only guarantee that one has |
|---|
| 133 | removed phenotypic covariances between characters. Genetic covariances |
|---|
| 134 | could only be removed by knowing the coheritabilities of the characters, |
|---|
| 135 | which would require genetic experiments, and selective covariances |
|---|
| 136 | (covariances due to covariation of selection pressures) would require |
|---|
| 137 | knowledge of the sources and extent of selection pressure in all variables. |
|---|
| 138 | <P> |
|---|
| 139 | CONTRAST is a program designed to infer, for a given phylogeny that is |
|---|
| 140 | provided to the program, the covariation between characters in a data |
|---|
| 141 | set. Thus we have a program in this set that allow us to take information |
|---|
| 142 | about the covariation and rates of evolution of characters and make an |
|---|
| 143 | estimate of the phylogeny (CONTML), and a program that takes an estimate of the |
|---|
| 144 | phylogeny and infers the variances and covariances of the character |
|---|
| 145 | changes. But we have no program that infers both the phylogenies and |
|---|
| 146 | the character covariation from the same data set. |
|---|
| 147 | <P> |
|---|
| 148 | In the quantitative characters mode, a typical small data set would be: |
|---|
| 149 | <P> |
|---|
| 150 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 151 | <PRE> |
|---|
| 152 | 5 6 |
|---|
| 153 | Alpha 0.345 0.467 1.213 2.2 -1.2 1.0 |
|---|
| 154 | Beta 0.457 0.444 1.1 1.987 -0.2 2.678 |
|---|
| 155 | Gamma 0.6 0.12 0.97 2.3 -0.11 1.54 |
|---|
| 156 | Delta 0.68 0.203 0.888 2.0 1.67 |
|---|
| 157 | Epsilon 0.297 0.22 0.90 1.9 1.74 |
|---|
| 158 | </PRE> |
|---|
| 159 | </TD></TR></TABLE> |
|---|
| 160 | <P> |
|---|
| 161 | Note that in the latter case, there is no line giving the numbers |
|---|
| 162 | of alleles at each locus. In this latter case no square-root |
|---|
| 163 | transformation of the coordinates is done: each is assumed to give |
|---|
| 164 | directly the position on the Brownian motion scale. |
|---|
| 165 | <P> |
|---|
| 166 | For further discussion of options and modifiable constants in CONTML, |
|---|
| 167 | GENDIST, and CONTRAST see the documentation files for those programs. |
|---|
| 168 | </BODY> |
|---|
| 169 | </HTML> |
|---|