1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> |
---|
2 | <HTML> |
---|
3 | <HEAD> |
---|
4 | <TITLE>contchar</TITLE> |
---|
5 | <META NAME="description" CONTENT="contchar"> |
---|
6 | <META NAME="keywords" CONTENT="contchar"> |
---|
7 | <META NAME="resource-type" CONTENT="document"> |
---|
8 | <META NAME="distribution" CONTENT="global"> |
---|
9 | <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> |
---|
10 | </HEAD> |
---|
11 | <BODY BGCOLOR="#ccffff"> |
---|
12 | <DIV ALIGN=RIGHT> |
---|
13 | version 3.6 |
---|
14 | </DIV> |
---|
15 | <P> |
---|
16 | <DIV ALIGN=CENTER> |
---|
17 | <H1>Gene Frequencies and Continuous Character Data Programs</H1> |
---|
18 | </DIV> |
---|
19 | <P> |
---|
20 | © Copyright 1986-2000 by the University of |
---|
21 | Washington. Written by Joseph Felsenstein. Permission is granted to copy |
---|
22 | this document provided that no fee is charged for it and that this copyright |
---|
23 | notice is not removed. |
---|
24 | <P> |
---|
25 | The programs in this group |
---|
26 | use gene frequencies and quantitative character values. One (CONTML) |
---|
27 | constructs maximum likelihood estimates of the phylogeny, another |
---|
28 | (GENDIST) computes genetic distances for use in the distance matrix |
---|
29 | programs, and the third (CONTRAST) examines correlation of traits as |
---|
30 | they evolve along a given phylogeny. |
---|
31 | <P> |
---|
32 | When the gene frequencies data are used in CONTML or GENDIST, this |
---|
33 | involves the following assumptions: |
---|
34 | <P> |
---|
35 | <OL> |
---|
36 | <LI>Different lineages evolve independently. |
---|
37 | <LI>After two lineages split, their characters change |
---|
38 | independently. |
---|
39 | <LI>Each gene frequency changes by genetic drift, with or without mutation |
---|
40 | (this varies from method to method). |
---|
41 | <LI>Different loci or characters drift independently. |
---|
42 | </OL> |
---|
43 | <P> |
---|
44 | How these assumptions affect the methods will be seen in my papers on |
---|
45 | inference of phylogenies from gene frequency and continuous character |
---|
46 | data (Felsenstein, 1973b, 1981c, 1985c). |
---|
47 | <P> |
---|
48 | The input formats are fairly similar to the discrete-character |
---|
49 | programs, but with one difference. When CONTML is used in the gene-frequency |
---|
50 | mode (its usual, default mode), or when GENDIST is used, |
---|
51 | the first line contains the number of species (or |
---|
52 | populations) and the number of loci and the options information. |
---|
53 | There then follows a line which |
---|
54 | gives the numbers of alleles at each locus, in order. This must be |
---|
55 | the full number of alleles, not the number of alleles which will be input: |
---|
56 | i. e. for a two-allele locus the number should be 2, not 1. There |
---|
57 | then follow the species (population) data, each species beginning |
---|
58 | on a new line. The first 10 characters are taken as the name, and |
---|
59 | thereafter the values of the individual characters are read free-format, |
---|
60 | preceded and separated by blanks. They can go to a new line if desired, |
---|
61 | though of course not in the middle of a number. Missing data is not |
---|
62 | allowed - an important limitation. In the default configuration, for |
---|
63 | each locus, the numbers should be |
---|
64 | the frequencies of all but one allele. The menu option A (All) signals that |
---|
65 | the frequencies of all alleles are provided in the input data -- the |
---|
66 | program will then automatically ignore the last of them. So without the |
---|
67 | A option, for a |
---|
68 | three-allele locus there should be two numbers, the frequencies of |
---|
69 | two of the alleles (and of course it must always be the same |
---|
70 | two!). Here is a typical data set without the A option: |
---|
71 | <P> |
---|
72 | <TABLE><TR><TD BGCOLOR=white> |
---|
73 | <PRE> |
---|
74 | 5 3 |
---|
75 | 2 3 2 |
---|
76 | Alpha 0.90 0.80 0.10 0.56 |
---|
77 | Beta 0.72 0.54 0.30 0.20 |
---|
78 | Gamma 0.38 0.10 0.05 0.98 |
---|
79 | Delta 0.42 0.40 0.43 0.97 |
---|
80 | Epsilon 0.10 0.30 0.70 0.62 |
---|
81 | </PRE> |
---|
82 | </TD></TR></TABLE> |
---|
83 | <P> |
---|
84 | whereas here is what it would have to look like if the A option were |
---|
85 | invoked: |
---|
86 | <P> |
---|
87 | <TABLE><TR><TD BGCOLOR=white> |
---|
88 | <PRE> |
---|
89 | 5 3 |
---|
90 | 2 3 2 |
---|
91 | Alpha 0.90 0.10 0.80 0.10 0.10 0.56 0.44 |
---|
92 | Beta 0.72 0.28 0.54 0.30 0.16 0.20 0.80 |
---|
93 | Gamma 0.38 0.62 0.10 0.05 0.85 0.98 0.02 |
---|
94 | Delta 0.42 0.58 0.40 0.43 0.17 0.97 0.03 |
---|
95 | Epsilon 0.10 0.90 0.30 0.70 0.00 0.62 0.38 |
---|
96 | </PRE> |
---|
97 | </TD></TR></TABLE> |
---|
98 | <P> |
---|
99 | The first line has the number of species (or populations) and the number |
---|
100 | of loci. The second line has the number of alleles for each of the 3 loci. |
---|
101 | The species lines have names (filled out to 10 characters with blanks) |
---|
102 | followed by the gene frequencies of the 2 alleles for the first locus, the |
---|
103 | 3 alleles for the second locus, and the 2 alleles for the third locus. |
---|
104 | You can start a new line after any of these allele frequencies, and |
---|
105 | continue to give the frequencies on that line (without repeating the |
---|
106 | species name). |
---|
107 | <P> |
---|
108 | If all alleles of a locus are given, it is important to have them add up |
---|
109 | to 1. Roundoff of the frequencies may cause the program to conclude that |
---|
110 | the numbers do not sum to 1, and stop with an error message. |
---|
111 | <P> |
---|
112 | While many compilers may be more tolerant, it is probably wise to |
---|
113 | make sure that each number, including the first, is preceded by a blank, |
---|
114 | and that there are digits both preceding and following any decimal |
---|
115 | points. |
---|
116 | <P> |
---|
117 | CONTML and CONTRAST also treat quantitative characters (the |
---|
118 | continuous-characters mode in CONTML, which is option C). It is assumed |
---|
119 | that each character is evolving according to a Brownian motion model, at the |
---|
120 | same rate, and independently. In |
---|
121 | reality it is almost always impossible to guarantee this. The issue is |
---|
122 | discussed at length |
---|
123 | in my review article in Annual Review of Ecology and Systematics (Felsenstein, |
---|
124 | 1988a), where I point out the difficulty of transforming the characters so |
---|
125 | that they are not only genetically independent but have independent selection |
---|
126 | acting on them. If you are going to use CONTML to model evolution of |
---|
127 | continuous characters, then you should at least make some attempt to remove |
---|
128 | genetic correlations between the characters (usually all one can do is remove |
---|
129 | phenotypic correlations by transforming the characters so that there is no |
---|
130 | within-population covariance and so that the within-population |
---|
131 | variances of the characters are equal -- this is equivalent to using |
---|
132 | Canonical Variates). However, this will only guarantee that one has |
---|
133 | removed phenotypic covariances between characters. Genetic covariances |
---|
134 | could only be removed by knowing the coheritabilities of the characters, |
---|
135 | which would require genetic experiments, and selective covariances |
---|
136 | (covariances due to covariation of selection pressures) would require |
---|
137 | knowledge of the sources and extent of selection pressure in all variables. |
---|
138 | <P> |
---|
139 | CONTRAST is a program designed to infer, for a given phylogeny that is |
---|
140 | provided to the program, the covariation between characters in a data |
---|
141 | set. Thus we have a program in this set that allow us to take information |
---|
142 | about the covariation and rates of evolution of characters and make an |
---|
143 | estimate of the phylogeny (CONTML), and a program that takes an estimate of the |
---|
144 | phylogeny and infers the variances and covariances of the character |
---|
145 | changes. But we have no program that infers both the phylogenies and |
---|
146 | the character covariation from the same data set. |
---|
147 | <P> |
---|
148 | In the quantitative characters mode, a typical small data set would be: |
---|
149 | <P> |
---|
150 | <TABLE><TR><TD BGCOLOR=white> |
---|
151 | <PRE> |
---|
152 | 5 6 |
---|
153 | Alpha 0.345 0.467 1.213 2.2 -1.2 1.0 |
---|
154 | Beta 0.457 0.444 1.1 1.987 -0.2 2.678 |
---|
155 | Gamma 0.6 0.12 0.97 2.3 -0.11 1.54 |
---|
156 | Delta 0.68 0.203 0.888 2.0 1.67 |
---|
157 | Epsilon 0.297 0.22 0.90 1.9 1.74 |
---|
158 | </PRE> |
---|
159 | </TD></TR></TABLE> |
---|
160 | <P> |
---|
161 | Note that in the latter case, there is no line giving the numbers |
---|
162 | of alleles at each locus. In this latter case no square-root |
---|
163 | transformation of the coordinates is done: each is assumed to give |
---|
164 | directly the position on the Brownian motion scale. |
---|
165 | <P> |
---|
166 | For further discussion of options and modifiable constants in CONTML, |
---|
167 | GENDIST, and CONTRAST see the documentation files for those programs. |
---|
168 | </BODY> |
---|
169 | </HTML> |
---|