source: branches/profile/GDE/PHYLIP/doc/contchar.html

Last change on this file was 2176, checked in by westram, 20 years ago

* empty log message *

  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
File size: 7.3 KB
Line 
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
2<HTML>
3<HEAD>
4<TITLE>contchar</TITLE>
5<META NAME="description" CONTENT="contchar">
6<META NAME="keywords" CONTENT="contchar">
7<META NAME="resource-type" CONTENT="document">
8<META NAME="distribution" CONTENT="global">
9<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
10</HEAD>
11<BODY BGCOLOR="#ccffff">
12<DIV ALIGN=RIGHT>
13version 3.6
14</DIV>
15<P>
16<DIV ALIGN=CENTER>
17<H1>Gene Frequencies and Continuous Character Data Programs</H1>
18</DIV>
19<P>
20&#169; Copyright 1986-2000 by the University of
21Washington.  Written by Joseph Felsenstein.  Permission is granted to copy
22this document provided that no fee is charged for it and that this copyright
23notice is not removed.
24<P>
25The programs in this group
26use gene frequencies and quantitative character values.  One (CONTML)
27constructs maximum likelihood estimates of the phylogeny, another
28(GENDIST) computes genetic distances for use in the distance matrix
29programs, and the third (CONTRAST) examines correlation of traits as
30they evolve along a given phylogeny.
31<P>
32When the gene frequencies data are used in CONTML or GENDIST, this
33involves the following assumptions:
34<P>
35<OL>
36<LI>Different lineages evolve independently.
37<LI>After two lineages split, their characters change
38independently.
39<LI>Each gene frequency changes by genetic drift, with or without mutation
40(this varies from method to method).
41<LI>Different loci or characters drift independently.
42</OL> 
43<P>
44How these assumptions affect the methods will be seen in my papers on
45inference of phylogenies from gene frequency and continuous character
46data (Felsenstein, 1973b, 1981c, 1985c).
47<P>
48The input formats are fairly similar to the discrete-character
49programs, but with one difference.  When CONTML is used in the gene-frequency
50mode (its usual, default mode), or when GENDIST is used,
51the first line contains the number of species (or
52populations) and the number of loci and the options information.
53There then follows a line which
54gives the numbers of alleles at each locus, in order.  This must be
55the full number of alleles, not the number of alleles which will be input:
56i. e. for a two-allele locus the number should be 2, not 1.  There
57then follow the species (population) data, each species beginning
58on a new line.  The first 10 characters are taken as the name, and
59thereafter the values of the individual characters are read free-format,
60preceded and separated by blanks.  They can go to a new line if desired,
61though of course not in the middle of a number.  Missing data is not
62allowed - an important limitation.  In the default configuration, for
63each locus, the numbers should be
64the frequencies of all but one allele.  The menu option A (All) signals that
65the frequencies of all alleles are provided in the input data -- the
66program will then automatically ignore the last of them.  So without the
67A option, for a
68three-allele locus there should be two numbers, the frequencies of
69two of the alleles (and of course it must always be the same
70two!).  Here is a typical data set without the A option:
71<P>
72<TABLE><TR><TD BGCOLOR=white>
73<PRE>
74     5    3
752 3 2
76Alpha      0.90 0.80 0.10 0.56
77Beta       0.72 0.54 0.30 0.20
78Gamma      0.38 0.10 0.05  0.98
79Delta      0.42 0.40 0.43 0.97
80Epsilon    0.10 0.30 0.70 0.62
81</PRE>
82</TD></TR></TABLE>
83<P>
84whereas here is what it would have to look like if the A option were
85invoked:
86<P>
87<TABLE><TR><TD BGCOLOR=white>
88<PRE>
89     5    3
902 3 2
91Alpha      0.90 0.10 0.80 0.10 0.10 0.56 0.44
92Beta       0.72 0.28 0.54 0.30 0.16 0.20 0.80
93Gamma      0.38 0.62 0.10 0.05 0.85  0.98 0.02
94Delta      0.42 0.58 0.40 0.43 0.17 0.97 0.03
95Epsilon    0.10 0.90 0.30 0.70 0.00 0.62 0.38
96</PRE>
97</TD></TR></TABLE>
98<P>
99The first line has the number of species (or populations) and the number
100of loci.  The second line has the number of alleles for each of the 3 loci.
101The species lines have names (filled out to 10 characters with blanks)
102followed by the gene frequencies of the 2 alleles for the first locus, the
1033 alleles for the second locus, and the 2 alleles for the third locus.
104You can start a new line after any of these allele frequencies, and
105continue to give the frequencies on that line (without repeating the
106species name).
107<P>
108If all alleles of a locus are given, it is important to have them add up
109to 1.  Roundoff of the frequencies may cause the program to conclude that
110the numbers do not sum to 1, and stop with an error message.
111<P>
112While many compilers may be more tolerant, it is probably wise to
113make sure that each number, including the first, is preceded by a blank,
114and that there are digits both preceding and following any decimal
115points.
116<P>
117CONTML and CONTRAST also treat quantitative characters (the
118continuous-characters mode in CONTML,  which is option C).  It is assumed
119that each character is evolving according to a Brownian motion model, at the
120same rate, and independently.  In
121reality it is almost always impossible to guarantee this.  The issue is
122discussed at length
123in my review article in Annual Review of Ecology and Systematics (Felsenstein,
1241988a), where I point out the difficulty of transforming the characters so
125that they are not only genetically independent but have independent selection
126acting on them.  If you are going to use CONTML to model evolution of
127continuous characters, then you should at least make some attempt to remove
128genetic correlations between the characters (usually all one can do is remove
129phenotypic correlations by transforming the characters so that there is no
130within-population covariance and so that the within-population
131variances of the characters are equal -- this is equivalent to using
132Canonical Variates).  However, this will only guarantee that one has
133removed phenotypic covariances between characters.  Genetic covariances
134could only be removed by knowing the coheritabilities of the characters,
135which would require genetic experiments, and selective covariances
136(covariances due to covariation of selection pressures) would require
137knowledge of the sources and extent of selection pressure in all variables.
138<P>
139CONTRAST is a program designed to infer, for a given phylogeny that is
140provided to the program, the covariation between characters in a data
141set.  Thus we have a program in this set that allow us to take information
142about the covariation and rates of evolution of characters and make an
143estimate of the phylogeny (CONTML), and a program that takes an estimate of the
144phylogeny and infers the variances and covariances of the character
145changes.  But we have no program that infers both the phylogenies and
146the character covariation from the same data set.
147<P>
148In the quantitative characters mode, a typical small data set would be:
149<P>
150<TABLE><TR><TD BGCOLOR=white>
151<PRE>
152     5   6
153Alpha      0.345 0.467 1.213  2.2  -1.2 1.0
154Beta       0.457 0.444 1.1    1.987 -0.2 2.678
155Gamma      0.6 0.12 0.97 2.3  -0.11 1.54
156Delta      0.68  0.203 0.888 2.0  1.67
157Epsilon    0.297  0.22 0.90 1.9 1.74
158</PRE>
159</TD></TR></TABLE>
160<P>
161Note that in the latter case, there is no line giving the numbers
162of alleles at each locus.  In this latter case no square-root
163transformation of the coordinates is done: each is assumed to give
164directly the position on the Brownian motion scale.
165<P>
166For further discussion of options and modifiable constants in CONTML,
167GENDIST, and CONTRAST see the documentation files for those programs. 
168</BODY>
169</HTML>
Note: See TracBrowser for help on using the repository browser.