source: tags/arb-6.0/GDE/PHYLIP/doc/treedist.html

Last change on this file was 2176, checked in by westram, 21 years ago

* empty log message *

  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
File size: 11.8 KB
Line 
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
2<HTML>
3<HEAD>
4<TITLE>treedist</TITLE>
5<META NAME="description" CONTENT="treedist">
6<META NAME="keywords" CONTENT="treedist">
7<META NAME="resource-type" CONTENT="document">
8<META NAME="distribution" CONTENT="global">
9<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
10</HEAD>
11<BODY BGCOLOR="#ccffff">
12<DIV ALIGN=RIGHT>
13version 3.6
14</DIV>
15<P>
16<DIV ALIGN=CENTER>
17<H1>TREEDIST -- distances between trees</H1>
18</DIV>
19<P>
20&#169; Copyright 2002 by The University of
21Washington.  Written by Joseph Felsenstein.  Permission is granted to copy
22this document provided that no fee is charged for it and that this copyright
23notice is not removed.
24<P>
25This program computes distances between trees.  The distance that is
26computed is the Symmetric Distance of Robinson and Foulds (1981).  This
27does not use branch length information, only the tree topologies.  It
28must also be borne in mind that the distance does not have any immediate
29statistical interpretation -- we cannot say whether a larger distance is
30significantly larger than a smaller one.
31<P>
32The Symmetric Distance is computed by considering each of the branches of
33the two trees.  Each branch divides the set of species into two groups --
34the ones connected to one end of the branch and the ones connected to the
35other.  This makes a partition of the full set of species.  (in Newick notation)
36<PRE>
37  ((A,C),(D,(B,E)))
38</PRE>
39has two internal branches.  One induces the partition {A,&nbsp;C&nbsp;&nbsp;|&nbsp;&nbsp;B,&nbsp;D,&nbsp;E}
40and the other induces the partition {A,&nbsp;C,&nbsp;D&nbsp;&nbsp;|&nbsp;&nbsp;B,&nbsp;E}.  A different tree
41with the same set of species,
42<PRE>
43  (((A,D),C),(B,E)))
44</PRE>
45has internal branches that correspond to the two partitions  {A,&nbsp;C,&nbsp;D&nbsp;&nbsp;|&nbsp;&nbsp;B,&nbsp;E}
46and {A,&nbsp;D&nbsp;&nbsp;|&nbsp;&nbsp;B,&nbsp;C,&nbsp;E}.  Note that the other branches, all of which are
47external branches, induce partitions that separate one species from all the
48others.  Thus there are 5 partitions like this: {C&nbsp;&nbsp;|&nbsp;&nbsp;A,&nbsp;B,&nbsp;D,&nbsp;E} on each
49of these trees.  These are always present on all trees, provided that each
50tree has each species at the end of its own branch.
51<P>
52The Symmetric Distance is simply a count of how many partitions there are,
53among the two trees, that are on one tree and not on the other.  In the
54example above there are two partitions, {A,&nbsp;C&nbsp;&nbsp;|&nbsp;&nbsp;B,&nbsp;D,&nbsp;E} and {A,&nbsp;D&nbsp;&nbsp;|&nbsp;&nbsp;B,&nbsp;C,&nbsp;E},
55each of which is present on only one of the two trees.  The Symmetric
56Distance between the two trees is therefore 2.   When the two trees are
57fully resolved bifurcating trees, their symmetric distance must be an even
58number; it can range from 0 to twice the number of internal branches, which
59for <I>n</I> species is 4n-6.
60<P>
61We have assumed that nothing is lost if the trees are treated as unrooted trees.
62It is easy to define a counterpart to the Symmetric Distance for rooted trees.
63each branch then defines a set of species, namely the clade defined by that
64branch.  Thus if the first of the two trees above were considered as a rooted
65tree it would define the three clades {A,&nbsp;C}, {B,&nbsp;D,&nbsp;E}, and {B,&nbsp;E}.  The
66symmetric distance between two rooted trees is simply the count of the number
67of clades that are defined by one but not by the other.  For the second tree
68the clades would be {A,&nbsp;D}, {B,&nbsp;C,&nbsp;E}, and {B,&nbsp;E}.  The Symmetric Distance
69between thee two rooted trees would then be 4.
70<P>
71Although the examples we have discussed have involved fully
72bifurcating trees, the input trees can have multifurcations.
73This can lead to distances that are odd numbers.
74<P>
75<H2>INPUT AND OPTIONS</H2>
76<P>
77The program reads one or two input tree files.  If there is one input tree
78file, its default name is <TT>intree</TT>.  If there are two their default
79names are <TT>intree</TT> and <TT>intree2</TT>.  The tree files may either
80have the number of trees on their first line, or not.  If the number of
81trees is given, it is actually ignored and all trees in the tree file
82are considered, even if there are more trees than indicated by the number.
83(This is a bug and it will be fixed in the future).
84<P>
85The options are selected from a menu, which looks like this:
86<P>
87<TABLE><TR><TD BGCOLOR=white>
88<PRE>
89
90Tree distance program, version 3.6a3
91
92Settings for this run:
93 O                         Outgroup root:  No, use as outgroup species  1
94 R         Trees to be treated as Rooted:  No
95 T    Terminal type (IBM PC, ANSI, none):  (none)
96 1  Print indications of progress of run:  Yes
97 2                 Tree distance submenu:  Distance between adjacent pairs
98
99Are these settings correct? (type Y or the letter for one to change)
100
101</PRE>
102</TD></TR></TABLE>
103<P>
104The O option allows you to root the trees using an outgroup.  It is specified
105by giving its number, where the species are numbered in the order they
106appear in the first tree.  Outgroup-rooting all the trees does not
107affect the unrooted Symmetric Distance, and if it is done and trees are
108treated as rooted, the distances turn out to be the same as the unrooted
109ones.  Thus it is unlikely that you will find this option of interest.
110<P>
111The R option controls whether the Summetric Distance that is computed is
112to treat the trees as unrooted or rooted.  Unrooted is the default.
113<P>
114The terminal type (0) and progress (1) options do not need description here.
115<P>
116Option 2 controls how many tree files are read in, which trees are to
117be compared, and how the output is to be presented.  It causes
118another menu to appear:
119<P>
120<TABLE><TR><TD BGCOLOR=white>
121<PRE>
122Tree Pairing Submenu:
123 A     Distances between adjacent pairs in tree file.
124 P     Distances between all possible pairs in tree file.
125 C     Distances between corresponding pairs in one tree file and another.
126 L     Distances between all pairs in one tree file and another.
127</PRE>
128</TD></TR></TABLE>
129<P>
130Option A computes the distances between successive pairs of trees in the
131tree input file -- between trees 1 and 2, trees 3 and 4, trees
1325 and 6, and so on.  If there are an odd number of trees in the input tree
133file the last tree will be ignored and a warning message printed to
134remind the user that nothing was done with it.
135<P>
136Option P computes distances between all pairs of trees in the input tree
137file.  Thus with 10 trees 10 x 10 = 100 distances will be computed,
138including distances between each tree and itself.
139<P>
140Option C takes input from two tree files and cmputes distances between
141corresponding members of the two tree files.  Thus distances will be
142computed between tree 1 of the first tree file and tree 1 of the second one,
143between tree 2 of the first file and tree 2 of the second one, and so on.
144If the number of trees in the two files differs, the extra trees in the
145file that has more of them are ignored and a warning is printed out.
146<P>
147Option L computes distances between all pairs of trees, where one tree is
148taken from one tree file and the other from the other tree file.  Thus if
149the first tree file has 7 trees and the second has 5 trees, 7 x 5 = 35
150different distances will be computed. <B> Note -- this option seems not
151to work at the moment.  We hope to fix this soon.</B>
152<P>
153If option 2 is not selected, the program defaults to looking at one tree
154file and computing distances of adjacent pairs (so that option A is
155the default).
156<P>
157<H2>OUTPUT</H2>
158<P>
159The results of the analysis are written onto an output file whose
160default file name is <TT>outfile</TT>.
161<P>
162If any of the four types of analysis are selected, the program asks the
163user how they want the results presented.  Here is that menu for options
164P or L:
165<P>
166<TABLE><TR><TD BGCOLOR=white>
167<PRE>
168
169Distances output options:
170 F     Full matrix.
171 V     One pair per line, verbose.
172 S     One pair per line, sparse.
173
174 Choose one: (F,V,S)
175</PRE>
176</TD></TR></TABLE>
177<P>
178The Full matrix (choice F) is a table showing all distances.  It is
179written onto the output file.  The table is presented as groups of
18010 columns.  Here is the Full matrix for the 12 trees in the input
181tree file which is given as an example at the end of this page.
182<P>
183<TABLE><TR><TD BGCOLOR=white>
184<PRE>
185
186Tree distance program, version 3.6
187
188Symmetric differences between all pairs of trees in tree file:
189
190          1     2     3     4     5     6     7     8     9    10   
191      \------------------------------------------------------------
192    1 |   0     4     2    10    10    10    10    10    10    10 
193    2 |   4     0     2    10     8    10     8    10     8    10 
194    3 |   2     2     0    10    10    10    10    10    10    10 
195    4 |  10    10    10     0     2     2     4     2     4     0 
196    5 |  10     8    10     2     0     4     2     4     2     2 
197    6 |  10    10    10     2     4     0     2     2     4     2 
198    7 |  10     8    10     4     2     2     0     4     2     4 
199    8 |  10    10    10     2     4     2     4     0     2     2 
200    9 |  10     8    10     4     2     4     2     2     0     4 
201   10 |  10    10    10     0     2     2     4     2     4     0 
202   11 |   2     2     0    10    10    10    10    10    10    10 
203   12 |  10    10    10     2     4     2     4     0     2     2 
204
205
206         11    12   
207      \------------
208    1 |   2    10 
209    2 |   2    10 
210    3 |   0    10 
211    4 |  10     2 
212    5 |  10     4 
213    6 |  10     2 
214    7 |  10     4 
215    8 |  10     0 
216    9 |  10     2 
217   10 |  10     2 
218   11 |   0    10 
219   12 |  10     0 
220
221
222</PRE>
223</TD></TR></TABLE>
224<P>
225The Full matrix is only available for analyses P and L (not for A or C).
226<P>
227Option V (Verbose) writes one distance per line.  The Verbose
228output is the default.  Here it is for the example data set given below:
229<P>
230<TABLE><TR><TD BGCOLOR=white>
231<PRE>
232
233Tree distance program, version 3.6a3
234
235Symmetric differences between adjacent pairs of trees:
236
237Trees 1 and 2:    4
238Trees 3 and 4:    10
239Trees 5 and 6:    4
240Trees 7 and 8:    4
241Trees 9 and 10:    4
242Trees 11 and 12:    10
243
244</PRE>
245</TD></TR></TABLE>
246<P>
247Option S (Sparse or terse) is similar except that all that is
248given on each line are the numbers of the two trees and the distance,
249separated by blanks.  This may be a convenient format if you want to
250write a program to read these numbers in, and you want to spare yourself
251the effort of having the program wade through the words on each line
252in the Verbose output.
253The first four lines of the Sparse output are titles that your program would
254want to skip past.  Here is the Sparse output for the example trees.
255<P>
256<TABLE><TR><TD BGCOLOR=white>
257<PRE>
258
259Tree distance program, version 3.6
260
261Symmetric differences between adjacent pairs of trees:
262
2631 2 4
2643 4 10
2655 6 4
2667 8 4
2679 10 4
26811 12 10
269</PRE>
270</TD></TR></TABLE>
271<P>
272<H2>CREDITS AND FUTURE</H2>
273<P>
274TREEDIST was written by Dan Fineman.  In the future we hope to expand it
275to consider a distance based on branch lengths as well as tree topologies.
276The Branch Score distance defined by Kuhner and Felsenstein (1994) is
277the one we have in mind (the Branch Score defined by them is actually
278the square of the distance).  We also hope to compute a distance based on
279quartets shared and not shared by trees (implicit in the work of Estabrook, McMorris, and
280Meacham, 1985).
281<P>
282<HR>
283<P>
284<H3>TEST DATA SET</H3>
285<P>
286<TABLE><TR><TD BGCOLOR=white>
287<PRE>
288(A,(B,(H,(D,(J,(((G,E),(F,I)),C))))));
289(A,(B,(D,((J,H),(((G,E),(F,I)),C)))));
290(A,(B,(D,(H,(J,(((G,E),(F,I)),C))))));
291(A,(B,(E,(G,((F,I),((J,(H,D)),C))))));
292(A,(B,(E,(G,((F,I),(((J,H),D),C))))));
293(A,(B,(E,((F,I),(G,((J,(H,D)),C))))));
294(A,(B,(E,((F,I),(G,(((J,H),D),C))))));
295(A,(B,(E,((G,(F,I)),((J,(H,D)),C)))));
296(A,(B,(E,((G,(F,I)),(((J,H),D),C)))));
297(A,(B,(E,(G,((F,I),((J,(H,D)),C))))));
298(A,(B,(D,(H,(J,(((G,E),(F,I)),C))))));
299(A,(B,(E,((G,(F,I)),((J,(H,D)),C)))));
300</PRE>
301</TD></TR></TABLE>
302<P>
303The output from default settings for this test set is given above (it is the
304Verbose output example).
305</BODY>
306</HTML>
Note: See TracBrowser for help on using the repository browser.