1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> |
---|
2 | <HTML> |
---|
3 | <HEAD> |
---|
4 | <TITLE>clique</TITLE> |
---|
5 | <META NAME="description" CONTENT="clique"> |
---|
6 | <META NAME="keywords" CONTENT="clique"> |
---|
7 | <META NAME="resource-type" CONTENT="document"> |
---|
8 | <META NAME="distribution" CONTENT="global"> |
---|
9 | <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> |
---|
10 | </HEAD> |
---|
11 | <BODY BGCOLOR="#ccffff"> |
---|
12 | <DIV ALIGN=RIGHT> |
---|
13 | version 3.6 |
---|
14 | </DIV> |
---|
15 | <P> |
---|
16 | <DIV ALIGN=CENTER> |
---|
17 | <H1>CLIQUE -- Compatibility Program</H1> |
---|
18 | </DIV> |
---|
19 | <P> |
---|
20 | © Copyright 1986-2002 by the University of |
---|
21 | Washington. Written by Joseph Felsenstein. Permission is granted to copy |
---|
22 | this document provided that no fee is charged for it and that this copyright |
---|
23 | notice is not removed. |
---|
24 | <P> |
---|
25 | <TABLE><TR><TD BGCOLOR=white> |
---|
26 | <EM><B>Note:</B> Clique is an Old Style program. This means that it takes |
---|
27 | some of its options information, notably the |
---|
28 | Weights, Ancestral states and Factors |
---|
29 | options from the input file rather than from separate files of their own |
---|
30 | as the New Style programs in this version of PHYLIP do. |
---|
31 | </EM> |
---|
32 | </TD></TR></TABLE> |
---|
33 | <P> |
---|
34 | This program uses the compatibility method for unrooted two-state |
---|
35 | characters to obtain the largest cliques of characters and the trees |
---|
36 | which they suggest. This approach originated in the work of Le Quesne |
---|
37 | (1969), though the algorithms were not precisely specified until the |
---|
38 | later work of Estabrook, Johnson, and McMorris (1976a, 1976b). These |
---|
39 | authors proved the theorem that a group of two-state characters which |
---|
40 | were pairwise compatible would be jointly compatible. This program uses |
---|
41 | an algorithm inspired by the Kent Fiala - George Estabrook program |
---|
42 | CLINCH, though closer in detail to the algorithm of Bron and Kerbosch |
---|
43 | (1973). I am indebted to Kent Fiala for pointing out that paper to me, |
---|
44 | and to David Penny for decribing to me his branch-and-bound approach to |
---|
45 | finding largest cliques, from which I have also borrowed. I am |
---|
46 | particularly grateful to Kent Fiala for catching a bug in versions 2.0 |
---|
47 | and 2.1 which resulted in those versions failing to find all of the |
---|
48 | cliques which they should. The program computes a compatibility matrix |
---|
49 | for the characters, then uses a recursive procedure to examine all |
---|
50 | possible cliques of characters. |
---|
51 | <P> |
---|
52 | After one pass through all possible cliques, the program knows the |
---|
53 | size of the largest clique, and during a second pass it prints out the |
---|
54 | cliques of the right size. It also, along with each clique, prints out a |
---|
55 | the tree suggested by that clique. |
---|
56 | <P> |
---|
57 | <H2>INPUT, OUTPUT, AND OPTIONS</H2> |
---|
58 | <P> |
---|
59 | Input to the algorithm is |
---|
60 | standard, but the "?", "P", and "B" states are not allowed. This is a |
---|
61 | serious limitation of this program. If you want to find large cliques in |
---|
62 | data that has "?" states, I recommend that you use MIX instead with the T |
---|
63 | (Threshold) option and the value of the threshold set to 2.0. The theory |
---|
64 | underlying this is given in my paper on character weighting (Felsenstein, |
---|
65 | 1981b). |
---|
66 | <P> |
---|
67 | The options are chosen from a menu, which looks like this: |
---|
68 | <P> |
---|
69 | <TABLE><TR><TD BGCOLOR=white> |
---|
70 | <PRE> |
---|
71 | |
---|
72 | Largest clique program, version 3.6a3 |
---|
73 | |
---|
74 | Settings for this run: |
---|
75 | A Use ancestral states in input file? No |
---|
76 | C Specify minimum clique size? No |
---|
77 | O Outgroup root? No, use as outgroup species 1 |
---|
78 | M Analyze multiple data sets? No |
---|
79 | 0 Terminal type (IBM PC, ANSI, none)? none |
---|
80 | 1 Print out the data at start of run No |
---|
81 | 2 Print indications of progress of run Yes |
---|
82 | 3 Print out compatibility matrix No |
---|
83 | 4 Print out tree Yes |
---|
84 | 5 Write out trees onto tree file? Yes |
---|
85 | |
---|
86 | Y to accept these or type the letter for one to change |
---|
87 | |
---|
88 | </PRE> |
---|
89 | </TD></TR></TABLE> |
---|
90 | <P> |
---|
91 | The A (Ancestors), O (Outgroup) and M (Multiple Data Sets) |
---|
92 | options are the usual ones, described in the main |
---|
93 | documentation file. However, Clique being an Old Style program, |
---|
94 | the options information for the Ancestors, Factors, and Weights |
---|
95 | options must be specified, not in separate files, but in the input |
---|
96 | data file. This is done by putting the letters A, F, or W on |
---|
97 | the first line of the input file, separated by blanks from the |
---|
98 | number of characters. The options information then follows, with the |
---|
99 | first line of each option's information starting with its letter, |
---|
100 | followed by at least 9 spaces or other characters to fill out to |
---|
101 | the length of a species name before the options information occurs: |
---|
102 | You can continue to a new line within the options information at |
---|
103 | any time. Here is a simple example: |
---|
104 | <P> |
---|
105 | <TABLE><TR><TD BGCOLOR=white> |
---|
106 | <PRE> |
---|
107 | 5 6 FAW |
---|
108 | WEIGHTS 111101 |
---|
109 | ANCESTORS 001111 |
---|
110 | FACTORS 112234 |
---|
111 | Alpha 110110 |
---|
112 | Beta 110000 |
---|
113 | Gamma 100110 |
---|
114 | Delta 001001 |
---|
115 | Epsilon 001110 |
---|
116 | </PRE> |
---|
117 | </TD></TR></TABLE> |
---|
118 | <P> |
---|
119 | If you use option A (Ancestors) you should also choose it in the menu. |
---|
120 | The compatibility matrix calculation in effect |
---|
121 | assumes if the Ancestors option is invoked that there is in the data |
---|
122 | another species that has all the ancestral states. This changes the |
---|
123 | compatibility patterns in the proper way. The Ancestors option also |
---|
124 | requires information on the ancestral states of each character to be in |
---|
125 | the input file. |
---|
126 | <P> |
---|
127 | The Outgroup option will take effect only if the tree is not rooted by the |
---|
128 | Ancestral States option. |
---|
129 | <P> |
---|
130 | The C (Clique Size) option indicates that |
---|
131 | you wish to specify a minimum clique size and print out all cliques (and their |
---|
132 | associated trees) greater than or equal to than that size. The program |
---|
133 | prompts you for the minimum clique size. |
---|
134 | <P> |
---|
135 | Note that this allows you to list all cliques (each with its tree) by |
---|
136 | simply setting the minimum clique size to 1. If you do one run and find |
---|
137 | that the largest clique has 23 characters, you can do another run with |
---|
138 | the minimum clique size set at 18, thus listing all cliques within 5 |
---|
139 | characters of the largest one. |
---|
140 | <P> |
---|
141 | Output involves a compatibility matrix (using the symbols "." and "1") |
---|
142 | and the cliques and trees. |
---|
143 | <P> |
---|
144 | If you have used the F option there will be two lists of characters |
---|
145 | for each clique, one the original multistate characters and the other the |
---|
146 | binary characters. It is the latter that are shown on the tree. When the |
---|
147 | F option is not used the output and the cliques reflect only the binary |
---|
148 | characters. |
---|
149 | <P> |
---|
150 | The trees produced have it |
---|
151 | indicated on each branch the points at which derived character states |
---|
152 | arise in the characters that define the clique. There is a legend |
---|
153 | above the tree showing which binary character is involved. Of course |
---|
154 | if the tree is unrooted you can read the changes as going in either |
---|
155 | direction. |
---|
156 | <P> |
---|
157 | The program runs very quickly but if the |
---|
158 | maximum number of characters is large it will need a good deal of |
---|
159 | storage, since the compatibility matrix requires ActualChars x ActualChars |
---|
160 | boolean variables, where ActualChars is the number of characters (in the |
---|
161 | case of the factors option the total number of true multistate characters. |
---|
162 | <P> |
---|
163 | <H2>ASSUMPTIONS</H2> |
---|
164 | <P> |
---|
165 | Basically the following assumptions are made: |
---|
166 | <OL> |
---|
167 | <LI>Each character evolves independently. |
---|
168 | <LI>Different lineages evolve independently. |
---|
169 | <LI>The ancestral state is not known. |
---|
170 | <LI>Each character has a small chance of being one which evolves |
---|
171 | so rapidly, or is so thoroughly misinterpreted, that it |
---|
172 | provides no information on the tree. |
---|
173 | <LI>The probability of a single change in a character (other than |
---|
174 | in the high rate characters) is low but not as low as the |
---|
175 | probability of being one of these "bad" characters. |
---|
176 | <LI>The probability of two changes in a low-rate character is much |
---|
177 | less than the probability that it is a high-rate character. |
---|
178 | <LI>The true tree has segments which are not so unequal in length |
---|
179 | that two changes in a long are as easy to envisage as one |
---|
180 | change in a short segment. |
---|
181 | </OL> |
---|
182 | <P> |
---|
183 | The assumptions of compatibility methods have been treated in |
---|
184 | several of my papers (1978b, 1979, 1981b, 1988b), especially the 1981 |
---|
185 | paper. For an opposing view arguing that the parsimony methods |
---|
186 | make no substantive |
---|
187 | assumptions such as these, see the papers by Farris (1983) and Sober (1983a, |
---|
188 | 1983b), but also read the exchange between Felsenstein and Sober (1986). |
---|
189 | <P> |
---|
190 | A constant available for alteration at the beginning of the |
---|
191 | program is the form width, "FormWide", |
---|
192 | which you may want to change to make it as large as possible |
---|
193 | consistent with the page width available on your output device, |
---|
194 | so as to avoid the output of cliques and of trees getting wrapped |
---|
195 | around unnecessarily. |
---|
196 | <P> |
---|
197 | <HR> |
---|
198 | <P> |
---|
199 | <H3>TEST DATA SET</H3> |
---|
200 | <P> |
---|
201 | <TABLE><TR><TD BGCOLOR=white> |
---|
202 | <PRE> |
---|
203 | 5 6 |
---|
204 | Alpha 110110 |
---|
205 | Beta 110000 |
---|
206 | Gamma 100110 |
---|
207 | Delta 001001 |
---|
208 | Epsilon 001110 |
---|
209 | </PRE> |
---|
210 | </TD></TR></TABLE> |
---|
211 | <P> |
---|
212 | <HR> |
---|
213 | <P> |
---|
214 | <H3>TEST SET OUTPUT (with all numerical options on)</H3> |
---|
215 | <P> |
---|
216 | <TABLE><TR><TD BGCOLOR=white> |
---|
217 | <PRE> |
---|
218 | |
---|
219 | Largest clique program, version 3.6a3 |
---|
220 | |
---|
221 | |
---|
222 | Character Compatibility Matrix (1 if compatible) |
---|
223 | --------- ------------- ------ -- -- ----------- |
---|
224 | |
---|
225 | 111..1 |
---|
226 | 111..1 |
---|
227 | 111..1 |
---|
228 | ...111 |
---|
229 | ...111 |
---|
230 | 111111 |
---|
231 | |
---|
232 | |
---|
233 | Largest Cliques |
---|
234 | ------- ------- |
---|
235 | |
---|
236 | |
---|
237 | Characters: ( 1 2 3 6) |
---|
238 | |
---|
239 | |
---|
240 | Tree and characters: |
---|
241 | |
---|
242 | 2 1 3 6 |
---|
243 | 0 0 1 1 |
---|
244 | |
---|
245 | +1-Delta |
---|
246 | +0--1-+ |
---|
247 | +--0-+ +--Epsilon |
---|
248 | ! ! |
---|
249 | ! +--------Gamma |
---|
250 | ! |
---|
251 | +-------------Alpha |
---|
252 | ! |
---|
253 | +-------------Beta |
---|
254 | |
---|
255 | remember: this is an unrooted tree! |
---|
256 | |
---|
257 | |
---|
258 | </PRE> |
---|
259 | </TD></TR></TABLE> |
---|
260 | </BODY> |
---|
261 | </HTML> |
---|