| 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> |
|---|
| 2 | <HTML> |
|---|
| 3 | <HEAD> |
|---|
| 4 | <TITLE>clique</TITLE> |
|---|
| 5 | <META NAME="description" CONTENT="clique"> |
|---|
| 6 | <META NAME="keywords" CONTENT="clique"> |
|---|
| 7 | <META NAME="resource-type" CONTENT="document"> |
|---|
| 8 | <META NAME="distribution" CONTENT="global"> |
|---|
| 9 | <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> |
|---|
| 10 | </HEAD> |
|---|
| 11 | <BODY BGCOLOR="#ccffff"> |
|---|
| 12 | <DIV ALIGN=RIGHT> |
|---|
| 13 | version 3.6 |
|---|
| 14 | </DIV> |
|---|
| 15 | <P> |
|---|
| 16 | <DIV ALIGN=CENTER> |
|---|
| 17 | <H1>CLIQUE -- Compatibility Program</H1> |
|---|
| 18 | </DIV> |
|---|
| 19 | <P> |
|---|
| 20 | © Copyright 1986-2002 by the University of |
|---|
| 21 | Washington. Written by Joseph Felsenstein. Permission is granted to copy |
|---|
| 22 | this document provided that no fee is charged for it and that this copyright |
|---|
| 23 | notice is not removed. |
|---|
| 24 | <P> |
|---|
| 25 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 26 | <EM><B>Note:</B> Clique is an Old Style program. This means that it takes |
|---|
| 27 | some of its options information, notably the |
|---|
| 28 | Weights, Ancestral states and Factors |
|---|
| 29 | options from the input file rather than from separate files of their own |
|---|
| 30 | as the New Style programs in this version of PHYLIP do. |
|---|
| 31 | </EM> |
|---|
| 32 | </TD></TR></TABLE> |
|---|
| 33 | <P> |
|---|
| 34 | This program uses the compatibility method for unrooted two-state |
|---|
| 35 | characters to obtain the largest cliques of characters and the trees |
|---|
| 36 | which they suggest. This approach originated in the work of Le Quesne |
|---|
| 37 | (1969), though the algorithms were not precisely specified until the |
|---|
| 38 | later work of Estabrook, Johnson, and McMorris (1976a, 1976b). These |
|---|
| 39 | authors proved the theorem that a group of two-state characters which |
|---|
| 40 | were pairwise compatible would be jointly compatible. This program uses |
|---|
| 41 | an algorithm inspired by the Kent Fiala - George Estabrook program |
|---|
| 42 | CLINCH, though closer in detail to the algorithm of Bron and Kerbosch |
|---|
| 43 | (1973). I am indebted to Kent Fiala for pointing out that paper to me, |
|---|
| 44 | and to David Penny for decribing to me his branch-and-bound approach to |
|---|
| 45 | finding largest cliques, from which I have also borrowed. I am |
|---|
| 46 | particularly grateful to Kent Fiala for catching a bug in versions 2.0 |
|---|
| 47 | and 2.1 which resulted in those versions failing to find all of the |
|---|
| 48 | cliques which they should. The program computes a compatibility matrix |
|---|
| 49 | for the characters, then uses a recursive procedure to examine all |
|---|
| 50 | possible cliques of characters. |
|---|
| 51 | <P> |
|---|
| 52 | After one pass through all possible cliques, the program knows the |
|---|
| 53 | size of the largest clique, and during a second pass it prints out the |
|---|
| 54 | cliques of the right size. It also, along with each clique, prints out a |
|---|
| 55 | the tree suggested by that clique. |
|---|
| 56 | <P> |
|---|
| 57 | <H2>INPUT, OUTPUT, AND OPTIONS</H2> |
|---|
| 58 | <P> |
|---|
| 59 | Input to the algorithm is |
|---|
| 60 | standard, but the "?", "P", and "B" states are not allowed. This is a |
|---|
| 61 | serious limitation of this program. If you want to find large cliques in |
|---|
| 62 | data that has "?" states, I recommend that you use MIX instead with the T |
|---|
| 63 | (Threshold) option and the value of the threshold set to 2.0. The theory |
|---|
| 64 | underlying this is given in my paper on character weighting (Felsenstein, |
|---|
| 65 | 1981b). |
|---|
| 66 | <P> |
|---|
| 67 | The options are chosen from a menu, which looks like this: |
|---|
| 68 | <P> |
|---|
| 69 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 70 | <PRE> |
|---|
| 71 | |
|---|
| 72 | Largest clique program, version 3.6a3 |
|---|
| 73 | |
|---|
| 74 | Settings for this run: |
|---|
| 75 | A Use ancestral states in input file? No |
|---|
| 76 | C Specify minimum clique size? No |
|---|
| 77 | O Outgroup root? No, use as outgroup species 1 |
|---|
| 78 | M Analyze multiple data sets? No |
|---|
| 79 | 0 Terminal type (IBM PC, ANSI, none)? none |
|---|
| 80 | 1 Print out the data at start of run No |
|---|
| 81 | 2 Print indications of progress of run Yes |
|---|
| 82 | 3 Print out compatibility matrix No |
|---|
| 83 | 4 Print out tree Yes |
|---|
| 84 | 5 Write out trees onto tree file? Yes |
|---|
| 85 | |
|---|
| 86 | Y to accept these or type the letter for one to change |
|---|
| 87 | |
|---|
| 88 | </PRE> |
|---|
| 89 | </TD></TR></TABLE> |
|---|
| 90 | <P> |
|---|
| 91 | The A (Ancestors), O (Outgroup) and M (Multiple Data Sets) |
|---|
| 92 | options are the usual ones, described in the main |
|---|
| 93 | documentation file. However, Clique being an Old Style program, |
|---|
| 94 | the options information for the Ancestors, Factors, and Weights |
|---|
| 95 | options must be specified, not in separate files, but in the input |
|---|
| 96 | data file. This is done by putting the letters A, F, or W on |
|---|
| 97 | the first line of the input file, separated by blanks from the |
|---|
| 98 | number of characters. The options information then follows, with the |
|---|
| 99 | first line of each option's information starting with its letter, |
|---|
| 100 | followed by at least 9 spaces or other characters to fill out to |
|---|
| 101 | the length of a species name before the options information occurs: |
|---|
| 102 | You can continue to a new line within the options information at |
|---|
| 103 | any time. Here is a simple example: |
|---|
| 104 | <P> |
|---|
| 105 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 106 | <PRE> |
|---|
| 107 | 5 6 FAW |
|---|
| 108 | WEIGHTS 111101 |
|---|
| 109 | ANCESTORS 001111 |
|---|
| 110 | FACTORS 112234 |
|---|
| 111 | Alpha 110110 |
|---|
| 112 | Beta 110000 |
|---|
| 113 | Gamma 100110 |
|---|
| 114 | Delta 001001 |
|---|
| 115 | Epsilon 001110 |
|---|
| 116 | </PRE> |
|---|
| 117 | </TD></TR></TABLE> |
|---|
| 118 | <P> |
|---|
| 119 | If you use option A (Ancestors) you should also choose it in the menu. |
|---|
| 120 | The compatibility matrix calculation in effect |
|---|
| 121 | assumes if the Ancestors option is invoked that there is in the data |
|---|
| 122 | another species that has all the ancestral states. This changes the |
|---|
| 123 | compatibility patterns in the proper way. The Ancestors option also |
|---|
| 124 | requires information on the ancestral states of each character to be in |
|---|
| 125 | the input file. |
|---|
| 126 | <P> |
|---|
| 127 | The Outgroup option will take effect only if the tree is not rooted by the |
|---|
| 128 | Ancestral States option. |
|---|
| 129 | <P> |
|---|
| 130 | The C (Clique Size) option indicates that |
|---|
| 131 | you wish to specify a minimum clique size and print out all cliques (and their |
|---|
| 132 | associated trees) greater than or equal to than that size. The program |
|---|
| 133 | prompts you for the minimum clique size. |
|---|
| 134 | <P> |
|---|
| 135 | Note that this allows you to list all cliques (each with its tree) by |
|---|
| 136 | simply setting the minimum clique size to 1. If you do one run and find |
|---|
| 137 | that the largest clique has 23 characters, you can do another run with |
|---|
| 138 | the minimum clique size set at 18, thus listing all cliques within 5 |
|---|
| 139 | characters of the largest one. |
|---|
| 140 | <P> |
|---|
| 141 | Output involves a compatibility matrix (using the symbols "." and "1") |
|---|
| 142 | and the cliques and trees. |
|---|
| 143 | <P> |
|---|
| 144 | If you have used the F option there will be two lists of characters |
|---|
| 145 | for each clique, one the original multistate characters and the other the |
|---|
| 146 | binary characters. It is the latter that are shown on the tree. When the |
|---|
| 147 | F option is not used the output and the cliques reflect only the binary |
|---|
| 148 | characters. |
|---|
| 149 | <P> |
|---|
| 150 | The trees produced have it |
|---|
| 151 | indicated on each branch the points at which derived character states |
|---|
| 152 | arise in the characters that define the clique. There is a legend |
|---|
| 153 | above the tree showing which binary character is involved. Of course |
|---|
| 154 | if the tree is unrooted you can read the changes as going in either |
|---|
| 155 | direction. |
|---|
| 156 | <P> |
|---|
| 157 | The program runs very quickly but if the |
|---|
| 158 | maximum number of characters is large it will need a good deal of |
|---|
| 159 | storage, since the compatibility matrix requires ActualChars x ActualChars |
|---|
| 160 | boolean variables, where ActualChars is the number of characters (in the |
|---|
| 161 | case of the factors option the total number of true multistate characters. |
|---|
| 162 | <P> |
|---|
| 163 | <H2>ASSUMPTIONS</H2> |
|---|
| 164 | <P> |
|---|
| 165 | Basically the following assumptions are made: |
|---|
| 166 | <OL> |
|---|
| 167 | <LI>Each character evolves independently. |
|---|
| 168 | <LI>Different lineages evolve independently. |
|---|
| 169 | <LI>The ancestral state is not known. |
|---|
| 170 | <LI>Each character has a small chance of being one which evolves |
|---|
| 171 | so rapidly, or is so thoroughly misinterpreted, that it |
|---|
| 172 | provides no information on the tree. |
|---|
| 173 | <LI>The probability of a single change in a character (other than |
|---|
| 174 | in the high rate characters) is low but not as low as the |
|---|
| 175 | probability of being one of these "bad" characters. |
|---|
| 176 | <LI>The probability of two changes in a low-rate character is much |
|---|
| 177 | less than the probability that it is a high-rate character. |
|---|
| 178 | <LI>The true tree has segments which are not so unequal in length |
|---|
| 179 | that two changes in a long are as easy to envisage as one |
|---|
| 180 | change in a short segment. |
|---|
| 181 | </OL> |
|---|
| 182 | <P> |
|---|
| 183 | The assumptions of compatibility methods have been treated in |
|---|
| 184 | several of my papers (1978b, 1979, 1981b, 1988b), especially the 1981 |
|---|
| 185 | paper. For an opposing view arguing that the parsimony methods |
|---|
| 186 | make no substantive |
|---|
| 187 | assumptions such as these, see the papers by Farris (1983) and Sober (1983a, |
|---|
| 188 | 1983b), but also read the exchange between Felsenstein and Sober (1986). |
|---|
| 189 | <P> |
|---|
| 190 | A constant available for alteration at the beginning of the |
|---|
| 191 | program is the form width, "FormWide", |
|---|
| 192 | which you may want to change to make it as large as possible |
|---|
| 193 | consistent with the page width available on your output device, |
|---|
| 194 | so as to avoid the output of cliques and of trees getting wrapped |
|---|
| 195 | around unnecessarily. |
|---|
| 196 | <P> |
|---|
| 197 | <HR> |
|---|
| 198 | <P> |
|---|
| 199 | <H3>TEST DATA SET</H3> |
|---|
| 200 | <P> |
|---|
| 201 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 202 | <PRE> |
|---|
| 203 | 5 6 |
|---|
| 204 | Alpha 110110 |
|---|
| 205 | Beta 110000 |
|---|
| 206 | Gamma 100110 |
|---|
| 207 | Delta 001001 |
|---|
| 208 | Epsilon 001110 |
|---|
| 209 | </PRE> |
|---|
| 210 | </TD></TR></TABLE> |
|---|
| 211 | <P> |
|---|
| 212 | <HR> |
|---|
| 213 | <P> |
|---|
| 214 | <H3>TEST SET OUTPUT (with all numerical options on)</H3> |
|---|
| 215 | <P> |
|---|
| 216 | <TABLE><TR><TD BGCOLOR=white> |
|---|
| 217 | <PRE> |
|---|
| 218 | |
|---|
| 219 | Largest clique program, version 3.6a3 |
|---|
| 220 | |
|---|
| 221 | |
|---|
| 222 | Character Compatibility Matrix (1 if compatible) |
|---|
| 223 | --------- ------------- ------ -- -- ----------- |
|---|
| 224 | |
|---|
| 225 | 111..1 |
|---|
| 226 | 111..1 |
|---|
| 227 | 111..1 |
|---|
| 228 | ...111 |
|---|
| 229 | ...111 |
|---|
| 230 | 111111 |
|---|
| 231 | |
|---|
| 232 | |
|---|
| 233 | Largest Cliques |
|---|
| 234 | ------- ------- |
|---|
| 235 | |
|---|
| 236 | |
|---|
| 237 | Characters: ( 1 2 3 6) |
|---|
| 238 | |
|---|
| 239 | |
|---|
| 240 | Tree and characters: |
|---|
| 241 | |
|---|
| 242 | 2 1 3 6 |
|---|
| 243 | 0 0 1 1 |
|---|
| 244 | |
|---|
| 245 | +1-Delta |
|---|
| 246 | +0--1-+ |
|---|
| 247 | +--0-+ +--Epsilon |
|---|
| 248 | ! ! |
|---|
| 249 | ! +--------Gamma |
|---|
| 250 | ! |
|---|
| 251 | +-------------Alpha |
|---|
| 252 | ! |
|---|
| 253 | +-------------Beta |
|---|
| 254 | |
|---|
| 255 | remember: this is an unrooted tree! |
|---|
| 256 | |
|---|
| 257 | |
|---|
| 258 | </PRE> |
|---|
| 259 | </TD></TR></TABLE> |
|---|
| 260 | </BODY> |
|---|
| 261 | </HTML> |
|---|