1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> |
---|
2 | <HTML> |
---|
3 | <HEAD> |
---|
4 | <TITLE>factor</TITLE> |
---|
5 | <META NAME="description" CONTENT="factor"> |
---|
6 | <META NAME="keywords" CONTENT="factor"> |
---|
7 | <META NAME="resource-type" CONTENT="document"> |
---|
8 | <META NAME="distribution" CONTENT="global"> |
---|
9 | <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> |
---|
10 | </HEAD> |
---|
11 | <BODY BGCOLOR="#ccffff"> |
---|
12 | <DIV ALIGN=RIGHT> |
---|
13 | version 3.6 |
---|
14 | </DIV> |
---|
15 | <P> |
---|
16 | <DIV ALIGN=CENTER> |
---|
17 | <H1>FACTOR - Program to factor multistate characters.</H1> |
---|
18 | </DIV> |
---|
19 | <P> |
---|
20 | © Copyright 1986-2002 by The University of Washington. Written by |
---|
21 | Christopher Meacham and Joseph Felsenstein. Permission is granted |
---|
22 | to copy this document provided that no fee is charged for it and that this |
---|
23 | copyright notice is not removed. |
---|
24 | <P> |
---|
25 | <TABLE><TR><TD BGCOLOR=white> |
---|
26 | <EM><B>Note:</B> Factor is an Old Style program. |
---|
27 | This means that it takes some of its options information, notably the |
---|
28 | Ancestral states and Factors |
---|
29 | options from the input file rather than from separate files of their own |
---|
30 | as the New Style programs in this version of PHYLIP do. |
---|
31 | </EM> |
---|
32 | </TD></TR></TABLE> |
---|
33 | <P> |
---|
34 | </EM> |
---|
35 | <P> |
---|
36 | Programmed by C. Meacham, Botany, Univ. of Georgia, Athens, Georgia |
---|
37 | .ce |
---|
38 | (current address: University of California, Berkeley, California 94720) |
---|
39 | .ce |
---|
40 | additional code and documentation by Joe Felsenstein |
---|
41 | <P> |
---|
42 | This program factors a data set that contains multistate |
---|
43 | characters, creating a data set consisting entirely of binary (0,1) |
---|
44 | characters that, in turn, can be used as input to any of the other |
---|
45 | discrete character programs in this package, except for PARS. |
---|
46 | Besides this primary |
---|
47 | function, FACTOR also provides an easy way of deleting characters from a |
---|
48 | data set. The input format for FACTOR is very similar to the input |
---|
49 | format for the other discrete character programs except for the |
---|
50 | addition of character-state tree descriptions. |
---|
51 | <P> |
---|
52 | Note that this program has no way of converting an unordered multistate |
---|
53 | character into binary characters. This is a weakness of the Old Style |
---|
54 | discrete characters programs in this package. |
---|
55 | Fortunately, PARS has joined the package, and it enables unordered |
---|
56 | multistate characters, in which any state can change to any other in |
---|
57 | one step, to be analyzed with parsimony. |
---|
58 | <P> |
---|
59 | FACTOR is really for a different case, that in which there are |
---|
60 | multiple states related on a "character state tree", which specifies |
---|
61 | for each state which other states it can change to. That graph of |
---|
62 | states is assumed to be a tree, with no loops in it. |
---|
63 | <P> |
---|
64 | The first line of the input file should contain the number of |
---|
65 | species and the number of multistate characters. This |
---|
66 | first line is followed by the lines describing the character-state |
---|
67 | trees, one description per line. The species information constitutes |
---|
68 | the last part of the file. Any number of lines may be used for a single |
---|
69 | species. |
---|
70 | <P> |
---|
71 | <H2>FIRST LINE</H2> |
---|
72 | <P> |
---|
73 | The first line is free format with the number of species first, |
---|
74 | separated by at least one blank (space) from the number of multistate |
---|
75 | characters, which in turn is separated by at least one blank from the |
---|
76 | options, if present. |
---|
77 | <P> |
---|
78 | <H2>OPTIONS</H2> |
---|
79 | <P> |
---|
80 | The options are selected from a menu that looks like this: |
---|
81 | <P> |
---|
82 | <TABLE><TR><TD BGCOLOR=white> |
---|
83 | <PRE> |
---|
84 | |
---|
85 | Factor -- multistate to binary recoding program, version 3.6a3 |
---|
86 | |
---|
87 | Settings for this run: |
---|
88 | A put ancestral states in output file? No |
---|
89 | F put factors information in output file? No |
---|
90 | 0 Terminal type (IBM PC, ANSI, none)? (none) |
---|
91 | 1 Print indications of progress of run Yes |
---|
92 | |
---|
93 | Are these settings correct? (type Y or the letter for one to change) |
---|
94 | |
---|
95 | </PRE> |
---|
96 | </TD></TR></TABLE> |
---|
97 | <P> |
---|
98 | The options particular to this program are: |
---|
99 | <P> |
---|
100 | <DL COMPACT> |
---|
101 | <DT>A</DT> <DD>Choosing the A (Ancestors) options toggles on and off the setting |
---|
102 | that causes a line to be written in the output that |
---|
103 | describes the states of the ancestor as indicated by the |
---|
104 | character-state tree descriptions (see below). If the ancestral |
---|
105 | state is not specified by a particular character-state tree, |
---|
106 | a "?" signifying an unknown character state will be written. |
---|
107 | The multistate characters are factored in such a way that the |
---|
108 | ancestral state in the factored data set will always be "0". |
---|
109 | The ancestor line does not get counted as a species.</DD> |
---|
110 | <P> |
---|
111 | <DT>F</DT> <DD>Choosing the F (Factors) option toggles on and off |
---|
112 | a setting that will cause a "FACTORS" line to |
---|
113 | be written in the output. |
---|
114 | This line will indicate to other programs which factors came |
---|
115 | from the same multistate character. Of the programs currently in |
---|
116 | the package only SEQBOOT, MOVE, and DOLMOVE use this information.</DD> |
---|
117 | </DL> |
---|
118 | <P> |
---|
119 | <H2>CHARACTER-STATE TREE DESCRIPTIONS</H2> |
---|
120 | <P> |
---|
121 | The character-state trees are described in free format. The |
---|
122 | character number of the multistate character is given first followed |
---|
123 | by the description of the tree itself. Each description must be |
---|
124 | completed on a single line. Each character that is to be factored must |
---|
125 | have a description, and the characters must be described in the order |
---|
126 | that they occur in the input, that is, in numerical order. |
---|
127 | <P> |
---|
128 | The tree is described by listing the pairs of character states that |
---|
129 | are adjacent to each other in the character-state tree. The two |
---|
130 | character states in each adjacent pair are separated by a colon (":"). |
---|
131 | If character fifteen has this character state tree for possible states |
---|
132 | "A", "B", "C", and "D": |
---|
133 | <P> |
---|
134 | <PRE> |
---|
135 | A ---- B ---- C |
---|
136 | | |
---|
137 | | |
---|
138 | | |
---|
139 | D |
---|
140 | </PRE> |
---|
141 | <P> |
---|
142 | then the character-state tree description would be |
---|
143 | <P> |
---|
144 | <PRE> |
---|
145 | 15 A:B B:C D:B |
---|
146 | </PRE> |
---|
147 | <P> |
---|
148 | Note that either symbol may appear first. The ancestral state is |
---|
149 | identified, if desired, by putting it "adjacent" to a period. If we |
---|
150 | wanted to root character fifteen at state C: |
---|
151 | <P> |
---|
152 | <PRE> |
---|
153 | A <--- B <--- C |
---|
154 | | |
---|
155 | | |
---|
156 | V |
---|
157 | D |
---|
158 | </PRE> |
---|
159 | <P> |
---|
160 | we could write |
---|
161 | <P> |
---|
162 | <PRE> |
---|
163 | 15 B:D A:B C:B .:C |
---|
164 | </PRE> |
---|
165 | <P> |
---|
166 | Both the order in which the pairs are listed and the order of the |
---|
167 | symbols in each pair are arbitrary. However, each pair may only appear |
---|
168 | once in the list. Any symbols may be used for a character state in the |
---|
169 | input except the character that signals the connection between two states (in |
---|
170 | the distribution copy this is set to ":"), ".", and, of course, a |
---|
171 | blank. Blanks are ignored |
---|
172 | completely in the tree description so that even B:DA:BC:B.:C or |
---|
173 | B : DA : BC : B. : C would be equivalent to the above example. |
---|
174 | However, at least one blank must separate the character number from the |
---|
175 | tree description. |
---|
176 | <P> |
---|
177 | <H2>DELETING CHARACTERS FROM A DATA SET</H2> |
---|
178 | <P> |
---|
179 | If no description line appears in the input for a particular |
---|
180 | character, then that character will be omitted from the output. If the |
---|
181 | character number is given on the line, but no character-state tree is |
---|
182 | provided, then the symbol for the character in the input will be copied |
---|
183 | directly to the output without change. This is useful for characters |
---|
184 | that are already coded "0" and "1". Characters can be deleted from a |
---|
185 | data set simply by listing only those that are to appear in the output. |
---|
186 | <P> |
---|
187 | <H2>TERMINATING THE LIST OF TREE DESCRIPTIONS</H2> |
---|
188 | <P> |
---|
189 | The last character-state tree description should be followed by a |
---|
190 | line containing the number "999". This terminates processing of the |
---|
191 | trees and indicates the beginning of the species information. |
---|
192 | <P> |
---|
193 | <H2>SPECIES INFORMATION</H2> |
---|
194 | <P> |
---|
195 | The format for the species information is basically identical to |
---|
196 | the other discrete character programs. The first ten character positions |
---|
197 | are allotted to the species name (this value may be changed by altering |
---|
198 | the value of the constant nmlngth at the beginning of the program). The |
---|
199 | character states follow and may be continued to as many lines as |
---|
200 | desired. There is no current method for indicating polymorphisms. It is |
---|
201 | possible to either put blanks between characters or not. |
---|
202 | <P> |
---|
203 | There is a method for indicating uncertainty about states. There is |
---|
204 | one character value that stands for "unknown". If this appears in |
---|
205 | the input data then "?" is written out in all the corresponding |
---|
206 | positions in the output file. The character value that designates |
---|
207 | "unknown" is given in the constant unkchar at the beginning of the |
---|
208 | program, and can be changed by changing that constant. It is set to |
---|
209 | "?" in the distribution copy. |
---|
210 | <P> |
---|
211 | <H2>OUTPUT</H2> |
---|
212 | <P> |
---|
213 | The first line of output will contain the number of species and |
---|
214 | the number of binary characters in the factored data set followed by |
---|
215 | the letter "A" if the A option was specified in the input. If option |
---|
216 | F was specified, the next line will begin "FACTORS". If option A was |
---|
217 | specified, the line describing the ancestor will follow next. Finally, |
---|
218 | the factored characters will be written for each species in the format |
---|
219 | required for input by the other discrete programs in the package. The |
---|
220 | maximum length of the output lines is 80 characters, but this maximum |
---|
221 | length can be changed prior to compilation. |
---|
222 | <P> |
---|
223 | In fact, the format of the output file for the A and F options is not |
---|
224 | correct for the current release of PHYLIP. We need to change their |
---|
225 | output to write a factors file and an ancestors file instead of |
---|
226 | putting the Factors and Ancestors information into the data file. |
---|
227 | <P> |
---|
228 | ERRORS |
---|
229 | <P> |
---|
230 | The output should be checked for error messages. Errors will occur |
---|
231 | in the character-state tree descriptions if the format is incorrect |
---|
232 | (colons in the wrong place, etc.), if more than one root is specified, |
---|
233 | if the tree contains loops (and hence is not a tree), and if the tree is |
---|
234 | not connected, e.g. |
---|
235 | <P> |
---|
236 | <PRE> |
---|
237 | A:B B:C D:E |
---|
238 | </PRE> |
---|
239 | <P> |
---|
240 | describes |
---|
241 | <P> |
---|
242 | <PRE> |
---|
243 | A ---- B ---- C D ---- E |
---|
244 | </PRE> |
---|
245 | <P> |
---|
246 | This "tree" is in two unconnected pieces. An error will also occur if a symbol |
---|
247 | appears in the data set that is not in the tree description for that |
---|
248 | character. Blanks at the end of lines when the species information |
---|
249 | is continued to a new line will cause this kind of error. |
---|
250 | <P> |
---|
251 | <H2>CONSTANTS AVAILABLE TO BE CHANGED</H2> |
---|
252 | <P> |
---|
253 | At the beginning of the program a number of |
---|
254 | are available to be changed to accomodate larger data sets. These are |
---|
255 | "maxstates", "maxoutput", "sizearray", "factchar" and "unkchar". The |
---|
256 | constant "maxstates" |
---|
257 | gives the maximum number of states per character (set at 20 in the |
---|
258 | distribution copy). The constant "maxoutput" |
---|
259 | gives the maximum width of a line in the output file (80 in the |
---|
260 | distribution copy). The constant "sizearray" |
---|
261 | must be less than the sum of squares |
---|
262 | of the numbers of states in the characters. It is initially set to |
---|
263 | set to 2000, so that although 20 states are allowed (at the initial |
---|
264 | setting of maxstates) per character, there cannot be 20 states in all |
---|
265 | of 100 characters. |
---|
266 | <P> |
---|
267 | Particularly important constants are "factchar" and "unkchar" |
---|
268 | which are not numerical |
---|
269 | values but a character. Initially set to the colon ":", |
---|
270 | "factchar" is the character that will be used to separate states in the input of character |
---|
271 | state trees. It can be changed by changing this |
---|
272 | constant. (We could have used a hyphen ("-") but didn't because that would make the |
---|
273 | minus-sign ("-") unavailable as a character state in +/- characters). |
---|
274 | The constant "unkchar" |
---|
275 | is the character value in the input data that |
---|
276 | indicates that the state is unknown. It is set to "?" in the |
---|
277 | distribution copy. If your computer is one that lacks the colon ":" in its |
---|
278 | character set or uses a nonstandard character code such as EBCDIC, you |
---|
279 | will want to change the constant "factchar". |
---|
280 | <P> |
---|
281 | <H2>INPUT AND OUTPUT FILES</H2> |
---|
282 | <P> |
---|
283 | The input file for the program has the default file name "infile" |
---|
284 | and the output file, the one that has the binary character state data, |
---|
285 | has the name "outfile". |
---|
286 | <P> |
---|
287 | <TABLE> |
---|
288 | <TR> |
---|
289 | <TD>----SAMPLE INPUT-----</TD> <TD> -----Comments (not part of input file) -----</TD> |
---|
290 | </TR> |
---|
291 | <TR> |
---|
292 | <TD BGCOLOR=white> |
---|
293 | <PRE> |
---|
294 | 4 6 A |
---|
295 | 1 A:B B:C |
---|
296 | 2 A:B B:. |
---|
297 | 4 |
---|
298 | 5 0:1 1:2 .:0 |
---|
299 | 6 .:# #:$ #:% |
---|
300 | 999 |
---|
301 | Alpha CAW00# |
---|
302 | Beta BBX01% |
---|
303 | Gamma ABY12# |
---|
304 | Epsilon CAZ01$ |
---|
305 | |
---|
306 | </TD> |
---|
307 | <TD> |
---|
308 | <PRE> |
---|
309 | |
---|
310 | 4 species; 6 characters; A option on |
---|
311 | A ---- B ---- C |
---|
312 | B ---> A |
---|
313 | Character 3 deleted; 4 unchanged |
---|
314 | 0 ---> 1 ---> 2 |
---|
315 | % <--- # ---> $ |
---|
316 | Signals end of trees |
---|
317 | Species information begins |
---|
318 | |
---|
319 | |
---|
320 | |
---|
321 | </PRE> |
---|
322 | </TD> |
---|
323 | </TR> |
---|
324 | <TR> |
---|
325 | <TD> ---SAMPLE OUTPUT-----</TD> <TD> -----Comments (not part of output file) -----</TD> |
---|
326 | </TR> |
---|
327 | <TR> |
---|
328 | <TD BGCOLOR=white> |
---|
329 | <PRE> |
---|
330 | 5 8 A |
---|
331 | ANCESTOR ??0?0000 |
---|
332 | Alpha 11100000 |
---|
333 | Beta 10001001 |
---|
334 | Gamma 00011100 |
---|
335 | Epsilon 11101010 |
---|
336 | </PRE> |
---|
337 | </TD> |
---|
338 | <TD> |
---|
339 | <PRE> |
---|
340 | 5 species (incl. anc.); 8 factors |
---|
341 | Chars. 1 and 2 come from old number 1 |
---|
342 | Char. 3 comes from old number 2 |
---|
343 | Char. 4 is old number 4 |
---|
344 | Chars. 5 and 6 come from old number 5 |
---|
345 | Chars. 7 and 8 come from old number 6 |
---|
346 | </PRE> |
---|
347 | </TD> |
---|
348 | </TR> |
---|
349 | </TABLE> |
---|
350 | </BODY> |
---|
351 | </HTML> |
---|