Context Navigation

factor.html

Visit:

Last change on this file was 2176, checked in by westram, 20 years ago
* empty log message *
Property svn:eol-style set to `native` Property svn:keywords set to `Author Date Id Revision`
File size: 12.6 KB

Line
1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
2	<HTML>
3	<HEAD>
4	<TITLE>factor</TITLE>
5	<META NAME="description" CONTENT="factor">
6	<META NAME="keywords" CONTENT="factor">
7	<META NAME="resource-type" CONTENT="document">
8	<META NAME="distribution" CONTENT="global">
9	<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
10	</HEAD>
11	<BODY BGCOLOR="#ccffff">
12	<DIV ALIGN=RIGHT>
13	version 3.6
14	</DIV>
15	<P>
16	<DIV ALIGN=CENTER>
17	<H1>FACTOR - Program to factor multistate characters.</H1>
18	</DIV>
19	<P>
20	© Copyright 1986-2002 by The University of Washington. Written by
21	Christopher Meacham and Joseph Felsenstein. Permission is granted
22	to copy this document provided that no fee is charged for it and that this
23	copyright notice is not removed.
24	<P>
25	<TABLE><TR><TD BGCOLOR=white>
26	<EM><B>Note:</B> Factor is an Old Style program.
27	This means that it takes some of its options information, notably the
28	Ancestral states and Factors
29	options from the input file rather than from separate files of their own
30	as the New Style programs in this version of PHYLIP do.
31	</EM>
32	</TD></TR></TABLE>
33	<P>
34	</EM>
35	<P>
36	Programmed by C. Meacham, Botany, Univ. of Georgia, Athens, Georgia
37	.ce
38	(current address: University of California, Berkeley, California 94720)
39	.ce
40	additional code and documentation by Joe Felsenstein
41	<P>
42	This program factors a data set that contains multistate
43	characters, creating a data set consisting entirely of binary (0,1)
44	characters that, in turn, can be used as input to any of the other
45	discrete character programs in this package, except for PARS.
46	Besides this primary
47	function, FACTOR also provides an easy way of deleting characters from a
48	data set. The input format for FACTOR is very similar to the input
49	format for the other discrete character programs except for the
50	addition of character-state tree descriptions.
51	<P>
52	Note that this program has no way of converting an unordered multistate
53	character into binary characters. This is a weakness of the Old Style
54	discrete characters programs in this package.
55	Fortunately, PARS has joined the package, and it enables unordered
56	multistate characters, in which any state can change to any other in
57	one step, to be analyzed with parsimony.
58	<P>
59	FACTOR is really for a different case, that in which there are
60	multiple states related on a "character state tree", which specifies
61	for each state which other states it can change to. That graph of
62	states is assumed to be a tree, with no loops in it.
63	<P>
64	The first line of the input file should contain the number of
65	species and the number of multistate characters. This
66	first line is followed by the lines describing the character-state
67	trees, one description per line. The species information constitutes
68	the last part of the file. Any number of lines may be used for a single
69	species.
70	<P>
71	<H2>FIRST LINE</H2>
72	<P>
73	The first line is free format with the number of species first,
74	separated by at least one blank (space) from the number of multistate
75	characters, which in turn is separated by at least one blank from the
76	options, if present.
77	<P>
78	<H2>OPTIONS</H2>
79	<P>
80	The options are selected from a menu that looks like this:
81	<P>
82	<TABLE><TR><TD BGCOLOR=white>
83	<PRE>
84
85	Factor -- multistate to binary recoding program, version 3.6a3
86
87	Settings for this run:
88	A put ancestral states in output file? No
89	F put factors information in output file? No
90	0 Terminal type (IBM PC, ANSI, none)? (none)
91	1 Print indications of progress of run Yes
92
93	Are these settings correct? (type Y or the letter for one to change)
94
95	</PRE>
96	</TD></TR></TABLE>
97	<P>
98	The options particular to this program are:
99	<P>
100	<DL COMPACT>
101	<DT>A</DT> <DD>Choosing the A (Ancestors) options toggles on and off the setting
102	that causes a line to be written in the output that
103	describes the states of the ancestor as indicated by the
104	character-state tree descriptions (see below). If the ancestral
105	state is not specified by a particular character-state tree,
106	a "?" signifying an unknown character state will be written.
107	The multistate characters are factored in such a way that the
108	ancestral state in the factored data set will always be "0".
109	The ancestor line does not get counted as a species.</DD>
110	<P>
111	<DT>F</DT> <DD>Choosing the F (Factors) option toggles on and off
112	a setting that will cause a "FACTORS" line to
113	be written in the output.
114	This line will indicate to other programs which factors came
115	from the same multistate character. Of the programs currently in
116	the package only SEQBOOT, MOVE, and DOLMOVE use this information.</DD>
117	</DL>
118	<P>
119	<H2>CHARACTER-STATE TREE DESCRIPTIONS</H2>
120	<P>
121	The character-state trees are described in free format. The
122	character number of the multistate character is given first followed
123	by the description of the tree itself. Each description must be
124	completed on a single line. Each character that is to be factored must
125	have a description, and the characters must be described in the order
126	that they occur in the input, that is, in numerical order.
127	<P>
128	The tree is described by listing the pairs of character states that
129	are adjacent to each other in the character-state tree. The two
130	character states in each adjacent pair are separated by a colon (":").
131	If character fifteen has this character state tree for possible states
132	"A", "B", "C", and "D":
133	<P>
134	<PRE>
135	A ---- B ---- C
136	\|
137	\|
138	\|
139	D
140	</PRE>
141	<P>
142	then the character-state tree description would be
143	<P>
144	<PRE>
145	15 A:B B:C D:B
146	</PRE>
147	<P>
148	Note that either symbol may appear first. The ancestral state is
149	identified, if desired, by putting it "adjacent" to a period. If we
150	wanted to root character fifteen at state C:
151	<P>
152	<PRE>
153	A <--- B <--- C
154	\|
155	\|
156	V
157	D
158	</PRE>
159	<P>
160	we could write
161	<P>
162	<PRE>
163	15 B:D A:B C:B .:C
164	</PRE>
165	<P>
166	Both the order in which the pairs are listed and the order of the
167	symbols in each pair are arbitrary. However, each pair may only appear
168	once in the list. Any symbols may be used for a character state in the
169	input except the character that signals the connection between two states (in
170	the distribution copy this is set to ":"), ".", and, of course, a
171	blank. Blanks are ignored
172	completely in the tree description so that even B:DA:BC:B.:C or
173	B : DA : BC : B. : C would be equivalent to the above example.
174	However, at least one blank must separate the character number from the
175	tree description.
176	<P>
177	<H2>DELETING CHARACTERS FROM A DATA SET</H2>
178	<P>
179	If no description line appears in the input for a particular
180	character, then that character will be omitted from the output. If the
181	character number is given on the line, but no character-state tree is
182	provided, then the symbol for the character in the input will be copied
183	directly to the output without change. This is useful for characters
184	that are already coded "0" and "1". Characters can be deleted from a
185	data set simply by listing only those that are to appear in the output.
186	<P>
187	<H2>TERMINATING THE LIST OF TREE DESCRIPTIONS</H2>
188	<P>
189	The last character-state tree description should be followed by a
190	line containing the number "999". This terminates processing of the
191	trees and indicates the beginning of the species information.
192	<P>
193	<H2>SPECIES INFORMATION</H2>
194	<P>
195	The format for the species information is basically identical to
196	the other discrete character programs. The first ten character positions
197	are allotted to the species name (this value may be changed by altering
198	the value of the constant nmlngth at the beginning of the program). The
199	character states follow and may be continued to as many lines as
200	desired. There is no current method for indicating polymorphisms. It is
201	possible to either put blanks between characters or not.
202	<P>
203	There is a method for indicating uncertainty about states. There is
204	one character value that stands for "unknown". If this appears in
205	the input data then "?" is written out in all the corresponding
206	positions in the output file. The character value that designates
207	"unknown" is given in the constant unkchar at the beginning of the
208	program, and can be changed by changing that constant. It is set to
209	"?" in the distribution copy.
210	<P>
211	<H2>OUTPUT</H2>
212	<P>
213	The first line of output will contain the number of species and
214	the number of binary characters in the factored data set followed by
215	the letter "A" if the A option was specified in the input. If option
216	F was specified, the next line will begin "FACTORS". If option A was
217	specified, the line describing the ancestor will follow next. Finally,
218	the factored characters will be written for each species in the format
219	required for input by the other discrete programs in the package. The
220	maximum length of the output lines is 80 characters, but this maximum
221	length can be changed prior to compilation.
222	<P>
223	In fact, the format of the output file for the A and F options is not
224	correct for the current release of PHYLIP. We need to change their
225	output to write a factors file and an ancestors file instead of
226	putting the Factors and Ancestors information into the data file.
227	<P>
228	ERRORS
229	<P>
230	The output should be checked for error messages. Errors will occur
231	in the character-state tree descriptions if the format is incorrect
232	(colons in the wrong place, etc.), if more than one root is specified,
233	if the tree contains loops (and hence is not a tree), and if the tree is
234	not connected, e.g.
235	<P>
236	<PRE>
237	A:B B:C D:E
238	</PRE>
239	<P>
240	describes
241	<P>
242	<PRE>
243	A ---- B ---- C D ---- E
244	</PRE>
245	<P>
246	This "tree" is in two unconnected pieces. An error will also occur if a symbol
247	appears in the data set that is not in the tree description for that
248	character. Blanks at the end of lines when the species information
249	is continued to a new line will cause this kind of error.
250	<P>
251	<H2>CONSTANTS AVAILABLE TO BE CHANGED</H2>
252	<P>
253	At the beginning of the program a number of
254	are available to be changed to accomodate larger data sets. These are
255	"maxstates", "maxoutput", "sizearray", "factchar" and "unkchar". The
256	constant "maxstates"
257	gives the maximum number of states per character (set at 20 in the
258	distribution copy). The constant "maxoutput"
259	gives the maximum width of a line in the output file (80 in the
260	distribution copy). The constant "sizearray"
261	must be less than the sum of squares
262	of the numbers of states in the characters. It is initially set to
263	set to 2000, so that although 20 states are allowed (at the initial
264	setting of maxstates) per character, there cannot be 20 states in all
265	of 100 characters.
266	<P>
267	Particularly important constants are "factchar" and "unkchar"
268	which are not numerical
269	values but a character. Initially set to the colon ":",
270	"factchar" is the character that will be used to separate states in the input of character
271	state trees. It can be changed by changing this
272	constant. (We could have used a hyphen ("-") but didn't because that would make the
273	minus-sign ("-") unavailable as a character state in +/- characters).
274	The constant "unkchar"
275	is the character value in the input data that
276	indicates that the state is unknown. It is set to "?" in the
277	distribution copy. If your computer is one that lacks the colon ":" in its
278	character set or uses a nonstandard character code such as EBCDIC, you
279	will want to change the constant "factchar".
280	<P>
281	<H2>INPUT AND OUTPUT FILES</H2>
282	<P>
283	The input file for the program has the default file name "infile"
284	and the output file, the one that has the binary character state data,
285	has the name "outfile".
286	<P>
287	<TABLE>
288	<TR>
289	<TD>----SAMPLE INPUT-----</TD> <TD> -----Comments (not part of input file) -----</TD>
290	</TR>
291	<TR>
292	<TD BGCOLOR=white>
293	<PRE>
294	4 6 A
295	1 A:B B:C
296	2 A:B B:.
297	4
298	5 0:1 1:2 .:0
299	6 .:# #:$ #:%
300	999
301	Alpha CAW00#
302	Beta BBX01%
303	Gamma ABY12#
304	Epsilon CAZ01$
305
306	</TD>
307	<TD>
308	<PRE>
309
310	4 species; 6 characters; A option on
311	A ---- B ---- C
312	B ---> A
313	Character 3 deleted; 4 unchanged
314	0 ---> 1 ---> 2
315	% <--- # ---> $
316	Signals end of trees
317	Species information begins
318
319
320
321	</PRE>
322	</TD>
323	</TR>
324	<TR>
325	<TD> ---SAMPLE OUTPUT-----</TD> <TD> -----Comments (not part of output file) -----</TD>
326	</TR>
327	<TR>
328	<TD BGCOLOR=white>
329	<PRE>
330	5 8 A
331	ANCESTOR ??0?0000
332	Alpha 11100000
333	Beta 10001001
334	Gamma 00011100
335	Epsilon 11101010
336	</PRE>
337	</TD>
338	<TD>
339	<PRE>
340	5 species (incl. anc.); 8 factors
341	Chars. 1 and 2 come from old number 1
342	Char. 3 comes from old number 2
343	Char. 4 is old number 4
344	Chars. 5 and 6 come from old number 5
345	Chars. 7 and 8 come from old number 6
346	</PRE>
347	</TD>
348	</TR>
349	</TABLE>
350	</BODY>
351	</HTML>

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: tags/arb-6.0/GDE/PHYLIP/doc/factor.html

Download in other formats: