1 | |
---|
2 | DNAml_rates_1_0 |
---|
3 | |
---|
4 | Gary J. Olsen |
---|
5 | |
---|
6 | August 14, 1992 |
---|
7 | |
---|
8 | |
---|
9 | The DNAml_rates program takes a set of sequences and corresponding |
---|
10 | phylogenetic tree and produces and maximum likelihood estimate of the |
---|
11 | rate of nucleotides substitution at each sequence position. |
---|
12 | |
---|
13 | Input is read from standard input. The format is very much like that |
---|
14 | of the fastDNAml program. The first line of the input file gives the |
---|
15 | number of sequences and the number of bases per sequence. Also on |
---|
16 | this line are the requested program option letters. Any auxiliary |
---|
17 | data required by the options follow on subsequent lines. Either the |
---|
18 | user must specify the empirical base frequencies (F) option, or |
---|
19 | immediately preceding the data matrix there must be a line of data |
---|
20 | with the frequencies of A, C, G and T. Next, the program expects a |
---|
21 | data matrix. The first 10 characters of the first line of data for a |
---|
22 | given sequence in interpreted as the name (blanks are counted). |
---|
23 | Elsewhere in the data matrix, blanks and numbers are ignored. The |
---|
24 | default data matrix format is interleaved. If all the data for a |
---|
25 | sequence are on one input line, then interleaved and noninterleaved |
---|
26 | are equivalent. Following the data matrix there must be a line with |
---|
27 | the number of user-specified trees for which rates are to be estimated |
---|
28 | (as with the U option is fastDNAml). The rest of the input file is |
---|
29 | one or more user-specified trees with branch lengths (as with the U |
---|
30 | and L options in fastDNAml). |
---|
31 | |
---|
32 | The program writes to standard output. The output lists the estimated |
---|
33 | rate of change at every site in the sequence, or "Undefined" if there |
---|
34 | are not sufficient unambiguous data at the site. |
---|
35 | |
---|
36 | If the C option is specified, the program also categorizes the rates |
---|
37 | into the requested number of categories. The current categorization |
---|
38 | algorithm is rather crude, but is probably adequate if the number of |
---|
39 | categories is large enough. A weighting mask is also created in which |
---|
40 | sites with Undefined rates are assigned a weight of zero. |
---|
41 | |
---|
42 | If the Y option is specified, the program writes the weights and |
---|
43 | categories data to a file in a format appropriate for use by |
---|
44 | fastDNAml. |
---|
45 | |
---|
46 | |
---|
47 | Options summary: |
---|
48 | |
---|
49 | 1 - print data. Toggles print data option (default = noprint). |
---|
50 | |
---|
51 | C - write categories. Requires auxiliary line with a C and the desired |
---|
52 | number of categories. |
---|
53 | |
---|
54 | F - empirical base frequencies. Calculates base frequencies from data matrix, |
---|
55 | rather than expecting a base frequency input line. |
---|
56 | |
---|
57 | I - interleave. Toggles the data interleave option (default = interleave). |
---|
58 | |
---|
59 | L - userlengths. This is implicit in the program, so the option is ignored. |
---|
60 | |
---|
61 | M - minimum informative sequences. Requires an auxiliary data line with an |
---|
62 | M and the minimum number of sequences in which a sequence position |
---|
63 | (alignment column) must have unambiguous information in order for the rate |
---|
64 | at the site to be defined (default = 4). |
---|
65 | |
---|
66 | T - transitions/transversion ratio. Requires auxiliary line with a T and |
---|
67 | the ration of observed transitions to transversions (default = 2.0). |
---|
68 | |
---|
69 | U - user trees. This is implicit in the program, so the option is ignored. |
---|
70 | |
---|
71 | W - user weights. Requires weights auxiliary data. |
---|
72 | |
---|
73 | Y - categories file. Writes the weights and categories to a file. |
---|
74 | |
---|
75 | |
---|
76 | The option scripts usertree, weights, n_categories and categories_file are |
---|
77 | useful for adding the appropriate options to the input data matrix. |
---|
78 | |
---|
79 | The option script weights_categories is useful for adding the resulting |
---|
80 | outfile to a fastDNAml input file. |
---|
81 | |
---|