| 1 | |
|---|
| 2 | DNAml_rates_1_0 |
|---|
| 3 | |
|---|
| 4 | Gary J. Olsen |
|---|
| 5 | |
|---|
| 6 | August 14, 1992 |
|---|
| 7 | |
|---|
| 8 | |
|---|
| 9 | The DNAml_rates program takes a set of sequences and corresponding |
|---|
| 10 | phylogenetic tree and produces and maximum likelihood estimate of the |
|---|
| 11 | rate of nucleotides substitution at each sequence position. |
|---|
| 12 | |
|---|
| 13 | Input is read from standard input. The format is very much like that |
|---|
| 14 | of the fastDNAml program. The first line of the input file gives the |
|---|
| 15 | number of sequences and the number of bases per sequence. Also on |
|---|
| 16 | this line are the requested program option letters. Any auxiliary |
|---|
| 17 | data required by the options follow on subsequent lines. Either the |
|---|
| 18 | user must specify the empirical base frequencies (F) option, or |
|---|
| 19 | immediately preceding the data matrix there must be a line of data |
|---|
| 20 | with the frequencies of A, C, G and T. Next, the program expects a |
|---|
| 21 | data matrix. The first 10 characters of the first line of data for a |
|---|
| 22 | given sequence in interpreted as the name (blanks are counted). |
|---|
| 23 | Elsewhere in the data matrix, blanks and numbers are ignored. The |
|---|
| 24 | default data matrix format is interleaved. If all the data for a |
|---|
| 25 | sequence are on one input line, then interleaved and noninterleaved |
|---|
| 26 | are equivalent. Following the data matrix there must be a line with |
|---|
| 27 | the number of user-specified trees for which rates are to be estimated |
|---|
| 28 | (as with the U option is fastDNAml). The rest of the input file is |
|---|
| 29 | one or more user-specified trees with branch lengths (as with the U |
|---|
| 30 | and L options in fastDNAml). |
|---|
| 31 | |
|---|
| 32 | The program writes to standard output. The output lists the estimated |
|---|
| 33 | rate of change at every site in the sequence, or "Undefined" if there |
|---|
| 34 | are not sufficient unambiguous data at the site. |
|---|
| 35 | |
|---|
| 36 | If the C option is specified, the program also categorizes the rates |
|---|
| 37 | into the requested number of categories. The current categorization |
|---|
| 38 | algorithm is rather crude, but is probably adequate if the number of |
|---|
| 39 | categories is large enough. A weighting mask is also created in which |
|---|
| 40 | sites with Undefined rates are assigned a weight of zero. |
|---|
| 41 | |
|---|
| 42 | If the Y option is specified, the program writes the weights and |
|---|
| 43 | categories data to a file in a format appropriate for use by |
|---|
| 44 | fastDNAml. |
|---|
| 45 | |
|---|
| 46 | |
|---|
| 47 | Options summary: |
|---|
| 48 | |
|---|
| 49 | - 1: print data. Toggles print data option (default = noprint). |
|---|
| 50 | |
|---|
| 51 | - C: write categories. Requires auxiliary line with a C and the desired |
|---|
| 52 | number of categories. |
|---|
| 53 | |
|---|
| 54 | - F: empirical base frequencies. Calculates base frequencies from data matrix, |
|---|
| 55 | rather than expecting a base frequency input line. |
|---|
| 56 | |
|---|
| 57 | - I: interleave. Toggles the data interleave option (default = interleave). |
|---|
| 58 | |
|---|
| 59 | - L: userlengths. This is implicit in the program, so the option is ignored. |
|---|
| 60 | |
|---|
| 61 | - M: minimum informative sequences. Requires an auxiliary data line with an |
|---|
| 62 | M and the minimum number of sequences in which a sequence position |
|---|
| 63 | (alignment column) must have unambiguous information in order for the rate |
|---|
| 64 | at the site to be defined (default = 4). |
|---|
| 65 | |
|---|
| 66 | - T: transitions/transversion ratio. Requires auxiliary line with a T and |
|---|
| 67 | the ration of observed transitions to transversions (default = 2.0). |
|---|
| 68 | |
|---|
| 69 | - U: user trees. This is implicit in the program, so the option is ignored. |
|---|
| 70 | |
|---|
| 71 | - W: user weights. Requires weights auxiliary data. |
|---|
| 72 | |
|---|
| 73 | - Y: categories file. Writes the weights and categories to a file. |
|---|
| 74 | |
|---|
| 75 | |
|---|
| 76 | The option scripts usertree, weights, n_categories and categories_file are |
|---|
| 77 | useful for adding the appropriate options to the input data matrix. |
|---|
| 78 | |
|---|
| 79 | The option script weights_categories is useful for adding the resulting |
|---|
| 80 | outfile to a fastDNAml input file. |
|---|
| 81 | |
|---|