| 1 |  | 
|---|
| 2 | DNAml_rates_1_0 | 
|---|
| 3 |  | 
|---|
| 4 | Gary J. Olsen | 
|---|
| 5 |  | 
|---|
| 6 | August 14, 1992 | 
|---|
| 7 |  | 
|---|
| 8 |  | 
|---|
| 9 | The DNAml_rates program takes a set of sequences and corresponding | 
|---|
| 10 | phylogenetic tree and produces and maximum likelihood estimate of the | 
|---|
| 11 | rate of nucleotides substitution at each sequence position. | 
|---|
| 12 |  | 
|---|
| 13 | Input is read from standard input.  The format is very much like that | 
|---|
| 14 | of the fastDNAml program.  The first line of the input file gives the | 
|---|
| 15 | number of sequences and the number of bases per sequence.  Also on | 
|---|
| 16 | this line are the requested program option letters.  Any auxiliary | 
|---|
| 17 | data required by the options follow on subsequent lines.  Either the | 
|---|
| 18 | user must specify the empirical base frequencies (F) option, or | 
|---|
| 19 | immediately preceding the data matrix there must be a line of data | 
|---|
| 20 | with the frequencies of A, C, G and T.  Next, the program expects a | 
|---|
| 21 | data matrix.  The first 10 characters of the first line of data for a | 
|---|
| 22 | given sequence in interpreted as the name (blanks are counted). | 
|---|
| 23 | Elsewhere in the data matrix, blanks and numbers are ignored.  The | 
|---|
| 24 | default data matrix format is interleaved.  If all the data for a | 
|---|
| 25 | sequence are on one input line, then interleaved and noninterleaved | 
|---|
| 26 | are equivalent.  Following the data matrix there must be a line with | 
|---|
| 27 | the number of user-specified trees for which rates are to be estimated | 
|---|
| 28 | (as with the U option is fastDNAml).  The rest of the input file is | 
|---|
| 29 | one or more user-specified trees with branch lengths (as with the U | 
|---|
| 30 | and L options in fastDNAml). | 
|---|
| 31 |  | 
|---|
| 32 | The program writes to standard output.  The output lists the estimated | 
|---|
| 33 | rate of change at every site in the sequence, or "Undefined" if there | 
|---|
| 34 | are not sufficient unambiguous data at the site. | 
|---|
| 35 |  | 
|---|
| 36 | If the C option is specified, the program also categorizes the rates | 
|---|
| 37 | into the requested number of categories.  The current categorization | 
|---|
| 38 | algorithm is rather crude, but is probably adequate if the number of | 
|---|
| 39 | categories is large enough.  A weighting mask is also created in which | 
|---|
| 40 | sites with Undefined rates are assigned a weight of zero. | 
|---|
| 41 |  | 
|---|
| 42 | If the Y option is specified, the program writes the weights and | 
|---|
| 43 | categories data to a file in a format appropriate for use by | 
|---|
| 44 | fastDNAml. | 
|---|
| 45 |  | 
|---|
| 46 |  | 
|---|
| 47 | Options summary: | 
|---|
| 48 |  | 
|---|
| 49 | 1 - print data.  Toggles print data option (default = noprint). | 
|---|
| 50 |  | 
|---|
| 51 | C - write categories.  Requires auxiliary line with a C and the desired | 
|---|
| 52 | number of categories. | 
|---|
| 53 |  | 
|---|
| 54 | F - empirical base frequencies.  Calculates base frequencies from data matrix, | 
|---|
| 55 | rather than expecting a base frequency input line. | 
|---|
| 56 |  | 
|---|
| 57 | I - interleave.  Toggles the data interleave option (default = interleave). | 
|---|
| 58 |  | 
|---|
| 59 | L - userlengths.  This is implicit in the program, so the option is ignored. | 
|---|
| 60 |  | 
|---|
| 61 | M - minimum informative sequences.  Requires an auxiliary data line with an | 
|---|
| 62 | M and the minimum number of sequences in which a sequence position | 
|---|
| 63 | (alignment column) must have unambiguous information in order for the rate | 
|---|
| 64 | at the site to be defined (default = 4). | 
|---|
| 65 |  | 
|---|
| 66 | T - transitions/transversion ratio.  Requires auxiliary line with a T and | 
|---|
| 67 | the ration of observed transitions to transversions (default = 2.0). | 
|---|
| 68 |  | 
|---|
| 69 | U - user trees.  This is implicit in the program, so the option is ignored. | 
|---|
| 70 |  | 
|---|
| 71 | W - user weights.  Requires weights auxiliary data. | 
|---|
| 72 |  | 
|---|
| 73 | Y - categories file.  Writes the weights and categories to a file. | 
|---|
| 74 |  | 
|---|
| 75 |  | 
|---|
| 76 | The option scripts usertree, weights, n_categories and categories_file are | 
|---|
| 77 | useful for adding the appropriate options to the input data matrix. | 
|---|
| 78 |  | 
|---|
| 79 | The option script weights_categories is useful for adding the resulting | 
|---|
| 80 | outfile to a fastDNAml input file. | 
|---|
| 81 |  | 
|---|