| 1 | # main topics: |
|---|
| 2 | UP arb.hlp |
|---|
| 3 | UP glossary.hlp |
|---|
| 4 | |
|---|
| 5 | # sub topics: |
|---|
| 6 | #SUB subtopic.hlp |
|---|
| 7 | |
|---|
| 8 | # format described in ../help.readme |
|---|
| 9 | |
|---|
| 10 | |
|---|
| 11 | TITLE Maximum base frequency |
|---|
| 12 | |
|---|
| 13 | Calculate the Percentage of the Most Frequent Base |
|---|
| 14 | |
|---|
| 15 | OCCURRENCE ARB_NT/SAI/create SAI/Max Frequency |
|---|
| 16 | |
|---|
| 17 | DESCRIPTION Finds the most frequent base (or gap) in each column for all marked |
|---|
| 18 | species. Then the number of all sequences with this base are |
|---|
| 19 | divided by: |
|---|
| 20 | |
|---|
| 21 | * the number of all marked sequences (if not ignoring gaps) |
|---|
| 22 | * the number of bases in this column (if ignoring gaps) |
|---|
| 23 | |
|---|
| 24 | The resulting percentage is divided by ten and then the second last |
|---|
| 25 | digit is taken: |
|---|
| 26 | |
|---|
| 27 | 0% - 19% -> '1' (does not occur for nucleotides) |
|---|
| 28 | 20% - 29% -> '2' |
|---|
| 29 | 30% - 39% -> '3' |
|---|
| 30 | ... |
|---|
| 31 | 90% - 99% -> '9' |
|---|
| 32 | 100% -> '0' |
|---|
| 33 | |
|---|
| 34 | |
|---|
| 35 | NOTE The result can be used as a conservation profile and filter. |
|---|
| 36 | Rule of thumb: |
|---|
| 37 | the higher the number, the more conserved the position (but mind the '0' which means 100%!). |
|---|
| 38 | |
|---|
| 39 | Internally the SAI consists of two lines: the main line called 'data' and a second line called 'dat2'. |
|---|
| 40 | |
|---|
| 41 | The first is used when you use the SAI as conservation profile or filter and |
|---|
| 42 | contains the SECOND LAST digit of the calculated frequencies. |
|---|
| 43 | |
|---|
| 44 | The second contains the LAST digit of the calculated frequencies. It is not used and does only |
|---|
| 45 | show up, when you load the SAI into ARB_EDIT4, where it will show both lines. |
|---|
| 46 | |
|---|
| 47 | EXAMPLES Say one column contains 7 A's 4 G's and 5 Gaps. |
|---|
| 48 | |
|---|
| 49 | * ignoring gaps will result in 7/11 == 64 % which is converted to '6'. |
|---|
| 50 | * otherwise we get 7/16 == 44% which will be indicated by a '4' in the target sequence. |
|---|
| 51 | |
|---|
| 52 | SECTION Gaps |
|---|
| 53 | |
|---|
| 54 | If gaps are ignored '-' are treated like '.': both get removed and frequency is calculated on non-gaps only. |
|---|
| 55 | |
|---|
| 56 | If gaps are NOT ignored, '-' are treated like non-gaps, i.e. a column containing only '-' will be assigned a |
|---|
| 57 | max. frequency of 100%. '.' are treated as gaps. |
|---|
| 58 | |
|---|
| 59 | SECTION Ambiguities |
|---|
| 60 | |
|---|
| 61 | Ambiguities are counted proportionally, i.e. |
|---|
| 62 | |
|---|
| 63 | * a 'N' counts as 1/4 'A', 1/4 'C', 1/4 'G' and 1/4 'T' |
|---|
| 64 | * a 'D' counts as 1/3 'A', 1/3 'G' and 1/3 'T' |
|---|
| 65 | * a 'Y' counts as 1/2 'C' and 1/2 'T' |
|---|
| 66 | |
|---|
| 67 | Example: |
|---|
| 68 | |
|---|
| 69 | A column containing 9 'C' and one 'Y' results in a max. frequency of 95% (=9.5 'C'). |
|---|
| 70 | |
|---|
| 71 | BUGS No bugs known yet |
|---|