1 | #Please insert up references in the next lines (line starts with keyword UP) |
---|
2 | UP arb.hlp |
---|
3 | UP glossary.hlp |
---|
4 | |
---|
5 | #Please insert subtopic references (line starts with keyword SUB) |
---|
6 | #SUB subtopic.hlp |
---|
7 | |
---|
8 | # Hypertext links in helptext can be added like this: LINK{ref.hlp|http://add|bla@domain} |
---|
9 | |
---|
10 | #************* Title of helpfile !! and start of real helpfile ******** |
---|
11 | TITLE Calculate the Percentage of the Most Frequent Base |
---|
12 | |
---|
13 | OCCURRENCE ARB_NT/SAI/create SAI/Max Frequency |
---|
14 | |
---|
15 | DESCRIPTION Finds the most frequent base (or gap) in each column for all marked |
---|
16 | species. Then the number of all sequences with this base are |
---|
17 | divided by: |
---|
18 | |
---|
19 | * the number of all marked sequences (if not ignoring gaps) |
---|
20 | * the number of bases in this column (if ignoring gaps) |
---|
21 | |
---|
22 | The resulting percentage is divided by ten and then the second last |
---|
23 | digit is taken: |
---|
24 | |
---|
25 | 0% - 19% -> '1' (does not occur for nucleotides) |
---|
26 | 20% - 29% -> '2' |
---|
27 | 30% - 39% -> '3' |
---|
28 | ... |
---|
29 | 90% - 99% -> '9' |
---|
30 | 100% -> '0' |
---|
31 | |
---|
32 | |
---|
33 | NOTE The result can be used as a conservation profile and filter. |
---|
34 | Rule of thumb: |
---|
35 | the higher the number, the more conserved the position (but mind the '0' which means 100%!). |
---|
36 | |
---|
37 | Internally the SAI consists of two lines: the main line called 'data' and a second line called 'dat2'. |
---|
38 | |
---|
39 | The first is used when you use the SAI as conservation profile or filter and |
---|
40 | contains the SECOND LAST digit of the calculated frequencies. |
---|
41 | |
---|
42 | The second contains the LAST digit of the calculated frequencies. It is not used and does only |
---|
43 | show up, when you load the SAI into ARB_EDIT4, where it will show both lines. |
---|
44 | |
---|
45 | EXAMPLES Say one column contains 7 A's 4 G's and 5 Gaps. |
---|
46 | |
---|
47 | * ignoring gaps will result in 7/11 == 64 % which is converted to '6'. |
---|
48 | * otherwise we get 7/16 == 44% which will be indicated by a '4' in the target sequence. |
---|
49 | |
---|
50 | SECTION Gaps |
---|
51 | |
---|
52 | If gaps are ignored '-' are treated like '.': both get removed and frequency is calculated on non-gaps only. |
---|
53 | |
---|
54 | If gaps are NOT ignored, '-' are treated like non-gaps, i.e. a column containing only '-' will be assigned a |
---|
55 | max. frequency of 100%. '.' are treated as gaps. |
---|
56 | |
---|
57 | SECTION Ambiguities |
---|
58 | |
---|
59 | Ambiguities are counted proportionally, i.e. |
---|
60 | |
---|
61 | * a 'N' counts as 1/4 'A', 1/4 'C', 1/4 'G' and 1/4 'T' |
---|
62 | * a 'D' counts as 1/3 'A', 1/3 'G' and 1/3 'T' |
---|
63 | * a 'Y' counts as 1/2 'C' and 1/2 'T' |
---|
64 | |
---|
65 | Example: |
---|
66 | |
---|
67 | A column containing 9 'C' and one 'Y' results in a max. frequency of 95% (=9.5 'C'). |
---|
68 | |
---|
69 | BUGS No bugs known yet |
---|