1 | #Please insert up references in the next lines (line starts with keyword UP) |
---|
2 | UP arb.hlp |
---|
3 | UP glossary.hlp |
---|
4 | |
---|
5 | #Please insert subtopic references (line starts with keyword SUB) |
---|
6 | #SUB subtopic.hlp |
---|
7 | |
---|
8 | # Hypertext links in helptext can be added like this: LINK{ref.hlp|http://add|bla@domain} |
---|
9 | |
---|
10 | #************* Title of helpfile !! and start of real helpfile ******** |
---|
11 | TITLE Chimera check |
---|
12 | |
---|
13 | OCCURRENCE ARB_NT/Sequence/Chimera check |
---|
14 | |
---|
15 | DESCRIPTION Takes sequences, a tree and a column statistic as input, |
---|
16 | and generates a short sequence quality output string, which |
---|
17 | will be stored into the database under a user defined key. |
---|
18 | |
---|
19 | First the sequences are split into different slices: |
---|
20 | |
---|
21 | - 2 pieces (front and back half) |
---|
22 | - 5 equally sized pieces |
---|
23 | - user defined pieces |
---|
24 | |
---|
25 | The programs sums up the weighted mutations for each sequence slice |
---|
26 | using a maximum likelihood technique. |
---|
27 | |
---|
28 | For each slice a students t-test (see LINK{http://en.wikipedia.org/wiki/T-test}) is |
---|
29 | performed and its result is written into the XXX portions of the entries mentioned below. |
---|
30 | The t-test tests whether the likelihood of a specific sequence slice (of one species) follows |
---|
31 | a t-distribution of the likelihoods for that sequence slice in all examined species. |
---|
32 | |
---|
33 | The meaning of each X contains the result of the t-test (the "t-value") as follows: |
---|
34 | * if the t-test succeeds the value of X is '1' up to '8' (where '5' is shown as '-'). |
---|
35 | * if the t-test fails '0' or '9' is written to the X's |
---|
36 | * if there is not enough data to perform the t-test, '.' is written to the X's |
---|
37 | |
---|
38 | Rule of thumb: Values near '0' or '9' indicate regions with an abnormal, values |
---|
39 | near '5' ('-') regions with a normal (i.e. expectable) number of weighted mutations. |
---|
40 | |
---|
41 | The sequence quality string written into a user-definable species |
---|
42 | field has the following format: |
---|
43 | |
---|
44 | MED SUM aXX bXXXXX cXXXXX...XXXX |
---|
45 | |
---|
46 | where: |
---|
47 | * MED is the median of all t-values (0.0 = normal; <5.0 = succeeds t-test (mean); >5.0 = abnormal) |
---|
48 | * SUM is the sum of all t-values |
---|
49 | * aXX shows the quality for 2 pieces |
---|
50 | * bXXXXX shows the quality for 5 pieces |
---|
51 | * cXXXXX...XXXX shows the quality for user defined slices |
---|
52 | |
---|
53 | Optionally a 'quality' entry may be written to the alignment, allowing |
---|
54 | to display it in EDIT4 below the sequence. That quality entry simply is |
---|
55 | a "blown up" version of the "cXXXXX...XXXX" part of the sequence quality |
---|
56 | field. |
---|
57 | |
---|
58 | If 'Keep existing reports' is unchecked, the selected 'species field' and |
---|
59 | the quality entry in the alignment are removed from ALL species when running |
---|
60 | a (new) chimera check. |
---|
61 | |
---|
62 | By clicking the 'Remove them now' button they will be removed w/o running |
---|
63 | a new check. Make sure the selected 'species field' is correct! |
---|
64 | |
---|
65 | NOTES Only sequences which are in the tree are used. |
---|
66 | |
---|
67 | Slices in high variance regions more easily pass the t-test. |
---|
68 | |
---|
69 | Slices from sequences with higher overall variance more easily pass the t-test. |
---|
70 | |
---|
71 | WARNINGS Needs really a lot of memory! |
---|
72 | |
---|
73 | BUGS None |
---|