| 1 | #Please insert up references in the next lines (line starts with keyword UP) |
|---|
| 2 | UP arb.hlp |
|---|
| 3 | UP glossary.hlp |
|---|
| 4 | |
|---|
| 5 | #Please insert subtopic references (line starts with keyword SUB) |
|---|
| 6 | #SUB subtopic.hlp |
|---|
| 7 | |
|---|
| 8 | # Hypertext links in helptext can be added like this: LINK{ref.hlp|http://add|bla@domain} |
|---|
| 9 | |
|---|
| 10 | #************* Title of helpfile !! and start of real helpfile ******** |
|---|
| 11 | TITLE Chimera check |
|---|
| 12 | |
|---|
| 13 | OCCURRENCE ARB_NT/Sequence/Chimera check |
|---|
| 14 | |
|---|
| 15 | DESCRIPTION Takes sequences, a tree and a column statistic as input, |
|---|
| 16 | and generates a short sequence quality output string, which |
|---|
| 17 | will be stored into the database under a user defined key. |
|---|
| 18 | |
|---|
| 19 | First the sequences are split into different slices: |
|---|
| 20 | |
|---|
| 21 | - 2 pieces (front and back half) |
|---|
| 22 | - 5 equally sized pieces |
|---|
| 23 | - user defined pieces |
|---|
| 24 | |
|---|
| 25 | The programs sums up the weighted mutations for each sequence slice |
|---|
| 26 | using a maximum likelihood technique. |
|---|
| 27 | |
|---|
| 28 | For each slice a students t-test (see LINK{http://en.wikipedia.org/wiki/T-test}) is |
|---|
| 29 | performed and its result is written into the XXX portions of the entries mentioned below. |
|---|
| 30 | The t-test tests whether the likelihood of a specific sequence slice (of one species) follows |
|---|
| 31 | a t-distribution of the likelihoods for that sequence slice in all examined species. |
|---|
| 32 | |
|---|
| 33 | The meaning of each X contains the result of the t-test (the "t-value") as follows: |
|---|
| 34 | * if the t-test succeeds the value of X is '1' up to '8' (where '5' is shown as '-'). |
|---|
| 35 | * if the t-test fails '0' or '9' is written to the X's |
|---|
| 36 | * if there is not enough data to perform the t-test, '.' is written to the X's |
|---|
| 37 | |
|---|
| 38 | Rule of thumb: Values near '0' or '9' indicate regions with an abnormal, values |
|---|
| 39 | near '5' ('-') regions with a normal (i.e. expectable) number of weighted mutations. |
|---|
| 40 | |
|---|
| 41 | The sequence quality string written into a user-definable species |
|---|
| 42 | field has the following format: |
|---|
| 43 | |
|---|
| 44 | MED SUM aXX bXXXXX cXXXXX...XXXX |
|---|
| 45 | |
|---|
| 46 | where: |
|---|
| 47 | * MED is the median of all t-values (0.0 = normal; <5.0 = succeeds t-test (mean); >5.0 = abnormal) |
|---|
| 48 | * SUM is the sum of all t-values |
|---|
| 49 | * aXX shows the quality for 2 pieces |
|---|
| 50 | * bXXXXX shows the quality for 5 pieces |
|---|
| 51 | * cXXXXX...XXXX shows the quality for user defined slices |
|---|
| 52 | |
|---|
| 53 | Optionally a 'quality' entry may be written to the alignment, allowing |
|---|
| 54 | to display it in EDIT4 below the sequence. That quality entry simply is |
|---|
| 55 | a "blown up" version of the "cXXXXX...XXXX" part of the sequence quality |
|---|
| 56 | field. |
|---|
| 57 | |
|---|
| 58 | If 'Keep existing reports' is unchecked, the selected 'species field' and |
|---|
| 59 | the quality entry in the alignment are removed from ALL species when running |
|---|
| 60 | a (new) chimera check. |
|---|
| 61 | |
|---|
| 62 | By clicking the 'Remove them now' button they will be removed w/o running |
|---|
| 63 | a new check. Make sure the selected 'species field' is correct! |
|---|
| 64 | |
|---|
| 65 | NOTES Only sequences which are in the tree are used. |
|---|
| 66 | |
|---|
| 67 | Slices in high variance regions more easily pass the t-test. |
|---|
| 68 | |
|---|
| 69 | Slices from sequences with higher overall variance more easily pass the t-test. |
|---|
| 70 | |
|---|
| 71 | WARNINGS Needs really a lot of memory! |
|---|
| 72 | |
|---|
| 73 | BUGS None |
|---|