| 1 | # main topics: |
|---|
| 2 | UP arb.hlp |
|---|
| 3 | UP glossary.hlp |
|---|
| 4 | |
|---|
| 5 | # sub topics: |
|---|
| 6 | #SUB subtopic.hlp |
|---|
| 7 | |
|---|
| 8 | # format described in ../help.readme |
|---|
| 9 | |
|---|
| 10 | |
|---|
| 11 | TITLE Calculate sequence quality |
|---|
| 12 | |
|---|
| 13 | OCCURRENCE ARB_NT/Sequence/Calculate sequence quality |
|---|
| 14 | |
|---|
| 15 | DESCRIPTION 'Calculate sequence quality' tries to measure the quality of sequences and |
|---|
| 16 | the quality their alignment. |
|---|
| 17 | |
|---|
| 18 | HANDLING: |
|---|
| 19 | |
|---|
| 20 | Fill in the values you think are appropriate. |
|---|
| 21 | The default values are the values that worked best in the first test runs. |
|---|
| 22 | Many criteria are evaluated (see 'THE VALUES' below for details). |
|---|
| 23 | |
|---|
| 24 | A final "quality-value" (percentage) for each sequence is calculated |
|---|
| 25 | and all sequences below the given threshold may get marked. |
|---|
| 26 | |
|---|
| 27 | HOW IT WORKS: |
|---|
| 28 | |
|---|
| 29 | In the section "weights" you have quite a few options to fill in. |
|---|
| 30 | |
|---|
| 31 | These are some of the criteria used to evaluate the quality of the sequences.. |
|---|
| 32 | The values represent the share of the criteria in the final evaluation-formula. |
|---|
| 33 | All values represent percentages, therefore all values together should sum up to 100. |
|---|
| 34 | |
|---|
| 35 | The final evaluation value is stored in the field 'quality/ali_XXX/evaluation'. |
|---|
| 36 | |
|---|
| 37 | THE WEIGHTS: |
|---|
| 38 | |
|---|
| 39 | Base analysis: |
|---|
| 40 | |
|---|
| 41 | This is the number of bases that are stored in the sequence. "-" and "." are |
|---|
| 42 | not counted. |
|---|
| 43 | |
|---|
| 44 | Deviation: |
|---|
| 45 | |
|---|
| 46 | This is the deviation of the number of bases from a sequence to the average number |
|---|
| 47 | of bases in a group. |
|---|
| 48 | |
|---|
| 49 | No Helices: |
|---|
| 50 | |
|---|
| 51 | This is the number of positions in a sequence where a helix structure is expected, |
|---|
| 52 | but base pairings form no bond (i.e. are one of AA AC CC CT CU TT UU). |
|---|
| 53 | |
|---|
| 54 | The number of weak and strong base pairings are also calculated and stored in |
|---|
| 55 | quality database fields ('number_of_weak_helix' and 'number_of_strong_helix'), |
|---|
| 56 | but are NOT used for the final evaluation value. |
|---|
| 57 | |
|---|
| 58 | It is not possible to define which base pairings count as "none", "weak" or "strong", |
|---|
| 59 | like possible in LINK{helixsym.hlp}. |
|---|
| 60 | The sequence quality tool always uses the default values. |
|---|
| 61 | |
|---|
| 62 | Consensus: |
|---|
| 63 | |
|---|
| 64 | For each named group found in the tree (selected below) |
|---|
| 65 | a consensus sequence is calculated. |
|---|
| 66 | |
|---|
| 67 | Every species' sequence is compared against the consensus sequences |
|---|
| 68 | of all groups of which the species is a member. |
|---|
| 69 | |
|---|
| 70 | That comparison uses conformity with and deviation from the consensus sequence. |
|---|
| 71 | |
|---|
| 72 | # A consensus is computed from sequences in one group and then from subgroups to groups. |
|---|
| 73 | # So "multilevel" consensi are generated. |
|---|
| 74 | # The value consists of two analysis: Every sequence is tested against every level of the consensus. |
|---|
| 75 | # Conformity and deviation from the consensus are measured. |
|---|
| 76 | |
|---|
| 77 | IUPAC: |
|---|
| 78 | |
|---|
| 79 | This is the number of IUPAC-ambiguity-codes stored in a sequence. |
|---|
| 80 | |
|---|
| 81 | GC proportion: |
|---|
| 82 | |
|---|
| 83 | This is the deviation in GC proportion from a sequence to group. |
|---|
| 84 | |
|---|
| 85 | NOTES Generally speaking the consensus is the mightiest tool to evaluate the quality. So keep the |
|---|
| 86 | percentage high unless you know what you're doing or you want to evaluate with just one or |
|---|
| 87 | two values. |
|---|
| 88 | |
|---|
| 89 | Be aware that the computation is very complex and can easily take hours to finish. |
|---|
| 90 | So if you don't see the statusbar moving in the first ten minutes it just means |
|---|
| 91 | that you are analyzing a huge database. |
|---|
| 92 | |
|---|
| 93 | EXAMPLES None |
|---|
| 94 | |
|---|
| 95 | WARNINGS Sequence quality may no longer be calculated for protein data. |
|---|
| 96 | This had been allowed in the past, but only by mistake. |
|---|
| 97 | The resulting quality values were almost meaningless and partly wrong. |
|---|
| 98 | |
|---|
| 99 | BUGS No bugs known |
|---|