1 | #Please insert up references in the next lines (line starts with keyword UP) |
---|
2 | UP arb.hlp |
---|
3 | UP glossary.hlp |
---|
4 | |
---|
5 | #Please insert subtopic references (line starts with keyword SUB) |
---|
6 | #SUB subtopic.hlp |
---|
7 | |
---|
8 | # Hypertext links in helptext can be added like this: LINK{ref.hlp|http://add|bla@domain} |
---|
9 | |
---|
10 | #************* Title of helpfile !! and start of real helpfile ******** |
---|
11 | TITLE Calculate sequence quality |
---|
12 | |
---|
13 | OCCURRENCE ARB_NT/Sequence/Calculate sequence quality |
---|
14 | |
---|
15 | DESCRIPTION 'Calculate sequence quality' tries to measure the quality of sequences and |
---|
16 | the quality their alignment. |
---|
17 | |
---|
18 | HANDLING: |
---|
19 | |
---|
20 | Fill in the values you think are appropriate. |
---|
21 | The default values are the values that worked best in the first test runs. |
---|
22 | Many criteria are evaluated (see 'THE VALUES' below for details). |
---|
23 | |
---|
24 | A final "quality-value" (percentage) for each sequence is calculated |
---|
25 | and all sequences below the given threshold may get marked. |
---|
26 | |
---|
27 | HOW IT WORKS: |
---|
28 | |
---|
29 | In the section "weights" you have quite a few options to fill in. |
---|
30 | |
---|
31 | These are some of the criteria used to evaluate the quality of the sequences.. |
---|
32 | The values represent the share of the criteria in the final evaluation-formula. |
---|
33 | All values represent percentages, therefore all values together should sum up to 100. |
---|
34 | |
---|
35 | The final evaluation value is stored in the field 'quality/ali_XXX/evaluation'. |
---|
36 | |
---|
37 | THE WEIGHTS: |
---|
38 | |
---|
39 | Base analysis: |
---|
40 | |
---|
41 | This is the number of bases that are stored in the sequence. "-" and "." are |
---|
42 | not counted. |
---|
43 | |
---|
44 | Deviation: |
---|
45 | |
---|
46 | This is the deviation of the number of bases from a sequence to the average number |
---|
47 | of bases in a group. |
---|
48 | |
---|
49 | No Helices: |
---|
50 | |
---|
51 | This is the number of positions in a sequence where a helix structure is expected, |
---|
52 | but base pairings form no bond (i.e. are one of AA AC CC CT CU TT UU). |
---|
53 | |
---|
54 | The number of weak and strong base pairings are also calculated and stored in |
---|
55 | quality database fields ('number_of_weak_helix' and 'number_of_strong_helix'), |
---|
56 | but are NOT used for the final evaluation value. |
---|
57 | |
---|
58 | It is not possible to define which base pairings count as "none", "weak" or "strong", |
---|
59 | like possible in LINK{helixsym.hlp}. |
---|
60 | The sequence quality tool always uses the default values. |
---|
61 | |
---|
62 | Consensus: |
---|
63 | |
---|
64 | For each named group found in the tree (selected below) |
---|
65 | a consensus sequence is calculated. |
---|
66 | |
---|
67 | Every species' sequence is compared against the consensus sequences |
---|
68 | of all groups of which the species is a member. |
---|
69 | |
---|
70 | That comparison uses conformity with and deviation from the consensus sequence. |
---|
71 | |
---|
72 | # A consensus is computed from sequences in one group and then from subgroups to groups. |
---|
73 | # So "multilevel" consensi are generated. |
---|
74 | # The value consists of two analysis: Every sequence is tested against every level of the consensus. |
---|
75 | # Conformity and deviation from the consensus are measured. |
---|
76 | |
---|
77 | IUPAC: |
---|
78 | |
---|
79 | This is the number of IUPAC-ambiguity-codes stored in a sequence. |
---|
80 | |
---|
81 | GC proportion: |
---|
82 | |
---|
83 | This is the deviation in GC proportion from a sequence to group. |
---|
84 | |
---|
85 | NOTES Generally speaking the consensus is the mightiest tool to evaluate the quality. So keep the |
---|
86 | percentage high unless you know what you're doing or you want to evaluate with just one or |
---|
87 | two values. |
---|
88 | |
---|
89 | Be aware that the computation is very complex and can easily take hours to finish. |
---|
90 | So if you don't see the statusbar moving in the first ten minutes it just means |
---|
91 | that you are analyzing a huge database. |
---|
92 | |
---|
93 | EXAMPLES None |
---|
94 | |
---|
95 | WARNINGS Sequence quality may no longer be calculated for protein data. |
---|
96 | This had been allowed in the past, but only by mistake. |
---|
97 | The resulting quality values were almost meaningless and partly wrong. |
---|
98 | |
---|
99 | BUGS No bugs known |
---|