Context Navigation

← Previous Revision
Next Revision →
Blame
Revision Log

seq_quality.hlp

Visit:

Last change on this file was 19532, checked in by westram, 4 months ago
reintegrates 'help' into 'trunk' tweak arb documentation: automatically link ticket references to arb tracker (only affects html version). found URLs. page titles warn about long titles. introduce `SUBTITLE`s (automatically triggered by multi-line titles in source files). increase allowed length (limited by subwindow width). cleanup header sections in all helpfiles. fix and/or update several help files. document syntax of help sources. build issues: when xml validation fails, next build no longer uses invalid xml ⇒ keeps failing. remove output files on error (including files below `ARBHOME/lib`). pipe output through logs to ensure proper wrapping in `Entering/Leaving` lines. moves `Tree admin` + `NDS` menu entries to top of menu adds: log:branches/help@18783:19531
Property svn:eol-style set to `native` Property svn:keywords set to `Author Date Id Revision`
File size: 3.9 KB

Line
1	# main topics:
2	UP arb.hlp
3	UP glossary.hlp
4
5	# sub topics:
6	#SUB subtopic.hlp
7
8	# format described in ../help.readme
9
10
11	TITLE Calculate sequence quality
12
13	OCCURRENCE ARB_NT/Sequence/Calculate sequence quality
14
15	DESCRIPTION 'Calculate sequence quality' tries to measure the quality of sequences and
16	the quality their alignment.
17
18	HANDLING:
19
20	Fill in the values you think are appropriate.
21	The default values are the values that worked best in the first test runs.
22	Many criteria are evaluated (see 'THE VALUES' below for details).
23
24	A final "quality-value" (percentage) for each sequence is calculated
25	and all sequences below the given threshold may get marked.
26
27	HOW IT WORKS:
28
29	In the section "weights" you have quite a few options to fill in.
30
31	These are some of the criteria used to evaluate the quality of the sequences..
32	The values represent the share of the criteria in the final evaluation-formula.
33	All values represent percentages, therefore all values together should sum up to 100.
34
35	The final evaluation value is stored in the field 'quality/ali_XXX/evaluation'.
36
37	THE WEIGHTS:
38
39	Base analysis:
40
41	This is the number of bases that are stored in the sequence. "-" and "." are
42	not counted.
43
44	Deviation:
45
46	This is the deviation of the number of bases from a sequence to the average number
47	of bases in a group.
48
49	No Helices:
50
51	This is the number of positions in a sequence where a helix structure is expected,
52	but base pairings form no bond (i.e. are one of AA AC CC CT CU TT UU).
53
54	The number of weak and strong base pairings are also calculated and stored in
55	quality database fields ('number_of_weak_helix' and 'number_of_strong_helix'),
56	but are NOT used for the final evaluation value.
57
58	It is not possible to define which base pairings count as "none", "weak" or "strong",
59	like possible in LINK{helixsym.hlp}.
60	The sequence quality tool always uses the default values.
61
62	Consensus:
63
64	For each named group found in the tree (selected below)
65	a consensus sequence is calculated.
66
67	Every species' sequence is compared against the consensus sequences
68	of all groups of which the species is a member.
69
70	That comparison uses conformity with and deviation from the consensus sequence.
71
72	# A consensus is computed from sequences in one group and then from subgroups to groups.
73	# So "multilevel" consensi are generated.
74	# The value consists of two analysis: Every sequence is tested against every level of the consensus.
75	# Conformity and deviation from the consensus are measured.
76
77	IUPAC:
78
79	This is the number of IUPAC-ambiguity-codes stored in a sequence.
80
81	GC proportion:
82
83	This is the deviation in GC proportion from a sequence to group.
84
85	NOTES Generally speaking the consensus is the mightiest tool to evaluate the quality. So keep the
86	percentage high unless you know what you're doing or you want to evaluate with just one or
87	two values.
88
89	Be aware that the computation is very complex and can easily take hours to finish.
90	So if you don't see the statusbar moving in the first ten minutes it just means
91	that you are analyzing a huge database.
92
93	EXAMPLES None
94
95	WARNINGS Sequence quality may no longer be calculated for protein data.
96	This had been allowed in the past, but only by mistake.
97	The resulting quality values were almost meaningless and partly wrong.
98
99	BUGS No bugs known

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format