Context Navigation

← Previous Revision
Next Revision →
Blame
Revision Log

awt_csp.hlp

Visit:

Last change on this file was 19532, checked in by westram, 4 months ago
reintegrates 'help' into 'trunk' tweak arb documentation: automatically link ticket references to arb tracker (only affects html version). found URLs. page titles warn about long titles. introduce `SUBTITLE`s (automatically triggered by multi-line titles in source files). increase allowed length (limited by subwindow width). cleanup header sections in all helpfiles. fix and/or update several help files. document syntax of help sources. build issues: when xml validation fails, next build no longer uses invalid xml ⇒ keeps failing. remove output files on error (including files below `ARBHOME/lib`). pipe output through logs to ensure proper wrapping in `Entering/Leaving` lines. moves `Tree admin` + `NDS` menu entries to top of menu adds: log:branches/help@18783:19531
Property svn:eol-style set to `native` Property svn:keywords set to `Author Date Id Revision`
File size: 2.3 KB

Line
1	# main topics:
2	UP arb.hlp
3	UP glossary.hlp
4
5	# sub topics:
6	SUB pos_var_pars.hlp
7
8	# format described in ../help.readme
9
10
11	TITLE Estimate Parameters from Column Statistics
12
13	OCCURRENCE ARB_DIST
14
15	DESCRIPTION In a standard RNA, base frequencies are not equally
16	distributed. Especially in the archea subclass we find
17	extremely G+C rich sequences.
18	This yielded in a couple of new rate corrections, algorithms
19	and programs which:
20
21	- calculate the average G+C content of all/two sequences
22	- correct the distance.
23
24	But further research showed us that the G+C frequencies are
25	not equally distributed within a sequence. Especially helical
26	parts have a significant higher G+C content than non
27	helical parts.
28	One strait forward algorithm would calculate each frequency
29	independently for each column.
30	Especially for small datasets the resulting frequencies would
31	look like random data, as too few examples are analyzed.
32
33	In ARB we implemented a combination of the 2 approaches.
34	Lets say we want to estimate a Parameter 'P' with
35	a maximum variance 'maxvar', so we need a minimum
36	samples 'minsap'.
37
38	- All sequence positions are clustered according to
39
40	- helical/non helical region
41	- variability
42
43	The size of the cluster is choosen with respect
44	to the variability of the sequences to get a
45	minimum of independent events.
46
47	- The final parameter estimate for a column is a
48	weighted sum between the estimate for the
49	cluster and the estimate for the single position.
50
51	You can give your favorite method a higher weight by
52	controlling the smoothing parameter:
53
54	Less smoothing -> independent parameter estimates
55
56	Much smoothing -> clustered parameter estimates
57
58	To get a good tree we recommend you to try all selections.
59
60	NOTES To get parameters from a column statistic you first have
61	to create one.
62	Do this with <ARB_NT/SAI/Positional Variability (Parsimony M.)>
63
64	WARNINGS Problems may occur when
65
66	1. 'independent parameter estimates' is selected and
67	2. your dataset is quite small (<100 Sequences) and
68	3. one sequence is bad or badly aligned
69
70	or
71
72	1. Much smoothing of parameters is selected and
73	2. you are analyzing ribosomal RNA and
74	3. 'Use Helix Information' is turned off
75
76
77	BUGS No bugs known

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format