source: trunk/HELP_SOURCE/source/awt_csp.hlp

Last change on this file was 18769, checked in by westram, 3 years ago
  • move all helpfiles to new source location
  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
File size: 2.5 KB
Line 
1#Please insert up references in the next lines (line starts with keyword UP)
2UP      arb.hlp
3UP      glossary.hlp
4
5#Please insert subtopic references  (line starts with keyword SUB)
6SUB     pos_var_pars.hlp
7
8# Hypertext links in helptext can be added like this: LINK{ref.hlp|http://add|bla@domain}
9
10#************* Title of helpfile !! and start of real helpfile strunk********
11TITLE           Estimate Parameters from Column Statistics
12
13OCCURRENCE      ARB_DIST
14
15DESCRIPTION     In a standard RNA, base frequencies are not equally
16                distributed. Especially in the archea subclass we find
17                extremely G+C rich sequences.
18                This yielded in a couple of new rate corrections, algorithms
19                and programs which:
20
21                        - calculate the average G+C content of all/two sequences
22                        - correct the distance.
23
24                But further research showed us that the G+C frequencies are
25                not equally distributed within a sequence. Especially helical
26                parts have a significant higher G+C content than non
27                helical parts.
28                One strait forward algorithm would calculate each frequency
29                independently for each column.
30                Especially for small datasets the resulting frequencies would
31                look like random data, as too few examples are analyzed.
32
33                In ARB we implemented a combination of the 2 approaches.
34                        Lets say we want to estimate a Parameter 'P' with
35                        a maximum variance 'maxvar', so we need a minimum
36                        samples 'minsap'.
37
38                        - All sequence positions are clustered according to
39
40                                - helical/non helical region
41                                - variability
42
43                          The size of the cluster is choosen with respect
44                          to the variability of the sequences to get a
45                          minimum of independent events.
46
47                        - The final parameter estimate for a column is a
48                          weighted sum between the estimate for the
49                          cluster and the estimate for the single position.
50
51                You can give your favorite method a higher weight by
52                controlling the smoothing parameter:
53
54                        Less smoothing -> independent parameter estimates
55
56                        Much smoothing -> clustered parameter estimates
57
58                To get a good tree we recommend you to try all selections.
59
60NOTES           To get parameters from a column statistic you first have
61                to create one.
62                Do this with <ARB_NT/SAI/Positional Variability (Parsimony M.)>
63
64WARNINGS        Problems may occur when
65
66                         1. 'independent parameter estimates' is selected and
67                         2. your dataset is quite small (<100 Sequences) and
68                         3. one sequence is bad or badly aligned
69
70                or
71
72                         1. Much smoothing of parameters is selected and
73                         2. you are analyzing ribosomal RNA and
74                         3. 'Use Helix Information' is turned off
75
76
77BUGS            No bugs known
Note: See TracBrowser for help on using the repository browser.