source: trunk/HELP_SOURCE/source/awt_csp.hlp

Last change on this file was 19708, checked in by westram, 3 months ago
  • update doc+gui:
    • avoid terms "ARB_NT" (=former name of main arb window), "ARB_NTREE" and similar
      • instead talk about "ARB main window"
    • correct refs to 'ARB/Probes/PT_SERVER Admin'.
    • update 'What are marked species?'.
    • update protection level documentation (parts already done with previous commit).
  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
File size: 2.3 KB
Line 
1#       main topics:
2UP      arb.hlp
3UP      glossary.hlp
4
5#       sub topics:
6SUB     pos_var_pars.hlp
7
8# format described in ../help.readme
9
10
11TITLE           Estimate Parameters from Column Statistics
12
13OCCURRENCE      ARB_DIST
14
15DESCRIPTION     In a standard RNA, base frequencies are not equally
16                distributed. Especially in the archea subclass we find
17                extremely G+C rich sequences.
18                This yielded in a couple of new rate corrections, algorithms
19                and programs which:
20
21                        - calculate the average G+C content of all/two sequences
22                        - correct the distance.
23
24                But further research showed us that the G+C frequencies are
25                not equally distributed within a sequence. Especially helical
26                parts have a significant higher G+C content than non
27                helical parts.
28                One strait forward algorithm would calculate each frequency
29                independently for each column.
30                Especially for small datasets the resulting frequencies would
31                look like random data, as too few examples are analyzed.
32
33                In ARB we implemented a combination of the 2 approaches.
34                        Lets say we want to estimate a Parameter 'P' with
35                        a maximum variance 'maxvar', so we need a minimum
36                        samples 'minsap'.
37
38                        - All sequence positions are clustered according to
39
40                                - helical/non helical region
41                                - variability
42
43                          The size of the cluster is choosen with respect
44                          to the variability of the sequences to get a
45                          minimum of independent events.
46
47                        - The final parameter estimate for a column is a
48                          weighted sum between the estimate for the
49                          cluster and the estimate for the single position.
50
51                You can give your favorite method a higher weight by
52                controlling the smoothing parameter:
53
54                        Less smoothing -> independent parameter estimates
55
56                        Much smoothing -> clustered parameter estimates
57
58                To get a good tree we recommend you to try all selections.
59
60NOTES           To get parameters from a column statistic you first have
61                to create one.
62                Do this with <ARB/SAI/Create SAI using../Positional variability.. (parsimony method)>
63
64WARNINGS        Problems may occur when
65
66                         1. 'independent parameter estimates' is selected and
67                         2. your dataset is quite small (<100 Sequences) and
68                         3. one sequence is bad or badly aligned
69
70                or
71
72                         1. Much smoothing of parameters is selected and
73                         2. you are analyzing ribosomal RNA and
74                         3. 'Use Helix Information' is turned off
75
76
77BUGS            No bugs known
Note: See TracBrowser for help on using the repository browser.