source: trunk/HELP_SOURCE/source/awt_csp.hlp

Last change on this file was 19532, checked in by westram, 5 weeks ago
  • reintegrates 'help' into 'trunk'
    • tweak arb documentation:
      • automatically link
        • ticket references to arb bug tracker (only affects html version).
        • found URLs.
      • page titles
        • warn about long titles.
        • introduce SUBTITLEs (automatically triggered by multi-line titles in source files).
        • increase allowed length (limited by subwindow width).
      • cleanup header sections in all helpfiles.
      • fix and/or update several help files.
      • document syntax of help sources.
      • build issues:
        • when xml validation fails, next build no longer uses invalid xml ⇒ keeps failing.
        • remove output files on error (including files below ARBHOME/lib).
        • pipe output through logs to ensure proper wrapping in Entering/Leaving lines.
    • moves Tree admin + NDS menu entries to top of menu
  • adds: log:branches/help@18783:19531
  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
File size: 2.3 KB
Line 
1#       main topics:
2UP      arb.hlp
3UP      glossary.hlp
4
5#       sub topics:
6SUB     pos_var_pars.hlp
7
8# format described in ../help.readme
9
10
11TITLE           Estimate Parameters from Column Statistics
12
13OCCURRENCE      ARB_DIST
14
15DESCRIPTION     In a standard RNA, base frequencies are not equally
16                distributed. Especially in the archea subclass we find
17                extremely G+C rich sequences.
18                This yielded in a couple of new rate corrections, algorithms
19                and programs which:
20
21                        - calculate the average G+C content of all/two sequences
22                        - correct the distance.
23
24                But further research showed us that the G+C frequencies are
25                not equally distributed within a sequence. Especially helical
26                parts have a significant higher G+C content than non
27                helical parts.
28                One strait forward algorithm would calculate each frequency
29                independently for each column.
30                Especially for small datasets the resulting frequencies would
31                look like random data, as too few examples are analyzed.
32
33                In ARB we implemented a combination of the 2 approaches.
34                        Lets say we want to estimate a Parameter 'P' with
35                        a maximum variance 'maxvar', so we need a minimum
36                        samples 'minsap'.
37
38                        - All sequence positions are clustered according to
39
40                                - helical/non helical region
41                                - variability
42
43                          The size of the cluster is choosen with respect
44                          to the variability of the sequences to get a
45                          minimum of independent events.
46
47                        - The final parameter estimate for a column is a
48                          weighted sum between the estimate for the
49                          cluster and the estimate for the single position.
50
51                You can give your favorite method a higher weight by
52                controlling the smoothing parameter:
53
54                        Less smoothing -> independent parameter estimates
55
56                        Much smoothing -> clustered parameter estimates
57
58                To get a good tree we recommend you to try all selections.
59
60NOTES           To get parameters from a column statistic you first have
61                to create one.
62                Do this with <ARB_NT/SAI/Positional Variability (Parsimony M.)>
63
64WARNINGS        Problems may occur when
65
66                         1. 'independent parameter estimates' is selected and
67                         2. your dataset is quite small (<100 Sequences) and
68                         3. one sequence is bad or badly aligned
69
70                or
71
72                         1. Much smoothing of parameters is selected and
73                         2. you are analyzing ribosomal RNA and
74                         3. 'Use Helix Information' is turned off
75
76
77BUGS            No bugs known
Note: See TracBrowser for help on using the repository browser.