source: tags/ms_r16q4/HELP_SOURCE/oldhelp/di_clusters.hlp

Last change on this file was 12877, checked in by westram, 10 years ago
File size: 6.5 KB
Line 
1#Please insert up references in the next lines (line starts with keyword UP)
2UP      arb.hlp
3UP      glossary.hlp
4
5#Please insert subtopic references  (line starts with keyword SUB)
6#SUB    subtopic.hlp
7
8# Hypertext links in helptext can be added like this: LINK{ref.hlp|http://add|bla@domain}
9
10#************* Title of helpfile !! and start of real helpfile ********
11TITLE           Cluster detection
12
13OCCURRENCE      ARB_DIST
14
15DESCRIPTION     Cluster detection searches for subtrees of the tree selected
16                in the ARB_DIST main window, that form homologous groups of sequences.
17
18                The main prerequisite for cluster detection to work well is a good tree,
19                preferable a tree optimized with ARB_PARSIMONY (since cluster detection uses
20                the same distance function as ARB_PARSIMONY does).
21
22                You may control the distance calculation by selecting filter and/or weights
23                in the ARB_DIST main window.
24
25                The following parameters define which subtrees will be reported as clusters
26                - Max. distance inside each cluster (no two sequences in a cluster have
27                  bigger distance than specified). Specify the distance as percentage of
28                  mutations, 100 means every base differs, 0 means no base differs
29                - Min cluster size (clusters below that size are ignored)
30
31
32                Press 'Detect clusters' to start the cluster detection..
33
34                The clusters matching the given
35                parameters will be displayed in the list below.
36                Each line contains the following information:
37
38                     - number of species in cluster
39                     - mean distance [min. - max.distance]
40                     - minimal bases used for distance calculation (weighted)
41                     - a generated cluster description
42
43                Each cluster contains one so called 'representative'.
44                The representative is the species in the cluster with the least
45                mean distance to all other cluster members.
46
47SECTION Working with found clusters
48
49                Marking
50
51                        You can mark the members of the currently selected cluster by clicking
52                        on the 'Mark' button. Below that button you may select whether to mark
53                         - all species in the cluster,
54                         - all species in the cluster despite the representative or
55                         - only the representative.
56
57                        The second mode is useful when you plan to remove all but the
58                        representative from the tree.
59
60                        You may also mark ALL clusters by clicking on the 'Mark all' button.
61                        This is handy to expand all cluster in the tree or to load all clusters into the
62                        sequence editor.
63
64                Auto mark
65
66                        If you enable the 'Auto mark' toggle, ARB will automatically mark the cluster
67                        as soon as you select it in the list.
68
69                Selecting representative
70
71                        If this option is checked, the representative species of the selected cluster
72                        will be become the 'selected species'.
73
74                Storing intermediate results
75
76                        You may store the displayed clusters by either pressing
77                        - 'Store selected' or
78                        - 'Store all'
79
80                        The number of currently stored clusters will be displayed on
81                        the restore button. By pressing that button, you can restore
82                        these clusters.
83
84                        Press 'Swap stored' to exchange stored clusters with displayed
85                        clusters.
86
87                        Storing result will be useful to compare results of two cluster detections
88                        with different parameters.
89
90                Delete results
91
92                        You can delete results using 'Delete selected' or 'Clear list'.
93
94                Cluster groups
95
96                        Create groups for found clusters
97
98NOTES           The performance of the cluster detection is very sensitive to the parameters:
99
100                - Shortly said: Big cluster size + small max.distance => faster calculation
101                - A cluster size of 2 forces all sequences to be loaded. This consumes time and memory and
102                  may render the calculation impossible.
103                - Opposed a minimum cluster size of 10 only loads about 20% of the sequence
104                  data (in best case), a size of 20 will only load about 10% of data.
105
106                - The bigger the maximum allowed distance is, the more clusters will be found,
107                  hence the more has to be calculated.
108                - So if you got no idea about what distance to use, start with a low
109                  distance (e.g. 0.01) and if you don't find any clusters, increase
110                  the distance stepwise.
111
112EXAMPLES        One use case is to reduce a given tree by removing clones or very nearly
113                related species and only keeping one of them as representative of
114                the so formed OTU.
115
116                Steps:
117
118                        - search clusters
119                        - examine found clusters and delete those you'd like to keep
120                        - uncheck 'Mark representative' and click 'Mark all'
121                        - in ARB_NTREE call 'Tree/Remove species from tree/Remove marked'
122
123                Another use case is to create groups.
124
125                        If you choose higher values for the maximum distance allowed
126                        in found clusters and for the minimum cluster size, the found
127                        clusters might be good candidates to create groups.
128
129WARNINGS        Be careful when the minimum distance reported for a cluster is zero.
130                This may have 2 reasons:
131
132                     - two sequences are identical (in filtered region)
133                     - one sequence is empty (in filtered region)
134
135                In the second case, the results are meaningless and the empty sequence
136                will be used as representative (which makes no sense).
137
138                As a second indicator the min. number of base positions used for distance
139                calculation is listed for each cluster. When this gets low or zero the result
140                get more and more random.
141
142BUGS            No bugs known
Note: See TracBrowser for help on using the repository browser.