1 | #Please insert up references in the next lines (line starts with keyword UP) |
---|
2 | UP arb.hlp |
---|
3 | UP glossary.hlp |
---|
4 | |
---|
5 | #Please insert subtopic references (line starts with keyword SUB) |
---|
6 | #SUB subtopic.hlp |
---|
7 | |
---|
8 | # Hypertext links in helptext can be added like this: LINK{ref.hlp|http://add|bla@domain} |
---|
9 | |
---|
10 | #************* Title of helpfile !! and start of real helpfile ******** |
---|
11 | TITLE Cluster detection |
---|
12 | |
---|
13 | OCCURRENCE ARB_DIST |
---|
14 | |
---|
15 | DESCRIPTION Cluster detection searches for subtrees of the tree selected |
---|
16 | in the ARB_DIST main window, that form homologous groups of sequences. |
---|
17 | |
---|
18 | The main prerequisite for cluster detection to work well is a good tree, |
---|
19 | preferable a tree optimized with ARB_PARSIMONY (since cluster detection uses |
---|
20 | the same distance function as ARB_PARSIMONY does). |
---|
21 | |
---|
22 | You may control the distance calculation by selecting filter and/or weights |
---|
23 | in the ARB_DIST main window. |
---|
24 | |
---|
25 | The following parameters define which subtrees will be reported as clusters |
---|
26 | - Max. distance inside each cluster (no two sequences in a cluster have |
---|
27 | bigger distance than specified). Specify the distance as percentage of |
---|
28 | mutations, 100 means every base differs, 0 means no base differs |
---|
29 | - Min cluster size (clusters below that size are ignored) |
---|
30 | |
---|
31 | |
---|
32 | Press 'Detect clusters' to start the cluster detection.. |
---|
33 | |
---|
34 | The clusters matching the given |
---|
35 | parameters will be displayed in the list below. |
---|
36 | Each line contains the following information: |
---|
37 | |
---|
38 | - number of species in cluster |
---|
39 | - mean distance [min. - max.distance] |
---|
40 | - minimal bases used for distance calculation (weighted) |
---|
41 | - a generated cluster description |
---|
42 | |
---|
43 | Each cluster contains one so called 'representative'. |
---|
44 | The representative is the species in the cluster with the least |
---|
45 | mean distance to all other cluster members. |
---|
46 | |
---|
47 | SECTION Working with found clusters |
---|
48 | |
---|
49 | Marking |
---|
50 | |
---|
51 | You can mark the members of the currently selected cluster by clicking |
---|
52 | on the 'Mark' button. Below that button you may select whether to mark |
---|
53 | - all species in the cluster, |
---|
54 | - all species in the cluster despite the representative or |
---|
55 | - only the representative. |
---|
56 | |
---|
57 | The second mode is useful when you plan to remove all but the |
---|
58 | representative from the tree. |
---|
59 | |
---|
60 | You may also mark ALL clusters by clicking on the 'Mark all' button. |
---|
61 | This is handy to expand all cluster in the tree or to load all clusters into the |
---|
62 | sequence editor. |
---|
63 | |
---|
64 | Auto mark |
---|
65 | |
---|
66 | If you enable the 'Auto mark' toggle, ARB will automatically mark the cluster |
---|
67 | as soon as you select it in the list. |
---|
68 | |
---|
69 | Selecting representative |
---|
70 | |
---|
71 | If this option is checked, the representative species of the selected cluster |
---|
72 | will be become the LINK{selected.hlp}. |
---|
73 | |
---|
74 | Storing intermediate results |
---|
75 | |
---|
76 | You may store the displayed clusters by either pressing |
---|
77 | - 'Store selected' or |
---|
78 | - 'Store all' |
---|
79 | |
---|
80 | The number of currently stored clusters will be displayed on |
---|
81 | the restore button. By pressing that button, you can restore |
---|
82 | these clusters. |
---|
83 | |
---|
84 | Press 'Swap stored' to exchange stored clusters with displayed |
---|
85 | clusters. |
---|
86 | |
---|
87 | Storing result will be useful to compare results of two cluster detections |
---|
88 | with different parameters. |
---|
89 | |
---|
90 | Delete results |
---|
91 | |
---|
92 | You can delete results using 'Delete selected' or 'Clear list'. |
---|
93 | |
---|
94 | Cluster groups |
---|
95 | |
---|
96 | Create groups for found clusters |
---|
97 | |
---|
98 | NOTES The performance of the cluster detection is very sensitive to the parameters: |
---|
99 | |
---|
100 | - Shortly said: Big cluster size + small max.distance => faster calculation |
---|
101 | - A cluster size of 2 forces all sequences to be loaded. This consumes time and memory and |
---|
102 | may render the calculation impossible. |
---|
103 | - Opposed a minimum cluster size of 10 only loads about 20% of the sequence |
---|
104 | data (in best case), a size of 20 will only load about 10% of data. |
---|
105 | |
---|
106 | - The bigger the maximum allowed distance is, the more clusters will be found, |
---|
107 | hence the more has to be calculated. |
---|
108 | - So if you got no idea about what distance to use, start with a low |
---|
109 | distance (e.g. 0.01) and if you don't find any clusters, increase |
---|
110 | the distance stepwise. |
---|
111 | |
---|
112 | EXAMPLES One use case is to reduce a given tree by removing clones or very nearly |
---|
113 | related species and only keeping one of them as representative of |
---|
114 | the so formed OTU. |
---|
115 | |
---|
116 | Steps: |
---|
117 | |
---|
118 | - search clusters |
---|
119 | - examine found clusters and delete those you'd like to keep |
---|
120 | - uncheck 'Mark representative' and click 'Mark all' |
---|
121 | - in ARB_NTREE call 'Tree/Remove species from tree/Remove marked' |
---|
122 | |
---|
123 | Another use case is to create groups. |
---|
124 | |
---|
125 | If you choose higher values for the maximum distance allowed |
---|
126 | in found clusters and for the minimum cluster size, the found |
---|
127 | clusters might be good candidates to create groups. |
---|
128 | |
---|
129 | WARNINGS Be careful when the minimum distance reported for a cluster is zero. |
---|
130 | This may have 2 reasons: |
---|
131 | |
---|
132 | - two sequences are identical (in filtered region) |
---|
133 | - one sequence is empty (in filtered region) |
---|
134 | |
---|
135 | In the second case, the results are meaningless and the empty sequence |
---|
136 | will be used as representative (which makes no sense). |
---|
137 | |
---|
138 | As a second indicator the min. number of base positions used for distance |
---|
139 | calculation is listed for each cluster. When this gets low or zero the result |
---|
140 | get more and more random. |
---|
141 | |
---|
142 | BUGS No bugs known |
---|