Context Navigation

di_clusters.hlp

Visit:

Last change on this file was 12877, checked in by westram, 10 years ago
reintegrates 'cluster' into 'trunk' implements #259 adds: log:branches/cluster@12869:12876
File size: 6.5 KB

Line
1	#Please insert up references in the next lines (line starts with keyword UP)
2	UP arb.hlp
3	UP glossary.hlp
4
5	#Please insert subtopic references (line starts with keyword SUB)
6	#SUB subtopic.hlp
7
8	# Hypertext links in helptext can be added like this: LINK{ref.hlp\|http://add\|bla@domain}
9
10	#*********** Title of helpfile !! and start of real helpfile ******
11	TITLE Cluster detection
12
13	OCCURRENCE ARB_DIST
14
15	DESCRIPTION Cluster detection searches for subtrees of the tree selected
16	in the ARB_DIST main window, that form homologous groups of sequences.
17
18	The main prerequisite for cluster detection to work well is a good tree,
19	preferable a tree optimized with ARB_PARSIMONY (since cluster detection uses
20	the same distance function as ARB_PARSIMONY does).
21
22	You may control the distance calculation by selecting filter and/or weights
23	in the ARB_DIST main window.
24
25	The following parameters define which subtrees will be reported as clusters
26	- Max. distance inside each cluster (no two sequences in a cluster have
27	bigger distance than specified). Specify the distance as percentage of
28	mutations, 100 means every base differs, 0 means no base differs
29	- Min cluster size (clusters below that size are ignored)
30
31
32	Press 'Detect clusters' to start the cluster detection..
33
34	The clusters matching the given
35	parameters will be displayed in the list below.
36	Each line contains the following information:
37
38	- number of species in cluster
39	- mean distance [min. - max.distance]
40	- minimal bases used for distance calculation (weighted)
41	- a generated cluster description
42
43	Each cluster contains one so called 'representative'.
44	The representative is the species in the cluster with the least
45	mean distance to all other cluster members.
46
47	SECTION Working with found clusters
48
49	Marking
50
51	You can mark the members of the currently selected cluster by clicking
52	on the 'Mark' button. Below that button you may select whether to mark
53	- all species in the cluster,
54	- all species in the cluster despite the representative or
55	- only the representative.
56
57	The second mode is useful when you plan to remove all but the
58	representative from the tree.
59
60	You may also mark ALL clusters by clicking on the 'Mark all' button.
61	This is handy to expand all cluster in the tree or to load all clusters into the
62	sequence editor.
63
64	Auto mark
65
66	If you enable the 'Auto mark' toggle, ARB will automatically mark the cluster
67	as soon as you select it in the list.
68
69	Selecting representative
70
71	If this option is checked, the representative species of the selected cluster
72	will be become the 'selected species'.
73
74	Storing intermediate results
75
76	You may store the displayed clusters by either pressing
77	- 'Store selected' or
78	- 'Store all'
79
80	The number of currently stored clusters will be displayed on
81	the restore button. By pressing that button, you can restore
82	these clusters.
83
84	Press 'Swap stored' to exchange stored clusters with displayed
85	clusters.
86
87	Storing result will be useful to compare results of two cluster detections
88	with different parameters.
89
90	Delete results
91
92	You can delete results using 'Delete selected' or 'Clear list'.
93
94	Cluster groups
95
96	Create groups for found clusters
97
98	NOTES The performance of the cluster detection is very sensitive to the parameters:
99
100	- Shortly said: Big cluster size + small max.distance => faster calculation
101	- A cluster size of 2 forces all sequences to be loaded. This consumes time and memory and
102	may render the calculation impossible.
103	- Opposed a minimum cluster size of 10 only loads about 20% of the sequence
104	data (in best case), a size of 20 will only load about 10% of data.
105
106	- The bigger the maximum allowed distance is, the more clusters will be found,
107	hence the more has to be calculated.
108	- So if you got no idea about what distance to use, start with a low
109	distance (e.g. 0.01) and if you don't find any clusters, increase
110	the distance stepwise.
111
112	EXAMPLES One use case is to reduce a given tree by removing clones or very nearly
113	related species and only keeping one of them as representative of
114	the so formed OTU.
115
116	Steps:
117
118	- search clusters
119	- examine found clusters and delete those you'd like to keep
120	- uncheck 'Mark representative' and click 'Mark all'
121	- in ARB_NTREE call 'Tree/Remove species from tree/Remove marked'
122
123	Another use case is to create groups.
124
125	If you choose higher values for the maximum distance allowed
126	in found clusters and for the minimum cluster size, the found
127	clusters might be good candidates to create groups.
128
129	WARNINGS Be careful when the minimum distance reported for a cluster is zero.
130	This may have 2 reasons:
131
132	- two sequences are identical (in filtered region)
133	- one sequence is empty (in filtered region)
134
135	In the second case, the results are meaningless and the empty sequence
136	will be used as representative (which makes no sense).
137
138	As a second indicator the min. number of base positions used for distance
139	calculation is listed for each cluster. When this gets low or zero the result
140	get more and more random.
141
142	BUGS No bugs known

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format