Context Navigation

← Previous Revision
Next Revision →
Blame
Revision Log

README

Visit:

Last change on this file was 1754, checked in by westram, 23 years ago
updated to version 1.83
Property svn:eol-style set to `native` Property svn:keywords set to `Author Date Id Revision`
File size: 11.7 KB

Line
1	******************************************************************************
2
3	CLUSTAL W Multiple Sequence Alignment Program
4	(version 1.83, Feb 2003)
5
6	******************************************************************************
7
8
9	Please send bug reports, comments etc. to one of:-
10	gibson@embl-heidelberg.de
11	thompson@igbmc.u-strasbg.fr
12	d.higgins@ucc.ie
13
14
15	******************************************************************************
16
17	POLICY ON COMMERCIAL DISTRIBUTION OF CLUSTAL W
18
19	Clustal W is freely available to the user community. However, Clustal W is
20	increasingly being distributed as part of commercial sequence analysis
21	packages. To help us safeguard future maintenance and development, commercial
22	distributors of Clustal W must take out a NON-EXCLUSIVE LICENCE. Anyone
23	wishing to commercially distribute version 1.81 of Clustal W should contact the
24	authors unless they have previously taken out a licence.
25
26	******************************************************************************
27
28	Clustal W is written in ANSI-C and can be run on any machine with an ANSI-C
29	compiler. Executables are provided for several major platforms.
30
31	Changes since CLUSTAL X Version 1.82
32	------------------------------------
33
34	1. The FASTA format has been added to the list of alignment output options.
35
36	2. It is now possible to save the residue ranges (appended after the sequence
37	names) when saving a specified range of the alignment.
38
39	3. The efficiency of the neighour-joining algorithm has been improved. This
40	work was done by Tadashi Koike at the Center for Information Biology and DNA Data
41	Bank of Japan and FUJITSU Limited.
42
43	Some example speedups are given below : (timings on a SPARC64 CPU)
44
45	No. of sequences original NJ new NJ
46	200 0' 12" 0.1"
47	500 9' 19" 1.4"
48	1000 XXXX 0' 31"
49
50	Changes since version 1.8
51	--------------------------
52
53	1. ClustalW now returns error codes for some common errors when exiting. This
54	may be useful for people who run clustalw automatically from within a script.
55	Error codes are:
56	1 bad command line option
57	2 cannot open sequence file
58	3 wrong format in sequence file
59	4 sequence file contains only 1 sequence (for multiple alignments)
60
61	2. Alignments can now be saved in Nexus format, for compatibility with PAUP,
62	MacClade etc. For a description of the Nexus format, see:
63	Maddison, D. R., D. L. Swofford and W. P. Maddison. 1997.
64	NEXUS: an extensible file format for systematic information.
65	Systematic Biology 46:590-621.
66
67	3. Phylogenetic trees can also be saved in nexus format.
68
69	4. A ClustalW icon has been designed for MAC and PC systems.
70
71
72	Changes since version 1.74
73	--------------------------
74
75	1. Some work has been done to automatically select the optimal parameters
76	depending on the set of sequences to be aligned. The Gonnet series of residue
77	comparison matrices are now used by default. The Blosum series remains as an
78	option. The default gap extension penalty for proteins has been changed to 0.2
79	(was 0.05).The 'delay divergent sequences' option has been changed to 30%
80	residue identity (was 40%).
81
82	2. The default parameters used when the 'Negative matrix' option is selected
83	have been optimised. This option may help when the sequences to be aligned are
84	not superposable over their whole lengths (e.g. in the presence of N/C terminal
85	extensions).
86
87	3. A bug in the calculation of phylogenetic trees for 2 sequences has been
88	fixed.
89
90	4. A command line option has been added to turn off the sequence weighting
91	calculation.
92
93	5. The phylogenetic tree calculation now ignores any ambiguity codes in the
94	sequences.
95
96	6. A bug in the memory access during the calculation of profiles has been
97	fixed. (Thanks to Haruna Cofer at SGI).
98
99	7. A bug has been fixed in the 'transition weight' option for nucleic acid
100	sequences. (Thanks to Chanan Rubin at Compugen).
101
102	8. An option has been added to read in a series of comparison matrices from a
103	file. This option is only applicable for protein sequences. For details of the
104	file format, see the on-line documentation.
105
106	9. The MSF output file format has been changed. The sequence weights
107	calculated by Clustal W are now included in the header.
108
109	10. Two bugs in the FAST/APPROXIMATE pairwise alignments have been fixed. One
110	involved the alignment of new sequences to an existing profile using the fast
111	pairwise alignment option; the second was caused by changing the default
112	options for the fast pairwise alignments.
113
114	11. A bug in the alignment of a small number of sequences has been fixed.
115	Previously a Guide Tree was not calculated for less than 4 sequences.
116
117
118	Changes since version 1.6
119	-------------------------
120
121	1. The static arrays used by clustalw for storing the alignment data have been
122	replaced by dynamically allocated memory. There is now no limit on the number
123	or length of sequences which can be input.
124
125	2. The alignment of DNA sequences now offers a new hard-coded matrix, as well
126	as the identity matrix used previously. The new matrix is the default scoring
127	matrix used by the BESTFIT program of the GCG package for the comparison of
128	nucleic acid sequences. X's and N's are treated as matches to any IUB ambiguity
129	symbol. All matches score 1.9; all mismatches for IUB symbols score 0.0.
130
131	3. The transition weight option for aligning nucleotide sequences has been
132	changed from an on/off toggle to a weight between 0 and 1. A weight of zero
133	means that the transitions are scored as mismatches; a weight of 1 gives
134	transitions the full match score. For distantly related DNA sequences, the
135	weight should be near to zero; for closely related sequences it can be useful
136	to assign a higher score.
137
138	4. The RSF sequence alignment file format used by GCG Version 9 can now be
139	read.
140
141	5. The clustal sequence alignment file format has been changed to allow
142	sequence names longer than 10 characters. The maximum length allowed is set in
143	clustalw.h by the statement:
144	#define MAXNAMES 10
145
146	For the fasta format, the name is taken as the first string after the '>'
147	character, stopping at the first white space. (Previously, the first 10
148	characters were taken, replacing blanks by underscores).
149
150	6. The bootstrap values written in the phylip tree file format can be assigned
151	either to branches or nodes. The default is to write the values on the nodes,
152	as this can be read by several commonly-used tree display programs. But note
153	that this can lead to confusion if the tree is rooted and the bootstraps may
154	be better attached to the internal branches: Software developers should ensure
155	they can read the branch label format.
156
157	7. The sequence weighting used during sequence to profile alignments has been
158	changed. The tree weight is now multiplied by the percent identity of the
159	new sequence compared with the most closely related sequence in the profile.
160
161	8. The sequence weighting used during profile to profile alignments has been
162	changed. A guide tree is now built for each profile separately and the
163	sequence weights calculated from the two trees. The weights for each
164	sequence are then multiplied by the percent identity of the sequence compared
165	with the most closely related sequence in the opposite profile.
166
167	9. The adjustment of the Gap Opening and Gap Extension Penalties for sequences
168	of unequal length has been improved.
169
170	10. The default order of the sequences in the output alignment file has been
171	changed. Previously the default was to output the sequences in the same order
172	as the input file. Now the default is to use the order in which the sequences
173	were aligned (from the guide tree/dendrogram), thus automatically grouping
174	closely related sequences.
175
176	11. The option to 'Reset Gaps between alignments' has been switched off by
177	default.
178
179	12. The conservation line output in the clustal format alignment file has been
180	changed. Three characters are now used:
181	'*' indicates positions which have a single, fully conserved residue
182	':' indicates that one of the following 'strong' groups is fully conserved:-
183	STA
184	NEQK
185	NHQK
186	NDEQ
187	QHRK
188	MILV
189	MILF
190	HY
191	FYW
192
193	'.' indicates that one of the following 'weaker' groups is fully conserved:-
194	CSA
195	ATV
196	SAG
197	STNK
198	STPA
199	SGND
200	SNDEQK
201	NDEQHK
202	NEQHRK
203	FVLIM
204	HFY
205
206	These are all the positively scoring groups that occur in the Gonnet Pam250
207	matrix. The strong and weak groups are defined as strong score >0.5 and weak
208	score =<0.5 respectively.
209
210	13. A bug in the modification of the Myers and Miller alignment algorithm
211	for residue-specific gap penalites has been fixed. This occasionally caused
212	new gaps to be opened a few residues away from the optimal position.
213
214	14. The GCG/MSF input format no longer needs the word PILEUP on the first
215	line. Several versions can now be recognised:-
216	1. The word PILEUP as the first word in the file
217	2. The word !!AA_MULTIPLE_ALIGNMENT or !!NA_MULTIPLE_ALIGNMENT
218	as the first word in the file
219	3. The characters MSF on the first line in the line, and the
220	characters .. at the end of the line.
221
222	15. The standard command line separator for UNIX systems has been changed from
223	'/' to '-'. ie. to give options on the command line, you now type
224
225	clustalw input.aln -gapopen=8.0
226
227	instead of clustalw input.aln /gapopen=8.0
228
229
230	ATTENTION SOFTWARE DEVELOPERS!!
231	-------------------------------
232
233	The CLUSTAL sequence alignment output format was modified from version 1.7:
234
235	1. Names longer than 10 chars are now allowed. (The maximum is specified in
236	clustalw.h by '#define MAXNAMES'.)
237
238	2. The consensus line now consists of three characters: '*',':' and '.'. (Only
239	the '*' and '.' were previously used.)
240
241	3. An option (not the default) has been added, allowing the user to print out
242	sequence numbers at the end of each line of the alignment output.
243
244	4. Both RNA bases (U) and base ambiguities are now supported in nucleic acid
245	sequences. In the past, all characters (upper or lower case) other than
246	a,c,g,t or u were converted to N. Now the following characters are recognised
247	and retained in the alignment output: ABCDGHKMNRSTUVWXY (upper or lower case).
248
249	5. A Blank line inadvertently added in the version 1.6 header has been taken
250	out again.
251
252	CLUSTAL REFERENCES
253	------------------
254
255	Details of algorithms, implementation and useful tips on usage of Clustal
256	programs can be found in the following publications:
257
258	Jeanmougin,F., Thompson,J.D., Gouy,M., Higgins,D.G. and Gibson,T.J. (1998)
259	Multiple sequence alignment with Clustal X. Trends Biochem Sci, 23, 403-5.
260
261	Thompson,J.D., Gibson,T.J., Plewniak,F., Jeanmougin,F. and Higgins,D.G. (1997)
262	The ClustalX windows interface: flexible strategies for multiple sequence
263	alignment aided by quality analysis tools. Nucleic Acids Research, 24:4876-4882.
264
265	Higgins, D. G., Thompson, J. D. and Gibson, T. J. (1996) Using CLUSTAL for
266	multiple sequence alignments. Methods Enzymol., 266, 383-402.
267
268	Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTAL W: improving the
269	sensitivity of progressive multiple sequence alignment through sequence
270	weighting, positions-specific gap penalties and weight matrix choice. Nucleic
271	Acids Research, 22:4673-4680.
272
273	Higgins,D.G., Bleasby,A.J. and Fuchs,R. (1992) CLUSTAL V: improved software for
274	multiple sequence alignment. CABIOS 8,189-191.
275
276	Higgins,D.G. and Sharp,P.M. (1989) Fast and sensitive multiple sequence
277	alignments on a microcomputer. CABIOS 5,151-153.
278
279	Higgins,D.G. and Sharp,P.M. (1988) CLUSTAL: a package for performing multiple
280	sequence alignment on a microcomputer. Gene 73,237-244.

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: trunk/GDE/CLUSTALW/README

Download in other formats: