source: branches/profile/GDE/CLUSTALW/clustalw_help

Last change on this file was 10842, checked in by westram, 10 years ago
  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
File size: 32.1 KB
Line 
1
2This is the on-line help file for CLUSTAL W ( version 1.83).   
3
4It should be named or defined as: clustalw_help
5except with MSDOS in which case it should be named CLUSTALW.HLP
6
7For full details of usage and algorithms, please read the CLUSTALW.DOC file.
8
9
10Toby  Gibson                         EMBL, Heidelberg, Germany.
11Des   Higgins                        UCC, Cork, Ireland.
12Julie Thompson                       IGBMC, Strasbourg, France.
13
14
15
16>>NEW <<
17
18  Fasta output
19  ===========
20
21  Write/Read sequence with range specified. The command line syntax
22   for range specification is flexible. You can use one of the following
23   syntax.
24
25       -range=n:m 
26       -range=n-m
27       -range="n m"
28
29   where m is the starting and m is the length of the sequence.
30
31  Range and range numbers.
32  =======================
33
34  Include range numbers in the ouput.
35
36       -seqno_range=on/off
37
38  The sequence range will be appended as to the names of the sequence.
39
40
41  PIM: Percentage Identity Matrix
42  ===============================
43
44
45
46>>HELP 1 <<             General help for CLUSTAL W (1.81)
47
48Clustal W is a general purpose multiple alignment program for DNA or proteins.
49
50SEQUENCE INPUT:  all sequences must be in 1 file, one after another. 
517 formats are automatically recognised: NBRF-PIR, EMBL-SWISSPROT,
52Pearson (Fasta), Clustal (*.aln), GCG-MSF (Pileup), GCG9-RSF and GDE flat file.
53All non-alphabetic characters (spaces, digits, punctuation marks) are ignored
54except "-" which is used to indicate a GAP ("." in MSF-RSF). 
55
56To do a MULTIPLE ALIGNMENT on a set of sequences, use item 1 from this menu to
57INPUT them; go to menu item 2 to do the multiple alignment.
58
59PROFILE ALIGNMENTS (menu item 3) are used to align 2 alignments.  Use this to
60add a new sequence to an old alignment, or to use secondary structure to guide
61the alignment process.  GAPS in the old alignments are indicated using the "-"
62character.   PROFILES can be input in ANY of the allowed formats; just
63use "-" (or "." for MSF-RSF) for each gap position.
64
65PHYLOGENETIC TREES (menu item 4) can be calculated from old alignments (read in
66with "-" characters to indicate gaps) OR after a multiple alignment while the
67alignment is still in memory.
68
69
70The program tries to automatically recognise the different file formats used
71and to guess whether the sequences are amino acid or nucleotide.  This is not
72always foolproof.
73
74FASTA and NBRF-PIR formats are recognised by having a ">" as the first
75character in the file. 
76
77EMBL-Swiss Prot formats are recognised by the letters
78ID at the start of the file (the token for the entry name field). 
79
80CLUSTAL format is recognised by the word CLUSTAL at the beginning of the file.
81
82GCG-MSF format is recognised by one of the following:
83       - the word PileUp at the start of the file.
84       - the word !!AA_MULTIPLE_ALIGNMENT or !!NA_MULTIPLE_ALIGNMENT
85         at the start of the file.
86       - the word MSF on the first line of the line, and the characters ..
87         at the end of this line.
88
89GCG-RSF format is recognised by the word !!RICH_SEQUENCE at the beginning of
90the file.
91
92
93If 85% or more of the characters in the sequence are from A,C,G,T,U or N, the
94sequence will be assumed to be nucleotide.  This works in 97.3% of cases
95but watch out!
96
97>>HELP 2 <<      Help for multiple alignments
98
99If you have already loaded sequences, use menu item 1 to do the complete
100multiple alignment.  You will be prompted for 2 output files: 1 for the
101alignment itself; another to store a dendrogram that describes the similarity
102of the sequences to each other.
103
104Multiple alignments are carried out in 3 stages (automatically done from menu
105item 1 ...Do complete multiple alignments now):
106
1071) all sequences are compared to each other (pairwise alignments);
108
1092) a dendrogram (like a phylogenetic tree) is constructed, describing the
110approximate groupings of the sequences by similarity (stored in a file).
111
1123) the final multiple alignment is carried out, using the dendrogram as a guide.
113
114
115PAIRWISE ALIGNMENT parameters control the speed-sensitivity of the initial
116alignments.
117
118MULTIPLE ALIGNMENT parameters control the gaps in the final multiple alignments.
119
120
121RESET GAPS (menu item 7) will remove any new gaps introduced into the sequences
122during multiple alignment if you wish to change the parameters and try again.
123This only takes effect just before you do a second multiple alignment.  You
124can make phylogenetic trees after alignment whether or not this is ON.
125If you turn this OFF, the new gaps are kept even if you do a second multiple
126alignment. This allows you to iterate the alignment gradually.  Sometimes, the
127alignment is improved by a second or third pass.
128
129SCREEN DISPLAY (menu item 8) can be used to send the output alignments to the
130screen as well as to the output file.
131
132You can skip the first stages (pairwise alignments; dendrogram) by using an
133old dendrogram file (menu item 3); or you can just produce the dendrogram
134with no final multiple alignment (menu item 2).
135
136
137OUTPUT FORMAT: Menu item 9 (format options) allows you to choose from 6
138different alignment formats (CLUSTAL, GCG, NBRF-PIR, PHYLIP, GDE, NEXUS, and FASTA). 
139
140
141>>HELP 3 <<      Help for pairwise alignment parameters
142A distance is calculated between every pair of sequences and these are used to
143construct the dendrogram which guides the final multiple alignment. The scores
144are calculated from separate pairwise alignments. These can be calculated using
1452 methods: dynamic programming (slow but accurate) or by the method of Wilbur
146and Lipman (extremely fast but approximate).
147
148You can choose between the 2 alignment methods using menu option 8.  The
149slow-accurate method is fine for short sequences but will be VERY SLOW for
150many (e.g. >100) long (e.g. >1000 residue) sequences.   
151
152SLOW-ACCURATE alignment parameters:
153        These parameters do not have any affect on the speed of the alignments.
154They are used to give initial alignments which are then rescored to give percent
155identity scores.  These % scores are the ones which are displayed on the
156screen.  The scores are converted to distances for the trees.
157
1581) Gap Open Penalty:      the penalty for opening a gap in the alignment.
1592) Gap extension penalty: the penalty for extending a gap by 1 residue.
1603) Protein weight matrix: the scoring table which describes the similarity
161                          of each amino acid to each other.
1624) DNA weight matrix:     the scores assigned to matches and mismatches
163                          (including IUB ambiguity codes).
164
165
166FAST-APPROXIMATE alignment parameters:
167
168These similarity scores are calculated from fast, approximate, global alignments,
169which are controlled by 4 parameters.   2 techniques are used to make
170these alignments very fast: 1) only exactly matching fragments (k-tuples) are
171considered; 2) only the 'best' diagonals (the ones with most k-tuple matches)
172are used.
173
174K-TUPLE SIZE:  This is the size of exactly matching fragment that is used.
175INCREASE for speed (max= 2 for proteins; 4 for DNA), DECREASE for sensitivity.
176For longer sequences (e.g. >1000 residues) you may need to increase the default.
177
178GAP PENALTY:   This is a penalty for each gap in the fast alignments.  It has
179little affect on the speed or sensitivity except for extreme values.
180
181TOP DIAGONALS: The number of k-tuple matches on each diagonal (in an imaginary
182dot-matrix plot) is calculated.  Only the best ones (with most matches) are
183used in the alignment.  This parameter specifies how many.  Decrease for speed;
184increase for sensitivity.
185
186WINDOW SIZE:  This is the number of diagonals around each of the 'best'
187diagonals that will be used.  Decrease for speed; increase for sensitivity.
188
189
190>>HELP 4 <<      Help for multiple alignment parameters
191
192These parameters control the final multiple alignment. This is the core of the
193program and the details are complicated. To fully understand the use of the
194parameters and the scoring system, you will have to refer to the documentation.
195
196Each step in the final multiple alignment consists of aligning two alignments
197or sequences.  This is done progressively, following the branching order in
198the GUIDE TREE.  The basic parameters to control this are two gap penalties and
199the scores for various identical-non-identical residues. 
200
2011) and ..
202
2032) The GAP PENALTIES are set by menu items 1 and 2. These control the
204cost of opening up every new gap and the cost of every item in a gap.
205Increasing the gap opening penalty will make gaps less frequent. Increasing
206the gap extension penalty will make gaps shorter. Terminal gaps are not
207penalised.
208
2093) The DELAY DIVERGENT SEQUENCES switch delays the alignment of the most
210distantly related sequences until after the most closely related sequences have
211been aligned.   The setting shows the percent identity level required to delay
212the addition of a sequence; sequences that are less identical than this level
213to any other sequences will be aligned later.
214
215
216
2174) The TRANSITION WEIGHT gives transitions (A <--> G or C <--> T
218i.e. purine-purine or pyrimidine-pyrimidine substitutions) a weight between 0
219and 1; a weight of zero means that the transitions are scored as mismatches,
220while a weight of 1 gives the transitions the match score. For distantly related
221DNA sequences, the weight should be near to zero; for closely related sequences
222it can be useful to assign a higher score.
223
224
2255) PROTEIN WEIGHT MATRIX leads to a new menu where you are offered a choice of
226weight matrices. The default for proteins in version 1.8 is the PAM series
227derived by Gonnet and colleagues. Note, a series is used! The actual matrix
228that is used depends on how similar the sequences to be aligned at this
229alignment step are. Different matrices work differently at each evolutionary
230distance.
231
2326) DNA WEIGHT MATRIX leads to a new menu where a single matrix (not a series)
233can be selected. The default is the matrix used by BESTFIT for comparison of
234nucleic acid sequences.
235
236Further help is offered in the weight matrix menu.
237
238
2397)  In the weight matrices, you can use negative as well as positive values if
240you wish, although the matrix will be automatically adjusted to all positive
241scores, unless the NEGATIVE MATRIX option is selected.
242
2438) PROTEIN GAP PARAMETERS displays a menu allowing you to set some Gap Penalty
244options which are only used in protein alignments.
245
246 
247>>HELP A <<           Help for protein gap parameters.
2481) RESIDUE SPECIFIC PENALTIES are amino acid specific gap penalties that reduce
249or increase the gap opening penalties at each position in the alignment or
250sequence.  See the documentation for details.  As an example, positions that
251are rich in glycine are more likely to have an adjacent gap than positions that
252are rich in valine.
253
2542) [and ..]
255
2563) HYDROPHILIC GAP PENALTIES are used to increase the chances of a gap within
257a run (5 or more residues) of hydrophilic amino acids; these are likely to
258be loop or random coil regions where gaps are more common.  The residues that
259are "considered" to be hydrophilic are set by menu item 3.
260
2614) GAP SEPARATION DISTANCE tries to decrease the chances of gaps being too
262close to each other. Gaps that are less than this distance apart are penalised
263more than other gaps. This does not prevent close gaps; it makes them less
264frequent, promoting a block-like appearance of the alignment.
265
2665) END GAP SEPARATION treats end gaps just like internal gaps for the purposes
267of avoiding gaps that are too close (set by GAP SEPARATION DISTANCE above).
268If you turn this off, end gaps will be ignored for this purpose.  This is
269useful when you wish to align fragments where the end gaps are not biologically
270meaningful.
271>>HELP 5 <<      Help for output format options.
272
273Six output formats are offered. You can choose any (or all 6 if you wish). 
274
275CLUSTAL format output is a self explanatory alignment format.  It shows the
276sequences aligned in blocks.  It can be read in again at a later date to
277(for example) calculate a phylogenetic tree or add a new sequence with a
278profile alignment.
279
280GCG output can be used by any of the GCG programs that can work on multiple
281alignments (e.g. PRETTY, PROFILEMAKE, PLOTALIGN).  It is the same as the GCG
282.msf format files (multiple sequence file); new in version 7 of GCG.
283
284PHYLIP format output can be used for input to the PHYLIP package of Joe
285Felsenstein.  This is an extremely widely used package for doing every
286imaginable form of phylogenetic analysis (MUCH more than the the modest
287introduction offered by this program).
288
289NBRF-PIR:  this is the same as the standard PIR format with ONE ADDITION.  Gap
290characters "-" are used to indicate the positions of gaps in the multiple
291alignment.  These files can be re-used as input in any part of clustal that
292allows sequences (or alignments or profiles) to be read in. 
293
294GDE:  this is the flat file format used by the GDE package of Steven Smith.
295
296NEXUS: the format used by several phylogeny programs, including PAUP and
297MacClade.
298
299GDE OUTPUT CASE: sequences in GDE format may be written in either upper or
300lower case.
301
302CLUSTALW SEQUENCE NUMBERS: residue numbers may be added to the end of the
303alignment lines in clustalw format.
304
305OUTPUT ORDER is used to control the order of the sequences in the output
306alignments.  By default, the order corresponds to the order in which the
307sequences were aligned (from the guide tree-dendrogram), thus automatically
308grouping closely related sequences. This switch can be used to set the order
309to the same as the input file.
310
311PARAMETER OUTPUT: This option allows you to save all your parameter settings
312in a parameter file. This file can be used subsequently to rerun Clustal W
313using the same parameters.
314
315>>HELP 6 <<      Help for profile and structure alignments
316   
317By PROFILE ALIGNMENT, we mean alignment using existing alignments. Profile
318alignments allow you to store alignments of your favourite sequences and add
319new sequences to them in small bunches at a time. A profile is simply an
320alignment of one or more sequences (e.g. an alignment output file from CLUSTAL
321W). Each input can be a single sequence. One or both sets of input sequences
322may include secondary structure assignments or gap penalty masks to guide the
323alignment.
324
325The profiles can be in any of the allowed input formats with "-" characters
326used to specify gaps (except for MSF-RSF where "." is used).
327
328You have to specify the 2 profiles by choosing menu items 1 and 2 and giving
3292 file names.  Then Menu item 3 will align the 2 profiles to each other.
330Secondary structure masks in either profile can be used to guide the alignment.
331
332Menu item 4 will take the sequences in the second profile and align them to
333the first profile, 1 at a time.  This is useful to add some new sequences to
334an existing alignment, or to align a set of sequences to a known structure. 
335In this case, the second profile would not be pre-aligned.
336
337
338The alignment parameters can be set using menu items 5, 6 and 7. These are
339EXACTLY the same parameters as used by the general, automatic multiple
340alignment procedure. The general multiple alignment procedure is simply a
341series of profile alignments. Carrying out a series of profile alignments on
342larger and larger groups of sequences, allows you to manually build up a
343complete alignment, if necessary editing intermediate alignments.
344
345SECONDARY STRUCTURE OPTIONS. Menu Option 0 allows you to set 2D structure
346parameters. If a solved structure is available, it can be used to guide the
347alignment by raising gap penalties within secondary structure elements, so
348that gaps will preferentially be inserted into unstructured surface loops.
349Alternatively, a user-specified gap penalty mask can be supplied directly.
350
351A gap penalty mask is a series of numbers between 1 and 9, one per position in
352the alignment. Each number specifies how much the gap opening penalty is to be
353raised at that position (raised by multiplying the basic gap opening penalty
354by the number) i.e. a mask figure of 1 at a position means no change
355in gap opening penalty; a figure of 4 means that the gap opening penalty is
356four times greater at that position, making gaps 4 times harder to open.
357
358The format for gap penalty masks and secondary structure masks is explained
359in the help under option 0 (secondary structure options).
360>>HELP B <<      Help for secondary structure - gap penalty masks
361
362The use of secondary structure-based penalties has been shown to improve the
363accuracy of multiple alignment. Therefore CLUSTAL W now allows gap penalty
364masks to be supplied with the input sequences. The masks work by raising gap
365penalties in specified regions (typically secondary structure elements) so that
366gaps are preferentially opened in the less well conserved regions (typically
367surface loops).
368
369Options 1 and 2 control whether the input secondary structure information or
370gap penalty masks will be used.
371
372Option 3 controls whether the secondary structure and gap penalty masks should
373be included in the output alignment.
374
375Options 4 and 5 provide the value for raising the gap penalty at core Alpha
376Helical (A) and Beta Strand (B) residues. In CLUSTAL format, capital residues
377denote the A and B core structure notation. The basic gap penalties are
378multiplied by the amount specified.
379
380Option 6 provides the value for the gap penalty in Loops. By default this
381penalty is not raised. In CLUSTAL format, loops are specified by "." in the
382secondary structure notation.
383
384Option 7 provides the value for setting the gap penalty at the ends of
385secondary structures. Ends of secondary structures are observed to grow
386and-or shrink in related structures. Therefore by default these are given
387intermediate values, lower than the core penalties. All secondary structure
388read in as lower case in CLUSTAL format gets the reduced terminal penalty.
389
390Options 8 and 9 specify the range of structure termini for the intermediate
391penalties. In the alignment output, these are indicated as lower case.
392For Alpha Helices, by default, the range spans the end helical turn. For
393Beta Strands, the default range spans the end residue and the adjacent loop
394residue, since sequence conservation often extends beyond the actual H-bonded
395Beta Strand.
396
397CLUSTAL W can read the masks from SWISS-PROT, CLUSTAL or GDE format input
398files. For many 3-D protein structures, secondary structure information is
399recorded in the feature tables of SWISS-PROT database entries. You should
400always check that the assignments are correct - some are quite inaccurate.
401CLUSTAL W looks for SWISS-PROT HELIX and STRAND assignments e.g.
402
403FT   HELIX       100    115
404FT   STRAND      118    119
405
406The structure and penalty masks can also be read from CLUSTAL alignment format
407as comment lines beginning "!SS_" or "!GM_" e.g.
408
409!SS_HBA_HUMA    ..aaaAAAAAAAAAAaaa.aaaAAAAAAAAAAaaaaaaAaaa.........aaaAAAAAA
410!GM_HBA_HUMA    112224444444444222122244444444442222224222111111111222444444
411HBA_HUMA        VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGK
412
413Note that the mask itself is a set of numbers between 1 and 9 each of which is
414assigned to the residue(s) in the same column below.
415
416In GDE flat file format, the masks are specified as text and the names must
417begin with "SS_ or "GM_.
418
419Either a structure or penalty mask or both may be used. If both are included in
420an alignment, the user will be asked which is to be used.
421
422>>HELP C <<      Help for secondary structure - gap penalty mask output options
423   
424   The options in this menu let you choose whether or not to include the masks
425in the CLUSTAL W output alignments. Showing both is useful for understanding
426how the masks work. The secondary structure information is itself very useful
427in judging the alignment quality and in seeing how residue conservation
428patterns vary with secondary structure.
429
430
431>>HELP 7 <<      Help for phylogenetic trees
432
4331) Before calculating a tree, you must have an ALIGNMENT in memory. This can be
434input in any format or you should have just carried out a full multiple
435alignment and the alignment is still in memory.
436
437
438*************** Remember YOU MUST ALIGN THE SEQUENCES FIRST!!!! ***************
439
440
441The method used is the NJ (Neighbour Joining) method of Saitou and Nei. First
442you calculate distances (percent divergence) between all pairs of sequence from
443a multiple alignment; second you apply the NJ method to the distance matrix.
444
4452) EXCLUDE POSITIONS WITH GAPS? With this option, any alignment positions where
446ANY of the sequences have a gap will be ignored. This means that 'like' will be
447compared to 'like' in all distances, which is highly desirable. It also
448automatically throws away the most ambiguous parts of the alignment, which are
449concentrated around gaps (usually). The disadvantage is that you may throw away
450much of the data if there are many gaps (which is why it is difficult for us to
451make it the default). 
452
453
454
4553) CORRECT FOR MULTIPLE SUBSTITUTIONS? For small divergence (say <10%) this
456option makes no difference. For greater divergence, it corrects for the fact
457that observed distances underestimate actual evolutionary distances. This is
458because, as sequences diverge, more than one substitution will happen at many
459sites. However, you only see one difference when you look at the present day
460sequences. Therefore, this option has the effect of stretching branch lengths
461in trees (especially long branches). The corrections used here (for DNA or
462proteins) are both due to Motoo Kimura. See the documentation for details. 
463
464Where possible, this option should be used. However, for VERY divergent
465sequences, the distances cannot be reliably corrected. You will be warned if
466this happens. Even if none of the distances in a data set exceed the reliable
467threshold, if you bootstrap the data, some of the bootstrap distances may
468randomly exceed the safe limit. 
469
4704) To calculate a tree, use option 4 (DRAW TREE NOW). This gives an UNROOTED
471tree and all branch lengths. The root of the tree can only be inferred by
472using an outgroup (a sequence that you are certain branches at the outside
473of the tree .... certain on biological grounds) OR if you assume a degree
474of constancy in the 'molecular clock', you can place the root in the 'middle'
475of the tree (roughly equidistant from all tips).
476
4775) TOGGLE PHYLIP BOOTSTRAP POSITIONS
478By default, the bootstrap values are correctly placed on the tree branches of
479the phylip format output tree. The toggle allows them to be placed on the
480nodes, which is incorrect, but some display packages (e.g. TreeTool, TreeView
481and Phylowin) only support node labelling but not branch labelling. Care
482should be taken to note which branches and labels go together.
483
4846) OUTPUT FORMATS: four different formats are allowed. None of these displays
485the tree visually. Useful display programs accepting PHYLIP format include
486NJplot (from Manolo Gouy and supplied with Clustal W), TreeView (Mac-PC), and
487PHYLIP itself - OR get the PHYLIP package and use the tree drawing facilities
488there. (Get the PHYLIP package anyway if you are interested in trees). The
489NEXUS format can be read into PAUP or MacClade.
490
491>>HELP 8 <<      Help for choosing a weight matrix
492
493For protein alignments, you use a weight matrix to determine the similarity of
494non-identical amino acids.  For example, Tyr aligned with Phe is usually judged
495to be 'better' than Tyr aligned with Pro.
496
497There are three 'in-built' series of weight matrices offered. Each consists of
498several matrices which work differently at different evolutionary distances. To
499see the exact details, read the documentation. Crudely, we store several
500matrices in memory, spanning the full range of amino acid distance (from almost
501identical sequences to highly divergent ones). For very similar sequences, it
502is best to use a strict weight matrix which only gives a high score to
503identities and the most favoured conservative substitutions. For more divergent
504sequences, it is appropriate to use "softer" matrices which give a high score
505to many other frequent substitutions.
506
5071) BLOSUM (Henikoff). These matrices appear to be the best available for
508carrying out database similarity (homology searches). The matrices used are:
509Blosum 80, 62, 45 and 30. (BLOSUM was the default in earlier Clustal W
510versions)
511
5122) PAM (Dayhoff). These have been extremely widely used since the late '70s.
513We use the PAM 20, 60, 120 and 350 matrices.
514
5153) GONNET. These matrices were derived using almost the same procedure as the
516Dayhoff one (above) but are much more up to date and are based on a far larger
517data set. They appear to be more sensitive than the Dayhoff series. We use the
518GONNET 80, 120, 160, 250 and 350 matrices. This series is the default for
519Clustal W version 1.8.
520
521We also supply an identity matrix which gives a score of 1.0 to two identical
522amino acids and a score of zero otherwise. This matrix is not very useful.
523Alternatively, you can read in your own (just one matrix, not a series).
524
525A new matrix can be read from a file on disk, if the filename consists only
526of lower case characters. The values in the new weight matrix must be integers
527and the scores should be similarities. You can use negative as well as positive
528values if you wish, although the matrix will be automatically adjusted to all
529positive scores.
530
531
532
533For DNA, a single matrix (not a series) is used. Two hard-coded matrices are
534available:
535
536
5371) IUB. This is the default scoring matrix used by BESTFIT for the comparison
538of nucleic acid sequences. X's and N's are treated as matches to any IUB
539ambiguity symbol. All matches score 1.9; all mismatches for IUB symbols score 0.
540 
541 
5422) CLUSTALW(1.6). The previous system used by Clustal W, in which matches score
5431.0 and mismatches score 0. All matches for IUB symbols also score 0.
544
545INPUT FORMAT  The format used for a new matrix is the same as the BLAST program.
546Any lines beginning with a # character are assumed to be comments. The first
547non-comment line should contain a list of amino acids in any order, using the
5481 letter code, followed by a * character. This should be followed by a square
549matrix of integer scores, with one row and one column for each amino acid. The
550last row and column of the matrix (corresponding to the * character) contain
551the minimum score over the whole matrix.
552
553>>HELP 9 <<      Help for command line parameters
554                DATA (sequences)
555
556-INFILE=file.ext                             :input sequences.
557-PROFILE1=file.ext  and  -PROFILE2=file.ext  :profiles (old alignment).
558
559
560                VERBS (do things)
561
562-OPTIONS            :list the command line parameters
563-HELP  or -CHECK    :outline the command line params.
564-ALIGN              :do full multiple alignment
565-TREE               :calculate NJ tree.
566-BOOTSTRAP(=n)      :bootstrap a NJ tree (n= number of bootstraps; def. = 1000).
567-CONVERT            :output the input sequences in a different file format.
568
569
570                PARAMETERS (set things)
571
572***General settings:****
573-INTERACTIVE :read command line, then enter normal interactive menus
574-QUICKTREE   :use FAST algorithm for the alignment guide tree
575-TYPE=       :PROTEIN or DNA sequences
576-NEGATIVE    :protein alignment with negative values in matrix
577-OUTFILE=    :sequence alignment file name
578-OUTPUT=     :GCG, GDE, PHYLIP, PIR or NEXUS
579-OUTORDER=   :INPUT or ALIGNED
580-CASE        :LOWER or UPPER (for GDE output only)
581-SEQNOS=     :OFF or ON (for Clustal output only)
582-SEQNO_RANGE=:OFF or ON (NEW: for all output formats)
583-RANGE=m,n   :sequence range to write starting m to m+n.
584
585***Fast Pairwise Alignments:***
586-KTUPLE=n    :word size
587-TOPDIAGS=n  :number of best diags.
588-WINDOW=n    :window around best diags.
589-PAIRGAP=n   :gap penalty
590-SCORE       :PERCENT or ABSOLUTE
591
592
593***Slow Pairwise Alignments:***
594-PWMATRIX=    :Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename
595-PWDNAMATRIX= :DNA weight matrix=IUB, CLUSTALW or filename
596-PWGAPOPEN=f  :gap opening penalty       
597-PWGAPEXT=f   :gap opening penalty
598
599
600***Multiple Alignments:***
601-NEWTREE=      :file for new guide tree
602-USETREE=      :file for old guide tree
603-MATRIX=       :Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename
604-DNAMATRIX=    :DNA weight matrix=IUB, CLUSTALW or filename
605-GAPOPEN=f     :gap opening penalty       
606-GAPEXT=f      :gap extension penalty
607-ENDGAPS       :no end gap separation pen.
608-GAPDIST=n     :gap separation pen. range
609-NOPGAP        :residue-specific gaps off 
610-NOHGAP        :hydrophilic gaps off
611-HGAPRESIDUES= :list hydrophilic res.   
612-MAXDIV=n      :% ident. for delay
613-TYPE=         :PROTEIN or DNA
614-TRANSWEIGHT=f :transitions weighting
615
616
617***Profile Alignments:***
618-PROFILE      :Merge two alignments by profile alignment
619-NEWTREE1=    :file for new guide tree for profile1
620-NEWTREE2=    :file for new guide tree for profile2
621-USETREE1=    :file for old guide tree for profile1
622-USETREE2=    :file for old guide tree for profile2
623
624
625***Sequence to Profile Alignments:***
626-SEQUENCES   :Sequentially add profile2 sequences to profile1 alignment
627-NEWTREE=    :file for new guide tree
628-USETREE=    :file for old guide tree
629
630
631***Structure Alignments:***
632-NOSECSTR1     :do not use secondary structure-gap penalty mask for profile 1
633-NOSECSTR2     :do not use secondary structure-gap penalty mask for profile 2
634-SECSTROUT=STRUCTURE or MASK or BOTH or NONE   :output in alignment file
635-HELIXGAP=n    :gap penalty for helix core residues
636-STRANDGAP=n   :gap penalty for strand core residues
637-LOOPGAP=n     :gap penalty for loop regions
638-TERMINALGAP=n :gap penalty for structure termini
639-HELIXENDIN=n  :number of residues inside helix to be treated as terminal
640-HELIXENDOUT=n :number of residues outside helix to be treated as terminal
641-STRANDENDIN=n :number of residues inside strand to be treated as terminal
642-STRANDENDOUT=n:number of residues outside strand to be treated as terminal
643
644
645***Trees:***
646-OUTPUTTREE=nj OR phylip OR dist OR nexus
647-SEED=n        :seed number for bootstraps.
648-KIMURA        :use Kimura's correction.   
649-TOSSGAPS      :ignore positions with gaps.
650-BOOTLABELS=node OR branch :position of bootstrap values in tree display
651
652>>HELP 0 <<           Help for tree output format options
653
654Four output formats are offered:
655        1) Clustal,
656        2) Phylip,
657        3) Just the distances
658        4) Nexus
659
660None of these formats displays the results graphically. Many packages can
661display trees in the the PHYLIP format 2) below. It can also be imported into
662the PHYLIP programs RETREE, DRAWTREE and DRAWGRAM for graphical display.
663NEXUS format trees can be read by PAUP and MacClade.
664
6651) Clustal format output.
666
667        This format is verbose and lists all of the distances between the sequences and
668        the number of alignment positions used for each. The tree is described at the
669        end of the file. It lists the sequences that are joined at each alignment step
670        and the branch lengths. After two sequences are joined, it is referred to later
671        as a NODE. The number of a NODE is the number of the lowest sequence in that
672        NODE.   
673
6742) Phylip format output.
675
676        This format is the New Hampshire format, used by many phylogenetic analysis
677        packages. It consists of a series of nested parentheses, describing the
678        branching order, with the sequence names and branch lengths. It can be used by
679        the RETREE, DRAWGRAM and DRAWTREE programs of the PHYLIP package to see the
680        trees graphically. This is the same format used during multiple alignment for
681        the guide trees.
682       
683        Use this format with NJplot (Manolo Gouy), supplied with Clustal W. Some other
684        packages that can read and display New Hampshire format are TreeView (Mac/PC),
685        TreeTool (UNIX), and Phylowin.
686
6873) The distances only.
688
689        This format just outputs a matrix of all the pairwise distances in a format
690        that can be used by the Phylip package. It used to be useful when one could not
691        produce distances from protein sequences in the Phylip package but is now
692        redundant (Protdist of Phylip 3.5 now does this).
693
6944) NEXUS FORMAT TREE.
695
696        This format is used by several popular phylogeny programs,
697        including PAUP and MacClade. The format is described fully in:
698        Maddison, D. R., D. L. Swofford and W. P. Maddison.  1997.
699        NEXUS: an extensible file format for systematic information.
700        Systematic Biology 46:590-621.
701
7025) TOGGLE PHYLIP BOOTSTRAP POSITIONS
703
704        By default, the bootstrap values are placed on the nodes of the phylip format
705        output tree. This is inaccurate as the bootstrap values should be associated
706        with the tree branches and not the nodes. However, this format can be read and
707        displayed by TreeTool, TreeView and Phylowin. An option is available to
708        correctly place the bootstrap values on the branches with which they are
709        associated.
710
Note: See TracBrowser for help on using the repository browser.