source: branches/alilink/READSEQ/Readseq.help

Last change on this file was 10842, checked in by westram, 11 years ago
  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
File size: 9.2 KB
Line 
1
2 |  * ReadSeq.Help -- 30 Dec 92
3 |  *
4 |  * Reads and writes nucleic/protein sequences in various
5 |  * formats. Data files may have multiple sequences.
6 |  *
7 |  * Copyright 1990 by d.g.gilbert
8 |  * biology dept., indiana university, bloomington, in 47405
9 |  * e-mail: gilbertd@bio.indiana.edu
10 |  *
11 |  * This program may be freely copied and used by anyone.
12 |  * Developers are encourged to incorporate parts in their
13 |  * programs, rather than devise their own private sequence
14 |  * format.
15 |  *
16 |  * This should compile and run with any ANSI C compiler.
17 |  * Please advise me of any bugs, additions or corrections.
18
19Readseq is particularly useful as it automatically detects many
20sequence formats, and interconverts among them.
21
22Formats which readseq currently understands:
23
24  * IG/Stanford, used by Intelligenetics and others
25  * GenBank/GB, genbank flatfile format
26  * NBRF format
27  * EMBL, EMBL flatfile format
28  * GCG, single sequence format of GCG software
29  * DNAStrider, for common Mac program
30  * Fitch format, limited use
31  * Pearson/Fasta, a common format used by Fasta programs and others
32  * Zuker format, limited use. Input only.
33  * Olsen, format printed by Olsen VMS sequence editor. Input only.
34  * Phylip3.2, sequential format for Phylip programs
35  * Phylip, interleaved format for Phylip programs (v3.3, v3.4)
36  * Plain/Raw, sequence data only (no name, document, numbering)
37  + MSF multi sequence format used by GCG software
38  + PAUP's multiple sequence (NEXUS) format
39  + PIR/CODATA format used by PIR
40  + ASN.1 format used by NCBI
41  + Pretty print with various options for nice looking output. Output only.
42
43See the included "Formats" file for detail on file formats.
44
45
46Example usage:
47  readseq
48      -- for interactive use
49
50  readseq my.1st.seq  my.2nd.seq  -all  -format=genbank  -output=my.gb
51      -- convert all of two input files to one genbank format output file
52
53  readseq my.seq -all -form=pretty -nameleft=3 -numleft -numright -numtop -match
54      -- output to standard output a file in a pretty format
55
56  readseq my.seq -item=9,8,3,2 -degap -CASE -rev -f=msf -out=my.rev
57      -- select 4 items from input, degap, reverse, and uppercase them
58
59  cat *.seq | readseq -pipe -all -format=asn > bunch-of.asn
60      -- pipe a bunch of data thru readseq, converting all to asn
61
62
63The brief usage of readseq is as follows. The "[]" denote
64optional parts of the syntax:
65
66readseq -help
67readSeq (27Dec92), multi-format molbio sequence reader.
68usage: readseq [-options] in.seq > out.seq
69 options
70    -a[ll]         select All sequences
71    -c[aselower]   change to lower case
72    -C[ASEUPPER]   change to UPPER CASE
73    -degap[=-]     remove gap symbols
74    -i[tem=2,3,4]  select Item number(s) from several
75    -l[ist]        List sequences only
76    -o[utput=]out.seq  redirect Output
77    -p[ipe]        Pipe (command line, <stdin, >stdout)
78    -r[everse]     change to Reverse-complement
79    -v[erbose]     Verbose progress
80    -f[ormat=]#    Format number for output,  or
81    -f[ormat=]Name Format name for output:
82       |  1. IG/Stanford           10. Olsen (in-only)
83       |  2. GenBank/GB            11. Phylip3.2
84       |  3. NBRF                  12. Phylip
85       |  4. EMBL                  13. Plain/Raw
86       |  5. GCG                   14. PIR/CODATA
87       |  6. DNAStrider            15. MSF
88       |  7. Fitch                 16. ASN.1
89       |  8. Pearson/Fasta         17. PAUP
90       |  9. Zuker                 18. Pretty (out-only)
91
92   Pretty format options:
93    -wid[th]=#            sequence line width
94    -tab=#                left indent
95    -col[space]=#         column space within sequence line on output
96    -gap[count]           count gap chars in sequence numbers
97    -nameleft, -nameright[=#]   name on left/right side [=max width]
98    -nametop              name at top/bottom
99    -numleft, -numright   seq index on left/right side
100    -numtop, -numbot      index on top/bottom
101    -match[=.]            use match base for 2..n species
102    -inter[line=#]        blank line(s) between sequence blocks
103
104
105Notes:
106
107In use, readseq will respond to command line arguments, or to
108interactive use.  Command line arguments cannot be combined
109but must each follow a switch character (-).  In this release,
110the command line options are now words, with an equals (=)
111to separate parameter(s) fromt he command.  You cannot put a
112space between a command and its parameter, as is usual for
113Unix programs (this is to preserve compatibility with VMS).
114The command line syntax of the earlier versions is still
115supported.
116
117See the file Formats for details of the sequence formats which
118are supported by readseq.  The auto-detection feature of
119readseq which distinguishes these formats looks for some of the
120unique keywords and symbols that are found in each format. It
121is not infallible at this, though it attempts to exclude unknown
122formats.  In general, if you feed to readseq a sequence file that
123you know is one of these common formats, you are okay.  If you feed
124it data that might be oddball formats, or non-sequence data,
125you might well get garbage results.  Also, different developers
126are always thinking up minor twists on these common formats
127(like PAUP requiring a blank line between blocks of Phylip format,
128or IG adding form feeds between sequences), which may cause hassles.
129
130In general, output supports only minimal subsets of each format
131needed for sequence data exchanges.  Features, descriptions
132and other format-unique information is discarded.
133
134The pretty format requires additional options to generate a
135nice output.  Try the various pretty options to see what you like.
136Pretty format is OUPUT only, readseq cannot read a Pretty format
137file.
138
139Readseq is NOT optimized for LARGE files.  It generally makes several
140reads thru each input file (one per sequence output at present, future
141version may optimize this).  It should handle input and output files
142and sequences of any size, but will slow down quite a bit for very large
143(multi megabyte) sized files. It is NOT recommended for converting
144databanks or large subsets there-of.  It is primarily directed at the
145small files that researchers use to maintain their personal data, which
146they frequently need to interconvert for the various analysis programs
147which so frequently require a special format.
148
149Users of Olsen multi sequence editor (VMS).  The Olsen format
150here is produced with the print command:
151  print/out=some.file
152Use Genbank output from readseq to produce a format that this
153editor can read, and use the command
154  load/genbank some.file
155Dan Davison has a VMS program that will convert to/from the
156Olsen native binary data format.  E-mail davison@uh.edu
157
158Warning: Phylip format input is now supported (30Dec92), however the
159auto-detection of Phylip format is very probabilistic and messy,
160especially distinguishing sequential from interleaved versions. It
161is not recommended that one use readseq to convert files from Phylip
162format to others unless essential.
163
164
165This program is available thru Internet gopher, as
166
167  gopher ftp.bio.indiana.edu
168  browse into the IUBio-Software+Data/molbio/readseq/ folder
169  select the readseq.shar document
170
171Or thru anonymous FTP in this manner:
172  my_computer> ftp  ftp.bio.indiana.edu  (or IP address 129.79.224.25)
173    username:  anonymous
174    password:  my_username@my_computer
175  ftp> cd molbio/readseq
176  ftp> get readseq.shar
177  ftp> bye
178
179readseq.shar is a Unix shell archive of the readseq files.
180This file can be editted by any text editor to reconstitute the
181original files, for those who do not have a Unix system or an
182Unshar program.  Read the top of this .shar file for further
183instructions.
184
185There are also pre-compiled executables for the following computers:
186Silicon Graphics Iris, Sparc (Sun Sparcstation & clones), VMS-Vax,
187Macintosh. Use binary ftp to transfer these, except Macintosh.  The
188Mac version is just the command-line program in a window, not very
189handy.
190
191C source files:
192  readseq.c ureadseq.c ureadasn.c ureadseq.h
193
194Document files:
195  Readme (this doc)
196  Formats (description of sequence file formats)
197  add.gdemenu (GDE program users can add this to the .GDEmenu file)
198  Stdfiles -- test sequence files
199  Makefile -- Unix make file
200  Make.com -- VMS make file
201  *.std    -- files for testing validity of readseq
202
203
204Recent changes (see also readseq.c for all history of changes):
205
2064 May 92
207
208+ added 32 bit CRC checksum as alternative to GCG 6.5bit checksum
209
210Aug 92
211
212= fixed Olsen format input to handle files w/ more sequences,
213  not to mess up when more than one seq has same identifier,
214  and to convert number masks to symbols.
215= IG format fix to understand ^L
216
21730 Dec 92
218
219* revised command-line & interactive interface.  Suggested form is now
220
221    readseq infile -format=genbank -output=outfile -item=1,3,4 ...
222
223  but remains compatible with prior commandlines:
224
225    readseq infile -f2 -ooutfile -i3 ...
226
227+ added GCG MSF multi sequence file format
228+ added PIR/CODATA format
229+ added NCBI ASN.1 sequence file format
230+ added Pretty, multi sequence pretty output (only)
231+ added PAUP multi seq format
232+ added degap option
233+ added Gary Williams (GWW, G.Williams@CRC.AC.UK) reverse-complement option.
234+ added support for reading Phylip formats (interleave & sequential)
235* string fixes, dropped need for compiler flags NOSTR, FIXTOUPPER, NEEDSTRCASECMP
236* changed 32bit checksum to default, -DSMALLCHECKSUM for GCG version
237
238
Note: See TracBrowser for help on using the repository browser.