Context Navigation

← Previous Revision
Next Revision →
Blame
Revision Log

Readseq.help

Visit:

Last change on this file was 10842, checked in by westram, 12 years ago
reintegrates 'help' into 'trunk': adds: log:branches/help@10647:10841 log:branches/helptest@10704:10720
Property svn:eol-style set to `native` Property svn:keywords set to `Author Date Id Revision`
File size: 9.2 KB

Line
1
2	\| * ReadSeq.Help -- 30 Dec 92
3	\| *
4	\| * Reads and writes nucleic/protein sequences in various
5	\| * formats. Data files may have multiple sequences.
6	\| *
7	\| * Copyright 1990 by d.g.gilbert
8	\| * biology dept., indiana university, bloomington, in 47405
9	\| * e-mail: gilbertd@bio.indiana.edu
10	\| *
11	\| * This program may be freely copied and used by anyone.
12	\| * Developers are encourged to incorporate parts in their
13	\| * programs, rather than devise their own private sequence
14	\| * format.
15	\| *
16	\| * This should compile and run with any ANSI C compiler.
17	\| * Please advise me of any bugs, additions or corrections.
18
19	Readseq is particularly useful as it automatically detects many
20	sequence formats, and interconverts among them.
21
22	Formats which readseq currently understands:
23
24	* IG/Stanford, used by Intelligenetics and others
25	* GenBank/GB, genbank flatfile format
26	* NBRF format
27	* EMBL, EMBL flatfile format
28	* GCG, single sequence format of GCG software
29	* DNAStrider, for common Mac program
30	* Fitch format, limited use
31	* Pearson/Fasta, a common format used by Fasta programs and others
32	* Zuker format, limited use. Input only.
33	* Olsen, format printed by Olsen VMS sequence editor. Input only.
34	* Phylip3.2, sequential format for Phylip programs
35	* Phylip, interleaved format for Phylip programs (v3.3, v3.4)
36	* Plain/Raw, sequence data only (no name, document, numbering)
37	+ MSF multi sequence format used by GCG software
38	+ PAUP's multiple sequence (NEXUS) format
39	+ PIR/CODATA format used by PIR
40	+ ASN.1 format used by NCBI
41	+ Pretty print with various options for nice looking output. Output only.
42
43	See the included "Formats" file for detail on file formats.
44
45
46	Example usage:
47	readseq
48	-- for interactive use
49
50	readseq my.1st.seq my.2nd.seq -all -format=genbank -output=my.gb
51	-- convert all of two input files to one genbank format output file
52
53	readseq my.seq -all -form=pretty -nameleft=3 -numleft -numright -numtop -match
54	-- output to standard output a file in a pretty format
55
56	readseq my.seq -item=9,8,3,2 -degap -CASE -rev -f=msf -out=my.rev
57	-- select 4 items from input, degap, reverse, and uppercase them
58
59	cat *.seq \| readseq -pipe -all -format=asn > bunch-of.asn
60	-- pipe a bunch of data thru readseq, converting all to asn
61
62
63	The brief usage of readseq is as follows. The "[]" denote
64	optional parts of the syntax:
65
66	readseq -help
67	readSeq (27Dec92), multi-format molbio sequence reader.
68	usage: readseq [-options] in.seq > out.seq
69	options
70	-a[ll] select All sequences
71	-c[aselower] change to lower case
72	-C[ASEUPPER] change to UPPER CASE
73	-degap[=-] remove gap symbols
74	-i[tem=2,3,4] select Item number(s) from several
75	-l[ist] List sequences only
76	-o[utput=]out.seq redirect Output
77	-p[ipe] Pipe (command line, <stdin, >stdout)
78	-r[everse] change to Reverse-complement
79	-v[erbose] Verbose progress
80	-f[ormat=]# Format number for output, or
81	-f[ormat=]Name Format name for output:
82	\| 1. IG/Stanford 10. Olsen (in-only)
83	\| 2. GenBank/GB 11. Phylip3.2
84	\| 3. NBRF 12. Phylip
85	\| 4. EMBL 13. Plain/Raw
86	\| 5. GCG 14. PIR/CODATA
87	\| 6. DNAStrider 15. MSF
88	\| 7. Fitch 16. ASN.1
89	\| 8. Pearson/Fasta 17. PAUP
90	\| 9. Zuker 18. Pretty (out-only)
91
92	Pretty format options:
93	-wid[th]=# sequence line width
94	-tab=# left indent
95	-col[space]=# column space within sequence line on output
96	-gap[count] count gap chars in sequence numbers
97	-nameleft, -nameright[=#] name on left/right side [=max width]
98	-nametop name at top/bottom
99	-numleft, -numright seq index on left/right side
100	-numtop, -numbot index on top/bottom
101	-match[=.] use match base for 2..n species
102	-inter[line=#] blank line(s) between sequence blocks
103
104
105	Notes:
106
107	In use, readseq will respond to command line arguments, or to
108	interactive use. Command line arguments cannot be combined
109	but must each follow a switch character (-). In this release,
110	the command line options are now words, with an equals (=)
111	to separate parameter(s) fromt he command. You cannot put a
112	space between a command and its parameter, as is usual for
113	Unix programs (this is to preserve compatibility with VMS).
114	The command line syntax of the earlier versions is still
115	supported.
116
117	See the file Formats for details of the sequence formats which
118	are supported by readseq. The auto-detection feature of
119	readseq which distinguishes these formats looks for some of the
120	unique keywords and symbols that are found in each format. It
121	is not infallible at this, though it attempts to exclude unknown
122	formats. In general, if you feed to readseq a sequence file that
123	you know is one of these common formats, you are okay. If you feed
124	it data that might be oddball formats, or non-sequence data,
125	you might well get garbage results. Also, different developers
126	are always thinking up minor twists on these common formats
127	(like PAUP requiring a blank line between blocks of Phylip format,
128	or IG adding form feeds between sequences), which may cause hassles.
129
130	In general, output supports only minimal subsets of each format
131	needed for sequence data exchanges. Features, descriptions
132	and other format-unique information is discarded.
133
134	The pretty format requires additional options to generate a
135	nice output. Try the various pretty options to see what you like.
136	Pretty format is OUPUT only, readseq cannot read a Pretty format
137	file.
138
139	Readseq is NOT optimized for LARGE files. It generally makes several
140	reads thru each input file (one per sequence output at present, future
141	version may optimize this). It should handle input and output files
142	and sequences of any size, but will slow down quite a bit for very large
143	(multi megabyte) sized files. It is NOT recommended for converting
144	databanks or large subsets there-of. It is primarily directed at the
145	small files that researchers use to maintain their personal data, which
146	they frequently need to interconvert for the various analysis programs
147	which so frequently require a special format.
148
149	Users of Olsen multi sequence editor (VMS). The Olsen format
150	here is produced with the print command:
151	print/out=some.file
152	Use Genbank output from readseq to produce a format that this
153	editor can read, and use the command
154	load/genbank some.file
155	Dan Davison has a VMS program that will convert to/from the
156	Olsen native binary data format. E-mail davison@uh.edu
157
158	Warning: Phylip format input is now supported (30Dec92), however the
159	auto-detection of Phylip format is very probabilistic and messy,
160	especially distinguishing sequential from interleaved versions. It
161	is not recommended that one use readseq to convert files from Phylip
162	format to others unless essential.
163
164
165	This program is available thru Internet gopher, as
166
167	gopher ftp.bio.indiana.edu
168	browse into the IUBio-Software+Data/molbio/readseq/ folder
169	select the readseq.shar document
170
171	Or thru anonymous FTP in this manner:
172	my_computer> ftp ftp.bio.indiana.edu (or IP address 129.79.224.25)
173	username: anonymous
174	password: my_username@my_computer
175	ftp> cd molbio/readseq
176	ftp> get readseq.shar
177	ftp> bye
178
179	readseq.shar is a Unix shell archive of the readseq files.
180	This file can be editted by any text editor to reconstitute the
181	original files, for those who do not have a Unix system or an
182	Unshar program. Read the top of this .shar file for further
183	instructions.
184
185	There are also pre-compiled executables for the following computers:
186	Silicon Graphics Iris, Sparc (Sun Sparcstation & clones), VMS-Vax,
187	Macintosh. Use binary ftp to transfer these, except Macintosh. The
188	Mac version is just the command-line program in a window, not very
189	handy.
190
191	C source files:
192	readseq.c ureadseq.c ureadasn.c ureadseq.h
193
194	Document files:
195	Readme (this doc)
196	Formats (description of sequence file formats)
197	add.gdemenu (GDE program users can add this to the .GDEmenu file)
198	Stdfiles -- test sequence files
199	Makefile -- Unix make file
200	Make.com -- VMS make file
201	*.std -- files for testing validity of readseq
202
203
204	Recent changes (see also readseq.c for all history of changes):
205
206	4 May 92
207
208	+ added 32 bit CRC checksum as alternative to GCG 6.5bit checksum
209
210	Aug 92
211
212	= fixed Olsen format input to handle files w/ more sequences,
213	not to mess up when more than one seq has same identifier,
214	and to convert number masks to symbols.
215	= IG format fix to understand ^L
216
217	30 Dec 92
218
219	* revised command-line & interactive interface. Suggested form is now
220
221	readseq infile -format=genbank -output=outfile -item=1,3,4 ...
222
223	but remains compatible with prior commandlines:
224
225	readseq infile -f2 -ooutfile -i3 ...
226
227	+ added GCG MSF multi sequence file format
228	+ added PIR/CODATA format
229	+ added NCBI ASN.1 sequence file format
230	+ added Pretty, multi sequence pretty output (only)
231	+ added PAUP multi seq format
232	+ added degap option
233	+ added Gary Williams (GWW, G.Williams@CRC.AC.UK) reverse-complement option.
234	+ added support for reading Phylip formats (interleave & sequential)
235	* string fixes, dropped need for compiler flags NOSTR, FIXTOUPPER, NEEDSTRCASECMP
236	* changed 32bit checksum to default, -DSMALLCHECKSUM for GCG version
237
238

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format