source: branches/stable/READSEQ/Readme

Last change on this file was 10842, checked in by westram, 11 years ago
  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
File size: 6.2 KB
Line 
1
2 |  * ReadSeq  -- 1 Feb 93
3 |  *
4 |  * Reads and writes nucleic/protein sequences in various
5 |  * formats. Data files may have multiple sequences.
6 |  *
7 |  * Copyright 1990 by d.g.gilbert
8 |  * biology dept., indiana university, bloomington, in 47405
9 |  * e-mail: gilbertd@bio.indiana.edu
10 |  *
11 |  * This program may be freely copied and used by anyone.
12 |  * Developers are encourged to incorporate parts in their
13 |  * programs, rather than devise their own private sequence
14 |  * format.
15 |  *
16 |  * This should compile and run with any ANSI C compiler.
17 |  * Please advise me of any bugs, additions or corrections.
18
19Readseq has been updated.   There have been a number of enhancements
20and a few bug corrections since the previous general release in Nov 91
21(see below).  If you are using earlier versions, I recommend you update to
22this release.
23
24Readseq is particularly useful as it automatically detects many
25sequence formats, and interconverts among them.
26Formats added to this release include
27  + MSF multi sequence format used by GCG software
28  + PAUP's multiple sequence (NEXUS) format
29  + PIR/CODATA format used by PIR
30  + ASN.1 format used by NCBI
31  + Pretty print with various options for nice looking output.
32
33As well, Phylip format can now be used as input.  Options to
34reverse-compliment and to degap sequences have been added.  A menu
35addition for users of the GDE sequence editor is included.
36
37This program is available thru Internet gopher, as
38
39  gopher ftp.bio.indiana.edu
40  browse into the IUBio-Software+Data/molbio/readseq/ folder
41  select the readseq.shar document
42
43Or thru anonymous FTP in this manner:
44  my_computer> ftp  ftp.bio.indiana.edu  (or IP address 129.79.224.25)
45    username:  anonymous
46    password:  my_username@my_computer
47  ftp> cd molbio/readseq
48  ftp> get readseq.shar
49  ftp> bye
50
51readseq.shar is a Unix shell archive of the readseq files.
52This file can be editted by any text editor to reconstitute the
53original files, for those who do not have a Unix system or an
54Unshar program.  Read the top of this .shar file for further
55instructions.
56
57There are also pre-compiled executables for the following computers:
58Silicon Graphics Iris, Sparc (Sun Sparcstation & clones), VMS-Vax,
59Macintosh. Use binary ftp to transfer these, except Macintosh.  The
60Mac version is just the command-line program in a window, not very
61handy.
62
63C source files:
64  readseq.c ureadseq.c ureadasn.c ureadseq.h
65Document files:
66  Readme (this doc)
67  Readseq.help (longer than this doc)
68  Formats (description of sequence file formats)
69  add.gdemenu (GDE program users can add this to the .GDEmenu file)
70  Stdfiles -- test sequence files
71  Makefile -- Unix make file
72  Make.com -- VMS make file
73  *.std    -- files for testing validity of readseq
74
75
76Example usage:
77  readseq
78      -- for interactive use
79  readseq my.1st.seq  my.2nd.seq  -all  -format=genbank  -output=my.gb
80      -- convert all of two input files to one genbank format output file
81  readseq my.seq -all -form=pretty -nameleft=3 -numleft -numright -numtop -match
82      -- output to standard output a file in a pretty format
83  readseq my.seq -item=9,8,3,2 -degap -CASE -rev -f=msf -out=my.rev
84      -- select 4 items from input, degap, reverse, and uppercase them
85  cat *.seq | readseq -pipe -all -format=asn > bunch-of.asn
86      -- pipe a bunch of data thru readseq, converting all to asn
87
88
89The brief usage of readseq is as follows. The "[]" denote
90optional parts of the syntax:
91
92  readseq -help
93readSeq (27Dec92), multi-format molbio sequence reader.
94usage: readseq [-options] in.seq > out.seq
95 options
96    -a[ll]         select All sequences
97    -c[aselower]   change to lower case
98    -C[ASEUPPER]   change to UPPER CASE
99    -degap[=-]     remove gap symbols
100    -i[tem=2,3,4]  select Item number(s) from several
101    -l[ist]        List sequences only
102    -o[utput=]out.seq  redirect Output
103    -p[ipe]        Pipe (command line, <stdin, >stdout)
104    -r[everse]     change to Reverse-complement
105    -v[erbose]     Verbose progress
106    -f[ormat=]#    Format number for output,  or
107    -f[ormat=]Name Format name for output:
108       |  1. IG/Stanford           10. Olsen (in-only)
109       |  2. GenBank/GB            11. Phylip3.2
110       |  3. NBRF                  12. Phylip
111       |  4. EMBL                  13. Plain/Raw
112       |  5. GCG                   14. PIR/CODATA
113       |  6. DNAStrider            15. MSF
114       |  7. Fitch                 16. ASN.1
115       |  8. Pearson/Fasta         17. PAUP
116       |  9. Zuker                 18. Pretty (out-only)
117
118   Pretty format options:
119    -wid[th]=#            sequence line width
120    -tab=#                left indent
121    -col[space]=#         column space within sequence line on output
122    -gap[count]           count gap chars in sequence numbers
123    -nameleft, -nameright[=#]   name on left/right side [=max width]
124    -nametop              name at top/bottom
125    -numleft, -numright   seq index on left/right side
126    -numtop, -numbot      index on top/bottom
127    -match[=.]            use match base for 2..n species
128    -inter[line=#]        blank line(s) between sequence blocks
129
130
131
132Recent changes:
133
1344 May 92
135
136+ added 32 bit CRC checksum as alternative to GCG 6.5bit checksum
137
138Aug 92
139
140= fixed Olsen format input to handle files w/ more sequences,
141  not to mess up when more than one seq has same identifier,
142  and to convert number masks to symbols.
143= IG format fix to understand ^L
144
14530 Dec 92
146
147* revised command-line & interactive interface.  Suggested form is now
148
149    readseq infile -format=genbank -output=outfile -item=1,3,4 ...
150
151  but remains compatible with prior commandlines:
152
153    readseq infile -f2 -ooutfile -i3 ...
154
155+ added GCG MSF multi sequence file format
156+ added PIR/CODATA format
157+ added NCBI ASN.1 sequence file format
158+ added Pretty, multi sequence pretty output (only)
159+ added PAUP multi seq format
160+ added degap option
161+ added Gary Williams (GWW, G.Williams@CRC.AC.UK) reverse-complement option.
162+ added support for reading Phylip formats (interleave & sequential)
163* string fixes, dropped need for compiler flags NOSTR, FIXTOUPPER, NEEDSTRCASECMP
164* changed 32bit checksum to default, -DSMALLCHECKSUM for GCG version
165
1661Feb93
167
168= reverted Genbank output format to fixed left margin
169  (change in 30 Dec release), so GDE and others relying on fixed margin
170  can read this.
Note: See TracBrowser for help on using the repository browser.