1 | |
---|
2 | | * ReadSeq -- 1 Feb 93 |
---|
3 | | * |
---|
4 | | * Reads and writes nucleic/protein sequences in various |
---|
5 | | * formats. Data files may have multiple sequences. |
---|
6 | | * |
---|
7 | | * Copyright 1990 by d.g.gilbert |
---|
8 | | * biology dept., indiana university, bloomington, in 47405 |
---|
9 | | * e-mail: gilbertd@bio.indiana.edu |
---|
10 | | * |
---|
11 | | * This program may be freely copied and used by anyone. |
---|
12 | | * Developers are encourged to incorporate parts in their |
---|
13 | | * programs, rather than devise their own private sequence |
---|
14 | | * format. |
---|
15 | | * |
---|
16 | | * This should compile and run with any ANSI C compiler. |
---|
17 | | * Please advise me of any bugs, additions or corrections. |
---|
18 | |
---|
19 | Readseq has been updated. There have been a number of enhancements |
---|
20 | and a few bug corrections since the previous general release in Nov 91 |
---|
21 | (see below). If you are using earlier versions, I recommend you update to |
---|
22 | this release. |
---|
23 | |
---|
24 | Readseq is particularly useful as it automatically detects many |
---|
25 | sequence formats, and interconverts among them. |
---|
26 | Formats added to this release include |
---|
27 | + MSF multi sequence format used by GCG software |
---|
28 | + PAUP's multiple sequence (NEXUS) format |
---|
29 | + PIR/CODATA format used by PIR |
---|
30 | + ASN.1 format used by NCBI |
---|
31 | + Pretty print with various options for nice looking output. |
---|
32 | |
---|
33 | As well, Phylip format can now be used as input. Options to |
---|
34 | reverse-compliment and to degap sequences have been added. A menu |
---|
35 | addition for users of the GDE sequence editor is included. |
---|
36 | |
---|
37 | This program is available thru Internet gopher, as |
---|
38 | |
---|
39 | gopher ftp.bio.indiana.edu |
---|
40 | browse into the IUBio-Software+Data/molbio/readseq/ folder |
---|
41 | select the readseq.shar document |
---|
42 | |
---|
43 | Or thru anonymous FTP in this manner: |
---|
44 | my_computer> ftp ftp.bio.indiana.edu (or IP address 129.79.224.25) |
---|
45 | username: anonymous |
---|
46 | password: my_username@my_computer |
---|
47 | ftp> cd molbio/readseq |
---|
48 | ftp> get readseq.shar |
---|
49 | ftp> bye |
---|
50 | |
---|
51 | readseq.shar is a Unix shell archive of the readseq files. |
---|
52 | This file can be editted by any text editor to reconstitute the |
---|
53 | original files, for those who do not have a Unix system or an |
---|
54 | Unshar program. Read the top of this .shar file for further |
---|
55 | instructions. |
---|
56 | |
---|
57 | There are also pre-compiled executables for the following computers: |
---|
58 | Silicon Graphics Iris, Sparc (Sun Sparcstation & clones), VMS-Vax, |
---|
59 | Macintosh. Use binary ftp to transfer these, except Macintosh. The |
---|
60 | Mac version is just the command-line program in a window, not very |
---|
61 | handy. |
---|
62 | |
---|
63 | C source files: |
---|
64 | readseq.c ureadseq.c ureadasn.c ureadseq.h |
---|
65 | Document files: |
---|
66 | Readme (this doc) |
---|
67 | Readseq.help (longer than this doc) |
---|
68 | Formats (description of sequence file formats) |
---|
69 | add.gdemenu (GDE program users can add this to the .GDEmenu file) |
---|
70 | Stdfiles -- test sequence files |
---|
71 | Makefile -- Unix make file |
---|
72 | Make.com -- VMS make file |
---|
73 | *.std -- files for testing validity of readseq |
---|
74 | |
---|
75 | |
---|
76 | Example usage: |
---|
77 | readseq |
---|
78 | -- for interactive use |
---|
79 | readseq my.1st.seq my.2nd.seq -all -format=genbank -output=my.gb |
---|
80 | -- convert all of two input files to one genbank format output file |
---|
81 | readseq my.seq -all -form=pretty -nameleft=3 -numleft -numright -numtop -match |
---|
82 | -- output to standard output a file in a pretty format |
---|
83 | readseq my.seq -item=9,8,3,2 -degap -CASE -rev -f=msf -out=my.rev |
---|
84 | -- select 4 items from input, degap, reverse, and uppercase them |
---|
85 | cat *.seq | readseq -pipe -all -format=asn > bunch-of.asn |
---|
86 | -- pipe a bunch of data thru readseq, converting all to asn |
---|
87 | |
---|
88 | |
---|
89 | The brief usage of readseq is as follows. The "[]" denote |
---|
90 | optional parts of the syntax: |
---|
91 | |
---|
92 | readseq -help |
---|
93 | readSeq (27Dec92), multi-format molbio sequence reader. |
---|
94 | usage: readseq [-options] in.seq > out.seq |
---|
95 | options |
---|
96 | -a[ll] select All sequences |
---|
97 | -c[aselower] change to lower case |
---|
98 | -C[ASEUPPER] change to UPPER CASE |
---|
99 | -degap[=-] remove gap symbols |
---|
100 | -i[tem=2,3,4] select Item number(s) from several |
---|
101 | -l[ist] List sequences only |
---|
102 | -o[utput=]out.seq redirect Output |
---|
103 | -p[ipe] Pipe (command line, <stdin, >stdout) |
---|
104 | -r[everse] change to Reverse-complement |
---|
105 | -v[erbose] Verbose progress |
---|
106 | -f[ormat=]# Format number for output, or |
---|
107 | -f[ormat=]Name Format name for output: |
---|
108 | | 1. IG/Stanford 10. Olsen (in-only) |
---|
109 | | 2. GenBank/GB 11. Phylip3.2 |
---|
110 | | 3. NBRF 12. Phylip |
---|
111 | | 4. EMBL 13. Plain/Raw |
---|
112 | | 5. GCG 14. PIR/CODATA |
---|
113 | | 6. DNAStrider 15. MSF |
---|
114 | | 7. Fitch 16. ASN.1 |
---|
115 | | 8. Pearson/Fasta 17. PAUP |
---|
116 | | 9. Zuker 18. Pretty (out-only) |
---|
117 | |
---|
118 | Pretty format options: |
---|
119 | -wid[th]=# sequence line width |
---|
120 | -tab=# left indent |
---|
121 | -col[space]=# column space within sequence line on output |
---|
122 | -gap[count] count gap chars in sequence numbers |
---|
123 | -nameleft, -nameright[=#] name on left/right side [=max width] |
---|
124 | -nametop name at top/bottom |
---|
125 | -numleft, -numright seq index on left/right side |
---|
126 | -numtop, -numbot index on top/bottom |
---|
127 | -match[=.] use match base for 2..n species |
---|
128 | -inter[line=#] blank line(s) between sequence blocks |
---|
129 | |
---|
130 | |
---|
131 | |
---|
132 | Recent changes: |
---|
133 | |
---|
134 | 4 May 92 |
---|
135 | |
---|
136 | + added 32 bit CRC checksum as alternative to GCG 6.5bit checksum |
---|
137 | |
---|
138 | Aug 92 |
---|
139 | |
---|
140 | = fixed Olsen format input to handle files w/ more sequences, |
---|
141 | not to mess up when more than one seq has same identifier, |
---|
142 | and to convert number masks to symbols. |
---|
143 | = IG format fix to understand ^L |
---|
144 | |
---|
145 | 30 Dec 92 |
---|
146 | |
---|
147 | * revised command-line & interactive interface. Suggested form is now |
---|
148 | |
---|
149 | readseq infile -format=genbank -output=outfile -item=1,3,4 ... |
---|
150 | |
---|
151 | but remains compatible with prior commandlines: |
---|
152 | |
---|
153 | readseq infile -f2 -ooutfile -i3 ... |
---|
154 | |
---|
155 | + added GCG MSF multi sequence file format |
---|
156 | + added PIR/CODATA format |
---|
157 | + added NCBI ASN.1 sequence file format |
---|
158 | + added Pretty, multi sequence pretty output (only) |
---|
159 | + added PAUP multi seq format |
---|
160 | + added degap option |
---|
161 | + added Gary Williams (GWW, G.Williams@CRC.AC.UK) reverse-complement option. |
---|
162 | + added support for reading Phylip formats (interleave & sequential) |
---|
163 | * string fixes, dropped need for compiler flags NOSTR, FIXTOUPPER, NEEDSTRCASECMP |
---|
164 | * changed 32bit checksum to default, -DSMALLCHECKSUM for GCG version |
---|
165 | |
---|
166 | 1Feb93 |
---|
167 | |
---|
168 | = reverted Genbank output format to fixed left margin |
---|
169 | (change in 30 Dec release), so GDE and others relying on fixed margin |
---|
170 | can read this. |
---|