[2345] | 1 | |
---|
[10842] | 2 | | * ReadSeq -- 1 Feb 93 |
---|
| 3 | | * |
---|
| 4 | | * Reads and writes nucleic/protein sequences in various |
---|
| 5 | | * formats. Data files may have multiple sequences. |
---|
| 6 | | * |
---|
| 7 | | * Copyright 1990 by d.g.gilbert |
---|
| 8 | | * biology dept., indiana university, bloomington, in 47405 |
---|
| 9 | | * e-mail: gilbertd@bio.indiana.edu |
---|
| 10 | | * |
---|
| 11 | | * This program may be freely copied and used by anyone. |
---|
| 12 | | * Developers are encourged to incorporate parts in their |
---|
| 13 | | * programs, rather than devise their own private sequence |
---|
| 14 | | * format. |
---|
| 15 | | * |
---|
| 16 | | * This should compile and run with any ANSI C compiler. |
---|
| 17 | | * Please advise me of any bugs, additions or corrections. |
---|
[2345] | 18 | |
---|
| 19 | Readseq has been updated. There have been a number of enhancements |
---|
| 20 | and a few bug corrections since the previous general release in Nov 91 |
---|
| 21 | (see below). If you are using earlier versions, I recommend you update to |
---|
| 22 | this release. |
---|
| 23 | |
---|
| 24 | Readseq is particularly useful as it automatically detects many |
---|
| 25 | sequence formats, and interconverts among them. |
---|
| 26 | Formats added to this release include |
---|
| 27 | + MSF multi sequence format used by GCG software |
---|
| 28 | + PAUP's multiple sequence (NEXUS) format |
---|
| 29 | + PIR/CODATA format used by PIR |
---|
| 30 | + ASN.1 format used by NCBI |
---|
| 31 | + Pretty print with various options for nice looking output. |
---|
| 32 | |
---|
| 33 | As well, Phylip format can now be used as input. Options to |
---|
| 34 | reverse-compliment and to degap sequences have been added. A menu |
---|
| 35 | addition for users of the GDE sequence editor is included. |
---|
| 36 | |
---|
| 37 | This program is available thru Internet gopher, as |
---|
| 38 | |
---|
| 39 | gopher ftp.bio.indiana.edu |
---|
| 40 | browse into the IUBio-Software+Data/molbio/readseq/ folder |
---|
| 41 | select the readseq.shar document |
---|
| 42 | |
---|
| 43 | Or thru anonymous FTP in this manner: |
---|
| 44 | my_computer> ftp ftp.bio.indiana.edu (or IP address 129.79.224.25) |
---|
| 45 | username: anonymous |
---|
| 46 | password: my_username@my_computer |
---|
| 47 | ftp> cd molbio/readseq |
---|
| 48 | ftp> get readseq.shar |
---|
| 49 | ftp> bye |
---|
| 50 | |
---|
| 51 | readseq.shar is a Unix shell archive of the readseq files. |
---|
| 52 | This file can be editted by any text editor to reconstitute the |
---|
| 53 | original files, for those who do not have a Unix system or an |
---|
| 54 | Unshar program. Read the top of this .shar file for further |
---|
| 55 | instructions. |
---|
| 56 | |
---|
| 57 | There are also pre-compiled executables for the following computers: |
---|
| 58 | Silicon Graphics Iris, Sparc (Sun Sparcstation & clones), VMS-Vax, |
---|
| 59 | Macintosh. Use binary ftp to transfer these, except Macintosh. The |
---|
| 60 | Mac version is just the command-line program in a window, not very |
---|
| 61 | handy. |
---|
| 62 | |
---|
| 63 | C source files: |
---|
| 64 | readseq.c ureadseq.c ureadasn.c ureadseq.h |
---|
| 65 | Document files: |
---|
| 66 | Readme (this doc) |
---|
| 67 | Readseq.help (longer than this doc) |
---|
| 68 | Formats (description of sequence file formats) |
---|
| 69 | add.gdemenu (GDE program users can add this to the .GDEmenu file) |
---|
| 70 | Stdfiles -- test sequence files |
---|
| 71 | Makefile -- Unix make file |
---|
| 72 | Make.com -- VMS make file |
---|
| 73 | *.std -- files for testing validity of readseq |
---|
| 74 | |
---|
| 75 | |
---|
| 76 | Example usage: |
---|
| 77 | readseq |
---|
| 78 | -- for interactive use |
---|
| 79 | readseq my.1st.seq my.2nd.seq -all -format=genbank -output=my.gb |
---|
| 80 | -- convert all of two input files to one genbank format output file |
---|
| 81 | readseq my.seq -all -form=pretty -nameleft=3 -numleft -numright -numtop -match |
---|
| 82 | -- output to standard output a file in a pretty format |
---|
| 83 | readseq my.seq -item=9,8,3,2 -degap -CASE -rev -f=msf -out=my.rev |
---|
| 84 | -- select 4 items from input, degap, reverse, and uppercase them |
---|
| 85 | cat *.seq | readseq -pipe -all -format=asn > bunch-of.asn |
---|
| 86 | -- pipe a bunch of data thru readseq, converting all to asn |
---|
| 87 | |
---|
| 88 | |
---|
| 89 | The brief usage of readseq is as follows. The "[]" denote |
---|
| 90 | optional parts of the syntax: |
---|
| 91 | |
---|
| 92 | readseq -help |
---|
| 93 | readSeq (27Dec92), multi-format molbio sequence reader. |
---|
| 94 | usage: readseq [-options] in.seq > out.seq |
---|
| 95 | options |
---|
| 96 | -a[ll] select All sequences |
---|
| 97 | -c[aselower] change to lower case |
---|
| 98 | -C[ASEUPPER] change to UPPER CASE |
---|
| 99 | -degap[=-] remove gap symbols |
---|
| 100 | -i[tem=2,3,4] select Item number(s) from several |
---|
| 101 | -l[ist] List sequences only |
---|
| 102 | -o[utput=]out.seq redirect Output |
---|
| 103 | -p[ipe] Pipe (command line, <stdin, >stdout) |
---|
| 104 | -r[everse] change to Reverse-complement |
---|
| 105 | -v[erbose] Verbose progress |
---|
| 106 | -f[ormat=]# Format number for output, or |
---|
| 107 | -f[ormat=]Name Format name for output: |
---|
[10842] | 108 | | 1. IG/Stanford 10. Olsen (in-only) |
---|
| 109 | | 2. GenBank/GB 11. Phylip3.2 |
---|
| 110 | | 3. NBRF 12. Phylip |
---|
| 111 | | 4. EMBL 13. Plain/Raw |
---|
| 112 | | 5. GCG 14. PIR/CODATA |
---|
| 113 | | 6. DNAStrider 15. MSF |
---|
| 114 | | 7. Fitch 16. ASN.1 |
---|
| 115 | | 8. Pearson/Fasta 17. PAUP |
---|
| 116 | | 9. Zuker 18. Pretty (out-only) |
---|
[2345] | 117 | |
---|
| 118 | Pretty format options: |
---|
| 119 | -wid[th]=# sequence line width |
---|
| 120 | -tab=# left indent |
---|
| 121 | -col[space]=# column space within sequence line on output |
---|
| 122 | -gap[count] count gap chars in sequence numbers |
---|
| 123 | -nameleft, -nameright[=#] name on left/right side [=max width] |
---|
| 124 | -nametop name at top/bottom |
---|
| 125 | -numleft, -numright seq index on left/right side |
---|
| 126 | -numtop, -numbot index on top/bottom |
---|
| 127 | -match[=.] use match base for 2..n species |
---|
| 128 | -inter[line=#] blank line(s) between sequence blocks |
---|
| 129 | |
---|
| 130 | |
---|
| 131 | |
---|
| 132 | Recent changes: |
---|
| 133 | |
---|
| 134 | 4 May 92 |
---|
[10842] | 135 | |
---|
[2345] | 136 | + added 32 bit CRC checksum as alternative to GCG 6.5bit checksum |
---|
[10842] | 137 | |
---|
[2345] | 138 | Aug 92 |
---|
[10842] | 139 | |
---|
[2345] | 140 | = fixed Olsen format input to handle files w/ more sequences, |
---|
| 141 | not to mess up when more than one seq has same identifier, |
---|
| 142 | and to convert number masks to symbols. |
---|
| 143 | = IG format fix to understand ^L |
---|
[10842] | 144 | |
---|
[2345] | 145 | 30 Dec 92 |
---|
[10842] | 146 | |
---|
[2345] | 147 | * revised command-line & interactive interface. Suggested form is now |
---|
[10842] | 148 | |
---|
[2345] | 149 | readseq infile -format=genbank -output=outfile -item=1,3,4 ... |
---|
[10842] | 150 | |
---|
[2345] | 151 | but remains compatible with prior commandlines: |
---|
[10842] | 152 | |
---|
[2345] | 153 | readseq infile -f2 -ooutfile -i3 ... |
---|
[10842] | 154 | |
---|
[2345] | 155 | + added GCG MSF multi sequence file format |
---|
| 156 | + added PIR/CODATA format |
---|
| 157 | + added NCBI ASN.1 sequence file format |
---|
| 158 | + added Pretty, multi sequence pretty output (only) |
---|
| 159 | + added PAUP multi seq format |
---|
| 160 | + added degap option |
---|
| 161 | + added Gary Williams (GWW, G.Williams@CRC.AC.UK) reverse-complement option. |
---|
| 162 | + added support for reading Phylip formats (interleave & sequential) |
---|
| 163 | * string fixes, dropped need for compiler flags NOSTR, FIXTOUPPER, NEEDSTRCASECMP |
---|
| 164 | * changed 32bit checksum to default, -DSMALLCHECKSUM for GCG version |
---|
| 165 | |
---|
| 166 | 1Feb93 |
---|
[10842] | 167 | |
---|
| 168 | = reverted Genbank output format to fixed left margin |
---|
[2345] | 169 | (change in 30 Dec release), so GDE and others relying on fixed margin |
---|
| 170 | can read this. |
---|