| 1 |  | 
|---|
| 2 | |  * ReadSeq  -- 1 Feb 93 | 
|---|
| 3 | |  * | 
|---|
| 4 | |  * Reads and writes nucleic/protein sequences in various | 
|---|
| 5 | |  * formats. Data files may have multiple sequences. | 
|---|
| 6 | |  * | 
|---|
| 7 | |  * Copyright 1990 by d.g.gilbert | 
|---|
| 8 | |  * biology dept., indiana university, bloomington, in 47405 | 
|---|
| 9 | |  * e-mail: gilbertd@bio.indiana.edu | 
|---|
| 10 | |  * | 
|---|
| 11 | |  * This program may be freely copied and used by anyone. | 
|---|
| 12 | |  * Developers are encourged to incorporate parts in their | 
|---|
| 13 | |  * programs, rather than devise their own private sequence | 
|---|
| 14 | |  * format. | 
|---|
| 15 | |  * | 
|---|
| 16 | |  * This should compile and run with any ANSI C compiler. | 
|---|
| 17 | |  * Please advise me of any bugs, additions or corrections. | 
|---|
| 18 |  | 
|---|
| 19 | Readseq has been updated.   There have been a number of enhancements | 
|---|
| 20 | and a few bug corrections since the previous general release in Nov 91 | 
|---|
| 21 | (see below).  If you are using earlier versions, I recommend you update to | 
|---|
| 22 | this release. | 
|---|
| 23 |  | 
|---|
| 24 | Readseq is particularly useful as it automatically detects many | 
|---|
| 25 | sequence formats, and interconverts among them. | 
|---|
| 26 | Formats added to this release include | 
|---|
| 27 | + MSF multi sequence format used by GCG software | 
|---|
| 28 | + PAUP's multiple sequence (NEXUS) format | 
|---|
| 29 | + PIR/CODATA format used by PIR | 
|---|
| 30 | + ASN.1 format used by NCBI | 
|---|
| 31 | + Pretty print with various options for nice looking output. | 
|---|
| 32 |  | 
|---|
| 33 | As well, Phylip format can now be used as input.  Options to | 
|---|
| 34 | reverse-compliment and to degap sequences have been added.  A menu | 
|---|
| 35 | addition for users of the GDE sequence editor is included. | 
|---|
| 36 |  | 
|---|
| 37 | This program is available thru Internet gopher, as | 
|---|
| 38 |  | 
|---|
| 39 | gopher ftp.bio.indiana.edu | 
|---|
| 40 | browse into the IUBio-Software+Data/molbio/readseq/ folder | 
|---|
| 41 | select the readseq.shar document | 
|---|
| 42 |  | 
|---|
| 43 | Or thru anonymous FTP in this manner: | 
|---|
| 44 | my_computer> ftp  ftp.bio.indiana.edu  (or IP address 129.79.224.25) | 
|---|
| 45 | username:  anonymous | 
|---|
| 46 | password:  my_username@my_computer | 
|---|
| 47 | ftp> cd molbio/readseq | 
|---|
| 48 | ftp> get readseq.shar | 
|---|
| 49 | ftp> bye | 
|---|
| 50 |  | 
|---|
| 51 | readseq.shar is a Unix shell archive of the readseq files. | 
|---|
| 52 | This file can be editted by any text editor to reconstitute the | 
|---|
| 53 | original files, for those who do not have a Unix system or an | 
|---|
| 54 | Unshar program.  Read the top of this .shar file for further | 
|---|
| 55 | instructions. | 
|---|
| 56 |  | 
|---|
| 57 | There are also pre-compiled executables for the following computers: | 
|---|
| 58 | Silicon Graphics Iris, Sparc (Sun Sparcstation & clones), VMS-Vax, | 
|---|
| 59 | Macintosh. Use binary ftp to transfer these, except Macintosh.  The | 
|---|
| 60 | Mac version is just the command-line program in a window, not very | 
|---|
| 61 | handy. | 
|---|
| 62 |  | 
|---|
| 63 | C source files: | 
|---|
| 64 | readseq.c ureadseq.c ureadasn.c ureadseq.h | 
|---|
| 65 | Document files: | 
|---|
| 66 | Readme (this doc) | 
|---|
| 67 | Readseq.help (longer than this doc) | 
|---|
| 68 | Formats (description of sequence file formats) | 
|---|
| 69 | add.gdemenu (GDE program users can add this to the .GDEmenu file) | 
|---|
| 70 | Stdfiles -- test sequence files | 
|---|
| 71 | Makefile -- Unix make file | 
|---|
| 72 | Make.com -- VMS make file | 
|---|
| 73 | *.std    -- files for testing validity of readseq | 
|---|
| 74 |  | 
|---|
| 75 |  | 
|---|
| 76 | Example usage: | 
|---|
| 77 | readseq | 
|---|
| 78 | -- for interactive use | 
|---|
| 79 | readseq my.1st.seq  my.2nd.seq  -all  -format=genbank  -output=my.gb | 
|---|
| 80 | -- convert all of two input files to one genbank format output file | 
|---|
| 81 | readseq my.seq -all -form=pretty -nameleft=3 -numleft -numright -numtop -match | 
|---|
| 82 | -- output to standard output a file in a pretty format | 
|---|
| 83 | readseq my.seq -item=9,8,3,2 -degap -CASE -rev -f=msf -out=my.rev | 
|---|
| 84 | -- select 4 items from input, degap, reverse, and uppercase them | 
|---|
| 85 | cat *.seq | readseq -pipe -all -format=asn > bunch-of.asn | 
|---|
| 86 | -- pipe a bunch of data thru readseq, converting all to asn | 
|---|
| 87 |  | 
|---|
| 88 |  | 
|---|
| 89 | The brief usage of readseq is as follows. The "[]" denote | 
|---|
| 90 | optional parts of the syntax: | 
|---|
| 91 |  | 
|---|
| 92 | readseq -help | 
|---|
| 93 | readSeq (27Dec92), multi-format molbio sequence reader. | 
|---|
| 94 | usage: readseq [-options] in.seq > out.seq | 
|---|
| 95 | options | 
|---|
| 96 | -a[ll]         select All sequences | 
|---|
| 97 | -c[aselower]   change to lower case | 
|---|
| 98 | -C[ASEUPPER]   change to UPPER CASE | 
|---|
| 99 | -degap[=-]     remove gap symbols | 
|---|
| 100 | -i[tem=2,3,4]  select Item number(s) from several | 
|---|
| 101 | -l[ist]        List sequences only | 
|---|
| 102 | -o[utput=]out.seq  redirect Output | 
|---|
| 103 | -p[ipe]        Pipe (command line, <stdin, >stdout) | 
|---|
| 104 | -r[everse]     change to Reverse-complement | 
|---|
| 105 | -v[erbose]     Verbose progress | 
|---|
| 106 | -f[ormat=]#    Format number for output,  or | 
|---|
| 107 | -f[ormat=]Name Format name for output: | 
|---|
| 108 | |  1. IG/Stanford           10. Olsen (in-only) | 
|---|
| 109 | |  2. GenBank/GB            11. Phylip3.2 | 
|---|
| 110 | |  3. NBRF                  12. Phylip | 
|---|
| 111 | |  4. EMBL                  13. Plain/Raw | 
|---|
| 112 | |  5. GCG                   14. PIR/CODATA | 
|---|
| 113 | |  6. DNAStrider            15. MSF | 
|---|
| 114 | |  7. Fitch                 16. ASN.1 | 
|---|
| 115 | |  8. Pearson/Fasta         17. PAUP | 
|---|
| 116 | |  9. Zuker                 18. Pretty (out-only) | 
|---|
| 117 |  | 
|---|
| 118 | Pretty format options: | 
|---|
| 119 | -wid[th]=#            sequence line width | 
|---|
| 120 | -tab=#                left indent | 
|---|
| 121 | -col[space]=#         column space within sequence line on output | 
|---|
| 122 | -gap[count]           count gap chars in sequence numbers | 
|---|
| 123 | -nameleft, -nameright[=#]   name on left/right side [=max width] | 
|---|
| 124 | -nametop              name at top/bottom | 
|---|
| 125 | -numleft, -numright   seq index on left/right side | 
|---|
| 126 | -numtop, -numbot      index on top/bottom | 
|---|
| 127 | -match[=.]            use match base for 2..n species | 
|---|
| 128 | -inter[line=#]        blank line(s) between sequence blocks | 
|---|
| 129 |  | 
|---|
| 130 |  | 
|---|
| 131 |  | 
|---|
| 132 | Recent changes: | 
|---|
| 133 |  | 
|---|
| 134 | 4 May 92 | 
|---|
| 135 |  | 
|---|
| 136 | + added 32 bit CRC checksum as alternative to GCG 6.5bit checksum | 
|---|
| 137 |  | 
|---|
| 138 | Aug 92 | 
|---|
| 139 |  | 
|---|
| 140 | = fixed Olsen format input to handle files w/ more sequences, | 
|---|
| 141 | not to mess up when more than one seq has same identifier, | 
|---|
| 142 | and to convert number masks to symbols. | 
|---|
| 143 | = IG format fix to understand ^L | 
|---|
| 144 |  | 
|---|
| 145 | 30 Dec 92 | 
|---|
| 146 |  | 
|---|
| 147 | * revised command-line & interactive interface.  Suggested form is now | 
|---|
| 148 |  | 
|---|
| 149 | readseq infile -format=genbank -output=outfile -item=1,3,4 ... | 
|---|
| 150 |  | 
|---|
| 151 | but remains compatible with prior commandlines: | 
|---|
| 152 |  | 
|---|
| 153 | readseq infile -f2 -ooutfile -i3 ... | 
|---|
| 154 |  | 
|---|
| 155 | + added GCG MSF multi sequence file format | 
|---|
| 156 | + added PIR/CODATA format | 
|---|
| 157 | + added NCBI ASN.1 sequence file format | 
|---|
| 158 | + added Pretty, multi sequence pretty output (only) | 
|---|
| 159 | + added PAUP multi seq format | 
|---|
| 160 | + added degap option | 
|---|
| 161 | + added Gary Williams (GWW, G.Williams@CRC.AC.UK) reverse-complement option. | 
|---|
| 162 | + added support for reading Phylip formats (interleave & sequential) | 
|---|
| 163 | * string fixes, dropped need for compiler flags NOSTR, FIXTOUPPER, NEEDSTRCASECMP | 
|---|
| 164 | * changed 32bit checksum to default, -DSMALLCHECKSUM for GCG version | 
|---|
| 165 |  | 
|---|
| 166 | 1Feb93 | 
|---|
| 167 |  | 
|---|
| 168 | = reverted Genbank output format to fixed left margin | 
|---|
| 169 | (change in 30 Dec release), so GDE and others relying on fixed margin | 
|---|
| 170 | can read this. | 
|---|