| 1 | SEQUENCE DATA SUBMISSION FORM |
|---|
| 2 | |
|---|
| 3 | |
|---|
| 4 | This form solicits the information needed for a nucleotide or amino acid |
|---|
| 5 | sequence database entry. It can be filled in using any text editor or |
|---|
| 6 | printed and filled in by hand. By completing and returning it to us |
|---|
| 7 | promptly you help us to enter your data in the database accurately and |
|---|
| 8 | rapidly. These data will be shared among the following databases: EMBL |
|---|
| 9 | Data Library (Heidelberg, W. Germany); GenBank (Los Alamos, NM, U.S.A. and |
|---|
| 10 | Mountain View, CA, U.S.A), DNA Data Bank of Japan (DDBJ; Tokyo, Japan); |
|---|
| 11 | National Biomedical Research Foundation Protein Identification Resource |
|---|
| 12 | (NBRF-PIR; Washington, D.C., U.S.A.); Martinsried Institute for Protein |
|---|
| 13 | Sequence Data (MIPS; Martinsried, W. Germany) and International Protein |
|---|
| 14 | Information Database in Japan (JIPID; Tokyo). |
|---|
| 15 | |
|---|
| 16 | Please answer all questions which apply to your data. If you submit 2 or |
|---|
| 17 | more non-contiguous sequences, copy and fill out this form for each |
|---|
| 18 | additional sequence. When submitting nucleic acid sequences containing |
|---|
| 19 | protein coding regions, please include a translation (SEPARATELY from the |
|---|
| 20 | nucleic acid sequence). Then send us (1) this form, (2) a pre- or reprint |
|---|
| 21 | of any manuscript which pertains to these data, and (3) your sequence data. |
|---|
| 22 | You can send these materials (a) electronically via computer network, (b) on |
|---|
| 23 | magnetic tape, or (c) on a floppy diskette. More detailed information about |
|---|
| 24 | formats for submitted data is included at the end of this form. |
|---|
| 25 | |
|---|
| 26 | our mailing address: EMBL Data Library Submissions, Postfach 10.2209 |
|---|
| 27 | D-6900 Heidelberg, West Germany |
|---|
| 28 | telephone: (06221) 387 258 |
|---|
| 29 | computer network: datasubs@embl.earn (for data submissions) |
|---|
| 30 | datalib@embl.earn (for general inquiries) |
|---|
| 31 | |
|---|
| 32 | Please include in your submission any additional sequence data which is not |
|---|
| 33 | reported in your manuscript but which has been reliably determined (for |
|---|
| 34 | example, introns or flanking sequences). |
|---|
| 35 | |
|---|
| 36 | When we receive this material we will assign the data an accession number, |
|---|
| 37 | which serves as a reference that permanently identifies them in the |
|---|
| 38 | database. We will inform you what accession number your data have been |
|---|
| 39 | given and we recommend that you cite this number when referring to these |
|---|
| 40 | data in publications. |
|---|
| 41 | |
|---|
| 42 | If new data become available which would make the database entry more |
|---|
| 43 | informative (e.g., function of the gene product or location of important |
|---|
| 44 | sites within the sequence), or if you discover errors in the sequence, we |
|---|
| 45 | urge you to contact us so that we can update your entry. |
|---|
| 46 | |
|---|
| 47 | Thank you. |
|---|
| 48 | |
|---|
| 49 | |
|---|
| 50 | I. GENERAL INFORMATION |
|---|
| 51 | ============================================================================== |
|---|
| 52 | Your name $(YOUR NAME) |
|---|
| 53 | ------------------------------------------------------------------------------ |
|---|
| 54 | Institution $(INSTITUTION) |
|---|
| 55 | ------------------------------------------------------------------------------ |
|---|
| 56 | Address $(ADDRESS) |
|---|
| 57 | ------------------------------------------------------------------------------ |
|---|
| 58 | Computer mail address $(MAIL) Telex number |
|---|
| 59 | ------------------------------------------------------------------------------ |
|---|
| 60 | Telephone $(PHONE) Telefax number $(TELEFAX) |
|---|
| 61 | ============================================================================== |
|---|
| 62 | On what medium and in what format are you sending us your sequence data? |
|---|
| 63 | (see instructions at the end of this form) |
|---|
| 64 | [X] electronic mail |
|---|
| 65 | [ ] diskette |
|---|
| 66 | computer:Commodore operating system:MS DOS editor: |
|---|
| 67 | [ ] magnetic tape |
|---|
| 68 | record length: blocksize: label type: |
|---|
| 69 | density [ ] 800 [ ] 1600 [ ] 6250 |
|---|
| 70 | character code [ ] ASCII [ ] EBCDIC |
|---|
| 71 | ============================================================================== |
|---|
| 72 | |
|---|
| 73 | |
|---|
| 74 | II. CITATION INFORMATION |
|---|
| 75 | ============================================================================== |
|---|
| 76 | These data are [ ] published [X] in press [ ] submitted [ ] in preparation |
|---|
| 77 | [ ] no plans to publish |
|---|
| 78 | ------------------------------------------------------------------------------ |
|---|
| 79 | authors $(author) |
|---|
| 80 | ------------------------------------------------------------------------------ |
|---|
| 81 | title of paper $(title) ------------------------------------------------------------------------------ |
|---|
| 82 | journal volume, first-last pages, $(journal) ------------------------------------------------------------------------------ |
|---|
| 83 | Do you agree that these data can be made available in the database before |
|---|
| 84 | they appear in print? |
|---|
| 85 | [x] yes [ ] no, they should be made available only after publication. |
|---|
| 86 | estimated date: $(DATE) |
|---|
| 87 | ============================================================================== |
|---|
| 88 | Does the sequence which you are sending with this form include data that |
|---|
| 89 | do NOT appear in the above citation? |
|---|
| 90 | [X] no [ ] yes, from position ______ to ______ [ ] base pairs OR |
|---|
| 91 | [ ] amino acid residues |
|---|
| 92 | (If your sequence contains 2 or more such spans, use the feature |
|---|
| 93 | table in section IV to indicate their positions) |
|---|
| 94 | If so, how should these data be cited in the database? |
|---|
| 95 | [ ] published [ ] in press [ ] submitted [ ] in preparation |
|---|
| 96 | [ ] no plans to publish |
|---|
| 97 | ------------------------------------------------------------------------------ |
|---|
| 98 | authors |
|---|
| 99 | ------------------------------------------------------------------------------ |
|---|
| 100 | address (if different from that given in section I) |
|---|
| 101 | |
|---|
| 102 | |
|---|
| 103 | ------------------------------------------------------------------------------ |
|---|
| 104 | title of paper |
|---|
| 105 | |
|---|
| 106 | ------------------------------------------------------------------------------ |
|---|
| 107 | journal volume, first-last pages, year |
|---|
| 108 | ============================================================================== |
|---|
| 109 | List references to papers and/or database entries which report sequences |
|---|
| 110 | overlapping with that submitted here. |
|---|
| 111 | |
|---|
| 112 | 1st author journal, vol., pages, year and/or database, accession number |
|---|
| 113 | ------------------------------------------------------------------------------ |
|---|
| 114 | |
|---|
| 115 | ------------------------------------------------------------------------------ |
|---|
| 116 | |
|---|
| 117 | ============================================================================== |
|---|
| 118 | |
|---|
| 119 | |
|---|
| 120 | III. DESCRIPTION OF SEQUENCED SEGMENT |
|---|
| 121 | |
|---|
| 122 | Wherever possible, please use standard nomenclature or conventions. If a |
|---|
| 123 | question is not applicable to your sequence, answer by writing N.A. in the |
|---|
| 124 | appropriate space; if the information is relevant but not available, write |
|---|
| 125 | a question mark (?). |
|---|
| 126 | ============================================================================== |
|---|
| 127 | What kind of molecule did you sequence? (check all boxes which apply) |
|---|
| 128 | |
|---|
| 129 | [X] genomic DNA [ ] genomic RNA [ ] virus or [ ] provirus |
|---|
| 130 | [ ] cDNA to mRNA [ ] cDNA to genomic RNA |
|---|
| 131 | [ ] organelle DNA [ ] organelle RNA please specify organelle: |
|---|
| 132 | [ ] tRNA [ ] rRNA [ ] snRNA [ ] scRNA |
|---|
| 133 | [ ] other nucleic acid. please specify: |
|---|
| 134 | [ ] peptide [ ] sequence assembled by [ ] overlap of sequenced fragments |
|---|
| 135 | [ ] homology with related sequence |
|---|
| 136 | [ ] other. please specify: |
|---|
| 137 | |
|---|
| 138 | [ ] partial: [ ] N-terminal |
|---|
| 139 | [ ] C-terminal |
|---|
| 140 | [ ] internal fragment |
|---|
| 141 | ============================================================================== |
|---|
| 142 | length of sequence $(SEQ_LEN) [X] base pairs or [ ] amino acid residues |
|---|
| 143 | ------------------------------------------------------------------------------ |
|---|
| 144 | gene name(s) (e.g., lacZ) $(gene) |
|---|
| 145 | ------------------------------------------------------------------------------ |
|---|
| 146 | gene product name(s) (e.g., beta-D-galactosidase) $(gene) |
|---|
| 147 | ------------------------------------------------------------------------------ |
|---|
| 148 | Enzyme Commission number (e.g., EC 3.2.1.23) |
|---|
| 149 | ------------------------------------------------------------------------------ |
|---|
| 150 | gene product subunit structure (e.g., hemoglobin alpha-2 beta-2) |
|---|
| 151 | ============================================================================== |
|---|
| 152 | The following items refer to the original source of the molecule you have |
|---|
| 153 | sequenced. |
|---|
| 154 | organism ---- name $(full_name) |
|---|
| 155 | ------------------------------------------------------------------------------ |
|---|
| 156 | sub-species strain $(strain) |
|---|
| 157 | ------------------------------------------------------------------------------ |
|---|
| 158 | name/number of individual or isolate (e.g., patient 123; influenza virus |
|---|
| 159 | A/PR/8/34) |
|---|
| 160 | ------------------------------------------------------------------------------ |
|---|
| 161 | developmental stage [ ] germ line [ ] rearranged |
|---|
| 162 | ------------------------------------------------------------------------------ |
|---|
| 163 | haplotype tissue type cell type |
|---|
| 164 | ============================================================================== |
|---|
| 165 | The following items refer to the immediate experimental source of the |
|---|
| 166 | submitted sequence. |
|---|
| 167 | name of cell line (e.g., Hela; 3T3-L1) |
|---|
| 168 | ------------------------------------------------------------------------------ |
|---|
| 169 | library (type; name) clone(s) |
|---|
| 170 | ============================================================================== |
|---|
| 171 | The following items refer to the position of the submitted sequence in the |
|---|
| 172 | genome. |
|---|
| 173 | chromosome (or segment) name/number |
|---|
| 174 | ------------------------------------------------------------------------------ |
|---|
| 175 | map position units: [ ] genome % [ ] nucleotide number |
|---|
| 176 | [ ] other: |
|---|
| 177 | ============================================================================== |
|---|
| 178 | Using single words or short phrases, describe the properties of the sequence |
|---|
| 179 | in terms of: |
|---|
| 180 | |
|---|
| 181 | - its associated phenotype(s); |
|---|
| 182 | - the biological/enzymatic activity of its product; |
|---|
| 183 | - the general functional classification of the gene and/or gene product |
|---|
| 184 | - macromolecules to which the gene product can bind (e.g., DNA, calcium, |
|---|
| 185 | other proteins); |
|---|
| 186 | - subcellular localization of the gene product; |
|---|
| 187 | - any other relevant information. |
|---|
| 188 | |
|---|
| 189 | Example (for the viral erbB nucleotide sequence): transforming capacity; EGF |
|---|
| 190 | receptor-related; tyrosine kinase; oncogene; transmembrane protein. |
|---|
| 191 | |
|---|
| 192 | |
|---|
| 193 | ============================================================================== |
|---|
| 194 | |
|---|
| 195 | |
|---|
| 196 | IV. FEATURES OF THE SEQUENCE |
|---|
| 197 | |
|---|
| 198 | Please list below the types and locations of all significant features |
|---|
| 199 | experimentally identified within the sequence. Be sure that your sequence |
|---|
| 200 | is numbered beginning with "1." |
|---|
| 201 | |
|---|
| 202 | In the column marked fill in |
|---|
| 203 | |
|---|
| 204 | feature type of feature (see information below) |
|---|
| 205 | from number of first base/amino acid in the feature |
|---|
| 206 | to number of last base/amino acid in the feature |
|---|
| 207 | bp x, if numbering refers to position of a base pair in |
|---|
| 208 | a nucleotide sequence |
|---|
| 209 | aa x, if numbering refers to position of an amino acid |
|---|
| 210 | residue in a peptide sequence |
|---|
| 211 | id indicate method by which the feature was identified. |
|---|
| 212 | E = experimentally; S = by similarity to known |
|---|
| 213 | sequence or to an established consensus sequence; P = |
|---|
| 214 | by similarity to some other pattern, such as an |
|---|
| 215 | open reading frame |
|---|
| 216 | comp x, if feature is located on the nucleic acid strand |
|---|
| 217 | complementary to that reported here |
|---|
| 218 | |
|---|
| 219 | Significant features include: |
|---|
| 220 | |
|---|
| 221 | - regulatory signals (e.g., promoters, attenuators, enhancers) |
|---|
| 222 | - transcribed regions (e.g., mRNA, rRNA, tRNA). (indicate reading frame |
|---|
| 223 | if start and stop codons are not present) |
|---|
| 224 | - regions subject to post-transcriptional modificaton (e.g., introns, |
|---|
| 225 | modified bases) |
|---|
| 226 | - translated regions |
|---|
| 227 | - extent of signal peptide, prepropeptide, propeptide, mature peptide |
|---|
| 228 | - regions subject to post-translational modification (e.g., glycosylated |
|---|
| 229 | or phosphorylated sites) |
|---|
| 230 | - other domains/sites of interest (e.g., extracellular domain, DNA- |
|---|
| 231 | binding domain, active site, inhibitory site) |
|---|
| 232 | - sites involved in bonding (disulfide, thiolester, intrachain, interchain) |
|---|
| 233 | - regions of protein secondary structure (e.g., alpha helix or beta sheet) |
|---|
| 234 | - conflicts with sequence data reported by other authors |
|---|
| 235 | - variations and polymorphisms |
|---|
| 236 | |
|---|
| 237 | The first 2 lines of the table are filled in with examples. |
|---|
| 238 | |
|---|
| 239 | ============================================================================== |
|---|
| 240 | Numbering for features on submitted sequence [X] matches manuscript |
|---|
| 241 | [ ] does not match manuscript |
|---|
| 242 | ============================================================================== |
|---|
| 243 | feature from to bp aa id comp |
|---|
| 244 | ------------------------------------------------------------------------------ |
|---|
| 245 | EXAMPLE TATA box 1 8 x S |
|---|
| 246 | ------------------------------------------------------------------------------ |
|---|
| 247 | EXAMPLE exon 1 9 264 x |
|---|
| 248 | ============================================================================== |
|---|
| 249 | $(gene) 1 $(SEQ_LEN) x |
|---|
| 250 | |
|---|
| 251 | ------------------------------------------------------------------------------ |
|---|
| 252 | $(tax) |
|---|
| 253 | ------------------------------------------------------------------------------ |
|---|
| 254 | |
|---|
| 255 | ------------------------------------------------------------------------------ |
|---|
| 256 | |
|---|
| 257 | ------------------------------------------------------------------------------ |
|---|
| 258 | |
|---|
| 259 | ------------------------------------------------------------------------------ |
|---|
| 260 | |
|---|
| 261 | ------------------------------------------------------------------------------ |
|---|
| 262 | |
|---|
| 263 | ------------------------------------------------------------------------------ |
|---|
| 264 | |
|---|
| 265 | ============================================================================== |
|---|
| 266 | |
|---|
| 267 | |
|---|
| 268 | |
|---|
| 269 | FORMATS FOR SUBMITTED DATA |
|---|
| 270 | |
|---|
| 271 | We are happy to accept data submitted in any of the following formats: |
|---|
| 272 | |
|---|
| 273 | (1) Electronic file transfer: files can be sent via computer network to: |
|---|
| 274 | DATASUBS@EMBL.EARN. This BITNET/EARN address can be reached via various |
|---|
| 275 | gateways from Arpanet, Usenet, JANET, etc. Ask your local network expert |
|---|
| 276 | for help or phone us. |
|---|
| 277 | |
|---|
| 278 | (2) Magnetic tapes: 9-track only (fixed-length records preferred); 800, |
|---|
| 279 | 1600 or 6250 bpi (any blocksize); ASCII or EBCDIC character codes; any label |
|---|
| 280 | type or unlabelled. |
|---|
| 281 | |
|---|
| 282 | (3) Floppy disks: we can read Macintosh diskettes and 5-1/4" diskettes from |
|---|
| 283 | MS-DOS systems. |
|---|
| 284 | |
|---|
| 285 | Whatever format you choose, we would appreciate receiving the sequence data |
|---|
| 286 | in a form which conforms as closely as possible to the following standards. |
|---|
| 287 | |
|---|
| 288 | - Each sequence should include the names of the authors. |
|---|
| 289 | |
|---|
| 290 | - Each distinct sequence should be listed separately using the same number |
|---|
| 291 | of bases/residues per line. The length of each sequence in bases/ |
|---|
| 292 | residues should be clearly indicated. |
|---|
| 293 | |
|---|
| 294 | - Enumeration should begin with a "1" and continue in the direction 5' to |
|---|
| 295 | 3' (or amino- to carboxy-terminus). |
|---|
| 296 | |
|---|
| 297 | - Amino acid sequences should be listed using the one-letter code. |
|---|
| 298 | |
|---|
| 299 | - Translations of protein coding regions in nucleotide sequences should |
|---|
| 300 | be submitted in a separate computer file from the nucleotide sequences |
|---|
| 301 | themselves. |
|---|
| 302 | |
|---|
| 303 | - The code for representing the sequence characters should conform to the |
|---|
| 304 | IUPAC-IUB standards, which are described in: Nucl. Acids Res. 13: 3021- |
|---|
| 305 | 3030 (1985) (for nucleic acids) and J. Biol. Chem. 243: 3557-3559 |
|---|
| 306 | (1968) and Eur. J. Biochem 5: 151-153 (1968) (for amino acids). |
|---|
| 307 | |
|---|
| 308 | $(SEQUENCE) |
|---|
| 309 | |
|---|
| 310 | |
|---|