| 1 | ------------------------------------------------------ |
|---|
| 2 | |
|---|
| 3 | EMBL NUCLEOTIDE SEQUENCE DATABASE SUBMISSION FORM |
|---|
| 4 | |
|---|
| 5 | HOW TO USE THIS FORM - PLEASE READ FIRST |
|---|
| 6 | |
|---|
| 7 | 1) WEBIN: THE WORLD WIDE WEB SUBMISSION TOOL |
|---|
| 8 | ============================================ |
|---|
| 9 | If you have access to the World Wide Web then DO NOT use this form. Use the |
|---|
| 10 | WebIn form on the World Wide Web at |
|---|
| 11 | |
|---|
| 12 | ############################################## |
|---|
| 13 | # http://www.ebi.ac.uk/submission/webin.html # |
|---|
| 14 | ############################################## |
|---|
| 15 | |
|---|
| 16 | If you do not have access to the World Wide Web then please use this form |
|---|
| 17 | and email it to DATASUBS@EBI.AC.UK. |
|---|
| 18 | |
|---|
| 19 | It is only necessary to submit to one database. Public data are exchanged |
|---|
| 20 | between EMBL, GenBank and DDBJ on a daily basis. |
|---|
| 21 | |
|---|
| 22 | 2) MULTIPLE SUBMISSIONS |
|---|
| 23 | ======================= |
|---|
| 24 | If you have more than one but less than 25 sequences to submit, copy this |
|---|
| 25 | form and send all the submissions together in one email with a note saying |
|---|
| 26 | how many sequences you are sending. |
|---|
| 27 | |
|---|
| 28 | 3) BULK SUBMISSIONS |
|---|
| 29 | =================== |
|---|
| 30 | If you have more than 25 related sequences to submit DO NOT send them all |
|---|
| 31 | using this form. Instead email DATASUBS@EBI.AC.UK and include the following |
|---|
| 32 | information |
|---|
| 33 | a) how many sequences you are going to submit |
|---|
| 34 | b) a short explanation of how the sequences are related |
|---|
| 35 | c) what type of differences there are between the entries (e.g. isolate) |
|---|
| 36 | d) one completed email submission form as an example |
|---|
| 37 | You will be contacted by a curator who will create a template for you which |
|---|
| 38 | you should then use to submit all of the sequences. |
|---|
| 39 | |
|---|
| 40 | 4) UPDATES |
|---|
| 41 | ========== |
|---|
| 42 | DO NOT use this form for submitting updates or corrections. |
|---|
| 43 | If you are sending an update please complete the update form available on |
|---|
| 44 | the web at: http://www.ebi.ac.uk/ebi_docs/update.html or get a copy of the |
|---|
| 45 | update form via anonymous FTP: |
|---|
| 46 | ftp://ftp.ebi.ac.uk/pub/databases/embl/release/update.doc |
|---|
| 47 | If you need help with updates contact UPDATE@EBI.AC.UK |
|---|
| 48 | |
|---|
| 49 | 5) PROTEIN SEQUENCES |
|---|
| 50 | ==================== |
|---|
| 51 | DO NOT use this form to submit protein sequences. |
|---|
| 52 | For submissions to the SWISS-PROT protein sequence databank access the |
|---|
| 53 | World Wide Web at http://www.ebi.ac.uk/ebi_docs/swissprot_db/swisshome.html |
|---|
| 54 | or email DATALIB@EBI.AC.UK |
|---|
| 55 | |
|---|
| 56 | 6) ACCESSION NUMBERS AND CONFIDENTIALITY |
|---|
| 57 | ======================================== |
|---|
| 58 | Your data can be made public immediately, or they can be kept confidential |
|---|
| 59 | until a release date which you provide. Confidential data are ALWAYS made |
|---|
| 60 | available to the public after publication. |
|---|
| 61 | |
|---|
| 62 | If your data contain all the information we require we will assign unique |
|---|
| 63 | accession numbers within two working days. We will email you to tell you |
|---|
| 64 | the new accession numbers. |
|---|
| 65 | |
|---|
| 66 | You should submit your sequence data BEFORE you have galley proofs. We |
|---|
| 67 | suggest that the following text be used to cite the accession number(s) in |
|---|
| 68 | publication(s): "The nucleotide sequence data reported in this paper will |
|---|
| 69 | appear in the DDBJ/EMBL/GenBank Nucleotide Sequence Database under the |
|---|
| 70 | accession number(s) ________" |
|---|
| 71 | |
|---|
| 72 | 7) FORM FILLING INSTRUCTIONS |
|---|
| 73 | ============================ |
|---|
| 74 | |
|---|
| 75 | <============== DO NOT EXCEED THIS LINE WIDTH IN YOUR REPLY ==============> |
|---|
| 76 | |
|---|
| 77 | To display this form properly choose a fixed width font (e.g. Courier) in |
|---|
| 78 | your editor. If you are saving files in a word processing program then |
|---|
| 79 | please save the file as TEXT ONLY WITH LINE BREAKS. (To do this in |
|---|
| 80 | Microscoft Word you will need to choose File, Save as, Save file type as, |
|---|
| 81 | and select Text only with line breaks). Please do not send files that are |
|---|
| 82 | saved in Word or Wordperfect format. Processing of the submission may be |
|---|
| 83 | delayed if your email is text wrapped, encoded or binhexed. |
|---|
| 84 | |
|---|
| 85 | ######################################################################## |
|---|
| 86 | # Fill in the form as follows: # |
|---|
| 87 | # a) if there is a colon : then enter text (e.g. Last name : Smith) # |
|---|
| 88 | # b) if there is an empty box [ ] and if the answer is yes then fill # |
|---|
| 89 | # the box with an X (e.g. Genomic DNA [X]) # |
|---|
| 90 | # c) if the option is not relevant then do not enter any text and/or # |
|---|
| 91 | # do not write an X in the box. # |
|---|
| 92 | # d) DO NOT delete lines from this form. # |
|---|
| 93 | ######################################################################## |
|---|
| 94 | |
|---|
| 95 | 8) ENTERING FEATURES AND LOCATIONS |
|---|
| 96 | ================================== |
|---|
| 97 | Enter the feature key from the list given in Appendix I at the end of this |
|---|
| 98 | document. Enter the locations, gene name, product name, and EC number, |
|---|
| 99 | where appropriate. Use < and > in the locations to show whether the feature |
|---|
| 100 | is partial at the 5' end and/or the 3' end. Mark with an X in the box [ ] |
|---|
| 101 | if the feature is on the complementary strand and if you have experimental |
|---|
| 102 | evidence for the feature. |
|---|
| 103 | |
|---|
| 104 | If you do not provide any features or adequate locations and names for the |
|---|
| 105 | features you will be contacted for more information before an accession |
|---|
| 106 | number is assigned to the sequence. For CDS features you must provide a gene |
|---|
| 107 | name AND a product name, even if the product name is putative. |
|---|
| 108 | |
|---|
| 109 | If a CDS is partial at the 5' end then write the codon start number. This |
|---|
| 110 | is the number (1,2 or 3) of the first base of the first complete codon of |
|---|
| 111 | the translation. For example the following CDS is partial and the codon |
|---|
| 112 | start is 2 because the first complete codon, T, starts with the base a, |
|---|
| 113 | which is the second base in the feature. |
|---|
| 114 | DNA tacatcgatg... |
|---|
| 115 | Translation T S M... |
|---|
| 116 | |
|---|
| 117 | FEATURE EXAMPLE NO.1 |
|---|
| 118 | Feature key :CDS |
|---|
| 119 | >From :201 |
|---|
| 120 | To :500 |
|---|
| 121 | Gene name :abcD |
|---|
| 122 | Product name :ABC repressor protein |
|---|
| 123 | Codon start 1,2 or 3 : |
|---|
| 124 | EC number : |
|---|
| 125 | Complementary strand [ ] |
|---|
| 126 | Experimental evidence [X] |
|---|
| 127 | |
|---|
| 128 | FEATURE EXAMPLE NO.2 |
|---|
| 129 | Feature key :rRNA |
|---|
| 130 | >From :<1 |
|---|
| 131 | To :>1500 |
|---|
| 132 | Gene name :16S rRNA |
|---|
| 133 | Product name :16S ribosomal RNA |
|---|
| 134 | Codon start 1,2 or 3 : |
|---|
| 135 | EC number : |
|---|
| 136 | Complementary strand [ ] |
|---|
| 137 | Experimental evidence [ ] |
|---|
| 138 | |
|---|
| 139 | If you have further questions after reading this form please contact |
|---|
| 140 | DATASUBS@EBI.AC.UK |
|---|
| 141 | |
|---|
| 142 | I. CONFIDENTIAL STATUS |
|---|
| 143 | |
|---|
| 144 | Enter an X if you want these data to be confidential [ ] |
|---|
| 145 | If confidential write the release date here : |
|---|
| 146 | (Date format DD-MMM-YYYY e.g. 30-JUN-1998) |
|---|
| 147 | |
|---|
| 148 | |
|---|
| 149 | II. CONTACT INFORMATION |
|---|
| 150 | |
|---|
| 151 | Last name :$(LAST_NAME) |
|---|
| 152 | First name :$(FIRST_NAME) |
|---|
| 153 | Middle initials : |
|---|
| 154 | Department :$(DEPT) |
|---|
| 155 | Institution :$(INSTITUTION) |
|---|
| 156 | Address :$(ADDRESS) |
|---|
| 157 | : |
|---|
| 158 | : |
|---|
| 159 | Country :$(COUNTRY) |
|---|
| 160 | Telephone :$(PHONE) |
|---|
| 161 | Fax :$(TELEFAX) |
|---|
| 162 | Email :$(MAIL) |
|---|
| 163 | |
|---|
| 164 | |
|---|
| 165 | III. CITATION INFORMATION |
|---|
| 166 | |
|---|
| 167 | Author 1 :$(author_1) |
|---|
| 168 | Author 2 :$(author_2) |
|---|
| 169 | Author 3 :$(author_3) |
|---|
| 170 | Author 4 :$(author_4) |
|---|
| 171 | Author 5 :$(author_5) |
|---|
| 172 | Author 6 :$(author_6) |
|---|
| 173 | Author 7 :$(author_7) |
|---|
| 174 | Author 8 :$(author_8) |
|---|
| 175 | Author 9 :$(author_9) |
|---|
| 176 | Author 10 :$(author_10) |
|---|
| 177 | Author 11 :$(author_11) |
|---|
| 178 | Author 12 :$(author_12) |
|---|
| 179 | (e.g. Smith A.B.) |
|---|
| 180 | (Copy line for extra authors) |
|---|
| 181 | Title :$(title) |
|---|
| 182 | Journal :$(journal) |
|---|
| 183 | Volume :$(volume) |
|---|
| 184 | First page :$(page_1) |
|---|
| 185 | Last page :$(page_2) |
|---|
| 186 | Year :$(year_pub) |
|---|
| 187 | Institute (if thesis): |
|---|
| 188 | |
|---|
| 189 | Publication status |
|---|
| 190 | Mark one of the following |
|---|
| 191 | In preparation [ ] |
|---|
| 192 | Accepted [x] |
|---|
| 193 | Published [ ] |
|---|
| 194 | Thesis/Book [ ] |
|---|
| 195 | No plans to publish [ ] |
|---|
| 196 | |
|---|
| 197 | |
|---|
| 198 | IV. SEQUENCE INFORMATION |
|---|
| 199 | |
|---|
| 200 | Sequence length (bp) :$(SEQ_LEN) |
|---|
| 201 | |
|---|
| 202 | Molecule type |
|---|
| 203 | Mark one of the following |
|---|
| 204 | Genomic DNA [ ] |
|---|
| 205 | cDNA to mRNA [ ] |
|---|
| 206 | rRNA [x] |
|---|
| 207 | tRNA [ ] |
|---|
| 208 | Genomic RNA [ ] |
|---|
| 209 | cDNA to genomic RNA [ ] |
|---|
| 210 | |
|---|
| 211 | Mark if either of these apply |
|---|
| 212 | Circular [ ] |
|---|
| 213 | Checked for vector |
|---|
| 214 | contamination [ ] |
|---|
| 215 | |
|---|
| 216 | |
|---|
| 217 | V. SOURCE INFORMATION |
|---|
| 218 | |
|---|
| 219 | Organism :$(full_name) |
|---|
| 220 | Sub species : |
|---|
| 221 | Strain :$(strain) |
|---|
| 222 | Cultivar : |
|---|
| 223 | Variety : |
|---|
| 224 | Isolate/individual : |
|---|
| 225 | Developmental stage : |
|---|
| 226 | Tissue type : |
|---|
| 227 | Cell type : |
|---|
| 228 | Cell line : |
|---|
| 229 | Clone :$(clone) |
|---|
| 230 | Clone (if >1) : |
|---|
| 231 | Clone library : |
|---|
| 232 | Chromosome : |
|---|
| 233 | Map position : |
|---|
| 234 | Haplotype : |
|---|
| 235 | Natural host : |
|---|
| 236 | Laboratory host : |
|---|
| 237 | Macronuclear [ ] |
|---|
| 238 | |
|---|
| 239 | Mark one if immunoglobulin |
|---|
| 240 | or T cell receptor |
|---|
| 241 | Germline [ ] |
|---|
| 242 | Rearranged [ ] |
|---|
| 243 | |
|---|
| 244 | Mark one if viral |
|---|
| 245 | Proviral [ ] |
|---|
| 246 | Virion [ ] |
|---|
| 247 | |
|---|
| 248 | Mark one if from an organelle |
|---|
| 249 | Chloroplast [ ] |
|---|
| 250 | Mitochondrion [ ] |
|---|
| 251 | Chromoplast [ ] |
|---|
| 252 | Kinetoplast [ ] |
|---|
| 253 | Cyanelle [ ] |
|---|
| 254 | Plasmid (not clone) [ ] |
|---|
| 255 | |
|---|
| 256 | Further source information |
|---|
| 257 | (e.g. taxonomy, specimen voucher etc) |
|---|
| 258 | Note :$(tax) |
|---|
| 259 | |
|---|
| 260 | |
|---|
| 261 | VI. FEATURES OF THE SEQUENCE |
|---|
| 262 | |
|---|
| 263 | |
|---|
| 264 | YOU MUST DESCRIBE AT LEAST ONE FEATURE OF THE SEQUENCE OR THERE WILL BE A |
|---|
| 265 | DELAY IN THE PROCESSING OF YOUR SUBMISSION |
|---|
| 266 | |
|---|
| 267 | |
|---|
| 268 | Complete the block below for every feature you need to describe. If you |
|---|
| 269 | have more than one feature copy the block as many times as you require. For |
|---|
| 270 | help see 8) ENTERING FEATURES AND LOCATIONS above. |
|---|
| 271 | |
|---|
| 272 | |
|---|
| 273 | FEATURE NO.1 |
|---|
| 274 | Feature key :$(seq_type) |
|---|
| 275 | >From :$(start) |
|---|
| 276 | To :$(end) |
|---|
| 277 | Gene name :$(gene) |
|---|
| 278 | Product name :$(gene_prod) |
|---|
| 279 | Codon start 1,2 or 3 : |
|---|
| 280 | EC number : |
|---|
| 281 | Complementary strand [ ] |
|---|
| 282 | Experimental evidence [ ] |
|---|
| 283 | |
|---|
| 284 | |
|---|
| 285 | VII. SEQUENCE INFORMATION |
|---|
| 286 | |
|---|
| 287 | Enter the sequence data below |
|---|
| 288 | (IUPAC nucleotide base codes, Nucl. Acids Res. 13: 3021-3030, 1985) |
|---|
| 289 | |
|---|
| 290 | BEGINNING OF SEQUENCE: |
|---|
| 291 | $(SEQUENCE) |
|---|
| 292 | |
|---|
| 293 | END OF SEQUENCE |
|---|
| 294 | |
|---|
| 295 | |
|---|
| 296 | Include the translation for each CDS feature below. |
|---|
| 297 | |
|---|
| 298 | |
|---|
| 299 | BEGINNING OF TRANSLATION: |
|---|
| 300 | |
|---|
| 301 | |
|---|
| 302 | END OF TRANSLATION |
|---|
| 303 | |
|---|
| 304 | |
|---|
| 305 | --------------------------------------------------------------------------- |
|---|
| 306 | These data will be shared among the following databases: DDBJ Database |
|---|
| 307 | (DNA Data Bank of Japan; Mishima, Japan); EMBL Nucleotide Sequence Database |
|---|
| 308 | (EBI, Cambridge, UK); GenBank (NCBI, Bethesda, USA); SWISS-PROT Protein |
|---|
| 309 | Sequence Database (Geneva, Switzerland and Heidelberg, FRG); International |
|---|
| 310 | Protein Information Database in Japan (JIPID; Noda, Japan) Martinsried |
|---|
| 311 | Institute For Protein Sequence Data (MIPS; Martinsried, FRG) National |
|---|
| 312 | Biomedical Research Foundation Protein Identification Resource (NBRF-PIR; |
|---|
| 313 | Washington, D.C., USA.) |
|---|
| 314 | |
|---|
| 315 | EMBL Data Submissions E-mail datasubs@ebi.ac.uk |
|---|
| 316 | European Bioinformatics Inst. Telephone +44 (0)1223 494499 |
|---|
| 317 | Hinxton Hall, Hinxton Telefax +44 (0)1223 494472 |
|---|
| 318 | Cambridge CB10 1SD, UK |
|---|
| 319 | --------------------------------------------------------------------------- |
|---|
| 320 | |
|---|
| 321 | |
|---|
| 322 | |
|---|
| 323 | |
|---|
| 324 | |
|---|
| 325 | |
|---|
| 326 | |
|---|
| 327 | |
|---|
| 328 | |
|---|
| 329 | |
|---|
| 330 | |
|---|
| 331 | |
|---|
| 332 | |
|---|
| 333 | |
|---|
| 334 | |
|---|
| 335 | |
|---|
| 336 | |
|---|
| 337 | |
|---|
| 338 | |
|---|
| 339 | |
|---|
| 340 | |
|---|
| 341 | |
|---|
| 342 | |
|---|
| 343 | |
|---|
| 344 | |
|---|
| 345 | |
|---|
| 346 | |
|---|
| 347 | |
|---|
| 348 | |
|---|
| 349 | |
|---|
| 350 | |
|---|
| 351 | |
|---|
| 352 | |
|---|
| 353 | APPENDIX I FEATURE KEYS |
|---|
| 354 | ======================= |
|---|
| 355 | A full description of features is found in the DDBJ/EMBL/GenBank Feature |
|---|
| 356 | Table Definition Document at |
|---|
| 357 | ftp://ftp.ebi.ac.uk/pub/databases/embl/release/ftable.doc |
|---|
| 358 | and on the EBI website at |
|---|
| 359 | http://www.ebi.ac.uk/ebi_docs/embl_db/ft/feature_table.html |
|---|
| 360 | An abbreviated list of features keys is given below |
|---|
| 361 | |
|---|
| 362 | C_region constant region of immunoglobulin light and heavy chain, |
|---|
| 363 | and T-cell receptor alpha, beta and gamma chains |
|---|
| 364 | CAAT_signal eukaryotic promoter element; consensus=GG(C or T)CAATCT |
|---|
| 365 | CDS protein coding sequence (includes stop codon) |
|---|
| 366 | conflict the "same" sequence reported by different laboratories |
|---|
| 367 | differ at this site or region |
|---|
| 368 | D-segment diversity segment of immunoglobulin heavy chain and |
|---|
| 369 | T-cell receptor beta-chain |
|---|
| 370 | enhancer cis-acting enhancer of eukaryotic promoter function |
|---|
| 371 | exon region that codes for part of spliced mRNA |
|---|
| 372 | GC_signal eukaryotic promoter element; consensus=GGGCGG |
|---|
| 373 | intron transcribed region excised by mRNA splicing |
|---|
| 374 | J_segment joining segment of immunoglobulin light and heavy chains, |
|---|
| 375 | T-cell receptor alpha, beta and gamma-chains |
|---|
| 376 | LTR long terminal repeat |
|---|
| 377 | mat_peptide mature peptide coding region (does not include stop codon) |
|---|
| 378 | or signal peptide |
|---|
| 379 | misc_feature region of biological interest which cannot be described |
|---|
| 380 | by any other known feature |
|---|
| 381 | mRNA messenger RNA |
|---|
| 382 | mutation a related strain has an abrupt, inheritable change in the |
|---|
| 383 | sequence |
|---|
| 384 | polyA_signal polyadenylation signal recognition region |
|---|
| 385 | polyA_site polyadenylation site to which adenine residues are added |
|---|
| 386 | primer_bind non-covalent primer binding site |
|---|
| 387 | promoter promoter region involved in transcription initiation |
|---|
| 388 | protein_bind non-covalent protein binding site on DNA or RNA |
|---|
| 389 | RBS ribosome binding site |
|---|
| 390 | rep_origin origin of replication |
|---|
| 391 | repeat_region region of genome containing repeating units |
|---|
| 392 | repeat_unit single repeat element |
|---|
| 393 | rRNA ribosomal RNA |
|---|
| 394 | S_region switch region of immunoglobulin heavy chains |
|---|
| 395 | satellite many tandem repeats of a short basic repeating unit |
|---|
| 396 | sig_peptide signal peptide coding region |
|---|
| 397 | stem_loop hair-pin loop structure in DNA or RNA |
|---|
| 398 | STS sequence tagged site |
|---|
| 399 | TATA_signal eukaryotic promoter element; consensus=TATA(A or T)A(A or T) |
|---|
| 400 | terminator transcription termination signal |
|---|
| 401 | transit_peptide transit peptide coding region |
|---|
| 402 | tRNA transfer RNA |
|---|
| 403 | V_region variable region of immunoglobulin light and heavy chains, |
|---|
| 404 | and T-cell receptor alpha, beta, and gamma chains |
|---|
| 405 | V_segment variable segment of immunoglobulin light and heavy chains, |
|---|
| 406 | and T-cell receptor alpha, beta, and gamma chains. |
|---|
| 407 | variation a related strain contains stable mutations from the same |
|---|
| 408 | gene (e.g., RFLPs, polymorphisms) |
|---|
| 409 | 3'UTR region at the 3' end of a mature transcript, following the |
|---|
| 410 | stop codon |
|---|
| 411 | 5'UTR region at the 5' end of a mature transcript, preceding the |
|---|
| 412 | initiation |
|---|
| 413 | -10_signal prokaryotic promoter element, consensus=TAtAaT |
|---|
| 414 | -35_signal prokaryotic promoter element, consensus=TTGACa or TGTTGACA |
|---|
| 415 | |
|---|
| 416 | (Last change: 08-DEC-1998) |
|---|
| 417 | (Wendy Baker, EMBL nucleotide sequence database curator) |
|---|
| 418 | |
|---|
| 419 | |
|---|
| 420 | |
|---|
| 421 | |
|---|
| 422 | |
|---|
| 423 | Agnes Leyen |
|---|
| 424 | EMBL Outstation - The European Bioinformatics Institute |
|---|
| 425 | Wellcome Trust Genome Campus |
|---|
| 426 | Cambridge CB10 1SD |
|---|
| 427 | UK |
|---|
| 428 | |
|---|
| 429 | |
|---|
| 430 | DATASUBMISSIONS: |
|---|
| 431 | +44 1223 494499 |
|---|
| 432 | datasubs@ebi.ac.uk |
|---|
| 433 | |
|---|
| 434 | UPDATES: |
|---|
| 435 | +44 1223 494499 |
|---|
| 436 | updates@ebi.ac.uk |
|---|
| 437 | |
|---|
| 438 | PERSONAL: |
|---|
| 439 | +44 1223 494411 |
|---|
| 440 | leyen@ebi.ac.uk |
|---|
| 441 | |
|---|