Opened 16 years ago

Closed 15 years ago

#139 closed defect (wontfix)

genbank_multi.ift

Reported by: guest Owned by: devel
Priority: normal Milestone:
Component: no idea Version: release_20071207
Keywords: import filter Cc:

Description

I've met a problem when I tried to import the following gb file into arb. Only if I remove the part of "sig_peptide" before "ORIGIN" it works fine. Maybe it's a new field in Genbank and it's needed to update the import filter?

Best, Yan

LOCUS       NC_010511                759 bp    DNA     linear   BCT 29-JUL-2008
DEFINITION  Methylobacterium sp. 4-46, complete genome.
ACCESSION   NC_010511 REGION: 637214..637972
VERSION     NC_010511.1  GI:170738367
PROJECT     GenomeProject:18809
KEYWORDS    .
SOURCE      Methylobacterium sp. 4-46
  ORGANISM  Methylobacterium sp. 4-46
            Bacteria; Proteobacteria; Alphaproteobacteria; Rhizobiales;
            Methylobacteriaceae; Methylobacterium.
REFERENCE   1  (bases 1 to 759)
  AUTHORS   Copeland,A., Lucas,S., Lapidus,A., Glavina del Rio,T., Dalin,E.,
            Tice,H., Bruce,D., Goodwin,L., Pitluck,S., Chertkov,O., Brettin,T.,
            Detter,J.C., Han,C., Kuske,C.R., Schmutz,J., Larimer,F., Land,M.,
            Hauser,L., Kyrpides,N., Ivanova,N., Marx,C.J. and Richardson,P.
  CONSRTM   US DOE Joint Genome Institute
  TITLE     Complete sequence of chromosome of Methylobacterium sp. 4-46
  JOURNAL   Unpublished
REFERENCE   2  (bases 1 to 759)
  CONSRTM   NCBI Genome Project
  TITLE     Direct Submission
  JOURNAL   Submitted (24-MAR-2008) National Center for Biotechnology
            Information, NIH, Bethesda, MD 20894, USA
REFERENCE   3  (bases 1 to 759)
  AUTHORS   Copeland,A., Lucas,S., Lapidus,A., Glavina del Rio,T., Dalin,E.,
            Tice,H., Bruce,D., Goodwin,L., Pitluck,S., Chertkov,O., Brettin,T.,
            Detter,J.C., Han,C., Kuske,C.R., Schmutz,J., Larimer,F., Land,M.,
            Hauser,L., Kyrpides,N., Ivanova,N., Marx,C.J. and Richardson,P.
  CONSRTM   US DOE Joint Genome Institute
  TITLE     Direct Submission
  JOURNAL   Submitted (12-FEB-2008) US DOE Joint Genome Institute, 2800
            Mitchell Drive B100, Walnut Creek, CA 94598-1698, USA
COMMENT     PROVISIONAL REFSEQ: This record has not yet been subject to final
            NCBI review. The reference sequence was derived from CP000943.
            URL -- http://www.jgi.doe.gov
            JGI Project ID:  4003784
            Source DNA and bacteria available from Christopher J. Marx
            (cmarx@oeb.harvard.edu)
            Contacts: Christopher J. Marx (cmarx@oeb.harvard.edu)
                      Paul Richardson (microbes@cuba.jgi-psf.org)
            Quality assurance done by JGI-Stanford
            Annotation done by JGI-ORNL and JGI-PGF
            Finishing done by JGI-LANL
            Finished microbial genomes have been curated to close all gaps with
            greater than 98% coverage of at least two independent clones. Each
            base pair has a minimum q (quality) value of 30 and the total error
            rate is less than one per 50000.
            The JGI and collaborators endorse the principles for the
            distribution and use of large scale sequencing data adopted by the
            larger genome sequencing community and urge users of this data to
            follow them. it is our intention to publish the work of this
            project in a timely fashion and we welcome collaborative
            interaction on the project and analysis.
            (http://www.genome.gov/page.cfm?pageID=10506376).
            COMPLETENESS: full length.
FEATURES             Location/Qualifiers
     source          1..759
                     /organism="Methylobacterium sp. 4-46"
                     /mol_type="genomic DNA"
                     /strain="4-46"
                     /db_xref="taxon:426117"
     gene            complement(1..759)
                     /locus_tag="M446_0537"
                     /db_xref="GeneID:6135235"
     CDS             complement(1..759)
                     /locus_tag="M446_0537"
                     /note="PFAM: rhodopsin;
                     KEGG: rxy:Rxyl_2037 rhodopsin"
                     /codon_start=1
                     /transl_table=11
                     /product="rhodopsin"
                     /protein_id="YP_001767533.1"
                     /db_xref="GI:170738878"
                     /db_xref="InterPro:IPR000425"
                     /db_xref="InterPro:IPR001425"
                     /db_xref="GeneID:6135235"
                     /translation="MTVQTWLWLTLFAMSLGAAAILFTAKRRTPEEETDGILHGIVPL
                     IAAASYLAMACGQGAIRLPLGADPAAQWDFYFARYIDWTFTTPILLYALATDAMHSGM
                     RRHGAVFGMLAADVLMIATALFFGASATAWIKWTWYAVSCGAFLGVYYVIWVPLLEES
                     RREREDVRAAFRRNAAFLSVVWLIYPLVLIVGTDGLKLVSPVLTTALIAVLDVVAKVV
                     FGLMAVGERARIVDRDLHETRPVRRSAPSLAPAE"
     sig_peptide     complement(697..759)
                     /locus_tag="M446_0537"
                     /note="Signal predicted by SignalP 3.0 HMM (Signal peptide
                     probability 0.944) with cleavage site probability 0.699 at
                     residue 21"
ORIGIN      
        1 ctactcggcc ggcgcgaggc tcggcgcgga gcgccggacg ggccgggtct cgtggaggtc
       61 gcggtccacg atcctcgccc gctcgccgac cgccatgaga ccgaacacca ccttggcgac
      121 gacgtcgagg acggcgatca gcgccgtggt gaggaccggg ctcacgagct tcagcccgtc
      181 ggtcccgacg atgagcacga gggggtagat gagccagacc accgacagga aggccgcgtt
      241 gcgccggaac gcggcccgca cgtcctcgcg ctccctgcga ctctcctcga gcagcggcac
      301 ccagatcacg tagtagacgc cgagaaaggc gccgcaggag acggcgtacc aggtccactt
      361 gatccacgcc gtcgccgagg cgccgaagaa cagggcggtc gcgatcatca gcacgtcggc
      421 ggcgagcatg ccgaacacgg cgccgtgccg gcgcatgccg gaatgcatcg cgtccgtcgc
      481 gagggcgtag agcaggatgg gggtggtgaa ggtccagtcg atgtagcgcg cgaagtagaa
      541 gtcccactgc gccgcgggat ccgccccgag cggaagccgg atcgcgccct gaccgcacgc
      601 catcgccagg tacgacgcgg ccgcgatgag ggggacgatc ccgtgcagga tgccgtccgt
      661 ctcctcttcc ggagttcggc gcttggctgt gaagagaatg gccgctgcgc cgagggacat
      721 cgcgaacagg gtcaaccaca gccaagtctg cacggtcat
//

Change History (2)

comment:1 Changed 16 years ago by eissler

the import of this file works using svn-version 5962 on Ubuntu 9.04 (32 and 64 bit)

maybe already fixed by [5937] ?

comment:2 Changed 15 years ago by meierh

  • Resolution set to wontfix
  • Status changed from new to closed

the import works in principle using svn-version 6016 on OpenSuSE 11.1_64, although not all information is imported.

But I think, genbank_multi.ift is the wrong filter for this file. genbank_multi was made for importing flatfiles containing multiple genbank entries (with one Feature --> mainly CDS each) in order to produce multiple ARBDB-entries.

For this file the correct import routine would be "Import genome data"! (Or a very special "import filter would have to be coded").

Note: See TracTickets for help on using tickets.