Opened 7 years ago

Closed 7 years ago

Last modified 15 months ago

#702 closed defect (fixed)

format sequences is too slow

Reported by: epruesse Owned by: westram
Priority: normal Milestone: arb7.0
Component: Library (other) Version: arb-6.0
Keywords: Cc:

Description (last modified by westram)

Formatting a large database after importing a few sequences takes very long (minutes) and is usually unnecessary.

Currently, after import, all sequences are formatted always.

It should suffice to check with GBT_check_alignment whether the alignment was ok before import, and if the imported sequences are not longer than the alignment, only format the imported sequences.

Change History (7)

comment:1 Changed 7 years ago by westram

  • Owner changed from devel to westram
  • Status changed from new to accepted

comment:2 in reply to: ↑ description Changed 7 years ago by westram

  • Description modified (diff)
  • Status changed from accepted to _started

Replying to epruesse:

Formatting a large database after importing a few sequences takes very long (minutes) and is usually unnecessary.

Confirmed.

Currently, after import, all sequences are formatted always.

No. Only sequences with wrong length get formatted, but all sequences are read from DB (i.e. get decompressed), where it would be enough to check their length.

Proposed solution: perform lazy read

comment:3 Changed 7 years ago by westram

sketched class LazyAliData:

  • not able to do lazy-load (yet)
  • use it in EditedTerminal
    • manually formatting alignment (from ali-admin) in SSURef_NR99_119_SILVA_14_07_14_opt.arb (~0.5 MSpec)
      • passes in ~1sec (DEBUG)
      • needed ~1min before (NDEBUG)
      • does not wake-up from lazy state (in this case)

comment:4 Changed 7 years ago by westram

  • Component changed from No idea to Library (other)
  • Milestone set to arb6.1
  • Summary changed from format only new sequences after import to format sequences is too slow
  • Type changed from enhancement to defect
  • Version changed from SVN to arb-6.0

Bug history:

inside branch 'sai':

  • new way to format alignment was basically introduced with [9508] in branch 'sai'.
    (but everything was still working fine then)
  • ok in [9590]; broken by [9591]
    • with this patch AliEditCommand lost its ability to decide whether it might modify the data
    • in chunk applied at lines 1100ff of this patch, EditedTerminal will be instanciated unconditionally (i.e. sequence will always be read from db)

comment:5 Changed 7 years ago by westram

  • Resolution set to fixed
  • Status changed from _started to closed

by [14841]

comment:6 Changed 7 years ago by epruesse

Fix confirmed, works nicely. :)

(Many thanks for rapid fix!)

comment:7 Changed 15 months ago by westram

  • Milestone changed from arb6.1 to arb7.0

Milestone renamed

Note: See TracTickets for help on using tickets.