#702 closed defect (fixed)
format sequences is too slow
Reported by: | epruesse | Owned by: | westram |
---|---|---|---|
Priority: | normal | Milestone: | arb7.0 |
Component: | Library (other) | Version: | arb-6.0 |
Keywords: | Cc: |
Description (last modified by westram)
Formatting a large database after importing a few sequences takes very long (minutes) and is usually unnecessary.
Currently, after import, all sequences are formatted always.
It should suffice to check with GBT_check_alignment whether the alignment was ok before import, and if the imported sequences are not longer than the alignment, only format the imported sequences.
Change History (7)
comment:1 Changed 8 years ago by westram
- Owner changed from devel to westram
- Status changed from new to accepted
comment:2 in reply to: ↑ description Changed 8 years ago by westram
- Description modified (diff)
- Status changed from accepted to _started
comment:3 Changed 8 years ago by westram
sketched class LazyAliData:
- not able to do lazy-load (yet)
- use it in EditedTerminal ⇒
- manually formatting alignment (from ali-admin) in SSURef_NR99_119_SILVA_14_07_14_opt.arb (~0.5 MSpec)
- passes in ~1sec (DEBUG)
- needed ~1min before (NDEBUG)
- does not wake-up from lazy state (in this case)
- manually formatting alignment (from ali-admin) in SSURef_NR99_119_SILVA_14_07_14_opt.arb (~0.5 MSpec)
comment:4 Changed 8 years ago by westram
- Component changed from No idea to Library (other)
- Milestone set to arb6.1
- Summary changed from format only new sequences after import to format sequences is too slow
- Type changed from enhancement to defect
- Version changed from SVN to arb-6.0
Bug history:
inside branch 'sai':
- new way to format alignment was basically introduced with [9508] in branch 'sai'.
(but everything was still working fine then) - ok in [9590]; broken by [9591]
- with this patch AliEditCommand lost its ability to decide whether it might modify the data
- in chunk applied at lines 1100ff of this patch, EditedTerminal will be instanciated unconditionally (i.e. sequence will always be read from db)
comment:5 Changed 8 years ago by westram
- Resolution set to fixed
- Status changed from _started to closed
by [14841]
comment:6 Changed 8 years ago by epruesse
Fix confirmed, works nicely.
(Many thanks for rapid fix!)
Note: See
TracTickets for help on using
tickets.
Replying to epruesse:
Confirmed.
No. Only sequences with wrong length get formatted, but all sequences are read from (i.e. get decompressed), where it would be enough to check their length.
Proposed solution: perform lazy read