| 1 | # main topics: |
|---|
| 2 | UP arb.hlp |
|---|
| 3 | UP glossary.hlp |
|---|
| 4 | |
|---|
| 5 | #SUB subtopic.hlp |
|---|
| 6 | |
|---|
| 7 | |
|---|
| 8 | TITLE Optimize database compression |
|---|
| 9 | |
|---|
| 10 | OCCURRENCE ARB/File/Optimize database compression |
|---|
| 11 | |
|---|
| 12 | DESCRIPTION Sequence data normally need's a lot of memory. To be able to |
|---|
| 13 | handle thousands of sequences we implemented an online |
|---|
| 14 | compression. All data is compressed most of the time and only |
|---|
| 15 | uncompressed on demand. As a user you only find smaller database |
|---|
| 16 | files, that's all. |
|---|
| 17 | Without understanding the data, the program can compress data only |
|---|
| 18 | by a limited factor. With the help of a tree aligned sequences |
|---|
| 19 | can be compressed much better by storing only the differences |
|---|
| 20 | to a consensus sequence. |
|---|
| 21 | Once a sequence is compressed using a tree, it will keep |
|---|
| 22 | the good compression method until it is changed. Then only the |
|---|
| 23 | older method is used. |
|---|
| 24 | As long as you change only a few (up to 100) sequences, the |
|---|
| 25 | database won't grow very much. |
|---|
| 26 | |
|---|
| 27 | To compress the entire database, the program needs a tree, |
|---|
| 28 | which should cover most of the sequences. The larger and better |
|---|
| 29 | the tree, the better the compression. |
|---|
| 30 | |
|---|
| 31 | EXAMPLE 10000 aligned 16s sequences need 50 mega-bytes of memory. |
|---|
| 32 | Without your help ARB will reduce them to 10 mega-bytes, |
|---|
| 33 | and given a tree not more than 2 mega-bytes will be needed. |
|---|
| 34 | |
|---|
| 35 | NOTES Any major database update, especially inserting or deleting |
|---|
| 36 | gaps in an alignment, should be followed by a new optimization |
|---|
| 37 | step. |
|---|
| 38 | |
|---|
| 39 | BUGS No bugs known |
|---|