1 | #Please insert up references in the next lines (line starts with keyword UP) |
---|
2 | UP arb.hlp |
---|
3 | UP glossary.hlp |
---|
4 | |
---|
5 | #SUB subtopic.hlp |
---|
6 | |
---|
7 | |
---|
8 | TITLE Optimize database compression |
---|
9 | |
---|
10 | OCCURRENCE ARB_NT |
---|
11 | |
---|
12 | DESCRIPTION Sequence data normally need's a lot of memory. To be able to |
---|
13 | handle thousands of sequences we implemented an online |
---|
14 | compression. All data is compressed most of the time and only |
---|
15 | uncompressed on demand. As a user you only find smaller database |
---|
16 | files, that's all. |
---|
17 | Without understanding the data, the program can compress data only |
---|
18 | by a limited factor. With the help of a tree aligned sequences |
---|
19 | can be compressed much better by storing only the differences |
---|
20 | to a consensus sequence. |
---|
21 | Once a sequence is compressed using a tree, it will keep |
---|
22 | the good compression method until it is changed. Then only the |
---|
23 | older method is used. |
---|
24 | As long as you change only a few (up to 100) sequences, the |
---|
25 | database won't grow very much. |
---|
26 | |
---|
27 | To compress the entire database, the program needs a tree, |
---|
28 | which should cover most of the sequences. The larger and better |
---|
29 | the tree, the better the compression. |
---|
30 | |
---|
31 | EXAMPLE 10000 aligned 16s sequences need 50 mega-bytes of memory. |
---|
32 | Without your help ARB will reduce them to 10 mega-bytes, |
---|
33 | and given a tree not more than 2 mega-bytes will be needed. |
---|
34 | |
---|
35 | |
---|
36 | NOTES Any major database update, especially inserting or deleting |
---|
37 | gaps in an alignment, should be followed by a new optimization |
---|
38 | step. |
---|
39 | |
---|
40 | BUGS No bugs known |
---|