| 1 | # main topics: |
|---|
| 2 | UP arb.hlp |
|---|
| 3 | UP glossary.hlp |
|---|
| 4 | |
|---|
| 5 | # sub topics: |
|---|
| 6 | #SUB subtopic.hlp |
|---|
| 7 | |
|---|
| 8 | # format described in ../help.readme |
|---|
| 9 | |
|---|
| 10 | |
|---|
| 11 | TITLE ARB: Database |
|---|
| 12 | |
|---|
| 13 | OCCURRENCE ARB_NT |
|---|
| 14 | |
|---|
| 15 | DESCRIPTION |
|---|
| 16 | A central database of sequences and |
|---|
| 17 | additional information (taken from public databases or supplied |
|---|
| 18 | by the user) is stored in a binary or ASCII file (*.arb). |
|---|
| 19 | ( and in future releases archive and delta files). |
|---|
| 20 | The database reader auto-detects binary or ASCII mode. |
|---|
| 21 | Brief advantages of the different file types: |
|---|
| 22 | |
|---|
| 23 | binary with fast load file: |
|---|
| 24 | |
|---|
| 25 | (+) very fast |
|---|
| 26 | (+) runs on slow and old computers |
|---|
| 27 | (-) needs a lot of harddisc space |
|---|
| 28 | => for normal operation on old machines |
|---|
| 29 | |
|---|
| 30 | binary: |
|---|
| 31 | |
|---|
| 32 | (+) very fast |
|---|
| 33 | (+) small (compression rate: 60%-95%) |
|---|
| 34 | => for normal operation |
|---|
| 35 | |
|---|
| 36 | ASCII: |
|---|
| 37 | |
|---|
| 38 | (+) editable by standard text editors |
|---|
| 39 | (+) information can be extracted by hand |
|---|
| 40 | (-) needs an extreme amount of harddisc space |
|---|
| 41 | => to check and correct a database |
|---|
| 42 | |
|---|
| 43 | |
|---|
| 44 | All ARB tools for database handling and most of the ARB tools |
|---|
| 45 | for data analysis act directly upon the database. The database |
|---|
| 46 | is kept consistent at any time. Any local modifications by |
|---|
| 47 | individual ARB tools are immediately exported to the database |
|---|
| 48 | and all other active tools. |
|---|
| 49 | |
|---|
| 50 | The database is stored in-memory only until you |
|---|
| 51 | LINK{save.hlp}. |
|---|
| 52 | |
|---|
| 53 | NOTES ASCII format |
|---|
| 54 | |
|---|
| 55 | DATA FORMAT |
|---|
| 56 | |
|---|
| 57 | [xxx] means xxx is optional |
|---|
| 58 | [xxx]* means xxx is optional and can occur many times |
|---|
| 59 | xxx|yyy means xxx or yyy |
|---|
| 60 | // means comment |
|---|
| 61 | |
|---|
| 62 | ARBDB HIERARCHY |
|---|
| 63 | |
|---|
| 64 | ARB DB is a hierarchical database system, so here's a short description |
|---|
| 65 | of the hierarchy: |
|---|
| 66 | |
|---|
| 67 | ARBDB ::= species_data // container containing all species |
|---|
| 68 | presets // global alignment and field information |
|---|
| 69 | [extended_data] // all SAIs |
|---|
| 70 | [tmp] // temporary data |
|---|
| 71 | [tree_data] // all trees |
|---|
| 72 | ... // user defined entries (programmers) |
|---|
| 73 | |
|---|
| 74 | species_data::= [species]* |
|---|
| 75 | |
|---|
| 76 | extended_data::= [extended]* |
|---|
| 77 | |
|---|
| 78 | gene_data::= [gene]* // container for genes (species local) |
|---|
| 79 | |
|---|
| 80 | |
|---|
| 81 | species::= 'name' // species identifier |
|---|
| 82 | ['full_name'] |
|---|
| 83 | ... // (end) user defined fields |
|---|
| 84 | [ali_xxx] // the alignment container(s) |
|---|
| 85 | [gene_data] // container containing genes |
|---|
| 86 | |
|---|
| 87 | extended::= // analogous to species |
|---|
| 88 | |
|---|
| 89 | gene::= // analogous to species |
|---|
| 90 | |
|---|
| 91 | ali_xxx::= 'data' // the sequence |
|---|
| 92 | ... // additional sequence information |
|---|
| 93 | |
|---|
| 94 | presets::= 'use' // default alignment |
|---|
| 95 | [alignment]* |
|---|
| 96 | [key_data] // description of the user defined keys |
|---|
| 97 | |
|---|
| 98 | alignment::= 'alignment_name' // name of the alignment (prefix 'ali_') |
|---|
| 99 | 'alignment_len' // length of longest sequence |
|---|
| 100 | 'alignment_write_security' // default write security |
|---|
| 101 | 'alignment_type' // dna or pro |
|---|
| 102 | 'aligned' // ==1 when all sequences have the same |
|---|
| 103 | // length else 0 |
|---|
| 104 | key_data::= [key]* |
|---|
| 105 | |
|---|
| 106 | key::= 'key_name' // name of an user defined field |
|---|
| 107 | 'key_type' // type (12=string 3=int) |
|---|
| 108 | |
|---|
| 109 | ******************************************* |
|---|
| 110 | *************** ASCII BASIC ************** |
|---|
| 111 | ******************************************* |
|---|
| 112 | |
|---|
| 113 | Note: |
|---|
| 114 | |
|---|
| 115 | - /* xxx */ is used for comments and not read |
|---|
| 116 | |
|---|
| 117 | - I use a grammar to describe the dataformat. |
|---|
| 118 | All terminal symbols are surrounded by "'". |
|---|
| 119 | |
|---|
| 120 | ASCII::= ['/*ARBDB ASCII*/'] |
|---|
| 121 | [FIELD]* |
|---|
| 122 | |
|---|
| 123 | FIELD::= KEY [PROTECTION] [TYPE] VALUE |
|---|
| 124 | | |
|---|
| 125 | KEY [PROTECTION] '%%' (% |
|---|
| 126 | [FIELD]* |
|---|
| 127 | %) /* Comment */ |
|---|
| 128 | |
|---|
| 129 | |
|---|
| 130 | KEY::= 'Any string of a-z|A-Z|0-9|"_"' |
|---|
| 131 | |KEY| > 2 < 256 |
|---|
| 132 | |
|---|
| 133 | PROTECTION::= ':''delete protection level''write p.l.''00' |
|---|
| 134 | // 00 are reserved for future use |
|---|
| 135 | |
|---|
| 136 | TYPE::= '%s' // STRING |
|---|
| 137 | '%i' // INTEGER |
|---|
| 138 | '%f' // FLOAT |
|---|
| 139 | '%N' // BYTES |
|---|
| 140 | '%I' // BITS |
|---|
| 141 | '%F' // FLOATS |
|---|
| 142 | |
|---|
| 143 | |
|---|
| 144 | VALUE::= '"string"' | '"^Astring^A"' | 'string' //type = STRING |
|---|
| 145 | | 'int_number' //type = INT |
|---|
| 146 | | 'real_number' //type = FLOAT |
|---|
| 147 | | 'coded bytestring' //type = BYTES,FLOATS, |
|---|
| 148 | // BITS |
|---|
| 149 | |
|---|
| 150 | |
|---|
| 151 | EXAMPLES None |
|---|
| 152 | |
|---|
| 153 | ******************************************* |
|---|
| 154 | ************** ASCII EXAMPLE ************* |
|---|
| 155 | ******************************************* |
|---|
| 156 | |
|---|
| 157 | # PREFORMATTED WIDTH 110 |
|---|
| 158 | /*ARBDB ASCII*/ |
|---|
| 159 | species_data %% (% |
|---|
| 160 | species :5000 %% (% |
|---|
| 161 | name :7600 "EscCol10" |
|---|
| 162 | file "ecrna3.empro" |
|---|
| 163 | full_name "Escherichia coli" |
|---|
| 164 | acc "V00331;" |
|---|
| 165 | ali_23all :5000 %% (% |
|---|
| 166 | data :7500 "...........ACGTUUU........... |
|---|
| 167 | mark %I "---------------++++--------- |
|---|
| 168 | %) /*ali_23all*/ |
|---|
| 169 | |
|---|
| 170 | species :5000 %% (% |
|---|
| 171 | name :7600 "EscCol11" |
|---|
| 172 | file "ecrr23s.empro" |
|---|
| 173 | full_name "Escherichia coli" |
|---|
| 174 | ali_23all :5000 %% (% |
|---|
| 175 | data :7500 "...........ACGTUUUGGG....... |
|---|
| 176 | mark %I "---------------++++--------- |
|---|
| 177 | %) /*ali_23all*/ |
|---|
| 178 | %) /*species*/ |
|---|
| 179 | %) /*species_data*/ |
|---|
| 180 | presets %% (% |
|---|
| 181 | use "ali_23all" |
|---|
| 182 | max_alignment_len %i 2000 |
|---|
| 183 | alignment_len %i 0 |
|---|
| 184 | max_name_len %i 9 |
|---|
| 185 | alignment %% (% |
|---|
| 186 | alignment_name "ali_23all" |
|---|
| 187 | alignment_len %i 4205 |
|---|
| 188 | aligned %i 1 |
|---|
| 189 | alignment_write_security %i 5 |
|---|
| 190 | alignment_type "rna" |
|---|
| 191 | %) /*alignment*/ |
|---|
| 192 | key_data %% (% |
|---|
| 193 | key %% (% |
|---|
| 194 | key_name "name" |
|---|
| 195 | key_type %i 12 |
|---|
| 196 | %) /*key*/ |
|---|
| 197 | key %% (% |
|---|
| 198 | key_name "group_name" |
|---|
| 199 | key_type %i 12 |
|---|
| 200 | %) /*key*/ |
|---|
| 201 | key %% (% |
|---|
| 202 | key_name "acc" |
|---|
| 203 | key_type %i 12 |
|---|
| 204 | %) /*key*/ |
|---|
| 205 | key %% (% |
|---|
| 206 | key_name "ali_23all/data" |
|---|
| 207 | key_type %i 12 |
|---|
| 208 | %) /*key*/ |
|---|
| 209 | key %% (% |
|---|
| 210 | key_name "ali_23all/mark" |
|---|
| 211 | key_type %i 6 |
|---|
| 212 | %) /*key*/ |
|---|
| 213 | key %% (% |
|---|
| 214 | key_name "aligned" |
|---|
| 215 | key_type %i 12 |
|---|
| 216 | %) /*key*/ |
|---|
| 217 | key %% (% |
|---|
| 218 | key_name "author" |
|---|
| 219 | key_type %i 12 |
|---|
| 220 | %) /*key*/ |
|---|
| 221 | %) /*key_data*/ |
|---|
| 222 | %) /*presets*/ |
|---|
| 223 | tree_data %% (% |
|---|
| 224 | tree_main :4400 %% (% |
|---|
| 225 | nnodes %i 2 |
|---|
| 226 | tree "N0.014808,0.015168;N0.000360,0.000360;LEscCol10^ALEscColi^ALEscCol11^A" |
|---|
| 227 | ruler %% (% |
|---|
| 228 | size %f 0.100000 |
|---|
| 229 | RADIAL %% (% |
|---|
| 230 | ruler_y %f 0.341577 |
|---|
| 231 | ruler_x %f 0.000000 |
|---|
| 232 | %) /*RADIAL*/ |
|---|
| 233 | text_x %f 0.000000 |
|---|
| 234 | text_y %f 0.000000 |
|---|
| 235 | ruler_width %i 0 |
|---|
| 236 | LIST %% (% |
|---|
| 237 | ruler_y %f 0.000000 |
|---|
| 238 | ruler_x %f 0.000000 |
|---|
| 239 | %) /*LIST*/ |
|---|
| 240 | %) /*ruler*/ |
|---|
| 241 | %) /*tree_main*/ |
|---|
| 242 | %) /*tree_data*/ |
|---|
| 243 | extended_data :7000 %% (% |
|---|
| 244 | extended %% (% |
|---|
| 245 | name "HELIX_PAIRS" |
|---|
| 246 | ali_23all %% (% |
|---|
| 247 | data "............................1a.. |
|---|
| 248 | %) /*ali_23all*/ |
|---|
| 249 | %) /*extended*/ |
|---|
| 250 | extended %% (% |
|---|
| 251 | name "gpl5rr" |
|---|
| 252 | ali_23all %% (% |
|---|
| 253 | phyl_options %N 10000106D02:0C03.0D02-07.87.DB6 |
|---|
| 254 | bits %I "-----------------------+++++++++-+-+++ |
|---|
| 255 | floats %F 10000106D04:0A.C816.425C03.5D.802F.BF03 |
|---|
| 256 | %) /*ali_23all*/ |
|---|
| 257 | %) /*extended*/ |
|---|
| 258 | %) /*extended_data*/ |
|---|
| 259 | tmp %% (% |
|---|
| 260 | focus %% (% |
|---|
| 261 | species_name "EscColi" |
|---|
| 262 | cursor_position %i 323 |
|---|
| 263 | %) /*focus*/ |
|---|
| 264 | message "" |
|---|
| 265 | %) /*tmp*/ |
|---|
| 266 | # PREFORMATTED RESET |
|---|
| 267 | |
|---|
| 268 | |
|---|
| 269 | WARNINGS The ASCII version of arb needs a lot of virtual memory when |
|---|
| 270 | loaded. |
|---|
| 271 | |
|---|