| 1 | # main topics: |
|---|
| 2 | UP arb.hlp |
|---|
| 3 | UP glossary.hlp |
|---|
| 4 | UP mg_names.hlp |
|---|
| 5 | UP rename.hlp |
|---|
| 6 | |
|---|
| 7 | # sub topics: |
|---|
| 8 | #SUB subtopic.hlp |
|---|
| 9 | |
|---|
| 10 | # format described in ../help.readme |
|---|
| 11 | |
|---|
| 12 | |
|---|
| 13 | TITLE ARB NAMESERVER / Synchronize IDs |
|---|
| 14 | |
|---|
| 15 | OCCURRENCE ARB_NT/Species/Synchronize IDs |
|---|
| 16 | ARB_MERGE/Check IDs/Synchronize |
|---|
| 17 | |
|---|
| 18 | It's also used by several functions that create new species (eg. after import). |
|---|
| 19 | |
|---|
| 20 | DESCRIPTION Automatically creates unique identifiers (=shortnames stored in field 'name') for |
|---|
| 21 | species entries in the database. |
|---|
| 22 | It is required for ARB that all species have different, unique IDs - otherwise |
|---|
| 23 | ARB will misbehave in many ways! |
|---|
| 24 | |
|---|
| 25 | The single species entries are normally distinguished and identified by their |
|---|
| 26 | accession numbers. |
|---|
| 27 | |
|---|
| 28 | The unique IDs are created using information from the 'full_name'. |
|---|
| 29 | |
|---|
| 30 | Usually, the first three letters are taken from the genus designation, |
|---|
| 31 | the remaining letters from the species name. |
|---|
| 32 | |
|---|
| 33 | These tasks (identification and ID-generation) are handled by the |
|---|
| 34 | so called NAMESERVER. |
|---|
| 35 | |
|---|
| 36 | If there are duplicated (ie. indistinguishable) species entries, |
|---|
| 37 | the different versions are indicated by appending a dot followed |
|---|
| 38 | by running numbers: e.g. "DicTherm.2", "DicTherm.3", ... |
|---|
| 39 | |
|---|
| 40 | NOTES The IDs are stored with the database. They are protected versus change |
|---|
| 41 | to avoid assigning the same ID to different species. |
|---|
| 42 | |
|---|
| 43 | Accession numbers (stored in the field 'acc') normally will be imported |
|---|
| 44 | from public databases together with the sequence data. |
|---|
| 45 | If no accession number has been found during import (eg. because the sequence |
|---|
| 46 | has not yet been published), ARB will automatically generate accession |
|---|
| 47 | numbers (="ARB_" followed by a CRC-32-checksum of the sequence data). |
|---|
| 48 | |
|---|
| 49 | SECTION Duplicate IDs |
|---|
| 50 | |
|---|
| 51 | "Synchronize IDs" will create duplicate names whenever it fails to distinguish |
|---|
| 52 | between two or more species. |
|---|
| 53 | If there is some warning about duplicate entries, you REALLY should try |
|---|
| 54 | to understand the reason why this happens! |
|---|
| 55 | |
|---|
| 56 | Following some situations where you will run into that problem and |
|---|
| 57 | instructions how to solve the problem: |
|---|
| 58 | |
|---|
| 59 | 1. you've imported multiple IDENTICAL sequences w/o accession number. The |
|---|
| 60 | accession numbers generated by ARB will be identical as well and |
|---|
| 61 | "Synchronize IDs" will complain about duplicate species. |
|---|
| 62 | |
|---|
| 63 | Consider to remove the duplicated species. Normally duplicated information |
|---|
| 64 | isn't very useful. If this is no option for you, you might as well manually |
|---|
| 65 | change the accession numbers of the duplicated species (if you understand |
|---|
| 66 | the implications). |
|---|
| 67 | |
|---|
| 68 | 2. you've imported several genes from one organism and each of them |
|---|
| 69 | was assigned the same accession number (the acc of the organism) |
|---|
| 70 | |
|---|
| 71 | Use an additional field to make your species entries distinguishable |
|---|
| 72 | (e.g. a field containing the start-position of each gene). |
|---|
| 73 | You may configure whether and which field to use together with |
|---|
| 74 | NAMESERVER (see LINK{namesadmin.hlp}). |
|---|
| 75 | |
|---|
| 76 | SECTION NAMESERVER |
|---|
| 77 | |
|---|
| 78 | The NAMESERVER stores the associations between the unique IDs and |
|---|
| 79 | species entries (represented by the accession number and optionally |
|---|
| 80 | an additional field) in the NAMESERVER-database. The standard nameserver |
|---|
| 81 | uses the file '$ARBHOME/lib/nas/names.dat' as its database. |
|---|
| 82 | |
|---|
| 83 | For more details refer to the active arb_tcp.dat (Tools/Nameserver admin/Configure arb_tcp.dat). |
|---|
| 84 | |
|---|
| 85 | If you have multiple database containing common species, synchronizing IDs for |
|---|
| 86 | all these databases will generate the identical IDs for |
|---|
| 87 | identical species (as long as you use the same NAMESERVER-database). |
|---|
| 88 | |
|---|
| 89 | SECTION Central NAMESERVER |
|---|
| 90 | |
|---|
| 91 | It is possible to link names.dat to a central names.dat, but you should |
|---|
| 92 | be aware, that there may occur temporary inconsistencies, if multiple users |
|---|
| 93 | use the NAMESERVER at the same time. |
|---|
| 94 | |
|---|
| 95 | The NAMESERVER examines names.dat and terminates within 5-10 seconds if |
|---|
| 96 | the file changes. A message is written to the console window in either case. |
|---|
| 97 | |
|---|
| 98 | Another way to use a central NAMESERVER is to edit '$ARBHOME/lib/arb_tcp.dat' |
|---|
| 99 | and to specify a central host for ARB_NAME_SERVER. |
|---|
| 100 | This completely avoids any inconsistencies, but if too many users try to access |
|---|
| 101 | that nameserver at the same time, you'll run into DOS problems. |
|---|
| 102 | |
|---|
| 103 | EXAMPLES None |
|---|
| 104 | |
|---|
| 105 | WARNINGS None |
|---|
| 106 | |
|---|
| 107 | BUGS No bugs known |
|---|