| 1 | # main topics: |
|---|
| 2 | UP arb.hlp |
|---|
| 3 | UP arb_ntree.hlp |
|---|
| 4 | UP arb_import.hlp |
|---|
| 5 | UP pfold.hlp |
|---|
| 6 | UP glossary.hlp |
|---|
| 7 | |
|---|
| 8 | # sub topics: |
|---|
| 9 | |
|---|
| 10 | # format described in ../help.readme |
|---|
| 11 | |
|---|
| 12 | |
|---|
| 13 | TITLE NOTES: dssp |
|---|
| 14 | |
|---|
| 15 | OCCURRENCE ARB_IMPORT |
|---|
| 16 | |
|---|
| 17 | DESCRIPTION |
|---|
| 18 | |
|---|
| 19 | See NOTES in LINK{arb_import.hlp} for HOWTO reactivate disabled import filters. |
|---|
| 20 | |
|---|
| 21 | The three filters 'dssp_all.ift', 'dssp_2nd_struct.ift' and 'dssp_sequence.ift' |
|---|
| 22 | import protein secondary structure information and/or amino acid sequences |
|---|
| 23 | from DSSP files. In addition, some of the associated information is extracted, |
|---|
| 24 | too. The following fields are created (see also example below): |
|---|
| 25 | - name: [PDB ID]_[Chain char] (extracted from 'HEADER' and the optional chain |
|---|
| 26 | character in 'RESIDUE') |
|---|
| 27 | - full_name: [PDB ID] (extracted from 'HEADER') Chain [Chain char] (extracted |
|---|
| 28 | from the optional chain character in 'RESIDUE'); [Description] (extracted |
|---|
| 29 | from 'HEADER' and 'COMPND') |
|---|
| 30 | - tax: [Organism description] (extracted from 'SOURCE') |
|---|
| 31 | - author: [Author(s)] (extracted from 'AUTHOR') |
|---|
| 32 | - date: [Date] (extracted from 'HEADER') |
|---|
| 33 | - remark: [Remark] (extracted from headline and 'REFERENCE') |
|---|
| 34 | - ali_[alignment name]/data: [Amino acid sequence or secondary structure] |
|---|
| 35 | (extracted from 'AA' or 'STRUCTURE') |
|---|
| 36 | - sec_struct: [Secondary structure] (extracted from 'STRUCTURE') |
|---|
| 37 | |
|---|
| 38 | SECTION The DSSP code |
|---|
| 39 | |
|---|
| 40 | - H = alpha helix |
|---|
| 41 | - B = residue in isolated beta-bridge |
|---|
| 42 | - E = extended strand, participates in beta ladder |
|---|
| 43 | - G = 3-helix (3/10 helix) |
|---|
| 44 | - I = 5-helix (pi helix) |
|---|
| 45 | - T = hydrogen bonded turn |
|---|
| 46 | - S = bend |
|---|
| 47 | |
|---|
| 48 | NOTES |
|---|
| 49 | |
|---|
| 50 | - If a protein consists of several chains these are extracted individually |
|---|
| 51 | and stored as different species. |
|---|
| 52 | - The filter 'dssp_2nd_struct.ift' fills 'ali_[alignment name]/data' with |
|---|
| 53 | the protein secondary structure and 'dssp_sequence.ift' as well as |
|---|
| 54 | 'dssp_all.ift' fill it with the amino acid sequence. |
|---|
| 55 | - The field 'sec_struct' is only used by the filter 'dssp_all.ift'. |
|---|
| 56 | - Gaps-characters ('-') are inserted where no secondary structure is present. |
|---|
| 57 | - The DSSP files are first piped through the script 'format_dssp.pl' |
|---|
| 58 | (in "$ARBHOME/ARB/PERL_SCRIPTS/ARBTOOLS/IFTHELP") to format the files |
|---|
| 59 | for use with the filters 'dssp_all2.ift2', 'dssp_2nd_struct2.ift2' and |
|---|
| 60 | 'dssp_sequence2.ift2'. |
|---|
| 61 | - Reference to DSSP can be found in LINK{pfold.hlp} in section |
|---|
| 62 | 'REFERENCES' [2]. |
|---|
| 63 | |
|---|
| 64 | EXAMPLES |
|---|
| 65 | |
|---|
| 66 | The DSSP format looks like this: |
|---|
| 67 | |
|---|
| 68 | # PREFORMATTED WIDTH 144 |
|---|
| 69 | ==== Secondary Structure Definition by the program DSSP, updated CMBI version by ElmK / April 1,2000 ==== DATE=27-JUN-2003 . |
|---|
| 70 | REFERENCE W. KABSCH AND C.SANDER, BIOPOLYMERS 22 (1983) 2577-2637 . |
|---|
| 71 | HEADER RNA BINDING PROTEIN 22-NOV-99 1DG1 . |
|---|
| 72 | COMPND 2 MOLECULE: ELONGATION FACTOR TU; . |
|---|
| 73 | SOURCE 2 ORGANISM_SCIENTIFIC: ESCHERICHIA COLI; . |
|---|
| 74 | AUTHOR K.ABEL,M.YODER,R.HILGENFELD,F.JURNAK . |
|---|
| 75 | ... |
|---|
| 76 | ... |
|---|
| 77 | # RESIDUE AA STRUCTURE BP1 BP2 ACC N-H-->O O-->H-N N-H-->O O-->H-N TCO KAPPA ALPHA PHI PSI X-CA Y-CA Z-CA |
|---|
| 78 | 1 9 G K 0 0 143 0, 0.0 65,-0.2 0, 0.0 64,-0.1 0.000 360.0 360.0 360.0 143.2 13.7 48.3 -15.2 |
|---|
| 79 | 2 10 G P - 0 0 38 0, 0.0 65,-2.6 0, 0.0 2,-0.6 -0.404 360.0-137.6 -64.4 148.4 12.2 51.7 -14.1 |
|---|
| 80 | 3 11 G H E -a 67 0A 88 63,-0.2 2,-0.3 191,-0.1 65,-0.2 -0.949 24.0-180.0-114.6 117.8 10.1 51.4 -10.9 |
|---|
| 81 | 4 12 G V E -a 68 0A 0 63,-2.0 65,-2.3 -2,-0.6 2,-0.5 -0.855 18.2-141.7-116.0 149.5 6.8 53.4 -10.8 |
|---|
| 82 | 5 13 G N E +a 69 0A 36 -2,-0.3 86,-2.5 63,-0.2 87,-1.2 -0.949 31.8 154.0-113.1 127.7 4.2 53.5 -8.0 |
|---|
| 83 | 6 14 G V E -ab 70 92A 0 63,-2.5 65,-2.0 -2,-0.5 2,-0.3 -0.820 16.5-171.5-139.8-179.7 0.5 53.6 -8.8 |
|---|
| 84 | 7 15 G G E -ab 71 93A 0 85,-0.5 87,-2.2 63,-0.3 2,-0.3 -0.969 28.1-103.8-167.9 164.5 -2.7 52.6 -7.2 |
|---|
| 85 | 8 16 G T E + b 0 94A 0 63,-1.9 65,-0.4 -2,-0.3 2,-0.3 -0.735 37.7 175.7 -97.6 147.4 -6.4 52.2 -7.9 |
|---|
| 86 | 9 17 G I E + b 0 95A 3 85,-1.9 87,-2.6 -2,-0.3 2,-0.2 -0.962 21.1 98.6-147.2 156.0 -8.8 54.9 -6.7 |
|---|
| 87 | 10 18 G G - 0 0 0 -2,-0.3 87,-0.1 85,-0.2 97,-0.1 -0.829 69.1 -41.6 148.3 177.9 -12.6 55.4 -7.1 |
|---|
| 88 | 11 19 G H S > S- 0 0 35 85,-0.4 3,-1.4 -2,-0.2 5,-0.3 -0.330 72.6 -81.6 -74.1 153.0 -15.9 54.9 -5.4 |
|---|
| 89 | 12 20 G V T 3 S+ 0 0 58 1,-0.2 -1,-0.1 2,-0.1 94,-0.1 -0.094 111.7 15.7 -52.8 150.0 -16.8 51.7 -3.5 |
|---|
| 90 | 13 21 G D T 3 S+ 0 0 145 1,-0.1 -1,-0.2 -3,-0.1 -2,-0.1 0.575 91.3 115.4 59.3 12.4 -17.9 48.6 -5.4 |
|---|
| 91 | 14 22 G H S < S- 0 0 4 -3,-1.4 -2,-0.1 82,-0.1 85,-0.1 0.789 92.2 -97.1 -80.7 -25.6 -16.7 50.1 -8.6 |
|---|
| 92 | 15 23 G G S > S+ 0 0 11 -4,-0.2 4,-2.5 81,-0.1 5,-0.2 0.577 75.0 138.4 123.2 16.6 -14.0 47.5 -9.1 |
|---|
| 93 | 16 24 G K H > S+ 0 0 12 -5,-0.3 4,-2.7 1,-0.2 5,-0.1 0.931 82.3 40.1 -55.3 -48.3 -10.7 48.7 -7.8 |
|---|
| 94 | 17 25 G T H > S+ 0 0 17 2,-0.2 4,-2.3 1,-0.2 -1,-0.2 0.887 114.0 51.1 -70.8 -40.4 -9.8 45.4 -6.2 |
|---|
| 95 | 18 26 G T H > S+ 0 0 30 2,-0.2 4,-2.4 1,-0.2 -1,-0.2 0.899 113.8 47.6 -64.1 -39.4 -11.1 43.1 -9.0 |
|---|
| 96 | 19 27 G L H X S+ 0 0 0 -4,-2.5 4,-2.6 2,-0.2 -2,-0.2 0.955 107.8 53.8 -66.2 -49.6 -9.1 |
|---|
| 97 | ... |
|---|
| 98 | ... |
|---|
| 99 | # PREFORMATTED RESET |
|---|
| 100 | |
|---|
| 101 | The extracted ARB database entry looks like this (for alignment with the |
|---|
| 102 | name 'ali_prot' and imported with 'dssp_all.ift'): |
|---|
| 103 | |
|---|
| 104 | # PREFORMATTED WIDTH 149 |
|---|
| 105 | name S6: 1DG1_G |
|---|
| 106 | full_name S0: 1DG1 Chain G; RNA BINDING PROTEIN; MOLECULE: ELONGATION FACTOR TU |
|---|
| 107 | tax S0: ORGANISM_SCIENTIFIC: ESCHERICHIA COLI |
|---|
| 108 | author S0: K.ABEL,M.YODER,R.HILGENFELD,F.JURNAK |
|---|
| 109 | date S0: 22-NOV-99 |
|---|
| 110 | ali_prot %0: |
|---|
| 111 | ali_prot/data S0: KPHVNVGTIGHVDHGKTTL... |
|---|
| 112 | sec_struct S0: --EEEEEEE-STTSSHHHH... |
|---|
| 113 | remark S0: === Secondary Structure Definition by the program DSSP, updated CMBI version by ElmK / April 1,2000 ==== DATE=22-FEB-2008 |
|---|
| 114 | DSSP program by: W. KABSCH AND C.SANDER, BIOPOLYMERS 22 (1983) 2577-2637 |
|---|
| 115 | ... |
|---|
| 116 | ... |
|---|
| 117 | # PREFORMATTED RESET |
|---|
| 118 | |
|---|
| 119 | WARNINGS None |
|---|
| 120 | |
|---|
| 121 | BUGS No bugs known |
|---|