#663 closed defect (fixed)
consensus calculation is defect/inconsistent
Reported by: | westram | Owned by: | westram |
---|---|---|---|
Priority: | major | Milestone: | arb7.0 |
Component: | Library (other) | Version: | SVN |
Keywords: | Cc: |
Description (last modified by westram)
Arb has two ways to calculate consensi:
- NTREE/SAI/Create SAI using …/Consensus
- Consensus displayed in EDIT4
Code is not shared, because EDIT4 consensus-calculation is mixed with alignment-compression (i.e. hiding gap-columns) in EDIT4 for better performance.
Results of both differ!
Reproduce:
- load attached
- start EDIT4 with config 'consensus' (contains all 41 marked species)
- compare:
- SAI:consensus (created with method 1.)
- Consensus of group 'all' (method 2.)
- ConsAll2 (same; converted to species; use with 'View/Show? only differences to selected')
Consensus-settings used: gaps=off; iupac=on,26%; upper/lower=0/75
Current differences in detail:
Pos | SAI | Group | Col.content | Reason |
12 | M | Y | 1x'Y' | 'Y' is correct |
Current differences with gaps=on:
Pos | SAI | Group | Col.content | Reason |
115 | g | s | 14x'G'+10x'C'+3x'U' | C(37%)+G(51%)=S; 24/41=58%⇒lowercase |
120 | a | w | 14x'A'+3x'C'+6x'G'+9x'U' | A(43%)+U(28%)=W; 23/41=56%⇒lowercase |
153 | G | g | 23x'G' | 23/41=56%⇒lowercase |
159 | R | r | 13x'G'+8x'A'+1x'U'+1x'C' | G(56%)+A(34%)=R; 21/41=51%⇒lowercase |
More facts:
- if consensus SAI is calculated with 'gaps=on'
- the differences listed above disappear (i.e. EDIT4 group consensus seems to use gaps=on implicitely
- starting with pos=154 several positions are uppercase in SAI and lowercase in EDIT4
- make sure remark of CloTyro2 is ignored in method 1!
Differences before [14387]:
Pos | SAI | Group | Col.content | Reason |
6 | G | g | 14x'G'+3x'A'+1x'U' | G(77%) ⇒ 'G' |
12 | M | n | 1x'Y' | both wrong, should be 'Y' |
21 | M | m | 2x'A'+1x'C' | A(66%)+C(33%)=M |
32 | R | r | 18x'A'+12x'G'+6x'C'+3x'U' | A(46%)+G(30%)=R; 46+30=76%⇒uppercase |
69 | G | g | 2x'G' | only 'G' ⇒ should be uppercase |
115 | S | s | 14x'G'+10x'C'+3x'U' | C(37%)+G(51%)=S; 37+51=88%⇒uppercase |
Differences before [14319]:
Pos | SAI | Group | Col.content | Reason |
12 | M | - | 1x'Y' | |
21 | M | - | 2x'A'+1x'C' | A(66%)+C(33%)=M; gaps are off ⇒ '-' is wrong|| |
69 | G | - | 2x'G' | gaps are off ⇒ '-' is wrong! |
Differences before [14321]:
Pos | SAI | Group | Col.content | Reason |
21 | M | a | 2x'A'+1x'C' | A(66%)+C(33%)=M |
115 | S | g | 14x'G'+10x'C'+3x'U' | C(37%)+G(51%)=S; 37+51=88%⇒uppercase; gaps are off ⇒ 'g' is wrong! |
120 | w | a | 14x'A'+ 3x'C'+6x'G'+9x'U' | A(43%)+T(28%)=W; 43+28=71%⇒lowercase |
Amino acids
Consensus calculation for amino acids works does completely different things:
- EDIT4 calculates simplified amino acid code (as currently shown in IUPAC info window; i.e. 'A' for any of "PAGST" etc.)
- NTREE consensus calculation does not seem to handle any amino-IUPAC-codes
Common IUPAC codes for amino acids (both supported in ARB; but probably insufficient):
- B = D | N
- Z = E | Q
- X = <any>
Some sources also mention J = L | I.
Attachments (1)
Change History (12)
Changed 9 years ago by westram
comment:1 Changed 9 years ago by westram
- Owner changed from devel to westram
- Status changed from new to _started
comment:2 Changed 9 years ago by westram
- Component changed from !NoIdea to Library (other)
- Description modified (diff)
comment:3 Changed 9 years ago by westram
- Description modified (diff)
comment:4 Changed 9 years ago by westram
- Description modified (diff)
comment:5 Changed 9 years ago by westram
- Description modified (diff)
comment:6 Changed 9 years ago by westram
- Description modified (diff)
comment:7 Changed 9 years ago by westram
- Description modified (diff)
comment:8 Changed 9 years ago by westram
- Description modified (diff)
comment:9 Changed 9 years ago by westram
- Description modified (diff)
comment:10 Changed 9 years ago by westram
- Resolution set to fixed
- Status changed from _started to closed
with [14396] (in branch consensus)
demonstrate behavior