#144 closed defect (fixed)
Treeing for protein partial sequence
Reported by: | guest | Owned by: | westram |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | ARB_PARSIMONY | Version: | SVN |
Keywords: | arb6.0-hotfix | Cc: | syan@… |
Description
Hello,
I have a database for a protein based on nucleotide sequences and translated to aa. Around half of the sequences are full-length and half are partial, and the partials are in the similar range of columns.
I want to first create an aa full-length , then add partial sequences use "ARB parsimony (quick add marked)" with the full-length filter. Then I find that nearly all the partial sequences are grouped together instead of with their full-length relatives.
However, if I do the same full-length using nucleotide sequences, and add partial sequences using full-length nucleotide filter (just as what I do for 16S), the result is reasonable: all the partial sequences stands in where they should be.
My guess for the reason: in nucleotide treeing, ARB omit the dots "." (which is wanted), while for aa treeing, ARB treat dots as gaps, which caused clustering of partial sequences.
In the sample file (I haven't tried to attach it yet, if not working, I can send you), the marked sequences are the partial sequences, and 4 trees are inside.
Thanks, Yan Shi (syan@…)
Attachments (2)
Change History (7)
Changed 16 years ago by guest
comment:1 Changed 10 years ago by westram
- Owner changed from devel to westram
- Status changed from new to accepted
comment:2 Changed 10 years ago by westram
- Status changed from accepted to _started
comment:3 in reply to: ↑ description Changed 10 years ago by westram
I want to first create an aa full-length , then add partial sequences use "ARB parsimony (quick add marked)" with the full-length filter. Then I find that nearly all the partial sequences are grouped together instead of with their full-length relatives.
However, if I do the same full-length using nucleotide sequences, and add partial sequences using full-length nucleotide filter (just as what I do for 16S), the result is reasonable: all the partial sequences stands in where they should be.
ARB provides a special function (in ARB_PARSIMONY) to add "partial sequences" which
- does ignore gap penalties and
- detects best matching full sequence for each partial sequence
Using this function on your AA alignment results in reasonable placement of partial sequences. See new trees and their comments in upcoming attachment..
Changed 10 years ago by westram
comment:4 in reply to: ↑ description Changed 10 years ago by westram
- Component changed from !NoIdea to ARB_PARSIMONY
- Resolution set to fixed
- Status changed from _started to closed
- Version set to SVN
by [12490].
My guess for the reason: in nucleotide treeing, ARB omit the dots "." (which is wanted), while for aa treeing, ARB treat dots as gaps, which caused clustering of partial sequences.
Your guess was correct, ARB treated dots
- as gaps in protein-reconstruction and
- as 'N's in nucleotide-reconstruction
Now treats dots as 'X's.
soxB showing arb add marked parsimony for aa sequence