Opened 16 years ago

Closed 11 years ago

Last modified 9 years ago

#144 closed defect (fixed)

Treeing for protein partial sequence

Reported by: guest Owned by: westram
Priority: normal Milestone:
Component: ARB_PARSIMONY Version: SVN
Keywords: arb6.0-hotfix Cc: syan@…

Description

Hello,

I have a database for a protein based on nucleotide sequences and translated to aa. Around half of the sequences are full-length and half are partial, and the partials are in the similar range of columns.

I want to first create an aa full-length tree, then add partial sequences use "ARB parsimony (quick add marked)" with the full-length filter. Then I find that nearly all the partial sequences are grouped together instead of with their full-length relatives.

However, if I do the same full-length tree using nucleotide sequences, and add partial sequences using full-length nucleotide filter (just as what I do for 16S), the result is reasonable: all the partial sequences stands in where they should be.

My guess for the reason: in nucleotide treeing, ARB omit the dots "." (which is wanted), while for aa treeing, ARB treat dots as gaps, which caused clustering of partial sequences.

In the sample file (I haven't tried to attach it yet, if not working, I can send you), the marked sequences are the partial sequences, and 4 trees are inside.

Thanks, Yan Shi (syan@…)

Attachments (2)

soxB20081124test.arb (293.3 KB) - added by guest 16 years ago.
soxB showing arb add marked parsimony tree for aa sequence
soxB20081124test_add_partial.arb (305.2 KB) - added by westram 11 years ago.

Download all attachments as: .zip

Change History (7)

Changed 16 years ago by guest

soxB showing arb add marked parsimony tree for aa sequence

comment:1 Changed 11 years ago by westram

  • Owner changed from devel to westram
  • Status changed from new to accepted

comment:2 Changed 11 years ago by westram

  • Status changed from accepted to _started

comment:3 in reply to: ↑ description Changed 11 years ago by westram

I want to first create an aa full-length tree, then add partial sequences use "ARB parsimony (quick add marked)" with the full-length filter. Then I find that nearly all the partial sequences are grouped together instead of with their full-length relatives.

However, if I do the same full-length tree using nucleotide sequences, and add partial sequences using full-length nucleotide filter (just as what I do for 16S), the result is reasonable: all the partial sequences stands in where they should be.

ARB provides a special function (in ARB_PARSIMONY) to add "partial sequences" which

  • does ignore gap penalties and
  • detects best matching full sequence for each partial sequence

Using this function on your AA alignment results in reasonable placement of partial sequences. See new trees and their comments in upcoming attachment..

Changed 11 years ago by westram

comment:4 in reply to: ↑ description Changed 11 years ago by westram

  • Component changed from !NoIdea to ARB_PARSIMONY
  • Resolution set to fixed
  • Status changed from _started to closed
  • Version set to SVN

by [12490].

My guess for the reason: in nucleotide treeing, ARB omit the dots "." (which is wanted), while for aa treeing, ARB treat dots as gaps, which caused clustering of partial sequences.

Your guess was correct, ARB treated dots

  • as gaps in protein-reconstruction and
  • as 'N's in nucleotide-reconstruction

Now treats dots as 'X's.

comment:5 Changed 9 years ago by westram

  • Keywords arb6.0-hotfix added

mark arb6.0.x hotfixes

Note: See TracTickets for help on using tickets.