Opened 9 months ago

Last modified 9 months ago

#755 new enhancement

probe match shall support more types of "weak matches"

Reported by: westram Owned by: devel
Priority: major Milestone: wishlist
Component: ARB_PT_SERVER Version: SVN
Keywords: Cc:

Description (last modified by westram)

Types of weak matches currently supported by PT-Server:

  • mismatches
  • N-mismatches

It should also support these types:

  1. IUPAC codes in 'Target string'
  2. insertions
  3. deletions

Needed additional parameters:

  • max. number of insertions and deletions allowed
  • mismatch-penalty for inserts + deletes (each using 2 penalties, e.g.: one for initial insert, another for extending that insert)
  • mismatch-penalty for IUPAC-matches?
    • maybe specify a general weight-factor (for IUPAC and N-matches),
    • calculate some kind of weighted mismatch from
      • the probabilities of the specific bases (defined by IUPAC/N) and
      • their transition/transversion penalties and
    • add weight-factor * mismatch-weight to current (weighted) mismatch value. That would allow to either ignore IUPAC-mismatches (by setting the weight to zero) or to count/weight them as mismatching.

Prefix tree (PT) traversal:

  • needs to carry much more state-information, e.g. number of inserts/deletes performed (absolute + extends),
  • has to be able to properly undo state-modifications (performed during descent) while ascending and
  • should descent into all additional prefixes that get possible by inserting/deleting bases.

While traversing PT, matches get collected and need to get categorized into

  • impossible matches,
  • definite matches and
  • possible matches (which need further inspection).

Reasons why a hit may be classified as possible match:

  • target string is longer than PT-depth (⇒ reaches cut-off at tips of PT)
  • whenever any other cut-off in PT is reached (e.g. 'N' or IUPAC-code occurred in input sequence data)

Possible matches need to inspect the actual sequence data and evaluate the "rest" of the target string against it.


Related: #746, #747
Superseeds: #543

Change History (1)

comment:1 Changed 9 months ago by westram

  • Description modified (diff)
Note: See TracTickets for help on using tickets.