Longest common sequence problem

Oct 7, 2012 at 10:07 AM
Edited Oct 7, 2012 at 10:09 AM

Hi, please forgive for what might be a newbie question. I am attempting to use .NET Bio to solve some simple problems posted at Rosalind , hoping the lib will help me with most of them, and was curious to know if it is possible to use MultiWaySuffixTree to find the longest common substring among multiple sequences (exact match)

As my first thought, it should be possible to set up an alphabet [ACGT + IDs] - where ID is a byte [1..255 except ACGT and $] that would terminate each string before concatenating them together and instantiating SuffixTree. Afterwards, it should be easy to do a tree traversal to find the longest path that terminates with every ID.


Oct 17, 2012 at 2:23 PM

hi nyurik

Posts, newbie or otherwise, are always welcome. Apologies for not replying earlier  - saw your post a few days ago but have only now come back to it. Thanks in particular for pointing everyone also to Rosalind,, which is a great initiative. There are a number of ways of solving the substring probkem posed, The alphabet extension is one, and this data structure is certainly useful. but there may be more elegant ways. Think carefully about whether you really need to define the new alpha. 

Did you try to  implement your solution in the mean time?