User-reported bugs

Coordinator
May 30, 2012 at 4:11 PM

(forwarded on behalf of a commercial user - copied to the discussions so everyone can see and comment)

I have used the .NET Bio toolbox to perform SNP detection from Sanger reads (ab1 format).

Development was not always straightforward, since a few annoying bugs still exist in the library. Here are my notes on this:

 

1.       Input ISequence ID not transferred to output ISequence of Bio.Algorithms.Alignment.MultipleSequenceAlignment.MsaUtils.UnAlign

2.       Ab1Parser.Parse does not support sequences with Ns. Sequence does not get created.

=> would require to use Alphabets.AmbiguousDNA in GetSequence method of the Ab1Parser

3.       QualitativeSequence.GetSubsequence should return a QualitativeSequence (currently loosing the quality data in the process)

4.       FirstOffset and SecondOffset properties swapped in PairwiseAlignedSequence (First should be Second, and vice-versa)

5.       Strange behavior of alignment offsets in PairwiseAlignedSequence, often offset by 512 or 1024 (issue with Int64 representation?). For instance, seen a offset reported as 37 while true offset was 1024+37=1061

a.       Sample C# code showing the bug

using System;

using System.Collections.Generic;

using Bio;

using Bio.Algorithms.Alignment;

 

namespace Project1

{

   class Class1

   {

 

       public static void Main()

       {

           ISequence seq1 = new Sequence(Alphabets.DNA, "ATGTCTGCCCCTAAGAAGATCGTCGTTTTGCCAGGTGACCACGTTGGTCAAGAAATCACAGCCGAAGCCATTAAGGTTCTTAAAGCTATTTCTGATGTTCGTTCCAATGTCAAGTTCGATTTCGAAAATCATTTAATTGGTGGTGCTGCTATCGATGCTACAGGTGTTCCACTTCCAGATGAGGCGCTGGAAGCCTCCAAGAAGGCTGATGCCGTTTTGTTAGGTGCTGTGGGTGGTCCTAAATGGGGTACCGGTAGTGTTAGACCTGAACAAGGTTTACTAAAAATCCGTAAAGAACTTCAATTGTACGCCAACTTAAGACCATGTAACTTTGCATCCGACTCTCTTTTAGACTTATCTCCAATCAAGCCACAATTTGCTAAAGGTACTGACTTCGTTGTTGTCAGAGAATTAGTGGGAGGTATTTACTTTGGTAAGAGAAAGGAAGACGATGGTGATGGTGTCGCTTGGGATAGTGAACAATACACCGTTCCAGAAGTGCAAAGAATCACAAGAATGGCCGCTTTCATGGCCCTACAACATGAGCCACCATTGCCTATTTGGTCCTTGGATAAAGCTAATGTTTTGGCCTCTTCAAGATTATGGAGAAAAACTGTGGAGGAAACCATCAAGAACGAATTCCCTACATTGAAGGTTCAACATCAATTGATTGATTCTGCCGCCATGATCCTAGTTAAGAACCCAACCCACCTAAATGGTATTATAATCACCAGCAACATGTTTGGTGATATCATCTCCGATGAAGCCTCCGTTATCCCAGGTTCCTTGGGTTTGTTGCCATCTGCGTCCTTGGCCTCTTTGCCAGACAAGAACACCGCATTTGGTTTGTACGAACCATGCCACGGTTCTGCTCCAGATTTGCCAAAGAATAAGGTCAACCCTATCGCCACTATCTTGTCTGCTGCAATGATGTTGAAATTGTCATTGAACTTGCCTGAAGAAGGTAAGGCCATTGAAGATGCAGTTAAAAAGGTTTTGGATGCAGGTATCAGAACTGGTGATTTAGGTGGTTCCAACAGTACCACCGAAGTCGGTGATGCTGTCGCCGAAGAAGTTAAGAAAATCCTTGCTTAA");

           ISequence seq2 = seq1.GetSubSequence(800L, 250L);

           SmithWatermanAligner sw = new SmithWatermanAligner();

           IList<IPairwiseSequenceAlignment> aln = sw.Align(seq1, seq2);

           foreach (IPairwiseSequenceAlignment pwaln in aln )

           {

               foreach (PairwiseAlignedSequence pwalnseq in pwaln)

               {

                  System.Console.WriteLine("First offset (should be 800): " + pwalnseq.FirstOffset);

                   System.Console.WriteLine("Second offset (should be 0): " + pwalnseq.SecondOffset);

               }

           }

           System.Console.ReadLine();

       }

   }

}

 

Coordinator
May 30, 2012 at 4:12 PM
This discussion has been copied to a work item. Click here to go to the work item and continue the discussion.