This project is read-only.

Ambiguous DNA

Sep 17, 2015 at 2:40 AM
So I've been trying to run Padena but it keeps finding ambiguous DNA so I grabbed one to see what it was finding and the letters it's saying that are in the alphabet clearly aren't. Does anyone know why this would happen?
As you can see below the actual sequence doesn't have what the Alphabet says it does.
This is from the Immediate Window:
reads.First().ConvertToString(0,reads.First().Count) //my first sequence from the "reads" collection
"NNNNNAGGGCCATAATAATACCCTTCCTGACAGAGGAATTACAAGTGAGGGTAGGTCAATGATTTAATATATCACACAACAGAAGAAAAACTGAAGCTTAA"
reads.First().Alphabet
{ACGT-MRSWYKVHDBN}
    [Bio.AmbiguousDnaAlphabet]: {ACGT-MRSWYKVHDBN}
    Count: 16
    HasAmbiguity: true
Sep 18, 2015 at 7:12 AM
Hi Punky,

You can't have an N (ambiguous base) in sequences for assembly. The first several bases are what cause this. A solution would be to trim the leading "N" bases from the sequence.

Cheers,
N
Aug 26, 2016 at 4:23 AM
If you're looking for perfect matches, don't index kmers that contain Ns.

Are you coding the kmer index yourself? You might want to take a look at Tallymer, which creates an index similar to what you have in mind. potassium titanium oxalate