This project is read-only.

Needleman-Wunsch algorithm using

Sep 8, 2013 at 6:08 PM
Hello, everyone!
I have a question. How can I show results of pairwise globally alignment of two protein sequences?
Sep 9, 2013 at 3:31 PM
It's actually pretty simple. There are four basic steps:

1) Load two sequences
2) Create the aligner and setup the properties as you need (open/extension gap, similarity matrix, etc.)
3) Do the alignment
4) Process the results.

The only tricky part is that .NET Bio currently doesn't align protein sequences with gaps. Gaps don't make a lot of sense in protein alignments but I can't recall exactly why this was enforced and it's been discussed at various times to remove this restriction (as other aligners don't enforce this) so if your input has gaps you need to remove them before running them through the aligner.

Here's some sample code:
using Bio;
using Bio.Algorithms.Alignment;
using Bio.Extensions;
using Bio.IO;
using System;
using System.Collections.Generic;
using System.Linq;
using Bio.SimilarityMatrices;

namespace AlignProteins
{
    class Program
    {
        static void Main(string[] args)
        {
            if (args.Length == 0)
            {
                Console.WriteLine("Missing filename.");
                return;
            }

            // Step 1: load in sequences.
            ISequenceParser parser = SequenceParsers.FindParserByFileName(args[0]);
            if (parser == null)
            {
                Console.WriteLine("Failed to locate parser for {0}", args[0]);
                return;
            }

            IList<ISequence> sequences = parser.Parse().Take(2).ToList();
            if (sequences.Count != 2)
            {
                Console.WriteLine("There are not two sequences in the input file.");
                return;
            }

            IAlphabet alphabet = sequences[0].Alphabet;
            if (sequences[1].Alphabet != alphabet)
            {
                Console.WriteLine("Must use same alphabet for first and second sequence.");
                return;
            }

            // Remove the gaps - .NET Bio currently does not align proteins with gaps.
            ISequence sequence1 = new Sequence(alphabet, sequences[0].Where(b => !alphabet.CheckIsGap(b)).ToArray());
            ISequence sequence2 = new Sequence(alphabet, sequences[0].Where(b => !alphabet.CheckIsGap(b)).ToArray());

            Console.ForegroundColor = ConsoleColor.Green;
            Console.WriteLine("Sequence 1: [{0}] {1}", sequence1.Alphabet.Name, sequence1.ConvertToString());
            Console.ForegroundColor = ConsoleColor.Cyan;
            Console.WriteLine("Sequence 2: [{0}] {1}", sequence2.Alphabet.Name, sequence2.ConvertToString());
            Console.WriteLine();

            // Step 2: setup the aligner
            NeedlemanWunschAligner aligner = new NeedlemanWunschAligner();
            aligner.GapOpenCost = -11;
            aligner.GapExtensionCost = -1;
            aligner.SimilarityMatrix = new SimilarityMatrix(SimilarityMatrix.StandardSimilarityMatrix.Pam250);
            aligner.ConsensusResolver = new SimpleConsensusResolver(Alphabets.Protein);

            // Step 3: Do the alignment
            IList<IPairwiseSequenceAlignment> results = aligner.Align(sequence1, sequence2);

            // Step 4: Process the results
            Console.WriteLine(new string('=', 60));

            Console.ForegroundColor = ConsoleColor.Yellow;
            Console.WriteLine("{0}\r\nFirst: {1}\r\nSecond: {2}\r\nScore: {3}\r\n",
                aligner.Name,
                results[0].PairwiseAlignedSequences[0].FirstOffset,
                results[0].PairwiseAlignedSequences[0].SecondOffset,
                results[0].PairwiseAlignedSequences[0].Score);

            Console.ForegroundColor = ConsoleColor.Magenta;
            Console.WriteLine("Consensus:\r\n{0}", results[0].PairwiseAlignedSequences[0].Consensus.ConvertToString());
        }
    }
}
I hope that helps!
mark