Multiple sequence/genome alignment.

Jan 29, 2015 at 3:20 AM
Hello, I'm very new to .NET Bio and am currently working through the cook book and tutorials; however I have yet to find how to run entire genome/multiple sequence alignments which I will need to do for a project I will be working with. It would also be very helpful to be able to see the SNPs from all the sequences in my data pool. Is this possible in .NET Bio? Thanks!
Jan 31, 2015 at 11:27 PM
OK I have MSA working now, but how would I go about taking the alignments a building a phylo. tree?
Coordinator
Feb 2, 2015 at 2:19 PM
Hi
A quick welcome. Answering on my phone so a proper response tomorrow. The alignment object properties give you access to distances but I would need to check on tree construction. Travelling, but will be in touch.
Feb 4, 2015 at 9:41 PM
Browsing the library there does appear to be a Newick tree formatter, but how to use it is beyond me. It seems of all the tree formats that is the only one that has a formatter in the library, Nexus and Phylip only have parsers.
Feb 5, 2015 at 2:36 AM
Hmm also on a side note, I seem to be having a problem when dealing with large pools of sequences, if I try to do an alignment on a pool of about ~200 sequences it returns an out of bounds error. The max it seems to want to do is about 76 sequences...
Developer
Feb 5, 2015 at 6:53 AM
Hi Rooper,

Regarding building a phylogeny, the library at present does not contain any heavy-weight phylogenetic modeling implementations, but you could use the tools in the MSA to build a neighbor joining tree, as I believe there is code in there to do that.

Were you able to figure out how to use the newick formatter? The problem of constructing MSAs with a great many sequences is a general one, but the error shouldn't be 'out of bounds', do you happen to have a test case?

Cheers,
N
Feb 5, 2015 at 4:57 PM
Edited Feb 5, 2015 at 5:04 PM
No I have not figured out how to use the Newick formatter. As for the other issue it was not an out of bound error (my bad); here is the exact issue:

System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values.

This occurs when I pull all my sequences from a private database and turn them into Bio.Sequence objects (since they are strings from the db).
here is my method that builds them into Bio.Sequences objects:
public ISequence buildSequence(string seq)
        {
            return new Sequence(Alphabets.DNA, seq);
        }
I simply get a List of strings from the db and then via a foreach loop call the method above to make the strings into Bio.Sequences.

Now if I put validation to false it works fine until I goto preform the MSA, then I get the same issue when performing the alignment. Here is my alignment methods:
public List<ISequence> AlignSequences(List<ISequence> sequences, int openGapPenalty, int extendedGapPenalty, int type, int matrix, int para, int kmer)
        {
            PAMSAMMultipleSequenceAligner.parallelOption = new ParallelOptions() { MaxDegreeOfParallelism = para };
            SimilarityMatrix similarityMatrix=null;

            if(matrix==0)
            similarityMatrix = new SimilarityMatrix(SimilarityMatrix.StandardSimilarityMatrix.Blosum45);
            if(matrix==1)
            similarityMatrix = new SimilarityMatrix(SimilarityMatrix.StandardSimilarityMatrix.Blosum50);
            if(matrix==2)
            similarityMatrix = new SimilarityMatrix(SimilarityMatrix.StandardSimilarityMatrix.Blosum62);
            if(matrix==3)
            similarityMatrix = new SimilarityMatrix(SimilarityMatrix.StandardSimilarityMatrix.Blosum80);
            if(matrix==4)
            similarityMatrix = new SimilarityMatrix(SimilarityMatrix.StandardSimilarityMatrix.Blosum90);
            if(matrix==5)
            similarityMatrix = new SimilarityMatrix(SimilarityMatrix.StandardSimilarityMatrix.Pam30);
            if(matrix==6)
            similarityMatrix = new SimilarityMatrix(SimilarityMatrix.StandardSimilarityMatrix.Pam70);
            if(matrix==7)
            similarityMatrix = new SimilarityMatrix(SimilarityMatrix.StandardSimilarityMatrix.Pam250);
            if(matrix==8)
            similarityMatrix = new SimilarityMatrix(SimilarityMatrix.StandardSimilarityMatrix.DiagonalScoreMatrix);
            if(matrix==9)
            similarityMatrix = new SimilarityMatrix(SimilarityMatrix.StandardSimilarityMatrix.EDnaFull);
            if(matrix==10)
            similarityMatrix = new SimilarityMatrix(SimilarityMatrix.StandardSimilarityMatrix.AmbiguousDna);
            if(matrix==11)
            similarityMatrix = new SimilarityMatrix(SimilarityMatrix.StandardSimilarityMatrix.AmbiguousRna);

            if(type==1)
                return TestProfileAligner(new SmithWatermanProfileAlignerParallel(similarityMatrix, ProfileScoreFunctionNames.PearsonCorrelation, openGapPenalty, extendedGapPenalty, 2), sequences, kmer);
            if (type == 0)
                return TestProfileAligner(new NeedlemanWunschProfileAlignerParallel(similarityMatrix, ProfileScoreFunctionNames.PearsonCorrelation, openGapPenalty, extendedGapPenalty, 2), sequences, kmer);
          
                return null;
        }

        static public List<ISequence> TestProfileAligner(IProfileAligner profileAligner, List<ISequence> lstSequences, int kmer)
        {

              int kmerLength = kmer;

              const DistanceFunctionTypes distanceFunctionName = DistanceFunctionTypes.EuclideanDistance;
              const UpdateDistanceMethodsTypes hierarchicalClusteringMethodName = UpdateDistanceMethodsTypes.Average;

              MsaUtils.SetProfileItemSets(lstSequences[0].Alphabet);

              // Generate DistanceMatrix
              KmerDistanceMatrixGenerator kmerDistanceMatrixGenerator = new KmerDistanceMatrixGenerator(lstSequences, kmerLength, lstSequences[0].Alphabet, distanceFunctionName);

              // Hierarchical clustering
              IHierarchicalClustering hierarcicalClustering = new HierarchicalClusteringParallel(kmerDistanceMatrixGenerator.DistanceMatrix, hierarchicalClusteringMethodName);

              // Generate Guide Tree
              BinaryGuideTree binaryGuideTree = new BinaryGuideTree(hierarcicalClustering);

              // Progressive Alignment
              IProgressiveAligner progressiveAlignerA = new ProgressiveAligner(profileAligner);

              progressiveAlignerA.Align(lstSequences, binaryGuideTree);

              float profileScore = MsaUtils.MultipleAlignmentScoreFunction(progressiveAlignerA.AlignedSequences, profileAligner.SimilarityMatrix,
                                                                           profileAligner.GapOpenCost, profileAligner.GapExtensionCost);

              return progressiveAlignerA.AlignedSequences;
        }
The issue then occurs on the alignment at the line " progressiveAlignerA.Align(lstSequences, binaryGuideTree);"
Feb 5, 2015 at 9:44 PM
Edited Feb 5, 2015 at 9:55 PM
On the alignment I get a System.AggregateException as well, assuming the I created the sequences with validation = false;
Feb 5, 2015 at 10:04 PM
Ahh, never-mind about about the aligning, it seems to work perfectly fine when setting it to ambiguous dna! But I am still curious if anyone know how to use the Newick formatter, or has another ways to produce a tree based off of my MSA. Thanks!
Feb 10, 2015 at 3:48 AM
So does anyone know how to use the Newick formatter? Or at least some kind of phylogenetic tree (even the most simple) in Bio .NET?
Feb 20, 2015 at 3:41 AM
Is there any form of phylogenetic support, for building trees, or at least running kimura/jukes &cantor/HKY? Maximum parsimony would be great, but I see nothing that looks like that in the library. If bio.net does not support these kinds of things, does anyone know of any libraries that do (preferably that can be interfaced in c#).
Developer
Mar 11, 2015 at 7:03 AM
Hi rooper,

Phylogenetic inference is a very nuanced thing, and largely depends on what you are trying to accomplish (in the "understand the history of domestic dog" sense rather than the "apply maximum likelihoood" to this problem sense). Can I ask what your larger goal is? It might make recommendations easier.

-N
Mar 11, 2015 at 5:23 PM
Hi, currently I am just using clustal to perform my maximum parsimony, basically to see the relation of each sequence in my pool. This helps me better understand how they are behaving as they evolve. I would prefer to be able to integrate this into software I am writing (in C# with .NET Bio), but for now I am just using clustal which proves a nuance to keep switching between programs. Thanks!