This project is read-only.

New tool for read cluster analysis and filtering (for ddRADseq pipeline, or more general usage)

Oct 1, 2013 at 9:23 AM
Hi everyone, I'm new here :)

I'm a current Software Dev student and this semester I'm working on project with .NET Bio. I'm coming to this as a developer without much bioinformatics/biology background, so any suggestions or input on this project would be most welcome!

Essentially, I'll be working on an add-in to .NET Bio, and possibly also a stand-alone application, capable of processing clusters of read sequences and calculating various metrics from them. The goal is to analyse clusters (produced by a clustering algorithm such as MCL) to determine the likelihood that each read in the cluster comes from the same genetic loci (i.e., determine how accurate the clustering process was).

I'm currently processing SAMAlignedSequences from a BAM file and will be calculating things like the number of haplotypes and genotypes represented in each sample and cluster. Based on this (and potentially on other metrics), as well as the quality score of each read and the number of individuals represented in each cluster, etc., I hope to generate a rating score or scores to indicate for each cluster whether it is "good" or "bad". Based on this rating, "bad" clusters can be filtered out before performing downstream analysis on the data.

As part of a ddRADseq pipeline, the purpose of this project is to improve on the cluster ploidy detection step mentioned in this article.

By adding this functionality to .NET Bio, it will be more accessible for others to use, and can perhaps be extended to be useful for a wider range of applications. By posting here, in particular I wondered if anyone has suggestions for other uses a cluster ploidy/accuracy calculation tool could be put to, or if there are any particular metrics you would be interested in seeing included.

I'm looking forward to fun times working on .NET Bio. :)


Project outline (if interested)
Oct 3, 2013 at 8:50 PM
Fun times indeed! Thanks for posting your plans on the forum - and welcome!