NGS Related Projects

Aug 18, 2013 at 7:29 AM
As discussed on another thread, we have been looking at a project to add additional capabilities to the library in support of NGS work. I approached Kirt Haden from for his thoughts, and he told me that they were working on:
  1. Reference finishing
    a. Compare references to aligned/assembled data to identify errors
  2. Structural variation
    a. Compare De Novo alignment to Reference and identify insert, deletions, translocations, inversions,…
  3. Visualization of Genomic information
    a. We have also put a lot of work into visualization and I am sure these concepts can be generalized
  4. Image Processing and image compression
    a. We are identifying features, stitching images, and combining multiple channels
  5. Cloud processing using a cluster
    a. Manage a data queue of raw data and select a pipeline to process the data. Report/monitor results with a dashboard
  6. Strain typing and genome identification
    a. This is basically looking for genomic fingerprints and statistical measure of uniqueness
We have previously done and published some work on genomic fingerprints and this is an active area for us, but the student group are more CS development oriented and so would be best focused around some clear cut implementations. But that would be our starting contribution - this is a more general thread. I think we might look pretty closely at parsers - see other thread - and also at 1 and 2, followed by 3.

Other thoughts on direction?

Aug 23, 2013 at 6:22 PM
Have you though about primer design? It is a solved problem and there must be some good guides out there that describe the approaches and pitfalls, making it more accessible to your CS students while still being a biological task. It is also something that fits into the idea of .NET Bio being a basic library of utilities.

Of course the 'basic library' part is a blessing and a curse - we all need 'glue' to stick together the various parts of an analysis pipeline (or whatever) but glue-makers don't get the credit they deserve, even if we all rely on them. That might affect how attractive your students would find the project.

Another thought - support for RNAseq? This is a workflow a lot of people use, so improved support would appeal to a broad range of users.