This project is read-only.

Adding Contig File Formats ACE & AGP to enable De Novo Assembly

Jun 21, 2012 at 2:28 AM

I have recently transitioned from a sequencing reads “guided alignment” world to a contig based “De Novo assembly” world and I would like to use .NET Bio to visualize the results. The key difference between the two methods is avoiding the use of a reference genome for alignment. I understand that a contig is just a fragment that has been assembled but not aligned to a reference, so it does not have a chromosome or position, yet. I can see that tools exist to view, compare and edit contigs, but code is not always provided. There are standardized file formats to support these tools so it should be easy to create simple viewing applications. I have seen the use of SAM/BAM, which is well supported by .NET Bio, but there are two more useful formats for contigs: ACE and AGP. ACE describes a collection of contigs, the sequence in each contig, the quality of the sequence and the reads used to form these contigs all in one file. AGP uses a two step pointer mechanism to separate the contig description from the contig alignment to allow each to be varied independently, which is very useful and efficient when editing. It seems that contig editing is required for finishing an assembly, which seems to be the real strength of De Novo. I see this as a gap in the .NET Bio IO library that can be plugged fairly quickly and it starts to open up a new area for .Net Bio.

I did not find code that was available to read/write ACE or AGP, so I have started to build an ACE class, but I may not be the right person to complete this. I would willing to share this code but I know it will need “finishing” and it may be better to start from scratch. More importantly, I would like to know if this is the right direction and if there are other file formats. Give me your thoughts

Aug 21, 2012 at 10:02 PM

Hi khaden,

I'm afraid incant answer your question (I'm not skilled in this area). I'm really replying to your message to bring it back to the top of peoples in boxes. I'm intrigued that you think you've identified a potential enhancement of .net Bio. If the community agrees and you feel that you can help implement this enhancement it would be a real shame to see your post unanswered.

Ross

On Jun 21, 2012 2:29 AM, "khaden" <notifications@codeplex.com> wrote:

From: khaden

I have recently transitioned from a sequencing reads “guided alignment” world to a contig based “De Novo assembly” world and I would like to use .NET Bio to visualize the results. The key difference between the two methods is avoiding the use of a reference genome for alignment. I understand that a contig is just a fragment that has been assembled but not aligned to a reference, so it does not have a chromosome or position, yet. I can see that tools exist to view, compare and edit contigs, but code is not always provided. There are standardized file formats to support these tools so it should be easy to create simple viewing applications. I have seen the use of SAM/BAM, which is well supported by .NET Bio, but there are two more useful formats for contigs: ACE and AGP. ACE describes a collection of contigs, the sequence in each contig, the quality of the sequence and the reads used to form these contigs all in one file. AGP uses a two step pointer mechanism to separate the contig description from the contig alignment to allow each to be varied independently, which is very useful and efficient when editing. It seems that contig editing is required for finishing an assembly, which seems to be the real strength of De Novo. I see this as a gap in the .NET Bio IO library that can be plugged fairly quickly and it starts to open up a new area for .Net Bio.

I did not find code that was available to read/write ACE or AGP, so I have started to build an ACE class, but I may not be the right person to complete this. I would willing to share this code but I know it will need “finishing” and it may be better to start from scratch. More importantly, I would like to know if this is the right direction and if there are other file formats. Give me your thoughts

Read the full discussion online.

To add a post to this discussion, reply to this email (bio@discussions.codeplex.com)

To start a new discussion for this project, email bio@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe or change your settings on codePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at codeplex.com