Split xml blast results into individual xml per hit

May 14, 2013 at 3:56 PM
Edited May 14, 2013 at 6:12 PM
dear all

Now i'm working in visualization biological data using Microsoft technologies, specifically using .Net Bio. i'm working now in a tool for visualize the blast results into a pivot collection, called BLIP (https://blip.codeplex.com/ , https://github.com/vforget/blip , https://github.com/leomont/blip), as a part of this develop i'm taking the one blast xml output result, like a entry and split the results into individual xml per hit, usign the <Iteration> tag, the final amount of xml files is given by the number of <Iteration> tags who also has a "static part" like a header. (more details in the next image http://postimg.org/image/umanjvoox/). i have two questions about this subject:
  1. I have developed a solution for this issue, and seems efficient, would it be useful as a part of .NET Bio code? (I can share the code in github if you want)
  2. Searching for another solution I found the BlastXmlParser in .NET Bio (aka MBF) i dont know if I can use this object, to split the file and send it to a IList object who stores the individual results. something like that:
using MBF;
using MBF.IO.Fasta;
using MBF.Web;
using MBF.Web.Blast;
using MBF.IO.GenBank;
String blastFile = “one of your split result.xml”
BlastXmlParser blastParser = new BlastXmlParser();
IList<BlastResult> blastResults = blastParser.Parse(blastFile);


I Hope to be clear, however i will be happy to clarify all the questions and work togheter about visualization tools over .NET Bio.

Regards,

Leo
Developer
May 14, 2013 at 4:14 PM
Hi Leo,

Just a couple comments from my end.

1- There is no MBF.Web.Blast namespace as far as I know, as it has all been renamed over to Bio.Web.Blast.

2- My understanding is that the namespace should definitely be able to read the XML output (and I suspect given its name also do the query).

Have you tried using the .NET Bio parsers? If they can't do what you want it would likely be great to have. If they can and/or should, probably debugging or working with the current code base would be very useful, after all the library is extensible, best not to reinvent the wheel!

Cheers,
Nigel
May 14, 2013 at 7:13 PM
Hi Nigel,
  1. the MBF.Web.Blast existed, at least this is that somebody who worked with MBF tell me about that , but sincerely, I don't know, actually this is the reason that i'm asking for help, the reason that I mentioned the MBF is because the project that i'm working now, called BLIP, uses MBF, we know, that we must change and update the program, but this will be after we finish with other issues. however the principal topic for me now, about this post, is know if .NET bio or the old MBF, split the xml BLAST output result.
  2. About read the xml, C# has XmlDocument class, so I think that's enough for read the file, save the xml Namespace of .NET Bio validate the xml structure, fact that is very straightforward.
I have tried some parsers, but not a parser for BLAST results, i will try it, but i think is important if someone confirm me if exists a functionality like this.

Thanks for your answer,

Leo
Developer
May 14, 2013 at 7:36 PM
Leo,

You may look into the Nuget packaging system for staying up to date. Haven't used it myself but it should be a simple add followed by a quick find/replace of the incorrect references to the older MBF. Mark wrote a blog entry about this here: http://julmar.com/blog/mark/?p=185.

Looking at the source code, it appears that functionality should exist, and could be called (e.g. with the type of code you have referenced above). You can check out BlastWebTests.cs for some example code that does stuff. I would strongly avoid going to the underlying XMLDocument class to parse this, and use the .NET Bio namespaces instead.

-Nigel
May 14, 2013 at 8:18 PM
Great!

I didn´t know about nutget, but i think this maybe is gonna be a solution, but the problem here's is that, some methods have changed his name since MBF, but i think this maybe will be a solution about the reference to .NET Bio library.

About the parsers for xml files, i'm definitely gonna check this , i don´t know quite well the reason to avoid the XmlDocument class, I think for conventions maybe. right?

Cheers,

Leo

PS: Thanks for your answer
Coordinator
May 17, 2013 at 1:21 AM
Edited May 17, 2013 at 1:21 AM
Hi Leo,

Just to reiterate Nigel's point -- MBF changed to .NET Bio when it moved to Outercurve. All the functionality that was in MBF is still here, it just has different namespaces. So you should be able to just change all the "MBF" namespaces to "Bio" and it should all work. So, given your code, try this:

using Bio;
using Bio.IO.Fasta;
using Bio.Web;
using Bio.Web.Blast;
using Bio.IO.GenBank;

string blastFile = “one of your split result.xml”
BlastXmlParser blastParser = new BlastXmlParser();
IList<BlastResult> blastResults = blastParser.Parse(blastFile);

mark
May 17, 2013 at 1:38 AM
Hi mark

Thanks for your answer, im working now specfically to give a new a approach to blip (a blast visualization tool, developed over MBF using also pivot and silverlight ), I have encountered, that some name methods have been changed. but i'm gonna try anyway.

About the code, i have a question, now i have all the individual blast split results per hit, stored in a specific directory, so should i put each of this files in each position of the IList BlastResults object? right?

Cheers,

Leo

PS: (Mark) I sent you a email, using the contact codeplex form, please check it out, i'm very interested in talk to you.