Android, JAVA, Bio?

May 17, 2012 at 6:25 PM

Dear .Neters,

I just read this blog post: http://blog.xamarin.com/2012/05/01/android-in-c-sharp/

And I came upon this idea: since they could machine translate million lines of JAVA into C#, if we do the same on GATK code, then we'll have a great tool set in no time.

(Of cause by no means I mean using GATK to replace .NET Bio; Having two guns is better than just one, right? :) And we could grow the almost invisible Windows-running-Bioinformatician-community faster.

Comments please,

Best,

dong

Coordinator
May 17, 2012 at 7:28 PM

That would be great - more is always better regarding tools.

Let me know how you get on with the port - I'd be happy to test it, and we would also be happy to host it here on CodePlex!

May 18, 2012 at 10:25 AM

Thanks Simon, that's a great trust from you. :)

I only recently done the porting of BWA and SAMTOOLS onto Win64, all in C language. If I take the above task, my brain need to switch again from C into JAVA and C#, might hurt.

A serious trouble do need your help though: in my case, bwa and samtools are released under GPLv3, yet the codeplex refuse to support this license type, thus I was forced to move the source code onto GitHub. This is not good for the long run, both Codeplex and Outercurve.

GATK's licensing https://github.com/broadgsa/gatk/tree/master/licensing

We need to make sure the licensing stuff, say, translating JAVA code into C#, and release at Codeplex, etc.

p.s. My port of bwa and samtools http://bow.codeplex.com, do you mind link this from .NET Bio?

Best,

dong

Coordinator
May 18, 2012 at 4:15 PM

No problem - your BOW page is now listed under 'Related Projects' on the .NET Bio homepage - good work.

I see your project includes a port of BWA, which is certainly a useful addition, and something that has also been requested from the .NET Bio project; users looking for fast alignment on Windows might also like to look at SNAP (http://snap.cs.berkeley.edu/) - which requires a lot of memory but is a LOT faster than BWA.

On the topic of Outercurve and GPL licensing; I can't speak for the Foundation, but my understanding is that they support the type of open source licensing that allows you to take free code, use it to build a tool of your own, and then choose whatever license works for you. GPL forces you to release any work under GPL, which is arguably not very 'open' open source; licenses like Apache 2.0 and BSD more truly fit what I think of as open. Just my $0.02...

May 22, 2012 at 3:14 PM

First time heard of this aligner, not only me, on THE sequecing forum of the universe, nobody ever discussed this one. So I asked, and got some reply, (most importantly from BWA's author), read here: http://seqanswers.com/forums/showthread.php?t=20172

Regarding licensing stuff, I only recently learnt a bit about it, I kind of understand why GPL demands GPL; Then why codeplex allow GPLv2 but not v3? I thought v2 is also very GPL?

Best,

dong

Jul 7, 2012 at 5:49 AM

Hi Guys,

I got the Mono Sharpen code converter to work and ran it over the Core part of BioJava and I am still fixing porting issues. It seems that BioJava uses a lot of "Generics" but Java's implementation of generics leave way too much to be desired. Also it seems that Java does some interesting things with generic type conversions. This has made it very hard to port the code to C#. I just thought you might like to know this.

 

Blair

Coordinator
Jul 9, 2012 at 6:19 PM

Blair,

very cool - thanks for letting us know; it would be very interesting to me (and probably others) if you could post details of your experiences here as your work progresses.

Good luck,

Simon

 

Coordinator
Jul 11, 2012 at 2:44 PM

Very interesting thread. I'd be especially keen to hear of the ease or otherwise of having .NET Bio talking to the ported BioJava - as i recall the BJ inheritance hierachy was significant and might present obstacles to these playing nicely. *very* happy to be proven wrong. 

 

jh 

Developer
Jan 15, 2013 at 5:54 PM

I thought I would chime in here a bit as I have been working some with the GATK lately.  It sadly reminded how backwards Java seems compared to C#, and so in the hopes of making for better code I  tried to port it over to C# using the Sharpen tool.  In general, the Google-y Map/Reduce framework and the ability to distribute jobs in the GATK is quite nice, but the lack of easy coding and visualization tools that you miss from not having .NET is terribly annoying. 

In terms of conversions, here are some issues I found that would make a direct port to BIO .NET difficult.

1 - The Sharpen Java to C# conversion tool runs in to some problems that makes direct translation of the code difficult.  The GATK makes extensive, though somewhat uncommon, use of java enum classes (quite distinct from C# enums) in ways that don't automatically translate well.  Additionally, the GATK accesses the java doc strings to make programmatic decisions and these would have to be translated to .NET attributes. The GATK java code itself is also not cross platform compatible.  Although all of this can be fixed, somewhat challenging.

2 - A more central design issue is that the GATK makes extensive use of the picard package (http://picard.sourceforge.net/) to interface with the data files.  Now, the picard package does translate pretty readily to C#, but much of it is redundant with current BIO .NET implementations.  For implementing GATK like functionality, it would be simpler to just translate all of picard, but I think it would clutter the .NET bio interface too much to have those redundancies and so more converting will be needed.

In any event, I will be playing with the GATK SNP calling tools over the next month, and if it seems reasonable to pipe them to .NET bio in some form will give it a go.  I worry I may just have to regress to Java, but if not will try and make it into a functioning Variant Calling Package for .NET.  Happy to talk with anyone else who might be interested in helping.

Cheers,

Nigel 

Jan 16, 2013 at 1:32 PM

Nice to see you back Nigel,

When you say Variant Calling Package for .NET, what strategy (i.e. algorithm wise) you have in mind? Samtools or GATK or something new?

Best,

dong

Developer
Jan 16, 2013 at 3:39 PM

Hey Dong!  Going to spin this off as a new post as the topic is digressing some, but hopefully you will see it there.