What do *you* want .net Bio to be?

Jun 20, 2012 at 9:32 AM

.net Bio is open source. That means *you* own it. You set its strategy and you decide what features it has and how it is used.

However, to date the project has, in my humble opinion, failed to realise its full potential. This is, in part, because whilst it has an open source licence it does not operate fully as a community led project. This is not because of some controlling influence limiting engagement. It is because leading an open source project is very hard to do.

I recently blogged on this topic at http://www.outercurve.org/Blogs/EntryId/55/Guest-post-Open-Development-by-Ross-Gardler

This is where I come in. Please allow myself to introduce myself...

My name is Ross Gardler. I'm not a Bio-informatician, I didn't even study Biology at high school. I'm a computer scientist who has made his way, over many years, in the world of open source (that's all that really matters here, you can find me on LinkedIn if you want to know more about my background).

A couple of months ago the .net Bio team invited me to help out as a mentor here. .net Bio is a project I had first engaged with a couple of years ago after I keynoted at the the Open Bioinformatics Foundation conference (part of BOSC 2010). The presentation was about how and why the Apache Software Foundation works (slides at http://www.slideshare.net/bosc2010/gardler-bosc2010-communitydevelopmentattheasf)

My role, as a mentor is not technical. It is entirely advisory. I'm hoping to help this project becomes a true community led, open source project that serves *your* needs. For this to happen you need to get involved. It's OK to just lurk on this forum and to consume the software passively, but to get the most out of this project you will need to get involved. Over the coming months we will work towards understanding how and when you can get involved. 

Don't worry, we don't expect anything of you. Your involvement will be managed entirely by yourself. You can give as much or as little of your time as you need to in order to get your own work done. 

When you are ready to please take a moment to introduce yourself and provide a brief summary of why you are here and what you want to see .net Bio become. 

Ross

Jun 21, 2012 at 7:12 PM

Hi, Ross,

That's a very warm post. Thanks!

So I'm the first one to reply, just as usual.

The problem you mentioned, I think there are many factors. Having been working in a genetics centre for 12 years myself, and still surviving not changing into a penguin and use Macs, I think it's a rare event in the universe, thus I might have some say. (Talking about Macs, that's the first thing you ask your boss for when you join as a PhD, PostDoc, etc., which is sad, and funny.)

I'll just list some points in random order for further discuss:

1, University/Research is not Microsoft friendly, this is a fact, not a feeling.

2, Microsoft can do more to be friendly to University/Research. Yes I know how much less we pay for commercial software like SQL Server, but yet again, it's not free as MySQL; There should be a divide line here though: if university is buying SQL Server for their payroll, account, student management, etc., then yes, charge them however you like, if they are angry, go buy Oracle, but if I want SQL Server Enterprise Edition because I need those advanced features to analysis my research data and probably making a disease more curable, then how about charge me 0 pound? The same could go with Windows Server, cluster or HPC. Try to divide the line. Everything you charge, there will be an OSS/FSF to replace you.

3, Journals are not Microsoft friendly, thus you can't easily publish your paper if you use MS, reviewers will jump out asking you: why don't you use Linux? Many funding body even have rules on this aspect. Nobody want trouble for themselfs, especially PIs.

4, The Science game might be broken, just no one admit it, at least not from people already have positions; I said this in one of the NSF task force meetings (by chance I was there). Imaging a group of professors/directors having a meeting discuss how the current game is no longer fair or productive, kind of like you want a dictator think harder to make his people happy.

5, Who is really going to use anything like .Net Bio, programmers. Programmers in University/Research, is categorized as CONSUMABLE, i.e. it's just some budget/numbers to put in the grant proposol, when the money is gone, you should disappear; if you are not happy for any reason, just go away, there are plenty to hire from the market.

6, There is no career ladder here for programmers, FULL STOP. There is just only one who is climbing the ladder, the PI himself.

7, In someway, PostDoc is probably even less hopeful, given out of how many of them can one be a new PI? Yet they have the potiential to be one someday but if you are a programmer, who is going to prompt you as a PI, do you know the science? (Well, even you know a lot it doesn't help because your job title already fixed you.)

8, Bioinformatics as a subject area is itself a compromise between "scientists" and "programmers", if anyone bother to write a history on this. Pure Scientist have no interest to know the difference between MySQL and MS SQL, all they want is paper and money.

9, If many people still argue is Computer Science really a science, then Bioinfo is even farther, but this title make people feel better.

10, In the old times, even learning to be a carpenter will take several years; but nowadays you can find plenty of bioinformatitians only studied programming in general in total about maybe half year and only know one of these: Perl, Python, or whatever is fashion at the current time; And most of their working time is spent on text-file-in-text-file-out, i.e. the UNIX spirit.

11, A similar job title is Data Miner, don't mis this with data mining, Data Miners know a bunch of popular tools (which again are text-in-text-out), a bunch of popular web databases, their time is spent on information retrievel by human life.

12, Even Perl is mentioned here, they only use that for scripting purpose

13, They don't know the difference between programming and scripting, they think they are good programmers. Well, most PIs also think scripting is programming, and programming is scripting. They will never understand what is MVC, what is MVVM, etc.

14, If scripters are the main population in the bio world, then .NET Bio is useless for them. Of cause .NET Bio can be loaded into Powershell to be very powerful, but do you think they know powershell? Even many Unix-haters admit that Powershell's object-in-object-out is better than the unix spirit, yet who cares, scripters don't have that deep understanding of CS in the first place.

15, So I did this to bridge the gap: http://bow.codeplex.com, but if 99.99% tools still can only be run on linux, then it won't help the scirpters to move; Even in fact they want mouse click and drag-and-drop, they still need to live in the terminal.

16, To prove that, this is exactly why people buy Macs: very expensive; feel good; get away from terminal when they can. Because otherwise, they should buy open source/free as beer hardware to run Linux most of the time. I guess in a 400 staff building you might able to find 1 this kind of real geek.

17, In some way it is true that places like this http://seqanswers.com/forums/forumdisplay.php?f=18 are very welcome and warm, you can easily see how people is helping each other. Many people put great effort to help others.

18, In MS world I guess only MSDN forum have similar feeling.

19, I have the feeling that commerical companies are more interested to use .NET Bio. They might have a business model when this is BSD licensed. Individual scripters lack that motivation. Yet companies normally don't contribute back.

20, OSS is not mature in MS world, for MS programmers; There need to be a business model.

21, Then why OSS works in Linux world, because these guys work very hard to get their code/tool published on journals to establish themself.

22, But if you produce OSS tools run on Windows, then mostly you are useless to the PIs thus he better replace you with a penguin. Or maybe Microsoft should pay the salary?

I'll end here as a catch-22, hopefully there will be some reply not like other threads that I'm always the one to end a conversation.

Best,

dong

Jun 22, 2012 at 10:08 AM

Hi,

wow, that's a lot. Here's my point of view.

C# is a nice programming language, but also has some very advanced features. That's one reason why a lot of students learn Java during studies. Another reason is: the Visual Studio's Express Edition is free, yes, but compared to Eclipse it is far away from a productive development environment. You get more IDE for no money in the non .Net world. Students become researchers and they take their experiences with them.

Papers: Writing with Word is not really fun. I am a fan for Office products, believe me, but I don't like Word for long texts. It is still not the tool to write a book.

Bioinformatics: I disagree with you. Like other sciences informatics has been split into special branches, because one needs special know-how to do bioinformatics or media informatics or -- you name it. A lot of years ago informatics had been done by physicist nerds. And also some biochemists or whoever had a feeling to be IT affine did the job, but in the meantime this topic has grown up and it is not that easy anymore to meet the users' requirements. They not only want data, they want a report. Also the studies are bigger now. Giga or even tera bytes of data need to be analyzed, one needs a database. It is only natural that you need professionals for that. I think this gets more and more into view and there will be a career path (if there isn't one yet).

I think the computer scientists among the people doing bioinformatics should know how to do work apart from Perl, Python and scripting. But I think they tend to use Java (and BioJava), R (and Bioconductor), or whatever suits them. Microsoft is a little late here.

A reason to use C# would be Desktop GUI. It is not easy to get a nice Desktop system using .Net, but try that with other platforms and you struggle even more. But it costs a lot of time/is too expensive, and research projects often cannot afford that.

Regards,
Dirk

Jun 22, 2012 at 11:49 AM
Thanks Dong,

In a previous life I ran an open source advisory service to the UK
academic research sector. Many of your observations regarding academic
researchers ring true with my experience there. However, there are
success stories too. Our job is to figure out what made those projects
a success and adapt their approach to this project. Our job, as a
community, is to find the potential for .net Bio and capitalise on it.

One thing I really like about your post is your acknowledgement that
it's not only about academic research. We need to engage with
commercial organisations too. Whilst this project originally came out
of Microsoft Research this is not a Microsoft project. We need to
demonstrate that and encourage other organisations to become a part of
the long term sustainability plans for this project. Our goal should
not be to convert the world to .net Bio, it should be to empower and
support those who feel that the .net Bio environment is the optimal
solution.

It's great to have you here Dong. Your insights are really valuable.
Your http://bow.codeplex.com/ project looks interesting, although as I
said in my intro, I'm not a bio-informatician (I'm note even a .net
guy) so I can't comment specifically. It would be interesting to
explore synergies between the two projects. But, unfortunately, I'm
not equipped to do that, so I'll defer to the more bio-aware people on
this forum. If you see any specific and achievable opportunities
please post them to this forum.

Ross

On 21 June 2012 19:12, xied75 <notifications@codeplex.com> wrote:
> From: xied75
>
> Hi, Ross,
>
> That's a very warm post. Thanks!
>
> So I'm the first one to reply, just as usual.
>
> The problem you mentioned, I think there are many factors. Having been
> working in a genetics centre for 12 years myself, and still surviving not
> changing into a penguin and use Macs, I think it's a rare event in the
> universe, thus I might have some say. (Talking about Macs, that's the first
> thing you ask your boss for when you join as a PhD, PostDoc, etc., which is
> sad, and funny.)
>
> I'll just list some points in random order for further discuss:
>
> 1, University/Research is not Microsoft friendly, this is a fact, not a
> feeling.
>
> 2, Microsoft can do more to be friendly to University/Research. Yes I know
> how much less we pay for commercial software like SQL Server, but yet again,
> it's not free as MySQL; There should be a divide line here though: if
> university is buying SQL Server for their payroll, account, student
> management, etc., then yes, charge them however you like, if they are angry,
> go buy Oracle, but if I want SQL Server Enterprise Edition because I need
> those advanced features to analysis my research data and probably making a
> disease more curable, then how about charge me 0 pound? The same could go
> with Windows Server, cluster or HPC. Try to divide the line. Everything you
> charge, there will be an OSS/FSF to replace you.
>
> 3, Journals are not Microsoft friendly, thus you can't easily publish your
> paper if you use MS, reviewers will jump out asking you: why don't you use
> Linux? Many funding body even have rules on this aspect. Nobody want trouble
> for themselfs, especially PIs.
>
> 4, The Science game might be broken, just no one admit it, at least not from
> people already have positions; I said this in one of the NSF task force
> meetings (by chance I was there). Imaging a group of professors/directors
> having a meeting discuss how the current game is no longer fair or
> productive, kind of like you want a dictator think harder to make his people
> happy.
>
> 5, Who is really going to use anything like .Net Bio, programmers.
> Programmers in University/Research, is categorized as CONSUMABLE, i.e. it's
> just some budget/numbers to put in the grant proposol, when the money is
> gone, you should disappear; if you are not happy for any reason, just go
> away, there are plenty to hire from the market.
>
> 6, There is no career ladder here for programmers, FULL STOP. There is just
> only one who is climbing the ladder, the PI himself.
>
> 7, In someway, PostDoc is probably even less hopeful, given out of how many
> of them can one be a new PI? Yet they have the potiential to be one someday
> but if you are a programmer, who is going to prompt you as a PI, do you know
> the science? (Well, even you know a lot it doesn't help because your job
> title already fixed you.)
>
> 8, Bioinformatics as a subject area is itself a compromise between
> "scientists" and "programmers", if anyone bother to write a history on this.
> Pure Scientist have no interest to know the difference between MySQL and MS
> SQL, all they want is paper and money.
>
> 9, If many people still argue is Computer Science really a science, then
> Bioinfo is even farther, but this title make people feel better.
>
> 10, In the old times, even learning to be a carpenter will take several
> years; but nowadays you can find plenty of bioinformatitians only studied
> programming in general in total about maybe half year and only know one of
> these: Perl, Python, or whatever is fashion at the current time; And most of
> their working time is spent on text-file-in-text-file-out, i.e. the UNIX
> spirit.
>
> 11, A similar job title is Data Miner, don't mis this with data mining, Data
> Miners know a bunch of popular tools (which again are text-in-text-out), a
> bunch of popular web databases, their time is spent on information retrievel
> by human life.
>
> 12, Even Perl is mentioned here, they only use that for scripting purpose
>
> 13, They don't know the difference between programming and scripting, they
> think they are good programmers. Well, most PIs also think scripting is
> programming, and programming is scripting. They will never understand what
> is MVC, what is MVVM, etc.
>
> 14, If scripters are the main population in the bio world, then .NET Bio is
> useless for them. Of cause .NET Bio can be loaded into Powershell to be very
> powerful, but do you think they know powershell? Even many Unix-haters admit
> that Powershell's object-in-object-out is better than the unix spirit, yet
> who cares, scripters don't have that deep understanding of CS in the first
> place.
>
> 15, So I did this to bridge the gap: http://bow.codeplex.com, but if 99.99%
> tools still can only be run on linux, then it won't help the scirpters to
> move; Even in fact they want mouse click and drag-and-drop, they still need
> to live in the terminal.
>
> 16, To prove that, this is exactly why people buy Macs: very expensive; feel
> good; get away from terminal when they can. Because otherwise, they should
> buy open source/free as beer hardware to run Linux most of the time. I guess
> in a 400 staff building you might able to find 1 this kind of real geek.
>
> 17, In some way it is true that places like this
> http://seqanswers.com/forums/forumdisplay.php?f=18 are very welcome and
> warm, you can easily see how people is helping each other. Many people put
> great effort to help others.
>
> 18, In MS world I guess only MSDN forum have similar feeling.
>
> 19, I have the feeling that commerical companies are more interested to use
> .NET Bio. They might have a business model when this is BSD licensed.
> Individual scripters lack that motivation. Yet companies normally don't
> contribute back.
>
> 20, OSS is not mature in MS world, for MS programmers; There need to be a
> business model.
>
> 21, Then why OSS works in Linux world, because these guys work very hard to
> get their code/tool published on journals to establish themself.
>
> 22, But if you produce OSS tools run on Windows, then mostly you are useless
> to the PIs thus he better replace you with a penguin. Or maybe Microsoft
> should pay the salary?
>
> I'll end here as a catch-22, hopefully there will be some reply not like
> other threads that I'm always the one to end a conversation.
>
> Best,
>
> dong
>
> Read the full discussion online.
>
> To add a post to this discussion, reply to this email
> ([email removed])
>
> To start a new discussion for this project, email
> [email removed]
>
> You are receiving this email because you subscribed to this discussion on
> CodePlex. You can unsubscribe or change your settings on codePlex.com.
>
> Please note: Images and attachments will be removed from emails. Any posts
> to this discussion will also be available online at codeplex.com
Jun 22, 2012 at 11:53 AM
On 22 June 2012 10:08, dstarke <notifications@codeplex.com> wrote:

...

> I think the computer scientists among the people doing bioinformatics should
> know how to do work apart from Perl, Python and scripting. But I think they
> tend to use Java (and BioJava), R (and Bioconductor), or whatever suits
> them. Microsoft is a little late here.
>
> A reason to use C# would be Desktop GUI. It is not easy to get a nice
> Desktop system using .Net, but try that with other platforms and you
> struggle even more.

Is this an observation that others would support? Are there other
advantages that .net Bio has over some of the other platforms?

Ross
Jun 28, 2012 at 12:19 PM

http://lemire.me/blog/archives/2011/06/06/why-i-still-program/

Jun 28, 2012 at 1:04 PM
On 28 June 2012 12:20, xied75 <notifications@codeplex.com> wrote:
> From: xied75
>
> http://lemire.me/blog/archives/2011/06/06/why-i-still-program/

This is a good read for both the researchers and commercial
organisations here. One interesting quote I find relevant is "For
example, recently a team from Facebook integrated one of my compressed
bitmap index library in Apache Hive: the Hadoop-based framework for
data warehousing. I am willing to bet good money that nobody at
Facebook read the original paper for which I wrote this software"

This is exactly what I believe we should be striving for here in the
.net Bio world. A collaboration which ensures researchers get an
increased level of visibility and additional (and practical)
applications of their work (along with the validation and feedback
this brings). In return commercial organisations get to work with the
best researchers whilst enhancing their core software deliverables.

There are many good examples, another one which researchers might find
useful is Apache Wookie (a part of the outputs of an EC funded
project) which in turn led to the creation of Apache Rave - the
unification of three pre-existing solutions to a single problem
involving Mitre Corp (USA), SurfNet (Netherlands) and the Open Gateway
Computing Environemnt (USA, NSF funded). This team was joined by some
commercial interests and is now a vibrant and healthy community. See
http://osswatch.jiscinvolve.org/wp/2011/02/22/rave-proposal-brings-together-us-and-european-partners/

Does anyone have any insights as to how the "horizontal collaboration"
discussed in the article originally linked could work for them here in
.net Bio?

Ross
Coordinator
Jun 28, 2012 at 7:41 PM

Hi Ross,

First, a direct response to your mail:

To introduce myself- My name is Simon Mercer, and I have worked as a bioinformatician at various places including the Max-Planck Institute, Wellcome Trust Sanger Institute and National Research Council of Canada. Over time, I came to realize that the field would be greatly accelerated if we could more effectively bridge the gap between computer science and experimental biology, and much of what I do is an attempt to bring these two sides closer together. I now work for Microsoft, and .NET Bio (formerly known as the Microsoft Biology Foundation - MBF) is one of the projects I was responsible for starting.

What I want to see .NET Bio become - when it was first created, the goal of the project was to provide well-written code to educate beginners, ready-made functionality for the use of developers, and a community for anyone interested in bioinformatics on Microsoft platforms. The core of .NET Bio is to provide basic functions that the majority of bioinformatics developers would need (so it provides the most value to the greatest number of possible users), but as an open source community-led project, new features will be determined by the needs of the users and contributors the project attracts.


I also have some coments on the discussion so far:

Paid versus free software: this is one of those 'religious' debates that will never end. What I can say is that .NET Bio is completely free, and if you want to use Eclipse as the IDE plus the range of OSS alternatives elsewhere you are free to do so, and will have a free software stack on the Microsoft platform. Nonetheless, .NET Bio is not for everyone - but if it does solve a problem you have, or fit well with your existing infrastructure (a common scenario), so much the better. This project is unlikely to take over the world - but so long as it provides value to those involved, that's just fine.

Barriers to publication; this is not generally the case; of course any paper is at the mercy of reviewers and editors - and some have strong biases, I have found that reviewer objections about the choice of platform are usually simple to resolve since this is an Apache 2.0 licensed project with plenty of interoperability and cross-platform features.

.NET Bio is not useless for scripting, it works just fine with any scripting approach that can access .NET - for example Python. Examples of usage in this way are in the SDK. I use it in Sho and python. Certainly integration could be improved - or even just better-documented.

C# is a pretty good choice as a full-strength language, and in our training courses we have found that Java programmers can migrate to it in a few hours. Like any other language or package, it is not always the best choice, for example some very large memory-management issues are best handled with unmanaged code such as C or C++ (.NET Bio can still be used with these). I would say that the main advantage of .NET Bio is its ability to be used with other languages, and I have seen Visual Basic, F# and C# used.

I very much agree with the idea that bioinformatics is changing; the leading edge of research should always be moving forward, causing some areas of work to enter into the domain of standard practice while new areas are opened up. The informatics needs of research and standard practice are different, and while ad-hoc scripting and 'postdoc-quality' software might be the best way to produce agile prototypes for research support, code supporting standard practices should be well-architected, modular, scalable and robust. These requirements are better served by the standard architect/developer/test model of professional software development and by full-strength languages. What I have observed is that the research community is not always aware of this dynamic, and many postdocs are still supporting common lab processes with idiosyncratic, custom and undocumented code. There are many sociological reasons for this of course - coding is fun and postdocs are cheap - but as a domain of research, bioinformatics would advance more rapidly if we reinvented less wheels and used more common solutions.

Simon

 

 

Jun 29, 2012 at 11:17 AM

Hello Simon,

you are right. .Net Bio is free. And I also agree: the debate about which world is more free is somewhat religious. But according to the developer documentation one does need a Professional Edition (!) of Visual Studio to work on .Net Bio. I don't know whether it works with Mono. Does it? If not, you need Windows as well. If a developer lives in the *nix world (according to Dong a lot of PIs do), this is far from attractive.

Yes, for Java developers it should be quite easy to switch to C#. (I have been a professional Java developer for eight years and switched to C# three years ago.) For beginners, it is not the best language to start with. (Operator overloading, extension methods, ... nice for the advanced, not for beginners.) I have the feeling a lot of universities in Germany focus on Java. How about the rest of the world?

Seldom I have seen attractive Dektop GUIs written in Java. I love WPF. But I have the feeling more than half of the world is web today. And Java is very strong on the server side. This all might draw developer community from the Microsoft world.

Regards
Dirk

Coordinator
Jul 1, 2012 at 2:49 PM

You can get the professional edition of MS Visual Studio through the Dreamspark program - https://www.dreamspark.com/Product/Product.aspx?productid=4 and I believe the only requirements in .NET Bio for the professional edition are some of the testing suites used. Let me know if you find that not to be the case or have trouble with getting Visual Studio. It might be good to add that reference to the Dreamspark program to our FAQ. I thought it was more prominent but it appears under the training TAB of the double menu system which I know is not ideal. 

As to working with Mono - that is covered in the FAQ which I believe is still accurate.

Is .NET Bio compatible with Mono?
No current member of the community has tested it specifically under Mono, but others have see their work here or review the items below

  • In two places there is a call to CredentialCache.DefaultCredentials. This is only used for NTLM, negotiate, and Kerberos-based authentication, so it can be ignored when those aren’t being used. But if you need them, there isn’t really any good workaround.
  • There are four calls to Assembly.GetName(Boolean), which has the rather uninteresting ability to change the Assembly.CodeBase when an assembly is shadow-copied.
  • Another security related feature being used is HttpTransportSecurity.ClientCredentialType from Windows Communication Foundation. Since this is just used for calling web services via WCF, an alternate web service layer could be used until Mono catches up.
  • In one of the add-in packages there are a couple of calls to the Win32 function GetTickCount. This merely returns the number of seconds since the system was last started, so it is rather odd that Mono doesn’t already have a translation layer for the Linux and OS X equivalents.

 

Rick

 

 

 

 

 

 

 

Coordinator
Jul 2, 2012 at 5:03 PM

Hi Dirk,

Oops, I had forgotten the requirement for the Professional edition of Visual Studio for .NET Bio development - and while it is true that this is needed for extension of .NET Bio, you can use any version of Visual Studio to build apps using .NET Bio - you can also use Eclipse if you use MONO (it has an Eclipse plugin) or you can use other Eclipse plug-ins like Emonic if you want to use .NET and C# but not Visual Studio. Also and as Rick mentioned above, it is also possible for many users to obtain much of the Microsoft technology stack for free or close-to-free.

Nonetheless, while these are possibilities I should be clear here - life will be simpler if you are at least prepared to pay for some software components - and while we tried to make sure there is a solution that is as free as possible, our real focus was on remaining as open as possible, to support as wide a range of usage as we could - whether in terms of languages (.NET supports over 70), usage modalities (native app, cross-platform apps, plug-ins, script integration, workflow integration, commandline, web, SharePoint integration, etc.) or platforms (Silverlight VM version, or MONO provide LINUX and OSX support, etc).

Anyway; I am aware that my responses on this forum make me look very Microsoft-partisan, but the reality is, I believe you should use whatever technology gets the job done for you, regardless of platform. What we are showing with .NET Bio however, is that Windows (and associated technologies and applications) has a combination of features that can make it attractive to the life sciences and is a viable platform for bioinformatics software development. At the very least, it deserves to be considered alongside other options as part of any solution - and in fact it frequently is, although more often in the commercial than the academic world.

Simon

Jul 3, 2012 at 10:28 PM
Whilst some of this conversation isn't really answering the question
of "what do you want from .Net Bio" I do think some of the information
is useful.

Rick mentions a way for academic staff to get Visual Studio but also
notes it is not well documented. Simon notes that it's possible to use
Eclipse and MONO or Eclipse and "plugins like Emonic" for those who
want to use .NET

It would be good to see these options at least mentioned, even if not
fully documented on the website. I'm a great believer in incomplete
docs being better than no docs, at least people can come to the forum
and ask for guidance, responses are then archived for future people
with the same question. Furthermore, improving incomplete
documentation is an easy way for people to make their first valuable
contribution to the project. Such contributions might seem trivial
but, in fact, they are very valuable as they help attract other new
community members.

There are, however, some observations that I think are useful in
helping me (as the dumb newcomer) to understand .NET Bio's role. For
example:

Simon defines that a goal of .NET Bio is to be as open as possible.
This, I imagine, will be very important to a set of people in the
bio-informatics space (probably a sub-set right now, but if it truly
is as open as possible who knows). It would be good to see this
inclusivity goal clearly articulated on the project homepage. It kind
of says it at present but I've never grasped this "open as possible"
objective until I read Simons email. I wonder if others are the same?

Dirk mentions that one of the advantages of .Net Bio over some other
toolkits is the ease at which good GUI's can be built. Given the
desire of .NET bio to be as open as possible in terms of the
technologies that can be leveraged within it I wonder if this is not
something that is unique to .NET Bio? If it is then is this something
that should also be highlighted on the project homepage?

Dong observes that there is no obvious business model around open
source development on a Windows platform. I don't think this is true
in a general sense but it may well be true of .NET Bio. As evidence of
the existence of open source business models around the MS ecosystem
take a look at one of the other projects I mentor - Apache OpenOffice
(coming up for 7 Million downloads of it's first Apache release just a
couple of months ago, the vast majority, >90%, being on Windows).
However, it is not the role of this project or this community to
define the business models participants can engage in. Our job is to
make sure that the community members are able to build viable business
models around the software. That is we need to be as open as possible,
not just in terms of technology but also in terms of the way the
project is managed.

Similarly the problems faced by academics cannot, and should not, be
resolved by this project. Academics, like businesses, need to be
empowered to address their own needs in efficient and productive ways.
Open source is becoming a very important part of research careers, for
example, a team I worked with at the University of Bolton has secured
an additional 700,000 EUR research funding (probably more now) as a
result of their open source work (see
http://www.oss-watch.ac.uk/resources/cs-wookie.xml). Another case in
Nottingham points to £1.4M funding resulting from their open source
work (see http://www.oss-watch.ac.uk/resources/cs-texgen.xml).

Being maximally open in both technology and governance allows
businesses to work alongside their competitors. For researchers to
collaborate more efficiently. For business to innovate on the latest
research tools and for students to learn more quickly. For example,
Rick points out that whilst this community has not tested under Mono,
other people have. The goal of an open project is to serve the needs
of the participants in that project. So if someone is lurking on this
forum wondering about experimenting with .NET Bio on Mono speak up -
tell us your needs, ask for some help and guidance (there may be
others to collaborate with).

Similarly, if you are like Dirk and see one of the major advantages
being the ease of GUI design speak up. What can would you like to see
happening in .NET Bio to make your GUI life easier.

My role here is to help the project team build an environment in which
all this can happen. Building such an environment is not easy. It
takes time. Often people spend a long time working in a vacuum
wondering why they are putting all this effort into building a
collaborative culture when there is no-one to collaborate with. But
then, if we are successful, one day a contribution from a new
community member lands on our forum or issue tracker. It might just be
a clarification of some documentation (this thread has unearthed some
useful Mono info for example), it might be a feature request (no
guarantee it will be delivered, but at least it will help people
prioritise), it might be an offer to implement that feature (maybe a
little pointer in the right direction is needed) etc.

This thread has already been very educational for me. We've not
produced any code yet, but we have identified a number of ways that
new community members can start on the road to new code. Over the next
couple of weeks I'm travelling a great deal. I hope the idea of
openness in both technology and governance continues to grow here
while I'm desperately fighting broken sleep patterns in some random
time-zone - it'll give me something to read.

Ross

On 2 July 2012 17:03, sjmercer <notifications@codeplex.com> wrote:
> From: sjmercer
>
> Hi Dirk,
>
> Oops, I had forgotten the requirement for the Professional edition of Visual
> Studio for .NET Bio development - and while it is true that this is needed
> for extension of .NET Bio, you can use any version of Visual Studio to build
> apps using .NET Bio - you can also use Eclipse if you use MONO (it has an
> Eclipse plugin) or you can use other Eclipse plug-ins like Emonic if you
> want to use .NET and C# but not Visual Studio. Also and as Rick mentioned
> above, it is also possible for many users to obtain much of the Microsoft
> technology stack for free or close-to-free.
>
> Nonetheless, while these are possibilities I should be clear here - life
> will be simpler if you are at least prepared to pay for some software
> components - and while we tried to make sure there is a solution that is as
> free as possible, our real focus was on remaining as open as possible, to
> support as wide a range of usage as we could - whether in terms of languages
> (.NET supports over 70), usage modalities (native app, cross-platform apps,
> plug-ins, script integration, workflow integration, commandline, web,
> SharePoint integration, etc.) or platforms (Silverlight VM version, or MONO
> provide LINUX and OSX support, etc).
>
> Anyway; I am aware that my responses on this forum make me look very
> Microsoft-partisan, but the reality is, I believe you should use whatever
> technology gets the job done for you, regardless of platform. What we are
> showing with .NET Bio however, is that Windows (and associated technologies
> and applications) has a combination of features that can make it attractive
> to the life sciences and is a viable platform for bioinformatics software
> development. At the very least, it deserves to be considered alongside other
> options as part of any solution - and in fact it frequently is, although
> more often in the commercial than the academic world.
>
> Simon
>
> Read the full discussion online.
>
> To add a post to this discussion, reply to this email
> ([email removed])
>
> To start a new discussion for this project, email
> [email removed]
>
> You are receiving this email because you subscribed to this discussion on
> CodePlex. You can unsubscribe or change your settings on codePlex.com.
>
> Please note: Images and attachments will be removed from emails. Any posts
> to this discussion will also be available online at codeplex.com
Jul 7, 2012 at 6:04 AM

What do I want .NET BIO to be Ok let's start here:

  1. I do large data analysis so can .NET BIO be deployed to an Azure cluster?
    1. If it can do we use "Cloud Numerics" from Microsoft Research?
  2. If it can let me do the analysis locally and have the Cluster just run the "Heavy lifting jobs"
  3. Allow access to as many public bio data sites as possible
    1. Ease of access via config or parameters is a great option
    2. Allow access to S3
      1. Sorry the 1000 genome data resides there
  4. Tell me what I need to do to join?
    1. I've been doing bioinformatics for over 20 years they just did not call it that back then :D
    2. I've been programming C# Professionally for over a decade
      1. Mostly for bioinformatics companies
    3. Before that I programmed Java
    4. And before that I was a Mol Bio Lab rat for almost 15 years
  5. So if it can not do #1 then can I extend it to do #2
    1. It might take me a little while I do have a 9 to 5 job that takes up more than 9 to 5
  6. Hmm need to think more about this.
Jul 7, 2012 at 11:15 AM

Hi, Blair,

It's really nice to have you here. If we have a bounch of people like you on this site, we are ready to turn the tide. :)

.NET Bio is a framework as .NET is a framework, reduced to the extreme, it's just a single DLL file. So you can 'deploy' this to Azure nodes/VMs with rest of your code, as a package; But if you demand this DLL able to fan out the work to many nodes automatically, 1, I'm not sure if this is a solved problem in computer science; 2, I'm not sure if this is the correct level of abstraction, i.e. you put all the Azure bits/plumbing inside this DLL.

If you are talking about Cluster as in Windows HPC or Azure HPC SDK, then it's the batch mode/job management stuff, .NET Bio can be called within jobs/batches. It might have some code calling MPI or using SOA (not sure about), but that's not in general a design goal I think, just like .NET is not designed 'for' HPC.

I'm in the middle of making a Windows Azure pipeline to handle TBs sequence data, will be good to learn more from you. :)

As for 1000G project on S3, it has FTP running thus should be easy to access, e.g. you can use samtools with a ftp url to directly view/process BAM files. It will be nice if Azure could hold these popular datasets in their datacentres thus we can access them 1, very fast, 2, not incur cost to our Azure account, 3, avoid us saving duplicated files thus also reducing our cost. Can anyone suggest/rise this issue to either Azure team or MSR?

May I ask you two questions here: 1, you have been using C# for a decade, that means you started from C# 1.0 in 2002. What made you decide to take that new language? Is that because you have always been running Windows, instead of Unix/Linux, in the lab? 2, why are companies interested to use C# other than Perl, at that time or all the time? (I'm curious on this because I only worked in university, only know half the story.)

Best,

dong

Jul 7, 2012 at 4:50 PM

Hi dong,

Yes I started writing C# code in 2002, I was working for a bioinformatics company and the product we had hit a scalability problem in it's original Java version. So we did an analysis and decided to use C#. It took us less than 6 months to port a large Java system to .NET. That change also caused the product to take off (it did ADME predictions from virtual compounds).

I have never been a Perl hack and I still try to avoid it at all costs. For scripting I prefer python. As to companies preferring C# or any other compiled language it tends to come down to being able to obfustucate the code so that the companies IP is not easily lost. Scripting languages do not offer this kind of protection.

As to using UNIX/Linux vs Windows I move back and forth all the time I currently just setup a CloudBioLinux VM in the Azure cloud. The reason for this is that I wanted to see how hard it would be to do and the base instance (I have a MSDN Subscription)  is 10 times larger than the AWS base instance. So I can actually load a full Ubuntu image without running out of hard drive space, and not have to pay for what is a play zone for me to learn in. Also I write code in .NET and Mono mostly for performance reasons I have found that Java just is not up to my standards for heavy math work.

Blair

Coordinator
Jul 9, 2012 at 7:07 PM
blair0011 wrote:

What do I want .NET BIO to be Ok let's start here:

  1. I do large data analysis so can .NET BIO be deployed to an Azure cluster?
    1. If it can do we use "Cloud Numerics" from Microsoft Research?
  2. If it can let me do the analysis locally and have the Cluster just run the "Heavy lifting jobs"
  3. Allow access to as many public bio data sites as possible
    1. Ease of access via config or parameters is a great option
    2. Allow access to S3
      1. Sorry the 1000 genome data resides there
  4. Tell me what I need to do to join?
    1. I've been doing bioinformatics for over 20 years they just did not call it that back then :D
    2. I've been programming C# Professionally for over a decade
      1. Mostly for bioinformatics companies
    3. Before that I programmed Java
    4. And before that I was a Mol Bio Lab rat for almost 15 years
  5. So if it can not do #1 then can I extend it to do #2
    1. It might take me a little while I do have a 9 to 5 job that takes up more than 9 to 5
  6. Hmm need to think more about this.


Okay - let me try to help with some of these:

1. As Dong says, .NET Bio is an extension to the .NET framework and exists as a .dll, so you can use it on Azure without modification. At the same time, I wouldn't say that .NET Bio is 'Azure aware', in that it does not take advantage of the capabilities of a cloud. If you wanted to use .NET Bio to create an Azure service, you could use it for its implamentations of common parsers, algorithms, etc - but would have to write code to intelligently scale your workload across Azure nodes, schedule jobs, perhaps slice uplarge datasets and distribute them - or whatever you have in mind; the Azure SDK is available and can help you there.

Does this work with 'Cloud Numerics': I haven't tried it but I expect so - the requirements are compatible and it is also a .NET assembly. Of course the devil will be in the details, and you will have to do all the plumbing to make your app work - but it should.

If you have Big Data needs, you may also be interested in Hadoop on Azure technical preview - more details are available at https://www.hadooponazure.com/.

2. Local client with large jobs handled remotely. Yes, you can do this - see the DistributeApp.exe sample application available in the 1.01 distribution for how this can be done on Windows HPC clusters. You could use this code as a starting point for any app you wanted to distribute on HPC, or take a similar approach to distribution on Azure (or use the webservice connectors to call into an Azure service).

3. .NET Bio has the capacity to link to any number of web services - and ships with pre-written functionality for BLAST at NCBI, EBI and on Azure and (I think) ClustalW - if further connectivity is desired, you can use this code as a starting point. Over time, we hope that the community contributes more of these connectors and expands what is available 'out of the box'.

As far as S3 connectivity goes - so long as S3 data can be accessed through web services, you won't have a problem, except perhaps for efficiency.

4. Joining? You already did! :-) there is no formal 'membership' process for participation in this open source project, just post on the forums, ask questions and play around with the code. If you would like to add functionality, fix bugs or otherwise make contributions the bar is a little higher and one of the current contributors would like to vet your first contributions to make sure they meet quality standards - it is also good practice to discuss extensions on this forum, to get input and build consensus around your planned approach. When you are ready to make a contribution, take a look at the contribution guide on the 'documentation' tab at the top of this page, which covers coding practices and conventions used in the project. There are plenty of people on this forum who can help out as well - just ask!

Simon

 

Coordinator
Jul 9, 2012 at 7:38 PM

As Ross asked 'what do you want .NET Bio to be?' - I'll add my view:

.NET Bio is intended to provide a base set of functionality to simplify the creation of life science apps on the Microsoft platform; I'd like to see this basic set of functionality extended to include:

  1. A broader range of file parsers and formatters, providing access to further data types and instrumentation. In many cases however, it maybe makes more sense to support generic file formats such as BAM/SAM and FASTQ, since these should be supported by new instrumentation and applications anyway. In my view, it is important that these new functions are written by community members who will use them - if they originate elsewhere, the authors have little incentive to maintain the code, fix bugs and answer user questions.
  2. Additional web service connectors; in common with support for new file formats, adding access to further web services extends the generic aspects of .NET Bio, making it valuable to a wider range of users. At the same time, new file formats and web services are likely to store data types not currently represented in the .NET Bio object model - and this should also be extended. Again, the best contributions would be from those needing the functionality and therefore with an incentive to keep it current and working.
  3. Additional algorithms: we have a reasonable set of assembly algorithms, and while SNAP (snap.cs.berkeley.edu) is not a part of .NET Bio, it fills the need for a BWA-like aligner - but more would certainly be better, and again attract a wider community.
  4. More scalability: using .NET as the framework restricts storage classes to 2bn objects, and this turns out to limit some uses such as whole-genome de novo assembly. A workaround with a storage class implemented as unmanaged code would be a good way to go, but would take skill and time.
  5. Better leverage of Azure: the cloud may not be the best solution for all bioinformatics issues, but the technology is in early stages and already offers interesting scalalbility, parallelism and data -sharing capabilities. Better cloud support would be an obvious way to increase the .NET Bio value-add.

Simon

 

Coordinator
Jul 10, 2012 at 3:59 PM

Quick update - following Ross' comments and the discussion so far, I have extended the FAQ and updated the homepage to clarify the overall project goals, proviode more information on Visual Studio and alternatives, etc.

S.

 

Coordinator
Jul 12, 2012 at 6:31 AM

I've found this to be a fascinating thread, both for the strategic ideas about what a community should be and for the nuts and bolts specifics of why .NET Bio might struggle, survive at some subsistence level in spite of all the odds, or actually flourish in a niche that suits its strengths and limits the exposure to its weaknesses. There is far too much in the posts from Dong, Blair and Dirk to address line-by-line without annoying people. But there are a number of points I'd love to see explored far more fully. However, I will try to keep my posts bite-size. 

The role of MS in engaging with universities is a very broad topic, and since I work for a university rather than microsoft, or for that matter, their sworn enemies, I'll leave that largely to Simon et. al. Suffice to say that I have found them easier to deal with than many. That said, neither MS nor their direct competitors are going to stand or fall on the basis of university sales and university engagement. They provide an ecosystem and we try to take best advantage of it. 

Dirk notes correctly that MS was late coming to the bioinformatics party, in the sense that there was no BioC# equivalent to BioJava or BioPerl or whatever. But of course plenty of people used MS tools in a pretty ordinary way, excel being the prime example here.  To me the opportunity for .NET Bio comes not from doing exactly the same things as others, but from the particular mix offered - serendipitously for us - from the .NET environment. The comment that people should not just be scripting is I think right as data grows, or they should be scripting using tools that scale. Some of the work that Don Syme and his people have been doing around Type Providers [Blog post here: http://blogs.msdn.com/b/fsharpteam/archive/2011/09/24/developing-f-type-providers-with-the-f-3-0-developer-preview-an-introductory-guide-and-samples.aspx ; more detail here: http://msdn.microsoft.com/en-us/library/hh156509(v=vs.110).aspx ] is very exciting in this space. Type providers and information rich programming + F# terseness +.NET Bio libraries to do specialised work seems a pretty promising combination to me. 

But horses for courses. and I must agree that MS word is better suited to the sprints than the distance events. 

cheers

jh