This project is read-only.

BioWF - .NET Bio workflow engine

Jul 27, 2013 at 1:31 AM
I've always thought Trident was cool - but not really usable in the real world. The WF change from 3.5 to 4.0 pretty much killed it. One of my pet projects for a little while has been to replicate that experience using .NET 4.5 in a maintainable fashion. Here's my result: http://julmar.com/blog/mark/?p=323

It's very early right now, but executes and is capable of simple tasks. I need to work on the designers more - not all the activities have them, and it needs some more "wizard" style connectivity and perhaps some sample workflows to process. It uses flat XML files (vs. SQL Server like Trident) and I ported most of the Bio.Workflow activities over to it. It uses WF 4.5 so it's a lot faster and has a better design experience.

If anyone is interested in playing with it or potentially adding to it, let me know. Feel free to download the source and build it just to take it for a spin. You can get it from GitHub at http://markjulmar.github.io/BioWF/
Jul 30, 2013 at 3:15 AM
I just updated the project to include a simple form of intellisense - it's not as robust as Visual Studio but it will help in finding types and variables. I have a list of additional enhancements I want to add including a bunch of new activities. I'd love some other ideas too if anyone has some.

mark
Aug 2, 2013 at 8:13 PM
Interesting!

For those unfamiliar with Trident, it is another open source project built on top of Workflow Foundation, and available here:
https://tridentworkflow.codeplex.com/

(the project is averaging 24 downloads a day, so I believe rumors of its death are exaggerated)

Personally, I really see the argument for scientific workflow engines. Given the appropriate set of primitives including flow control and standard bioinformatics components, workflow provides the ability to rapidly compose, prototype and deploy standard processing or analysis pipelines. This would seem to offer a more intelligent solution to supporting the evolving research environment than bespoke programming which rapidly becomes obsolete, or ad hoc scripting which can become unmaintainable and by default lacks built-in good practices such as recording provenance.

An interesting question then is why scientific workflow engines haven't taken off more widely. Taverna and Kepler seem to be the most prominent in bioinformatics, but even these appear to have a restricted user base; Knime appears to be popular currently and may be more broadly-used, perhaps due to its focus on data cleaning.

So, why are workflow engines rarely used? Are there ways in which BioWF can increase their popularity?

Mark - in direct answer to your question, some features you might like to consider include:
  • Adding a facility to invoke an external application and capture output, permitting the encapsulation of external apps as workflow components
  • Permitting execution of different steps in different environments - allowing the inclusion of Linux and Windows stages in the same workflow perhaps, or combining Cloud, HPC and desktop execution
  • The ability to define sub-workflows as their own workflow components, so they can be reused
  • A mechanism to support provenance - by recording execution timestamp, duration, input files, application versions, etc. Provenance should be recorded in a standard format
  • The ability to save a defined workflow and then execute it as a standalone service outside any graphical designer
Aug 8, 2013 at 4:49 PM
Edited Aug 8, 2013 at 4:49 PM
I've created a new project which is linked to .NET Bio at biowf.codeplex.com so I can have some discussion topics there. I've also updated the code with a few new features and bug fixes --

1) The ability to execute an external program (with parameters) and capture the results to feed into future steps.
2) Some updates to intellisense to make it a little smarter
3) Fixes to ForEach<T> activity to properly persist.
4) Ability to read input from both GUI and console command line.

I have some ideas on making the editing experience easier - i.e. auto-creating variables to map inputs to outputs so you don't need to type them in or know the .NET types you are working with like ISequence.

Simon - thanks for the ideas, I'll look into these. It already has a command-line execution capability, you can persist the workflows as XML and then use a second tool to run them right from the command line. I moved your ideas into a discussion topic there. Thanks!

mark
Oct 1, 2014 at 8:05 AM
Hi Guys,

I'm implementing this tool at this moment in BIOS, and i'm looking for some help for the deployment.

Cheers,

Leo
Nov 5, 2014 at 8:17 PM
Hi Leo,

How's it coming? What kinds of issues have you run into?

mark
Nov 5, 2014 at 9:11 PM
Hi Mark,

I'm glad to hear about you. At this moment I'm implementing BioWF as a part of the BIOS services. I'm working of three diferent aspects:
  1. I'm incorporating some visualization tools for Genome Assembly and BLAST operations. The first one, the visualization of contigs individually using the Sequence Assembler model and the other one using pivot for BLAST results (BLAST in Pivot).
  2. Make a complete migration to a web based tool.
  3. I'm including new Bioinformatics software for High Performance operations (Linux tools). This software will run over BIOS HPC infrastructure as another of the left hand activities in BIO WF.
Any help about these points will be extremely welcome.

Best regards,