Move Bio.PAMSAM project in to Bio project

Developer
Sep 28, 2013 at 6:13 PM
Changes related to dropping in the Bio.Selectome namespace.

There is currently no MutlipleSequenceAlignment class in the main Bio.dll, it only exists in Bio.PamSam, which really seems like a shame since an MSA is a basic thing a biologists needs.

There is an older IMultipleSequenceAlignment, but it is currently completely commented out.

I think the muscle aligner and the MSA are pretty important classes, and should be move in to the other project to avoid keeping it separate.

Votes/thoughts?
Coordinator
Sep 28, 2013 at 6:19 PM
My only concern would be if it has additional requirements that would get dragged in. PAMSAM is an implementation - perhaps only the base class or interfaces need to be moved (and that may be exactly what you are suggesting here)? I like keeping bio.dll reasonably small since more types equates to longer load times by .NET - it's the main reason why .NET 4 did all the refactoring and split out tons of mscorlib stuff to other assemblies. But I don't have any strong objections beyond that.

mark
Developer
Sep 28, 2013 at 6:24 PM
Edited Sep 28, 2013 at 6:24 PM
Not at all what I suggested, but your idea is really good for the reasons you give.

Very interesting comment on the assembly load times. I suppose the clr needs to verify all the types when loaded? (and .NET Bio is signed so this is even worse right?).

How about if I just move, and only move, MSA type, and keep all the heavy alignment stuff elsewhere. I am going to add Bio.Selectome as a separate project as well to keep the original.dll small as well.
Coordinator
Sep 28, 2013 at 6:36 PM
I think signing isn't really as impactful as it used to be - at least not for full trust apps like most desktop apps will be. And, to clarify my knee-jerk response, I honestly doubt having the PAMSAM merged in would impact load time significantly as there just aren't THAT many types in the dll as I look at it. A more compelling reason (for me anyway) to keep them separate is just for maintainability - changes to core types tend to ripple more through the library, vs. changes to ancillary assemblies. I like well factored class hierarchies that are loosely coupled and separate assemblies tends to enforce that design.

I think the criteria for Bio.dll should be anything that is commonly used in most bioinformatic scenarios. If MUSCLE is a big part of that, then by all means we should include it. However, if it's more abstract - i.e. it's more just Multiple Sequence Aligners (of which MUSCLE is one) then maybe it's just the base infrastructure that should be moved. The same for your selectome work - if it's something that is very common and would be useful to most people then it should be in the core assembly. I know I probably just muddied the water, sorry about that :-)
Coordinator
Sep 30, 2013 at 1:34 AM
MSA is pretty core. Sounds as though the basic typing should be there, doesn't it? It does seem that much of this stuff should live in Bio.Algorithms.Alignment, and the resulting service should then reside in Bio.dll. Here, for example, is an excerpt from the IAlignedSequence doc, which has members of sequences and metadata. I would favour following this same model.

.NET Bio
IAlignedSequence Interface
...

Interface to hold single aligned unit of alignment.
Namespace: Bio.Algorithms.Alignment
Assembly: Bio (in Bio.dll) Version: 1.1.4948.24704 (1.1.0.0)