Bio.Selectome, Tabix and Fast BAMParsers

Developer
Oct 14, 2013 at 2:45 AM
I just added a project for interfacing with Bio.Selectome to the repository. I think this is working just fine, however am a bit concerned that the upload might have missed something else. If anyone gets a chance and can run the unit tests with "selectome" in the name on a different machine I would definitely appreciate it. Any takers? (amber, mark?)

Also, I played around a bit with making the BAM parser faster this weekend. The old version was very slow in mono but is now much faster (http://evolvedmicrobe.com/blogs/), and I think with Amber's changes will be a much improved class.

Related to this, I will probably start putting the VCF parser together in a more usable format. VCF files are usually indexed with the TABIX program to make accessing them a bit easier to access qucikly, does anyone know if we have a .NET version of this available?
Coordinator
Oct 15, 2013 at 5:28 AM
Hi Nigel,
I believe you omitted Bio.Selectome.csproj in revision 79967.
the project shows as unavailable in solution explorer.
Developer
Oct 17, 2013 at 2:58 PM
Eck, okay thanks lawrence, will try to add that when I get back to my computer later.
Developer
Oct 19, 2013 at 3:34 AM
okay, just uploaded changes from home computer and downloaded them via remote desktop on work computer, it seems to be working this time. Lawrence, can i trouble you to give it another go?
Coordinator
Oct 21, 2013 at 2:11 PM
Thanks Nigel.

The project now loads and builds successfully and all unit tests pass.

One minor point: on rebuild I get a handful of warnings, with codes CS3003 and CS3001, which relate to CLS (Common Language Specification) compliance. They appear to be related to identifiers or types which are not guaranteed to work as expected in languages other than C#.

The root cause traces back to a line in AssemblyInfo.cs, where the [assembly: CLSCompliant(true)] attribute appears.

On the face of it, CLS compliance looks like a Good Thing, but it seems to be more than we promise elsewhere. For example, the Bio assembly does not include this attribute. I think we could remove this annotation and make the warnings go away without adverse impact (and in the process gain an empty error list, which is something I always enjoy ;^)

Unless you have some particular reason why the assembly ought to be CLS compliant? In which case we should fix the warnings...
Coordinator
Oct 21, 2013 at 3:51 PM
Personally, I don't think CLS compliance is an issue anymore. The original reason for it was to ensure libraries built with C# would work with other languages like VB.NET which didn't support unsigned values, overloaded methods, etc. Most modern languages today that you would use with .NET now have all these features. There likely is some odd language out there which might not work with .NET Bio, but putting those restrictions on the API surface just to support Erlang or COBOL.NET is probably overkill.

The only exception I think we should address is differing case methods which should be avoid at all costs (because it's silly). By this I mean:

void SomeFunction();
void someFunction();

Which is perfectly valid in C# but would fail in VB.NET which is not case-sensitive. Beyond that, I think we can ignore CLS compliance - it's restrictive and unnecessary. My .02$ :-)

mark


Developer
Oct 21, 2013 at 4:25 PM
Edited Oct 21, 2013 at 4:27 PM
Lawerence, thanks again for taking a look at this. It seems I have a knack for finding ways to not have the messages that do not appear on my system show up on others after committing :).

So an FxCop told me to put that attribute there, I wasn't really sure about it but it seemed to stop giving errors after that, definitely not the best basis to make a decision with I suppose. Hadn't seen the other errors, I gather they may have been caused by the generic methods or something?

Completely agreed with Mark though, no need to support cobol.net! But case differing methods seems like a terrible idea.

I am going to remove this attribute, would anyone object if I also unchecked in BioFramework_Code_Rules.ruleset CA1014 "Mark assemblies with CLSCompliantAttribute?

There is also CA2102 which has not deal with catching non-clscompliant exceptions, though am not sure how applicable that is.

Another problem I found with code analysis is that it often suggests using ICollection instead of IEnumerable, but ICollection doesn't play as well with F#.

Thanks again,
N
Coordinator
Oct 21, 2013 at 4:41 PM
I have no issue with these suggestions -- I prefer IEnumerable<T> if the consumer won't be editing the data and there is no need for random access. I think that's why code analysis flags it - there's no way to tell what the consumer does with the data and if the underlying thing is a collection type then returning IEnumerable<T> could produce inefficient code for the consumer. For most cases it can probably be ignored but we should think about what people want to do with the returning data and decide if ICollection<T> or IList<T> is a better approach - we can even use IReadOnlyList<T> (new for 4.5) if we want to ensure the data stays read-only.

mark