Jan 13 at 7:10 PM
Edited Jan 13 at 7:11 PM
The .NET Bio Sequence class has taken the (not uncommon) approach of representing sequences as byte’s instead of char’s or even just as a strings. I have read a couple of .NET Bio discussion threads that sight this as a source of complexity and confusion
in the system. I have to admit that it is not the first way I would have thought of storing sequence data, and it confused me at first too.
What I have not been able to find (in the .NET Bio, or anywhere else on the web) is an explanation of why this is a good way to store sequence data in memory. I have found several sources that claim byte’s lead to better performance, but I have never found
evidence, or even an explanation, to support this claim.
Since .NET Bio chose to represent sequences as byte’s, I am hoping that someone involved with this project can talk me through the design constraints/tradeoffs and overall thinking process that might have gone into making this decision. Specifically, at some
point someone may have been deciding between representing sequences as strings, char’s, or byte’s and I am interested in knowing the pros and cons of each and what may have lead the original .NET Bio developer(s) to choose byte’s over other options.