This project is read-only.

Design constraints leading to sequences as byte[]

Jan 13, 2017 at 7:10 PM
Edited Jan 13, 2017 at 7:11 PM
Hello all,
The .NET Bio Sequence class has taken the (not uncommon) approach of representing sequences as byte[]’s instead of char[]’s or even just as a strings. I have read a couple of .NET Bio discussion threads that sight this as a source of complexity and confusion in the system. I have to admit that it is not the first way I would have thought of storing sequence data, and it confused me at first too.
What I have not been able to find (in the .NET Bio, or anywhere else on the web) is an explanation of why this is a good way to store sequence data in memory. I have found several sources that claim byte[]’s lead to better performance, but I have never found evidence, or even an explanation, to support this claim.
Since .NET Bio chose to represent sequences as byte[]’s, I am hoping that someone involved with this project can talk me through the design constraints/tradeoffs and overall thinking process that might have gone into making this decision. Specifically, at some point someone may have been deciding between representing sequences as strings, char[]’s, or byte[]’s and I am interested in knowing the pros and cons of each and what may have lead the original .NET Bio developer(s) to choose byte[]’s over other options.

Thank you,