This project is read-only.

System.OutOfMemory Exception

Jan 18, 2015 at 2:29 AM
When trying to run _sequences = parser.Parse().ToList();
On a fastq file that's about 3 gigs I get an out of memory exception. Does anyone know how to parse large files without running out of memory?
Jan 19, 2015 at 1:00 PM
Hi,

It depends. First, make sure you are building a 64-bit application. The default in .NET is almost always 32-bit because they tend to be smaller and run a little faster. But they also are limited to 2G of application memory. 64-bit apps have a much larger memory space and you should be able to easily load in a 3g file as long as you have the physical memory on the machine to back it up.

However.. .NET Bio is also limited in the size of the sequence when you turn it into a list. It's really a limitation of .NET itself - arrays are indexed by integer, so you can only have Int32.MaxValue items in the array. Since List<T> is backed by an array under the covers, the same limitation applies. It's why .NET Bio returns everything in IEnumerable<T>, there's no limitation then except for memory.

If you still can't get it to work, then what many algorithms do is break the data up - process it in pieces.

_sequences = parser.Take(50).ToList();
.. do processing ..
_sequences = parser.Skip(50).Take(50).ToList();
.. do processing ..
etc.

I hope that helps!

Mark
Jan 19, 2015 at 1:06 PM
Mark beat me to it, though I was only going to cover the list issue. He has the order right as well - check that you have a 64 bit app first.

All the best, and if it persists, please post again with the error message.

cheers
jh
Jan 20, 2015 at 12:52 AM
Edited Jan 20, 2015 at 12:53 AM
Ahah!
That was it, I just made it 64-bit and all is right with the world, thanks so much for your help.