getting header info

May 28, 2015 at 12:00 AM
I'm trying to parse a whole Fasta file and save some info from the header of each sequence into the ISequence object, I suppose in the Metadata. Is there a way to do this at the point of parsing?
Jun 9, 2015 at 10:04 PM
No, i think you have to parse the file to deal with this, but check the calls. have a look in the cookbook at recipe for the genbank version. The FASTA metadata is simpler of course. If you look at the members of the parser classes, you will find that they work with collections of sequences. Feel free to propose a feature.
Jun 10, 2015 at 2:21 AM
Jim is correct - the Fasta parser does "just-in-time" parsing, so when you iterate through the sequences, it's parsing as it goes. That means you would need to parse the entire file in order to "see" all the sequences. However, if you only wanted to grab the first sequence (to determine the type of sequences for example), you could do that and only incur a penalty for reading/parsing that first sequence plus a little overhead to open the file on disk.

There is no scan to collect metadata (count, etc.) about the sequences.