How do I use GenBank sequence metadata?

Unlike other sequence file formats such as FASTA which are primarily for communicating sequence data, the GenBank (.gbk) document format supports detailed annotation of features such as genes. The sequence produced by a GenBank parser is an object which contains, along with raw sequence data, the full collection of feature annotations. This example shows how to access the annotations.

First, load a sequence from a GenBank file:

const string genbankFileName = @"...";
ISequence sequence = null;

using ( ISequenceParser parser = new GenBankParser( genbankFileName ) )
	IEnumerable<ISequence> sequences = parser.Parse();
	sequence = parser.Parse().FirstOrDefault();

Some basic information about the genome can be obtained directly from the ISequence object, but feature annotations are stored in a metadata object within the ISequence. To get a reference to the GenBank metadata:

GenBankMetadata metadata = sequence.Metadata["GenBank"] as GenBankMetadata;

The metadata object has several properties that provide more detail about the genome:

// Get general information about the sequence.
Console.WriteLine( "Locus: {0}", metadata.Locus );
Console.WriteLine( "Accession: {0}", metadata.Accession );
Console.WriteLine( "Version: {0}", metadata.Version );
Console.WriteLine( "Definition: {0}", metadata.Definition );
Console.WriteLine( "Common name: {0}", metadata.Source.CommonName );
Console.WriteLine( "Species: {0}", metadata.Source.Organism.Species );
Console.WriteLine( "Genus: {0}", metadata.Source.Organism.Genus );
Console.WriteLine( "Class levels: {0}", metadata.Source.Organism.ClassLevels );

To examine all annotated features in a genome:

// List all features in the collection.
foreach ( FeatureItem feature in metadata.Features.All )
	Console.WriteLine( "Feature: {0}", feature.Key );
	Console.WriteLine( "\tLocation: {0}", feature.Location );
	Console.WriteLine( "\tSubsequence: {0}", feature.GetSubSequence( sequence ) );
	Console.WriteLine( "\tQualifiers:" );

	foreach ( KeyValuePair<string,List<string>> qual in feature.Qualifiers )
		foreach ( string value in qual.Value )
			Console.WriteLine( "\t\t{0}: {1}", qual.Key, value );

Last edited Jun 24, 2014 at 2:05 PM by lbuckingham, version 2


No comments yet.