To download a GenBank-formatted genome from the NCBI FTP server, you need to know the name of the folder in which the file is stored and the accession code for the genome. In this example we download a bacterial genome from the folder genomes/Bacteria.

First create a string with the address of the file and a WebClient to download content
const string bacteriaTemplate = "ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/{0}/{1}.gbk";
const string accession = "NC_002182";
const string folder = "Chlamydia_muridarum_Nigg_uid57785";
string url = String.Format( bacteriaTemplate, folder, accession );

WebClient downloader = new WebClient();

Once you have the WebClient you can create a Stream from which the sequence can be parsed.
using ( Stream stream = downloader.OpenRead( url ) )
{
	StreamReader reader = new StreamReader( stream );
	ISequenceParser parser = new GenBankParser();
	ISequence sequence = parser.Parse( reader ).FirstOrDefault();

	// Process sequence here...
}

Notes:
  • If the file is very large or you are developing an interactive application, you can use the WebClient's OpenReadAsync method to do a non-blocking download.
  • The parser can be used in several ways to access data. In this example, we use a StreamReader to access the downloaded data stream.
  • A file may contain more than one sequence, so the Parse operation produces a list of sequences. In this case, we know there is only sequence, so we use FirstOrDefault() to grab the first element of the list.

See also:

Last edited Jun 24, 2014 at 1:58 PM by lbuckingham, version 3

Comments

No comments yet.