Synthetic sequences can be generated by using the read simulator tool which ships as part of the download. The read simulator classes don’t form part of the .Net Bio API, but can be accessed via the overall source code drop. Classes needed are:
  • ReadSimulator.cs
  • SimulatorSettings.cs

Start with a Sequence object that can be created using the NCBI FTP server. See How do I download a sequence directly from NCBI?

ISequence sequence; 
var simulation = new ReadSimulator(); 

//set the simulation settings 
// the Simulation settings class can be configured to use three different default settings as outlined 
//in the notes below. 
// the ReadSimulator constructor will need to be altered to take in the DefaultSettings object as a parameter to allow for this change 

SimulatorSettings Settings = new SimulatorSettings(); 
DefaultSettings defaultSettings = DefaultSettings.SangerDideoxy; Settings.SetDefaults(defaultSettings); 

//set sequence to use in simulation 
simulation.SequenceToSplit = sequence; 

//perform simulation 
//File is created in the project bin/debug folder containing the synthetic sequence data 
simulation.DoSimulation("outputFile"); 


Example output is given below. Here we shown only the first ‘read’ in its entirety, and the first line or so of the second ‘read’. This example generates 100 reads. A second example output is given below using the ShortRead settings from the Simulator Settings class, which restricts the output.

NC_002182 (Split 1, 1061bp)
atgtgtgcatagtagatactccacctagtcttggtggattaacaaaagaagcctttattgcaggagacaaactaatcgta
tgtttgattcctgagccattttctattctcgggctgcagaaaattagagaatttttaatttctataggcaaacctgagga
agaacatattcttggggtagcactatctttttgggatgaccggagttcgactaatcaaacgtacatagatatcattgagt
caatttacgaaaataagattttttcaacaaaaatacgcagagatatttctttgagtcgttcccttcttaaagaggattct
gtgatcaatgtatatccaacttcaagagctgcaacagatattctgaatttaacacacgaaatatctgctcttttaaattc
taaacacaaacaagacttttcccagaggacactgtgaataaactggaaaaggaagctagcgtcttttttaaaaaaaatca
ggaatccgtttctcaagactttaagaaaaaggtttcttcaattgagatgttttcaacttctttaaattcggaggaaaacc
agagtctggatcggctttttttgtctgagactcagaatttatcagatgaagaatcttaccaagaagatgttttgtcagta
aaacttctgacaagtcaaataaaggctattcaaaaacaacacgtgctccttcttggagagaagatttacaatgcgagaaa
gatactaagtaaaagttgtttctcttcaacaaccttttcatcttggctagatttagttttcaggactaaatcatccgcct
ataatgcgttggcttattatgaacttttcataagtctaccaagcacaactttgcagaaagagttccaatcaatcccgtat
aagtctgcatatattttagctgctaggaaaggagacttaaaaacaaaagtctctgttatagggaaagtttgtggaatgtc
caatgcatctgctatccgggttatggaccaacttcttccttcatctagaagtaaagataatcaaagatttttcgaatctg
atttagagaaaaatcgacagt

>NC_002182 (Split 2, 1053bp)
caaaacatctagaaacaaatacgaatttagtgggaaagaatctgaaacagctttagaggctctgtatcatttaggacatc
....


Output using ShortRead settings, showing the first two reads from the output file. These settings generated 20272 reads in different, sequential output files
NC_002182 (Split 1001, 41bp)
acctcttacagatcaacaaataatacttgggacatcgacaa
NC_002182 (Split 1002, 32bp)
agccttgcgtatattttaaggatgaatcgata


Note:
  • The above example code excludes parameters from the ReadSimulator class constructor that pertain to a GUI update.
  • The ReadSimulator class constructor (and SimulationSettings class variable) needs to be altered as shown below.

public SimulatorSettings Settings; 

public ReadSimulator(SimulatorSettings Settings)
    {
        _seqRandom = new Random();
        this.Settings = Settings;
    }


The SimulatorSettings class has three default setting as follows. Each offers differences on depth of coverage, sequence length, length variation, error frequency, and distribution type:
  • PyroSequencing
  • SangerDideoxy
  • ShortRead

Last edited Sep 16, 2014 at 7:29 AM by jamesmhogan, version 1

Comments

No comments yet.