cookbook entry review

Editor
Aug 26, 2014 at 11:18 PM
As part of a student project I have compiled the following proposed cookbook entry on 'how to generate synthetic sequence data'. Any feedback would be appreciated.

Synthetic sequences can be generated using the read simulator. The read simulator classes don’t form part of the .Net Bio API, but can be accessed via the source code. Classes needed are:
  • ReadSimulator.cs
  • SimulatorSettings.cs
Start with a Sequence object that can be created using the NCBI FTP server. See ‘How do I download a sequence directly from NCBI?’

ISequence sequence;
var simulation = new ReadSimulator();

//set the simulation settings
// the Simulation settings class can be configured to use three different default settings (outlined in //the notes below
// the ReadSimulator constructor will need to be altered to take in the DefaultSettings object as a parameter to allow for this change

SimulatorSettings Settings = new SimulatorSettings();
DefaultSettings defaultSettings = DefaultSettings.SangerDideoxy; Settings.SetDefaults(defaultSettings);

//set sequence to use in simulation
simulation.SequenceToSplit = sequence;

//perform simulation
//File is created in the project bin/debug folder containing the synthetic sequence data
simulation.DoSimulation("outputFile");

Example output is given below. Here we shown only the first ‘read’ in its entirety, and the first line or two of the second ‘read’. This example generates 100 reads. A second example output is given below using the ShortRead settings from the Simulator Settings class.
NC_002182 (Split 1, 1061bp)
atgtgtgcatagtagatactccacctagtcttggtggattaacaaaagaagcctttattgcaggagacaaactaatcgta
tgtttgattcctgagccattttctattctcgggctgcagaaaattagagaatttttaatttctataggcaaacctgagga
agaacatattcttggggtagcactatctttttgggatgaccggagttcgactaatcaaacgtacatagatatcattgagt
caatttacgaaaataagattttttcaacaaaaatacgcagagatatttctttgagtcgttcccttcttaaagaggattct
gtgatcaatgtatatccaacttcaagagctgcaacagatattctgaatttaacacacgaaatatctgctcttttaaattc
taaacacaaacaagacttttcccagaggacactgtgaataaactggaaaaggaagctagcgtcttttttaaaaaaaatca
ggaatccgtttctcaagactttaagaaaaaggtttcttcaattgagatgttttcaacttctttaaattcggaggaaaacc
agagtctggatcggctttttttgtctgagactcagaatttatcagatgaagaatcttaccaagaagatgttttgtcagta
aaacttctgacaagtcaaataaaggctattcaaaaacaacacgtgctccttcttggagagaagatttacaatgcgagaaa
gatactaagtaaaagttgtttctcttcaacaaccttttcatcttggctagatttagttttcaggactaaatcatccgcct
ataatgcgttggcttattatgaacttttcataagtctaccaagcacaactttgcagaaagagttccaatcaatcccgtat
aagtctgcatatattttagctgctaggaaaggagacttaaaaacaaaagtctctgttatagggaaagtttgtggaatgtc
caatgcatctgctatccgggttatggaccaacttcttccttcatctagaagtaaagataatcaaagatttttcgaatctg
atttagagaaaaatcgacagt

NC_002182 (Split 2, 1053bp)
Caaaacatctagaaacaaatacgaatttagtgggaaagaatctgaaacagctttagaggctctgtatcatttaggacatc
Output using ShortRead settings, showing the first two reads from the output file. These settings generated 20272 reads in different, sequential output files
NC_002182 (Split 1001, 41bp)
acctcttacagatcaacaaataatacttgggacatcgacaa
NC_002182 (Split 1002, 32bp)
agccttgcgtatattttaaggatgaatcgata
Note:
  • The above example code excludes parameters from the ReadSimulator class constructor that pertain to a GUI update.
  • The ReadSimulator class constructor (and SimulationSettings class variable) needs to be altered as follows:
public SimulatorSettings Settings;

public ReadSimulator(SimulatorSettings Settings)
    {
        _seqRandom = new Random();
        this.Settings = Settings;
    }
  • The SimulatorSettings class has three default setting as follows. Each offers differences on depth of coverage, sequence length, length variation, error frequency, and distribution type:
    • PyroSequencing
    • SangerDideoxy
    • ShortRead