recommended kmer length for PaDeNA utility

Coordinator
Nov 15, 2011 at 6:30 PM

A user brought this up to me directly so I thought I would share with the community. His question was what Kmer length should I use for the Padenautil utility?

First off it's best to read through the .pdf of the Padena paper hosted on this site and you should also read through the technical user guide documentation. Here are some other good guidelines

  •  Kmers can be any value from 2 to 32 in PadenaUtil
  •  If a short kmer is used, then any error in the data will be represented in fewer kmers, so shorter values are better for noisy data
  • If a longer kmer length is used, there is a higher chance of kmers being unique, so less ambiguity in assemblies
  • Since each dataset is different, the best approach is to evaluate the quality of the assembly using different kmer lengths, to see what works best. If a run is likely to take a long time, do this on a subset of the data.
  • Failing that, choose K=32. Assembly will be slower with longer kmer's, but for reasons noted above it is more likely to produce reasonable contigs.

Rick Benge - community program manager