Some time ago one of our people wrote a fairly elegant pattern language - specified in XML - which unifies a whole bunch of other standard patterns: REGEX, Motifs, Gaps, simple Matches and so on. The language has now been ported to sit on top of .NET Bio
and we will launch both the codebase and a web based app in the coming days. Public site is under construction.
Some blurb follows:
BioPatML.NET is an application library for the .NET framework which integrates the BioPATML pattern definition and search engine with the .NET Bio bioinformatics library.
BioPatML is an XML-based pattern description language providing support for a broad range of component patterns, and a rich grammatical structure for their combination. The language defines a common representation for patterns which may be used to describe
biologically significant sequences and sequence structures, including motifs, position weight matrices and regular expressions as well as hierarchical structures containing sequences or sets of arbitrarily complex patterns. Pattern iteration, repeat and Boolean
operators permit the construction of patterns with much greater specificity than that provided by regular expression matching. The language provides an elegant mechanism for the definition and reuse of named sub-patterns, enabling the construction of pattern
libraries which may be used to build concrete pattern instances.
Why this is cute is that one can specify hierarchical structures of patterns. .NET Bio grabs the files and does the actual parsing of the sequences, and we can get all the matches we want. An example specification for the three components of the promoter for
sigma 70, the housekeeping sigma factor in _E. coli. _
<Definition name="sigma70" >
<Definition name="-10element" >
<Motif motif="TATAAT" alphabet="DNA" threshold="0.7" />
<Definition name="-35element" >
<Motif motif="TTGACA" alphabet="DNA" threshold="0.7" />
<Gap impact="0.2" minimum="15" maximum="21" threshold="0.0" />
<Series mode="BEST" threshold="0.0">
<Use definition ="-10element"/>
So here we can combine the hexamers at -10 and -35 with a specified gap between them. Far more sophisticated versions are possible. The port is the result of some very good work by an Indonesian student called Lalu Yazikri, and we also have a nascent web based
JS editor to allow construction of these patterns (full credit to Sadeen, our summer student from Saudi Arabia who has done a fine job of starting that work).
Anyway, these thoughts and images will whet the appetite, with more to come and plenty of scope for contribution. Drag and drop to create a pattern:
Then run and list the results:
Enjoy, and watch this space.