How to Translate Ambiguous DNA to a Protein Sequence?

Jun 20, 2012 at 10:35 AM
Edited Jun 20, 2012 at 10:36 AM

Hi there,

I got DNA sequences from sequencing which I would like to translate to protein sequences. The alphabet I have to use is AmbiguousDNA for the DNA alphabet does not work. But if I use the AmbiguousDNA alphabet, the transcription results in ambiguous RNA and ProteinTranslation.Translate throws an exception. Is there another way to translate ambiguous RNA?

Regards,
Dirk

Coordinator
Jun 20, 2012 at 3:04 PM

Hi Dirk,

An ambiguous DNA alphamet includes the IUPAC codes representing different possible bases at a specific location in a sequence. As such, it is not possible to translate an ambiguous DNA sequence into a protein because the ambiguities may result in several different possible protein translations.

It would be better to work out why your input sequences don't work in the DNA alphabet - perhaps you have some non-standard characters? If they are ambiguities, you will have to resolve them before the DNA sequence object is created, then you can translate it.

Simon

 

Jun 20, 2012 at 8:30 PM

Hello Simon,

thank you for the quick reply. E. G. the Expasy Tools (web.expasy.org/translate) translate ambiguous DNA, the result is an ambiguous protein sequence. That is quite common, because not all sequencing results are "clean". So, what can I do?

Regards,

Dirk

Coordinator
Jun 20, 2012 at 11:00 PM

I wasn't aware of this - and I'm not sure .NET Bio directly supports it 'out of the box' - it would seem that translating ambiguous DNA to ambiguous protein could happen in a number of ways. The Expasy tool you provided a link to does a basic job - translating any codon containing an ambiguous basecall as 'X' - so using 'MTT' as input, where M is ambiguous for A or C, the possible translations are ATT (Ile) or CTT (Leu) forward in frame 1, and AAT (Asn) or AAG (Lys) in reverse. It would be possible to represent these ambiguities with actual translations, and presumably the number of output sequences would contain the set of all possible combinations of ambiguities. I can see why they went with X, although it does lose information.

I'll defer to someone more technical than me here, but I assume you would extend ProteinTranslation.Translate to accept ambiguous sequences - it should be relatively straightforward to encode X for any ambiguity, or take the more complex route for short sequences or those with few ambiguities.

Simon

 

Jun 21, 2012 at 8:19 AM

Hello Simon,

thanks again. If I understand you right I should change the source code and commit to your project? If that's possible, I will give that a try.

I would suggest to translate ambiguities to 'X', because providing more than one possible output sequence is not a solution: this might not lose information but fakes precision.

Dirk

Coordinator
Jun 25, 2012 at 3:03 PM

Hi Dirk,

Yes - this is an open-source project and we welcome people making changes to the source to improve it for everyone. You will find details on how to make a contribution on the website.

I agree that the substitution of 'X' as an amino acid for any codon containing an abmiguity is the simplest way to go.

If you have any trouble creating your contribution, just post to this list and someone will help.

Simon

 

Coordinator
Jul 2, 2012 at 10:41 PM

Hi Dirk,

Thanks for the patch upload to add this feature.  The changes looked good and are now checked in.  I did make one change to the tests you submitted.  The patch had an offset 1 for the AmbiguousRNA going to the translator, but the expected protein output sequence was looking for translation starting at 0.  I set the offset to 0 and all related tests pass.  The patch is associated with work item 8082 and resolved in change list #75638

Thanks again!
-bobd-

Aug 21, 2012 at 10:04 PM

Hi Dirk,

I wonder if you have had any time to work on the contribution you proposed? If you've not had the time yet, fair enough. If you've hit a problem please feel free to share it here, someone may be able to help.

Ross

On Jun 21, 2012 8:20 AM, "dstarke" <notifications@codeplex.com> wrote:

From: dstarke

Hello Simon,

thanks again. If I understand you right I should change the source code and commit to your project? If that's possible, I will give that a try.

I would suggest to translate ambiguities to 'X', because providing more than one possible output sequence is not a solution: this might not lose information but fakes precision.

Dirk

Read the full discussion online.

To add a post to this discussion, reply to this email (bio@discussions.codeplex.com)

To start a new discussion for this project, email bio@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe or change your settings on codePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at codeplex.com

Sep 13, 2012 at 4:00 PM

Hello Ross,

sorry for the late reply.

Actually I have posted a patch realizing this. I received a mail telling me it is all right apart from some minor changes, but the patch is still in state "Being evaluated". Lately when I updated my workarea (I access the source repo. through SVN), my changes haven't been in there. Today updating my workarea does not work at all (error message "Could not read status line: connection was closed by server").

I also realized some improvements locally, which I haven't uploaded as a patch yet. Since I cannot update my workarea, the new patch would include already provided changes...

Regards,
Dirk

Coordinator
Sep 13, 2012 at 4:22 PM

Oops - THAT shouldn't have happened!

 

Let me look into it...

Coordinator
Sep 14, 2012 at 9:09 PM
Edited Sep 14, 2012 at 9:10 PM
Dirk,

Although I made the changes for the patch back in July, I didn't tell CP that the patch had been applied.  It was changeset 75638 and checked-in on 7/2/2012
You should be able to pick-up the changes with any sync/get after that time.

I use TFS for the source control and talk directly to the CodePlex source control system.  I created a branch in early Aug to support getting VS2012 ready w/o impacting the live tree, however that branch seems to be causing problems for people using different source control options.  I want to look a bit deeper, but if the branch is the problem, I will kill the branch later today and hopefully some of the other issues people are seeing with other source control systems will be 'resolved'. 

Best,
-bobd-

dstarke wrote:
Hello Ross,

sorry for the late reply.

Actually I have posted a patch realizing this. I received a mail telling me it is all right apart from some minor changes, but the patch is still in state "Being evaluated". Lately when I updated my workarea (I access the source repo. through SVN), my changes haven't been in there. Today updating my workarea does not work at all (error message "Could not read status line: connection was closed by server").

I also realized some improvements locally, which I haven't uploaded as a patch yet. Since I cannot update my workarea, the new patch would include already provided changes...

Regards,
Dirk