Bam files issues

Oct 13, 2015 at 7:09 AM
Hi,

I am trying to parse a Bam file exported from Geneious. It comes up with an error "Invalid QName " and some fragment of a the file. Any idea why this would be the case. The same goes when I try and parse a .SAM file exported from Geneious also. Below is from top of file...it crashes in line M03443....

@HD VN:1.0 SO:unsorted @SQ SN:AY228557_POLDRM LN:9840 @RG ID:Unpaired reads assembled against AY228557 POLDRM SM:XXXX
M03443:9:000000000-AE7WG:1:1118:4103:17309 2:N:0:80 16 AY228557_POLDRM 1 255 177I1P10I2P16I29M * 0 0 TGTGTAAATTTGACTGTGCTGACATTGTTGCATGGCGCTGTTGCATTGAATGTCTCATTATTACACTTTAGAAGCGCATAACCAGCTGGAAGCACAAGAGGAAGAGGAAGTAGGCTTTCCAGTCACACCTCAGGTGCCTTTAAGAACAATGACTTATAAGGCTGCATTCGATCTCGGCTTCTTTTTAAAAGAAAAGGGGGGACTGGATGGGTTAATTTACTCTAAGAAAAGA 66C9>70A1385@A907;E@A55@CE=0:303)+FFED5D;9FFGFFD98;=C8DC9+?8FDD8CFA3++=C=@+=,FFEC@;5,9EDEE9DE=D>,FFEE9,9D48E8FDACEDEA,,7FBA?FF?=CFF=9GFCEFGFC,CF9AFEF@@@FDGGGFGDCGGF7CD:7CCF<DEC7E9AGGGGGGGGGGGFDDGGGGGFGGGGGGFGFFGGGGGFC<FGGGFGGGGG RG:Z:Unpaired reads assembled against AY228557 POLDRM
Developer
Oct 13, 2015 at 5:55 PM
According to the SAM spec the name has adhere to the regex: [!-?A-~]{1,254}, so your example looks fine to me. Can you post reproducible example?
Oct 14, 2015 at 6:56 AM
Edited Oct 14, 2015 at 6:57 AM
Thank you for the reply. Code seems fine as it works on a sample bam file I downloaded of internet. Issue seems to be with this file but I cannot understand why.
Error and source is below if this sheds any more light on issue. File fragment is similar to above and it seems to fail on first line.

The precise error is :

An unhandled exception of type 'System.ArgumentException' occurred in Bio.dll
Additional information: Invalid QName: "M03443:9:000000000-AE7WG:1:1105:11897:23575 2:N:0:82". The allowed pattern is: [^


]+.

Code is :
        Bio.IO.SAM.SequenceAlignmentMap F = new Bio.IO.SAM.SequenceAlignmentMap();
      //  Bio.IO.BAM.BAMParser x = new Bio.IO.BAM.BAMParser();


        Bio.IO.SAM.SAMParser x = new Bio.IO.SAM.SAMParser();

        //F = x.Parse(@"C:\test\Example.bam");
        F = x.Parse(@"c:\test\TASP_112066.sam");                        //Fails on this line
        // x.Parse(@"C:\test\112066\TASP_112066.bam");
Developer
Oct 14, 2015 at 8:20 AM
You appear to be using an outdated version of the library as you are parsing directly from the filename (are you grabbing this from nuget??).

In any event, I converted your pasted example data into a .sam file (below) and ran it through the following code. It appears to work without error. Can you confirm the same is true on your end? If your file was corrupted, this would explain why my cut/paste version worked but yours did not.
    Bio.IO.SAM.SAMParser x = new Bio.IO.SAM.SAMParser();
    var stream = File.OpenRead(@"test.sam");
    var F = x.Parse(stream); 
test.sam file contents (no line breaks for non @ lines)
@HD VN:1.0 SO:unsorted 
@SQ SN:AY228557_POLDRM LN:9840 
@RG ID:Unpaired reads assembled against AY228557 POLDRM SM:XXXX 
M03443:9:000000000-AE7WG:1:1118:4103:173092:N:0:80  16  AY228557_POLDRM 1   255 177I1P10I2P16I29M   *   0   0   TGTGTAAATTTGACTGTGCTGACATTGTTGCATGGCGCTGTTGCATTGAATGTCTCATTATTACACTTTAGAAGCGCATAACCAGCTGGAAGCACAAGAGGAAGAGGAAGTAGGCTTTCCAGTCACACCTCAGGTGCCTTTAAGAACAATGACTTATAAGGCTGCATTCGATCTCGGCTTCTTTTTAAAAGAAAAGGGGGGACTGGATGGGTTAATTTACTCTAAGAAA   66C9>70A1385@A907;E@A55@CE=0:303)+FFED5D;9FFGFFD98;=C8DC9+?8FDD8CFA3++=C=@+=,FFEC@;5,9EDEE9DE=D>,FFEE9,9D48E8FDACEDEA,,7FBA?FF?=CFF=9GFCEFGFC,CF9AFEF@@@FDGGGFGDCGGF7CD:7CCF<DEC7E9AGGGGGGGGGGGFDDGGGGGFGGGGGGFGFFGGGGGFC<FGGGFGGGGGG
Oct 14, 2015 at 9:02 AM
I downloaded .net version 1.1 msi from this site but I seem to be using a different version to you.
I am targetting framework 4...is that correct? When I run your code, x.parse refuses to accept a stream.
.Parse will only accept a filename or a Textreader. So I modified code to below and ran on your test.sam and I come up with the same error.
i.e.
An unhandled exception of type 'System.ArgumentException' occurred in Bio.dll
Additional information: Invalid QName: "M03443:9:000000000-AE7WG:1:1105:11897:23575 2:N:0:82". The allowed pattern is: [^

What version of bio.dll are you using? OS/framwework?
I am running Visual studio express 2015 (on win 8 ) - framwework 4(4.5) also.
        //Bio.IO.SAM.SequenceAlignmentMap F = new Bio.IO.SAM.SequenceAlignmentMap();

        Bio.IO.SAM.SAMParser x = new Bio.IO.SAM.SAMParser();
        //var stream = File.OpenRead(@"c:\test\TASP_112066_2.sam");


        TextReader stream = File.OpenText(@"c:\test\TASP_112066_2.sam");
        var F = x.Parse(stream);      
Oct 14, 2015 at 9:34 AM
Ok I have come right. Installed from nuget. I had an old version. Thank you for the help...you were right...old version of library. Apologies...I am new at this so did a bunch of stupid stuff before finding correct version.
Developer
Oct 14, 2015 at 6:45 PM
Excellent! Great to hear it works. We may consider taking down the old tools if they are causing problems, but glad you have this working.
Oct 15, 2015 at 7:09 AM
Well almost works. Now it comes up with error
"Query name: M03443:9:000000000-AE7WG:1:1118:4103:17309 2:N:0:80 contains illegal characters" - .SAM and .BAM parsing of same file comes up with this error.
It does not appear to be code as I can read test files(downloaded of internet)...just have issue with files exported out of Genious. I know the SAM/BAM format has been refined over time...does the library work with a particular version?
Regarding the library version being incorrect on my side...what threw me was the download section of this codeplex website..there is no obvious way of knowing that the version has been updated..a note that the latest version was elsewhere would have led me searching on Nuget...so taking it down might be an overkill - a note would be helpful though.