Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

ApplyBQSRSpark ApplyBQSR has different results

0

3 comments

  • Avatar
    Genevieve Brandt (she/her)

    AMN do you have any more information about the extent of these differences?

    This has been previously discussed on the forum, though we didn't have a resolution: https://gatk.broadinstitute.org/hc/en-us/community/posts/360073320632-BQSR-Spark-Why-Beta-

    0
    Comment actions Permalink
  • Avatar
    AMN

    Hi Genevieve, 

    We are still looking into the differences. However a preliminary look at the first few lines shows some differences already. The 42nd and 43rd line of the bam:

    spark-enabled:

    $ samtools view ../spark/sample.bam | head -43 | tail -2
    <instrumentID>:292:<flowcellID>:4:1265:20808:18458 133 1 10022 0 * = 10022 0 TAAAAGAATAGTAATAACCATACATTTAAACATACACTAAAACAAACTGTTACTCAAATATTTAAAATTCACTTAAGTATTACTGTAATGATTAAGTAAATTCAAAACAAAATGAATAAGTTTAATAACATCTAAACAGAGAATAATAAA <?+D,H'*,BH<BF<&:+-7,&;(,G>%*:G'E%G'*HB):F*7F:;HIC-;*B>E):E%EG-6:F:+>-7GH-6:I,6+>B*BIC6:EI9E=%F?CB9F;GH'9F9G6FF)E/EFEBF?;--5F+B9G6+=HB9)*6IE08)+B)<6*E MC:Z:87M1I39M PG:Z:MarkDuplicates RG:Z:<instrumentID>@292@<flowcellID>@4 AS:i:0 XS:i:0
    <instrumentID>:292:<flowcellID>:4:1265:20808:18458 1097 1 10022 15 87M1I39M = 10022 0 CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAAACCCTAACCCTAACCCAAACCATAACCCTAACCCTAACCCTAACCCAAACACTAACCCAAACCCTAAACCAAACCCTAAACCTCAC CDEGBEFF/HBF=FFHBF.FFHBF=FFDBF,=FDB*+-F(FFGF-CBFG-FHBFGFF(FF<F(EBFGF<37F<-FCBFGF<HB;+F-(*FG(<HBFGF-(FF+F-H7F;GF(FF+F,2BF*+FC-E; XA:Z:1,+10070,39M1I38M1D28M1I20M,11;hs37d5,+10061289,39M1I51M36S,4;4,-191043982,38S67M1D22M,4;3,-197900264,43S21M1I62M,3;hs37d5,-10060120,44S41M1I41M,3; MD:Z:39T17T4C27C7T8C2T8C3A2 PG:Z:MarkDuplicates RG:Z:<instrumentID>@292@<flowcellID>@4 NM:i:10 AS:i:77 XS:i:68

    spark-disabled:

    $ samtools view sample.bam | head -43 | tail -2
    <instrumentID>:292:<flowcellID>:4:1265:20808:18458 1097 1 10022 15 87M1I39M = 10022 0 CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAAACCCTAACCCTAACCCAAACCATAACCCTAACCCTAACCCTAACCCAAACACTAACCCAAACCCTAAACCAAACCCTAAACCTCAC CDEGBEFF/HBF=FFHBF.FFHBF=FFDBF,=FDB*+-F(FFGF-CBFG-FHBFGFF(FF<F(EBFGF<37F<-FCBFGF<HB;+F-(*FG(<HBFGF-(FF+F-H7F;GF(FF+F,2BF*+FC-E; XA:Z:1,+10070,39M1I38M1D28M1I20M,11;hs37d5,+10061289,39M1I51M36S,4;4,-191043982,38S67M1D22M,4;3,-197900264,43S21M1I62M,3;hs37d5,-10060120,44S41M1I41M,3; MD:Z:39T17T4C27C7T8C2T8C3A2 PG:Z:MarkDuplicates RG:Z:<instrumentID>@292@<flowcellID>@4 NM:i:10 AS:i:77 XS:i:68
    <instrumentID>:292:<flowcellID>:4:1265:20808:18458 133 1 10022 0 * = 10022 0 TAAAAGAATAGTAATAACCATACATTTAAACATACACTAAAACAAACTGTTACTCAAATATTTAAAATTCACTTAAGTATTACTGTAATGATTAAGTAAATTCAAAACAAAATGAATAAGTTTAATAACATCTAAACAGAGAATAATAAA <?+D,H'*,BH<BF<&:+-7,&;(,G>%*:G'E%G'*HB):F*7F:;HIC-;*B>E):E%EG-6:F:+>-7GH-6:I,6+>B*BIC6:EI9E=%F?CB9F;GH'9F9G6FF)E/EFEBF?;--5F+B9G6+=HB9)*6IE08)+B)<6*E MC:Z:87M1I39M PG:Z:MarkDuplicates RG:Z:<instrumentID>@292@<flowcellID>@4 AS:i:0 XS:i:0

    seems that they are sorted differently. however, i ran `samtools sort -n sample.bam | samtools view | md5sum` on each bam and found that they had different md5sums, so i am not confident that this is the only difference. 

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi AMN,

    Please keep us up to date if you find any differences besides the sort order.

    We did find a difference in ApplyBQSR and ApplyBQSRSpark, there is an extra sort in ApplyBQSRSpark. Were both the inputs sorted with the same tool?

    Let us know what you find.

    Best,

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk