Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Number of mutations differs between WES kits

0

4 comments

  • Avatar
    Gökalp Çelik

    Hi Matti Meikäläinen

    Apart from the coverage differences between exome capture kits there could be differences between how each library was prepared and how sequencing artifacts occurred during the first and the second batches. 

    If kit coverages are roughly the same and mapping, recalibration and calling parameters are no different, I would suspect that the first sequenced batch may have other issues with respect to cross-sample contamination, insert sizes, mapping quality and basecalling qualities. 

    Without knowing any of these parameters it is hard to tell where the problem is. 

    I hope this helps.

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi again. 

    If you would like to help us understand a little more about this issue we would like to request generating a stats file for FilterMutectCalls step and sharing it with us. 

    You can use the parameter 

    --filtering-stats <String>    The output filtering stats file  Default value: null.

    to generate the files for both batches. 

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Matti Meikäläinen

    Batch1

     

    #<METADATA>Ln prior of deletion of length 10=-15.827017252826378
    #<METADATA>Ln prior of deletion of length 9=-15.497109486538026
    #<METADATA>Ln prior of deletion of length 8=-16.393915418597032
    #<METADATA>Ln prior of deletion of length 7=-15.66790322077101
    #<METADATA>Ln prior of deletion of length 6=-15.353893951625777
    #<METADATA>Ln prior of deletion of length 5=-14.570599842294246
    #<METADATA>Ln prior of deletion of length 4=-14.4944209504161
    #<METADATA>Ln prior of deletion of length 3=-14.936435650367912
    #<METADATA>Ln prior of deletion of length 2=-14.496066098316097
    #<METADATA>Ln prior of deletion of length 1=-13.67045815850774
    #<METADATA>Ln prior of SNV=-9.399366101557503
    #<METADATA>Ln prior of insertion of length 1=-13.644941956702018
    #<METADATA>Ln prior of insertion of length 2=-13.827249693329136
    #<METADATA>Ln prior of insertion of length 3=-13.638134009967798
    #<METADATA>Ln prior of insertion of length 4=-13.421273395630884
    #<METADATA>Ln prior of insertion of length 5=-13.383772129736975
    #<METADATA>Ln prior of insertion of length 6=-13.702401584466164
    #<METADATA>Ln prior of insertion of length 7=-13.841408476520511
    #<METADATA>Ln prior of insertion of length 8=-13.862980888251581
    #<METADATA>Ln prior of insertion of length 9=-13.768158788905456
    #<METADATA>Ln prior of insertion of length 10=-13.85699460219151
    #<METADATA>Background beta-binomial cluster=weight = 0.0485, alpha = 2.69, beta = 5.29
    #<METADATA>High-AF beta-binomial cluster=weight = 0.0110, alpha = 14.10, beta = 0.50
    #<METADATA>Binomial cluster=weight = 0.8798, mean = 0.059
    #<METADATA>Binomial cluster=weight = 0.0422, mean = 0.494
    #<METADATA>Binomial cluster=weight = 0.0185, mean = 0.063
    #<METADATA>threshold=0.544
    #<METADATA>fdr=0.109
    #<METADATA>sensitivity=0.933
    filter    FP    FDR    FN    FNR
    weak_evidence    983.35    0.08    472.21    0.04
    strand_bias    23.3    0.0    0.36    0.0
    contamination    1.2    0.0    0.23    0.0
    slippage    0.18    0.0    0.02    0.0
    haplotype    291.31    0.02    160.04    0.01
    germline    83.47    0.01    54.16    0.0

     

    Batch2

     

    #<METADATA>Ln prior of deletion of length 10=-18.78648629378458
    #<METADATA>Ln prior of deletion of length 9=-17.48838437502819
    #<METADATA>Ln prior of deletion of length 8=-17.883935544363286
    #<METADATA>Ln prior of deletion of length 7=-20.72326583694641
    #<METADATA>Ln prior of deletion of length 6=-16.430624011474748
    #<METADATA>Ln prior of deletion of length 5=-17.172333034600175
    #<METADATA>Ln prior of deletion of length 4=-16.424962391442882
    #<METADATA>Ln prior of deletion of length 3=-16.49907094439751
    #<METADATA>Ln prior of deletion of length 2=-16.147987548061447
    #<METADATA>Ln prior of deletion of length 1=-15.318926279851622
    #<METADATA>Ln prior of SNV=-10.280649777210495
    #<METADATA>Ln prior of insertion of length 1=-16.51437129595656
    #<METADATA>Ln prior of insertion of length 2=-17.025941633551284
    #<METADATA>Ln prior of insertion of length 3=-20.72326583694641
    #<METADATA>Ln prior of insertion of length 4=-20.72326583694641
    #<METADATA>Ln prior of insertion of length 5=-18.18202614926411
    #<METADATA>Ln prior of insertion of length 6=-20.72326583694641
    #<METADATA>Ln prior of insertion of length 7=-20.72326583694641
    #<METADATA>Ln prior of insertion of length 8=-20.72326583694641
    #<METADATA>Ln prior of insertion of length 9=-20.72326583694641
    #<METADATA>Ln prior of insertion of length 10=-20.72326583694641
    #<METADATA>Background beta-binomial cluster=weight = 0.1406, alpha = 1.71, beta = 4.18
    #<METADATA>High-AF beta-binomial cluster=weight = 0.0044, alpha = 10.07, beta = 0.50
    #<METADATA>Binomial cluster=weight = 0.6376, mean = 0.029
    #<METADATA>Binomial cluster=weight = 0.1652, mean = 0.490
    #<METADATA>Binomial cluster=weight = 0.0309, mean = 0.990
    #<METADATA>Binomial cluster=weight = 0.0213, mean = 0.124
    #<METADATA>threshold=0.563
    #<METADATA>fdr=0.153
    #<METADATA>sensitivity=0.903
    filter    FP    FDR    FN    FNR
    weak_evidence    452.87    0.13    237.78    0.07
    strand_bias    7.46    0.0    1.07    0.0
    slippage    0.23    0.0    0.01    0.0
    haplotype    30.96    0.01    19.65    0.01
    germline    39.33    0.01    45.29    0.01

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Matti Meikäläinen

    After checking these reports with our team it is clear that 2 sets of samples have quite different characteristics in terms of tumor purity. Different libraries show different pools of allele fractions therefore if both kits are performing as they should and equally difference in the way sample is collected could explain the difference between 2 results.

    Also if sample preparation is pretty much the same and source of tumor and tumor purities are more or less similar then the actual problem could lie somewhere around the way one of the kits performing poorly than the other one. 

    In both cases we believe Mutect2 is acting as it should and results are quite expected given the circumstances. 

    I hope this helps. 
    Regards.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk