Number of mutations differs between WES kits
Hi,
I used Mutect2 to call variants from WES data in the tumor-only mode. The first batch of samples was generated using KAPA HyperExome v1 and the second batch using KAPA HyperExome v2.
I am getting three times as many variants for the first batch than the second batch. I checked with samtools depth from the recalibrated BAM files that the issue is not related to the sequencing depth or coverage. The depth and coverage were roughly the same for the two batches. The issue seems to occur at the Mutect2 step because the unfiltered VCF files (those without the FILTER status) have different number of rows in the first and second batches. 150k for the first batch and 50 k for the second batch.
I was wondering if you have any idea what could be causing this.
-
Apart from the coverage differences between exome capture kits there could be differences between how each library was prepared and how sequencing artifacts occurred during the first and the second batches.
If kit coverages are roughly the same and mapping, recalibration and calling parameters are no different, I would suspect that the first sequenced batch may have other issues with respect to cross-sample contamination, insert sizes, mapping quality and basecalling qualities.
Without knowing any of these parameters it is hard to tell where the problem is.
I hope this helps.
Regards.
-
Hi again.
If you would like to help us understand a little more about this issue we would like to request generating a stats file for FilterMutectCalls step and sharing it with us.
You can use the parameter
--filtering-stats <String> The output filtering stats file Default value: null.
to generate the files for both batches.
Regards.
-
Batch1
#<METADATA>Ln prior of deletion of length 10=-15.827017252826378
#<METADATA>Ln prior of deletion of length 9=-15.497109486538026
#<METADATA>Ln prior of deletion of length 8=-16.393915418597032
#<METADATA>Ln prior of deletion of length 7=-15.66790322077101
#<METADATA>Ln prior of deletion of length 6=-15.353893951625777
#<METADATA>Ln prior of deletion of length 5=-14.570599842294246
#<METADATA>Ln prior of deletion of length 4=-14.4944209504161
#<METADATA>Ln prior of deletion of length 3=-14.936435650367912
#<METADATA>Ln prior of deletion of length 2=-14.496066098316097
#<METADATA>Ln prior of deletion of length 1=-13.67045815850774
#<METADATA>Ln prior of SNV=-9.399366101557503
#<METADATA>Ln prior of insertion of length 1=-13.644941956702018
#<METADATA>Ln prior of insertion of length 2=-13.827249693329136
#<METADATA>Ln prior of insertion of length 3=-13.638134009967798
#<METADATA>Ln prior of insertion of length 4=-13.421273395630884
#<METADATA>Ln prior of insertion of length 5=-13.383772129736975
#<METADATA>Ln prior of insertion of length 6=-13.702401584466164
#<METADATA>Ln prior of insertion of length 7=-13.841408476520511
#<METADATA>Ln prior of insertion of length 8=-13.862980888251581
#<METADATA>Ln prior of insertion of length 9=-13.768158788905456
#<METADATA>Ln prior of insertion of length 10=-13.85699460219151
#<METADATA>Background beta-binomial cluster=weight = 0.0485, alpha = 2.69, beta = 5.29
#<METADATA>High-AF beta-binomial cluster=weight = 0.0110, alpha = 14.10, beta = 0.50
#<METADATA>Binomial cluster=weight = 0.8798, mean = 0.059
#<METADATA>Binomial cluster=weight = 0.0422, mean = 0.494
#<METADATA>Binomial cluster=weight = 0.0185, mean = 0.063
#<METADATA>threshold=0.544
#<METADATA>fdr=0.109
#<METADATA>sensitivity=0.933
filter FP FDR FN FNR
weak_evidence 983.35 0.08 472.21 0.04
strand_bias 23.3 0.0 0.36 0.0
contamination 1.2 0.0 0.23 0.0
slippage 0.18 0.0 0.02 0.0
haplotype 291.31 0.02 160.04 0.01
germline 83.47 0.01 54.16 0.0Batch2
#<METADATA>Ln prior of deletion of length 10=-18.78648629378458
#<METADATA>Ln prior of deletion of length 9=-17.48838437502819
#<METADATA>Ln prior of deletion of length 8=-17.883935544363286
#<METADATA>Ln prior of deletion of length 7=-20.72326583694641
#<METADATA>Ln prior of deletion of length 6=-16.430624011474748
#<METADATA>Ln prior of deletion of length 5=-17.172333034600175
#<METADATA>Ln prior of deletion of length 4=-16.424962391442882
#<METADATA>Ln prior of deletion of length 3=-16.49907094439751
#<METADATA>Ln prior of deletion of length 2=-16.147987548061447
#<METADATA>Ln prior of deletion of length 1=-15.318926279851622
#<METADATA>Ln prior of SNV=-10.280649777210495
#<METADATA>Ln prior of insertion of length 1=-16.51437129595656
#<METADATA>Ln prior of insertion of length 2=-17.025941633551284
#<METADATA>Ln prior of insertion of length 3=-20.72326583694641
#<METADATA>Ln prior of insertion of length 4=-20.72326583694641
#<METADATA>Ln prior of insertion of length 5=-18.18202614926411
#<METADATA>Ln prior of insertion of length 6=-20.72326583694641
#<METADATA>Ln prior of insertion of length 7=-20.72326583694641
#<METADATA>Ln prior of insertion of length 8=-20.72326583694641
#<METADATA>Ln prior of insertion of length 9=-20.72326583694641
#<METADATA>Ln prior of insertion of length 10=-20.72326583694641
#<METADATA>Background beta-binomial cluster=weight = 0.1406, alpha = 1.71, beta = 4.18
#<METADATA>High-AF beta-binomial cluster=weight = 0.0044, alpha = 10.07, beta = 0.50
#<METADATA>Binomial cluster=weight = 0.6376, mean = 0.029
#<METADATA>Binomial cluster=weight = 0.1652, mean = 0.490
#<METADATA>Binomial cluster=weight = 0.0309, mean = 0.990
#<METADATA>Binomial cluster=weight = 0.0213, mean = 0.124
#<METADATA>threshold=0.563
#<METADATA>fdr=0.153
#<METADATA>sensitivity=0.903
filter FP FDR FN FNR
weak_evidence 452.87 0.13 237.78 0.07
strand_bias 7.46 0.0 1.07 0.0
slippage 0.23 0.0 0.01 0.0
haplotype 30.96 0.01 19.65 0.01
germline 39.33 0.01 45.29 0.01 -
After checking these reports with our team it is clear that 2 sets of samples have quite different characteristics in terms of tumor purity. Different libraries show different pools of allele fractions therefore if both kits are performing as they should and equally difference in the way sample is collected could explain the difference between 2 results.
Also if sample preparation is pretty much the same and source of tumor and tumor purities are more or less similar then the actual problem could lie somewhere around the way one of the kits performing poorly than the other one.
In both cases we believe Mutect2 is acting as it should and results are quite expected given the circumstances.
I hope this helps.
Regards.
Please sign in to leave a comment.
4 comments