Mutect2 - Less somatic calls with WGS sample when compared with WES
AnsweredDear GATK Team,
I am using GATK 4.1.8.1 Mutect2 to call somatic variants in both Whole genome & Whole Exome Tumor with matched normal data from same individual.
Mutect2 calling is resticted to Agilent V6 target intervals in both WGS & WES.
Preprocessing is done on bam files prior to somatic call as per GATK best practices.
Since i have matched normal i ran only Mutect2 and FilterMutectCalls steps.
Here are the commands i used
gatk Mutect2 -R $ref -I tumorbam -I normalbam -normal testN -L Agilent_V6.bed --germline-resource af-only-gnomad.raw.sites.vcf -pon somatic-b37_Mutect2-WGS-panel-b37.vcf -O test.unfiltered.vcf
gatk FilterMutectCalls -V test.unfiltered.vcf -R hs37d5.fa -O test.filtered.vcf.gz --filtering-stats test.unfiltered.vcf.stats
And Coverage of WES Normal & Tumor is 140X and WGS Normal (30X) and WGS Tumor (140X).
I found huge difference in PASS calls in WES (13544 variants) & WGS (382 variants). Even though WGS Tumor sample has high coverage at exome regions I am not sure why Mutect2 is not able to call somatic calls that WES called.
I did the same run with Mutect (v1.1.7) and mutect also has less calls in WGS when compared with WES,
Could you please let me know whether Mutect2 is able to handle WGS Tumor/Normal data.?
Thanks In Advance
Fazulur Rehaman
-
Hi Vempalli Fazulur,
Yes, Mutect2 is able to handle WGS tumor/normal data. There can be many different reasons why a variant passes filtering or not.
Please take a look at this resource on why Mutect2 may call a variant, and troubleshooting steps you can take while looking into certain sites.
I am going to move your post into our Community Discussions -> General Discussion topic. Please take a look and try out those troubleshooting tips, and then let us know if you find specific examples where Mutect2 is not acting as expected and we can look into possible issues. Here is an explanation of the information we need. You can also look around the forum for other users with similar questions and how they solved them as well as our documentation about Mutect2.
Best,
Genevieve
-
Vempalli Fazulur did any of those resources help to answer your question, please let us know what you find.
Genevieve
-
Dear Genevieve,
Thanks a lot for providing the resources to troubleshoot the issue.
I tried running Mutect2 with options --debug, --linked-de-bruijn-graph, --bam-output & --recover-all-dangling-branches and there is no change in output.
In GVCF mode, Here is an example record from WGS where it shows as homozygous reference in both normal & tumor but in WES it is called as a variant.
WGS Normal:
12 109639292 . G <NON_REF> . . END=109639297 GT:DP:MIN_DP:TLOD 0/0:41:40:-1.633e+00
WGS Tumor:
12 109639296 . C <NON_REF> . . END=109639296 GT:DP:MIN_DP:TLOD 0/0:101:101:-3.589e+00
WES Tumor/Normal:
12 109639296 . C A . PASS AS_FilterStatus=SITE;AS_SB_TABLE=109,64|4,4;DP=184;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=174,170;MMQ=60,60;MPOS=31;NALOD=1.87;NLOD=21.36;POPAF=6;TLOD=12.08 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:67,8:0.098:75:36,0:30,8:41,26,4,4 0/0:106,0:0.013:106:42,0:64,0:68,38,0,0
I Added --debug-graph-transformations option to Mutect2 command to generate .dot files for WGS normal & tumor separately on below three regions (WES sample called as variants and WGS no call).
12:109639296-109639296
15:51201293-51201293
18:76873311-76873311
Here is the command i used
gatk Mutect2 -R hs37d5.fa -I test-tumor.bam -L wgsfailed-intervals.bed -germline-resource af-only-gnomad.raw.sites.vcf -pon somatic-b37_Mutect2-WGS-panel-b37.vcf -O test-tumor.unfiltered.vcf --debug-graph-transformations true --emit-ref-confidence GVCF --bam-output test-tumor.bam
I uploaded test data (normal & tumor WGS .dot files) to gatk ftp (https://gatk.broadinstitute.org/hc/en-us/articles/360035889671) with name "Mutect2_WGS_Testdata.zip" archive.
Kindly let us know how can i proceed further to resolve this issue with WGS T/N data.
Thanks In Advance
Fazulur Rehaman
-
Hi Vempalli Fazulur,
Please follow the instructions for our forum, do not upload test data unless you have explicitly asked to do so. There are a few more steps to look at before looking at the .dot files.
Could you show IGV screenshots showing the -bamout of the area where your WGS shows a reference block and the WES shows a variant? Also please have the input bam shown.
Genevieve
-
Dear Genevieve,
Sorry for uploading the testdata and thanks for looking into this.
As per your suggestion, I generated igvscreenshots of bamout files for regions 12:109639296 & 15:51201293 from Whole genome & Whole exome normal & tumor bam files.
Here is the first region below attached IGV sceeenshot. In this WES tumor has a variant and WGS is showing as reference.

Below is 2nd screenshot, Actual bam for the above 2 regions have reads at position 15:51201293 in normal whole exome, normal whole genome & tumor whole genome sample. but bamout showing only reads for whole exome tumor. I am not sure why it is not showing.

Kindly check once & let me know how can i proceed further to resolve this.
Thanks In Advance
Fazulur Rehaman
-
Hi Vempalli Fazulur,
Could you show what these sites look like in the input BAMs in IGV?
Thanks,
Genevieve
-
Dear Genevieve,
Please find below the igv screenshots of both positions from input bam files.
1. 12:109639296

2. 15:51201293

Thanks In Advance
Fazulur Rehaman
-
Hi Vempalli Fazulur,
I don't see any evidence that these sites in the WGS samples should be called as variant sites. In the GVCF, they span reference block regions with good coverage:
WGS Normal:
12 109639292 . G <NON_REF> . . END=109639297 GT:DP:MIN_DP:TLOD 0/0:41:40:-1.633e+00
WGS Tumor:
12 109639296 . C <NON_REF> . . END=109639296 GT:DP:MIN_DP:TLOD 0/0:101:101:-3.589e+00
And, in the images you shared, there are no reads that support a variant site. It doesn't look like there are issues with GATK here because there is not enough evidence to support a variant site, even though there is evidence in your other WES samples. There are more factors than the variant calling algorithm that can cause these differences, for example library prep, sequencing methods, or quality control. It is hard to know exactly why you are seeing this.
One other thing you could try would be to run your WGS without the Agilent V6 target intervals. You are going to lose some of your reads and depth by using these intervals, and it could be changing these results. GATK does local re-assembly, and if you are removing reads with the intervals option, this reassembly could be less successful, especially on the edge of the intervals.
Best,
Genevieve
-
Dear Genevieve,
Sorry for late response and thanks for your suggestion on running WGS without intervals. I tried this way and got more variants but not the above ones which I got with whole exome.
Thanks In Advance
Fazulur Rehaman
-
Hi Vempalli Fazulur,
It may be the case that there are not any reads supporting the variant allele with your WGS sample, which would indicate an issue other than GATK.
Best,
Genevieve
-
Dear Genevieve,
Thanks for your quick response. Yes, I agree that there are not enough reads to support variant allele in our WGS sample.
Thanks & Regards
Fazulur Rehaman
Please sign in to leave a comment.
11 comments