Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 - Less somatic calls with WGS sample when compared with WES

Answered
0

11 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Vempalli Fazulur,

    Yes, Mutect2 is able to handle WGS tumor/normal data. There can be many different reasons why a variant passes filtering or not. 

    Please take a look at this resource on why Mutect2 may call a variant, and troubleshooting steps you can take while looking into certain sites.

    I am going to move your post into our Community Discussions -> General Discussion topic. Please take a look and try out those troubleshooting tips, and then let us know if you find specific examples where Mutect2 is not acting as expected and we can look into possible issues. Here is an explanation of the information we need. You can also look around the forum for other users with similar questions and how they solved them as well as our documentation about Mutect2.

    Best,

    Genevieve

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Vempalli Fazulur did any of those resources help to answer your question, please let us know what you find.

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Vempalli Fazulur

    Dear Genevieve,

    Thanks a lot for providing the resources to troubleshoot the issue.

    I tried running Mutect2 with options --debug, --linked-de-bruijn-graph, --bam-output & --recover-all-dangling-branches and there is no change in output.

    In GVCF mode, Here is an example record from WGS where it shows as homozygous reference in both normal & tumor but in WES it is called as a variant.

    WGS Normal:

    12      109639292       .       G       <NON_REF>       .       .       END=109639297   GT:DP:MIN_DP:TLOD       0/0:41:40:-1.633e+00

    WGS Tumor:

    12      109639296       .       C       <NON_REF>       .       .       END=109639296   GT:DP:MIN_DP:TLOD       0/0:101:101:-3.589e+00

    WES  Tumor/Normal:

    12 109639296 . C A . PASS AS_FilterStatus=SITE;AS_SB_TABLE=109,64|4,4;DP=184;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=174,170;MMQ=60,60;MPOS=31;NALOD=1.87;NLOD=21.36;POPAF=6;TLOD=12.08 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:67,8:0.098:75:36,0:30,8:41,26,4,4 0/0:106,0:0.013:106:42,0:64,0:68,38,0,0

    I Added --debug-graph-transformations option to Mutect2 command to generate .dot files for WGS normal & tumor separately on below three regions (WES sample called as variants and WGS no call).

    12:109639296-109639296

    15:51201293-51201293

    18:76873311-76873311

    Here is the command i used

    gatk Mutect2 -R hs37d5.fa -I test-tumor.bam -L wgsfailed-intervals.bed -germline-resource af-only-gnomad.raw.sites.vcf -pon somatic-b37_Mutect2-WGS-panel-b37.vcf -O test-tumor.unfiltered.vcf --debug-graph-transformations true --emit-ref-confidence GVCF --bam-output test-tumor.bam

    I uploaded test data (normal & tumor WGS .dot files) to gatk ftp (https://gatk.broadinstitute.org/hc/en-us/articles/360035889671) with name  "Mutect2_WGS_Testdata.zip" archive.

    Kindly let us know how can i proceed further to resolve this issue with WGS T/N data.

    Thanks In Advance

    Fazulur Rehaman

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Vempalli Fazulur,

    Please follow the instructions for our forum, do not upload test data unless you have explicitly asked to do so. There are a few more steps to look at before looking at the .dot files.

    Could you show IGV screenshots showing the -bamout of the area where your WGS shows a reference block and the WES shows a variant? Also please have the input bam shown.

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Vempalli Fazulur

    Dear Genevieve,

    Sorry for uploading the testdata and thanks for looking into this.

    As per your suggestion, I generated igvscreenshots of bamout files for regions 12:109639296 & 15:51201293 from Whole genome & Whole exome normal & tumor bam files.

    Here is the first region below attached IGV sceeenshot. In this WES tumor has a variant and WGS is showing as reference.

    Below is 2nd screenshot, Actual bam for the above 2 regions have reads at position 15:51201293 in normal whole exome, normal whole genome & tumor whole genome sample. but bamout showing only reads for whole exome tumor. I am not sure why it is not showing.

    Kindly check once & let me know how can i proceed further to resolve this.

    Thanks In Advance

    Fazulur Rehaman

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Vempalli Fazulur,

    Could you show what these sites look like in the input BAMs in IGV? 

    Thanks,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Vempalli Fazulur

    Dear Genevieve,

    Please find below the igv screenshots of both positions from input bam files.

    1. 12:109639296

    2. 15:51201293

    Thanks In Advance

    Fazulur Rehaman

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Vempalli Fazulur,

    I don't see any evidence that these sites in the WGS samples should be called as variant sites. In the GVCF, they span reference block regions with good coverage:

    WGS Normal:

    12      10963929                  <NON_REF>                   END=10963929  GT:DP:MIN_DP:TLOD       0/0:41:40:-1.633e+00

    WGS Tumor:

    12      109639296                   <NON_REF>                   END=109639296   GT:DP:MIN_DP:TLOD       0/0:101:101:-3.589e+00

    And, in the images you shared, there are no reads that support a variant site. It doesn't look like there are issues with GATK here because there is not enough evidence to support a variant site, even though there is evidence in your other WES samples. There are more factors than the variant calling algorithm that can cause these differences, for example library prep, sequencing methods, or quality control. It is hard to know exactly why you are seeing this.

    One other thing you could try would be to run your WGS without the Agilent V6 target intervals. You are going to lose some of your reads and depth by using these intervals, and it could be changing these results. GATK does local re-assembly, and if you are removing reads with the intervals option, this reassembly could be less successful, especially on the edge of the intervals.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Vempalli Fazulur

    Dear Genevieve,

    Sorry for late response and thanks for your suggestion on running WGS without intervals. I tried this way and got more variants but not the above ones which I got with whole exome. 

    Thanks In Advance

    Fazulur Rehaman

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Vempalli Fazulur,

    It may be the case that there are not any reads supporting the variant allele with your WGS sample, which would indicate an issue other than GATK.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Vempalli Fazulur

    Dear Genevieve,

    Thanks for your quick response. Yes, I agree that there are not enough reads to support variant allele in our WGS sample.

    Thanks & Regards

    Fazulur Rehaman

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk