Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Reducing false positives in somatic variant calling

0

5 comments

  • Avatar
    Gökalp Çelik

    Hi Fia

    Unless you are performing tumor-normal matched calling your variants will always contain more false positives. You may try reducing them to a level by adjusting minimum AF to tumor purity levels however this may still be superficial compared to what an actual matched normal can provide. 

    Our team suggests using the PON we created, gnomad AF only resource, inputting read orientation metrics and a possible matched normal for best results with Mutect2. 

    I hope this helps. 

    0
    Comment actions Permalink
  • Avatar
    Fia

    Thank you for the feedback. I have been doing a lot of reading to figure out how to filter out false positives. Once thing that came up while I was reading was the formatting of the arguments for Muect2. I saw a website suggest that name of the tumor and normal sample must be specified after the respective input.bam or else it will impact results, while I didn't see this explicitly on the GATK website. I ran both scripts I am specifying below and got different total output variants. Can you please confirm which is correct.

    Additionally, the PoN GATK provides I believe is derived from blood samples. However, I couldn't find which process was used for sequencing. Was it Illumina?  

    Many thanks. 

    gatk --java-options "-Xmx${command_mem}m" Mutect2 \
                -R ${ref_fasta} \
                -I ${tumor_bam} \
                -I ${normal_bam} \
                -normal B_111_1111 \
                -tumor M_111_1111 \
                ${"--germline-resource " + gnomad} \
                ${"-L " + intervals} \
                -O "${output_vcf}" \
                -bamout bamout.bam \
                ${true='--f1r2-tar-gz f1r2.tar.gz' false='' run_ob_filter} \
                -pairHMM AVX_LOGLESS_CACHING \
                --native-pair-hmm-threads 1 \
                --smith-waterman AVX_ENABLED \
                ${m2_extra_args}
     
    or 
     
    gatk --java-options "-Xmx${command_mem}m" Mutect2 \
                -R ${ref_fasta} \
                -I ${tumor_bam} \
                -tumor M_111_1111 \
                -I ${normal_bam} \
                -normal B_111_1111 \
                ${"--germline-resource " + gnomad} \
                ${"-L " + intervals} \
                -O "${output_vcf}" \
                -bamout bamout.bam \
                ${true='--f1r2-tar-gz f1r2.tar.gz' false='' run_ob_filter} \
                -pairHMM AVX_LOGLESS_CACHING \
                --native-pair-hmm-threads 1 \
                --smith-waterman AVX_ENABLED \
                ${m2_extra_args}

     

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    @Fia It is totally normal for FilterMutectCalls to filter 90% or more of Mutect2 variant calls.  Mutect2 is designed to be very permissive and naive, leaving almost all the responsibility for filtering to FilterMutectCalls.

    Our public PoNs are derived from Illumina sequencing and work pretty well for all Illumina samples, regardless of tissue type etc.

    The order of arguments is irrelevant in the GATK, so your two commands really shouldn't differ.  In any case, though, the -tumor argument is unnecessary and deprecated.  Mutect2 only uses the -normal argument and assumes everything else is a tumor.

    0
    Comment actions Permalink
  • Avatar
    Michelle

    Hello everyone,

    I'm currently working with exome sequencing data in a tumor-only mode using GATK's Mutect2 and FilterMutectCalls. I'm facing an issue where over 90% of the variants are being filtered out by FilterMutectCalls, leaving only about 3% of the variants with the PASS flag.

    I'm wondering why this is happening and what might be causing such a high filtering rate. Additionally, which filters could be adjusted or relaxed to retain more variants without compromising the integrity of the results? Any suggestions or insights would be greatly appreciated.

    Thank you!

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Michelle

    There are a bunch of filters set in motion for Mutect2 and FilterMutectCalls. Details of these filters are explained in the document below. 

    https://github.com/broadinstitute/gatk/blob/master/docs/mutect/mutect.pdf 

    Each of these filters have their own model and a combination of all these filters cause many of the findings to be filtered out. Mutect2 is a quite sensitive caller therefore any little change that may expose itself as a variation in the final assembly will come out in the raw data. FilterMutectCalls will consider each of those filters and apply them to all sites and in the end you will get a combination of filters applied to those non-PASS sites. 

    Since you are using tumor-only approach our suggestions would be to use the Panel-of-Normals and germline resource we provided as supplementary filters to make sure that you do not capture artifacts and possible germline events as somatic variation. Of course having a matched normal is the best approach. 

    For allele fraction filtering you need to make sure that you know the fraction of tumor cells in your sample which can help removing or including more variants in your data. 

    Clustered events filter can be adjusted to include or exclude more variants from filters however you need to make sure that your known valid variants or false positives are not adversely affected by this filter. Our defaults are usually quite balanced for many of these filters.  

    I hope this helps. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk