Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Reducing false positives in somatic variant calling

0

3 comments

  • Avatar
    Gökalp Çelik

    Hi Fia

    Unless you are performing tumor-normal matched calling your variants will always contain more false positives. You may try reducing them to a level by adjusting minimum AF to tumor purity levels however this may still be superficial compared to what an actual matched normal can provide. 

    Our team suggests using the PON we created, gnomad AF only resource, inputting read orientation metrics and a possible matched normal for best results with Mutect2. 

    I hope this helps. 

    0
    Comment actions Permalink
  • Avatar
    Fia

    Thank you for the feedback. I have been doing a lot of reading to figure out how to filter out false positives. Once thing that came up while I was reading was the formatting of the arguments for Muect2. I saw a website suggest that name of the tumor and normal sample must be specified after the respective input.bam or else it will impact results, while I didn't see this explicitly on the GATK website. I ran both scripts I am specifying below and got different total output variants. Can you please confirm which is correct.

    Additionally, the PoN GATK provides I believe is derived from blood samples. However, I couldn't find which process was used for sequencing. Was it Illumina?  

    Many thanks. 

    gatk --java-options "-Xmx${command_mem}m" Mutect2 \
                -R ${ref_fasta} \
                -I ${tumor_bam} \
                -I ${normal_bam} \
                -normal B_111_1111 \
                -tumor M_111_1111 \
                ${"--germline-resource " + gnomad} \
                ${"-L " + intervals} \
                -O "${output_vcf}" \
                -bamout bamout.bam \
                ${true='--f1r2-tar-gz f1r2.tar.gz' false='' run_ob_filter} \
                -pairHMM AVX_LOGLESS_CACHING \
                --native-pair-hmm-threads 1 \
                --smith-waterman AVX_ENABLED \
                ${m2_extra_args}
     
    or 
     
    gatk --java-options "-Xmx${command_mem}m" Mutect2 \
                -R ${ref_fasta} \
                -I ${tumor_bam} \
                -tumor M_111_1111 \
                -I ${normal_bam} \
                -normal B_111_1111 \
                ${"--germline-resource " + gnomad} \
                ${"-L " + intervals} \
                -O "${output_vcf}" \
                -bamout bamout.bam \
                ${true='--f1r2-tar-gz f1r2.tar.gz' false='' run_ob_filter} \
                -pairHMM AVX_LOGLESS_CACHING \
                --native-pair-hmm-threads 1 \
                --smith-waterman AVX_ENABLED \
                ${m2_extra_args}

     

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    @Fia It is totally normal for FilterMutectCalls to filter 90% or more of Mutect2 variant calls.  Mutect2 is designed to be very permissive and naive, leaving almost all the responsibility for filtering to FilterMutectCalls.

    Our public PoNs are derived from Illumina sequencing and work pretty well for all Illumina samples, regardless of tissue type etc.

    The order of arguments is irrelevant in the GATK, so your two commands really shouldn't differ.  In any case, though, the -tumor argument is unnecessary and deprecated.  Mutect2 only uses the -normal argument and assumes everything else is a tumor.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk