Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Fake panel of normal performs better than publicly available in variant calling tumor-only samples

0

5 comments

  • Avatar
    David Benjamin

    Could you share your Mutect2 and FilterMutectCalls commands and describe what you are using for the -germline-resource argument?  Also could you describe your PCR validation and how your samples were sequenced?

    0
    Comment actions Permalink
  • Avatar
    mariesmith

    Dear GATK team,

    Thanks for your response.
     
    We did not include a germline resource as we are interested in detecting all possible variants. The commands used are the following:
     
    gatk Mutect2 \
        --output "sample_mutect2.vcf.gz" \
        --verbosity ERROR \
        --tumor-sample "sample.bam" \
        --reference "human_g1k_v37.fasta" \
        --input "sample.bam" \
        --panel-of-normals "pon.vcf.gz" \
        --intervals "panel_custom.bed"
     
    gatk FilterMutectCalls \
        --output "sample_mutect2_filtered.vcf.gz" \
        --reference "human_g1k_v37.fasta" \
        --variant "sample_mutect2.vcf.gz"
     
    Samples are custom panels using amplicon technology in NextSeq sequencing machines and the variants validation was performed using Droplet Digital PCR.
     
    Thanks in advance,
     
    Marie
     
    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    A few thoughts. . .

    1.  Although the panel of normals specializes in artifacts, it does also contain some germline variants.  Have you verified that the missed calls are not germline variants?

    2.  What are the probe sizes of your PCR validation.  Are they, combined with the mappability of your targeted variants, sufficient to rule out mapping error?  The panel of normals is full of apparently good variants that turn out to be alignment artifacts, although I'm not presuming that this is true in our case.

    3.  Do you have a sense of how many false positives result from using the fake panel?  If you essentially turn off the panel of normals sensitivity will inevitably increase by some amount, but the question is at what price to precision.

    4.  If you want Mutect2's output to contain variants in the panel of normals (rather than silently excluding them from the VCF) you can turn on the -genotype-pon-sites argument.

    5.  Unrelated but important: NextSeq machines may exhibit a couple of sequencing artifacts that our read orientation filter catches.  Please see section II, subsection G of our documentation: https://github.com/broadinstitute/gatk/blob/master/docs/mutect/mutect.pdf.

     

    0
    Comment actions Permalink
  • Avatar
    mariesmith

    Hello David, thanks for your response!


    Unfortunately, we do not have a proper way to check if they are germline variants as we do not have access to normal-matched samples. We check probe size and mappability and they do not seem to be alignment artifacts.

    We are interested in reporting all possible variants (somatic and germline) although the number of artifacts detected as variants will increase for sure. We will rely on alternative methods to confirm they are not artifacts. As mentioned, we cannot generate a panel of normals, so we decided to use your publicly available panel of normals with the recommended -genotype-pon-sites argument.

    Importantly could you please share with us, if there is any other parameter influencing detecting all possible variants?


    Thanks for the information related to the read orientation filter, we will try to include it!


    Looking forward to your response!


    Marie

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    To get a general sense of whether these are germline variants, you could look at the allele fractions of filtered variants.  You could also intersect them with gnomAD.  You could also remove variants that appear in gnomAD from the panel using the GATK tool SelectVariants.

    I also want to warn you very strongly against looking only at probe size and mappability to be confident that variants are not alignment artifacts.  First, mappability is measured with respect to the known reference and thus gives a false sense of security in the face of reference gaps.  Secondly, there are individual-specific alignment artifacts due to structural variation.  Our panel of normals gives significant protection from mapping errors that appear to be very good.  Most validations are not very useful in this regard, but long reads (PacBio, of course, but also 10X and other technologies) are a very good complement to Illumina sequencing.

    If you want Mutect2 to emit as much as possible, you can adjust the -init-lod and -emit-lod parameters, but you will get a lot of garbage to sort through.  Even with default settings Mutect2 often outputs VCFs where maybe 1% of calls end up being good.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk