Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

How to compensate for absence of normals for few samples?



  • Avatar
    David Benjamin

    All you can do is be honest about the fact that these 15 samples will have a lot of false positives.  Just to give a rough idea of this, suppose you hard-filtered every variant in gnomAD, even singletons, from your tumor-only calls  You would still expect several tens of thousands of unique germline variants per genome to remain in your calls, with little recourse.  If your tumor sample is very impure you can distinguish somatic calls from germline by their allele fractions, which FiterMutectCalls tries to do, but this is usually not very effective.

    Whatever you do, do not simulate normals.  There is no point.  Just use the panel of normals and af-only gnomAD from the gatk best practices google bucket.

    Comment actions Permalink
  • Avatar
    rohit satyam

    Hi David Benjamin

    Thanks for your prompt response. How about calling variants using different tools like DeepVariants, Mutect2, and Varscan and then intersecting the VCF files to rule out false positives? Will that help. We are planning to do this for all the samples anyway. 

    I wish to understand if that will help those 15 samples?


    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk