Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Haplotypecaller - log/list of SNPs that are present but were not retained in VCF

0

1 comment

  • Avatar
    James Emery

    Hello Wannes Dermauw. We do have an argument that does very closely match with what you are asking for. There are a number of calling thresholds/heuristics that can cause variants to be assembled by the local HaplotypeCaller assembly but not make it into the VCF. We have an argument `--output-mode EMIT_ALL_ACTIVE_SITES` that forces the VCF output to include a line (but not necessarily all that much more in the genotype fields) for every position that the assembly engine saw as a variant. It has some documented limitations:

    /** Produces calls at any region over the activity threshold regardless of confidence. On occasion, this will output
    * HOM_REF records where no call could be confidently made. This does not necessarily output calls for all sites in
    * a region. This argument is intended only for point mutations (SNPs); it will not produce a comprehensive set of
    * indels. */
    EMIT_ALL_ACTIVE_SITES

    This is limited to only events that were seen by assembly, which can sometimes fail. If you really care about getting a VCF line for everything that shows up in the pileups on visual inspection, we also have an argument that attempts to supplement the assembly engine with pileups `--pileup-detection`. This will necessarily introduce some false positives but should do a much better job at capturing all of the events a human possibly could pick out of an IGV screenshot.

    We do not however have a good way of displaying the exact reason that an event was dropped. Such a feature would be useful and it can be extracted from the various debugging outputs for HaplotypeCaller if you know where to look but we currently don't have any unified way to view all of those output events.

    Hope this answers your question.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk