Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Active Regions Mutect2 Theory

1

3 comments

  • Avatar
    Gökalp Çelik

    Hi Kate Newcomer

    Mutect2 uses entrophy based active region detection which you can read more about its mathematical details in the document below

    https://github.com/broadinstitute/gatk/blob/master/docs/mutect/mutect.pdf 

    For those missing mutations in mutect2 there are bunch of parameters that you may modify to change the sensitivity of calling. 

    --initial-tumor-lod 0.0
    --tumor-lod-to-emit 0.0

    These 2 parameters are used in our mitochondria-mode for sensitive calling for ultra rare variants which can also be used for high depth tumor data for increased sensitivity. 

    We also suggest users to disable downsampling by setting 

    --max-reads-per-alignment-start 0

    to ensure all reads are considered not a portion of them for very high coverage regions. 

    We also have additional recommendations in the document below for missing variants. 

    https://gatk.broadinstitute.org/hc/en-us/articles/360043491652-When-HaplotypeCaller-and-Mutect2-do-not-call-an-expected-variant 

    Finally we can also recommend you to try our pileup detection algorithm which can utilize both the aligned reads data along with the reassembled haplotypes for a better hmm formation

    --pileup-detection <Boolean>  If enabled, the variant caller will create pileup-based haplotypes in addition to the
                                  assembly-based haplotype generation.  Default value: false. Possible values: {true, false}
    --use-pdhmm <Boolean>         Partially Determined HMM, an alternative to the regular assembly haplotypes where we
                                  instead construct artificial haplotypes out of the union of the assembly and pileup
                                  alleles.  Default value: false. Possible values: {true, false} 

    Make sure you use both of these parameters to enable pileup detection mode to its fullest. PDHMM code is currently not accelerated and may be slow but if intervals are run in parallel scatters you may gain some speed to make up for the missing performance. PDHMM enables Mutect2 to integrate all variants detected in the aligned reads into the reassembly graph to ensure generation of better graphs for variant detection. 

    I hope these would help solve your problem. 

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Kate Newcomer

    Hi Gökalp Çelik ,

    Thank you. For the Mutect2 paper, section B, I have a few questions:

    1. The equation below seems to be the only instance of bernouli entropy. My understanding is that this equation calculates the log odds of one of the two cases: that an alt allele exists or that only ref alleles exist. That would assume this is applied per base not per region. Am I mistaken?
    2. If my assumption in 1 is true, how is active region detection calculated?
    3. Could variability in sequencing depth even at high levels (ie peaks that vary from 400-1000+ reads) affect active region calculation?

    1
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi again.

    Yes you are absolutely right that lod is calculated per base not per region. Mutect2/HaplotypeCaller walks over the genome with a given window size (which can be extended based on assembly region extension parameter) and a single active base is enough to set the region as active. Once a region has sites greater than the 

    --initial-tumor-lod

    default then Mutect2 starts detecting variants within the active region. Setting this variable 0 practically makes it quite permissive for Mutect2 to become more sensitive. Once reassembly and pairHMM is done sites with lod larger than the default value of 

    --tumor-lod-to-emit

    are emitted as variation in the VCF along with many other parameters calculated based on PoN, germline resource, matched normal, read orientation, sequencing artifact detection and PCR artifact detection, contamination, segmentation etc..

    Too high depths will reduce the initial tumor lod or the calculated tumor lod for ultra rare sites below the default threshold therefore it will limit the detection sensitivity for Mutect2. However if ultra rare allele fractions are a concern and needs detection we recommend setting the lod to 0.0. Beware that this also will bring more false positives along with rare variants. 

    I hope this helps. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk