Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

--interval-padding range differs called variants

0

1 comment

  • Avatar
    Gökalp Çelik

    Hi Sinem Selvi

    This is a known issue about how intervals affect HaplotypeCaller and Mutect2's performance in calling variants. 

    When a certain amount of interval provided to HaplotypeCaller/Mutect2, region of interest and active region search is automatically cropped to that size. So only reads that are captured and reference genome within the region of interest is available to the local reassembly engine. Changing the amount of reference genome provided changes all the the reassembly inputs therefore a different assembly may occur and the weights and evidences for the haplotype found with the shorter interval may be invalidated with the additional data. Therefore our recommendation is to keep intervals as long as possible but beware that complex regions and repeats will always show this discrepancy. The best practice is to provide intervals split by long stretches of Ns and later on limiting the number of variants discovered by SelectVariants tool. 

    Looking at the call you have there it is quite unbalanced in terms of ADs due to the regions complexity and repetitiveness both of which is counteracting the variant call quality.  

    Additionally you may want to check the article below for any variant that you think is real but not captured by HaplotypeCaller/Mutect2. 

    https://gatk.broadinstitute.org/hc/en-us/articles/360043491652-When-HaplotypeCaller-and-Mutect2-do-not-call-an-expected-variant 

    Finally. Our latest versions of GATK includes a parameter to add pileup based calling in HaplotypeCaller. If you are interested in you can try the parameter below.

    --pileup-detection <Boolean>  If enabled, the variant caller will create pileup-based haplotypes in addition to the
                                  assembly-based haplotype generation.  Default value: false. Possible values: {true, false}

     

    I hope this helps. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk