Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2, force-calling-mode, and restricting calls to positions in force-call-alleles.vcf

0

5 comments

  • Avatar
    Genevieve Brandt (she/her)

    Would the -L parameter work? You can use a VCF file as the interval for the tool to operate.

    Intervals and Interval Lists: https://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists

    Mutect2 parameters: https://gatk.broadinstitute.org/hc/en-us/articles/360045800552-Mutect2

    0
    Comment actions Permalink
  • Avatar
    Robert Dubin

    Thank you very much for the suggestion. I just tried using Mutect2 with the -L parameter, and supplying it with a vcf file. It worked perfectly. Only the positions in the vcf file were examined. 

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks for posting your solution Robert Dubin! Glad it worked.

    0
    Comment actions Permalink
  • Avatar
    TA

    Hi,

     

    I wanted to follow-up on this question Genevieve as I am also interested in restricting the output to alleles ONLY present in the force-call-alleles.vcf.

    I understand that using this .vcf file as both the -L parameter and the --alleles parameter will achieve this.

    But I saw in another GATK thread this: 

    "In general, using intervals (-L) can introduce artifacts if you choose the intervals unwisely. Reads will get discarded that are outside the interval when the reads may have been used in making variant calls (for example, lost mates). When we use intervals in our production pipelines with targeted sequencing, we make sure to give sufficient padding around the targeted sites (100 bp on each side). There are many GATK tools which require you to be careful regaring interval usage — this is because it can change the result, depending on how you use intervals."

     

    https://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists

    Do you think restricting -L and -alleles to the vcf of sites I am interested in might lead to an issue with accurate calls or REF/ALT counts at the vcf sites? Should I also turn on padding when I run this command?

     


    Thank you!

     

    TA

     

     

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi TA

    Our recommendation would be to use padded intervals even if you provide alleles as parameter, perform your calls and remove unwanted sites in the final VCF once you are done. 

    This would be the most sane way of calling variants without losing read and reference information. 

    Keep in mind that if you are using Mutect2 to call somatic variants this may require additonal attention to detail since Mutect2 has more detailed filters all of which may filter in or our sites of your interest depending on the haplotype context. 

    For HaplotypeCaller this is less of an issue. 

    I hope this helps. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk