Mutect2, force-calling-mode, and restricting calls to positions in force-call-alleles.vcf
Can you please provide
a) GATK version used v4.1.7.0
b) Exact GATK commands used
c) The entire error log if applicable.
I am using GATK v4.1.7.0 and I am interested in running Mutect2 with an alleles file (force-call-alleles.vcf). In the example on the web site which shows an example of the use of the --alleles parameter, it states that the force-calls mode calls alleles in the force-call-alleles.vcf in addition to any other variants Mutect2 discovers. Is it possible, however, to restrict the output to variants ONLY present within the force-call-alleles.vcf file?
-
Would the -L parameter work? You can use a VCF file as the interval for the tool to operate.
Intervals and Interval Lists: https://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists
Mutect2 parameters: https://gatk.broadinstitute.org/hc/en-us/articles/360045800552-Mutect2
-
Thank you very much for the suggestion. I just tried using Mutect2 with the -L parameter, and supplying it with a vcf file. It worked perfectly. Only the positions in the vcf file were examined.
-
Thanks for posting your solution Robert Dubin! Glad it worked.
-
Hi,
I wanted to follow-up on this question Genevieve as I am also interested in restricting the output to alleles ONLY present in the force-call-alleles.vcf.
I understand that using this .vcf file as both the -L parameter and the --alleles parameter will achieve this.
But I saw in another GATK thread this:
"In general, using intervals (
-L
) can introduce artifacts if you choose the intervals unwisely. Reads will get discarded that are outside the interval when the reads may have been used in making variant calls (for example, lost mates). When we use intervals in our production pipelines with targeted sequencing, we make sure to give sufficient padding around the targeted sites (100 bp on each side). There are many GATK tools which require you to be careful regaring interval usage — this is because it can change the result, depending on how you use intervals."https://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists
Do you think restricting -L and -alleles to the vcf of sites I am interested in might lead to an issue with accurate calls or REF/ALT counts at the vcf sites? Should I also turn on padding when I run this command?
Thank you!TA
-
Hi TA
Our recommendation would be to use padded intervals even if you provide alleles as parameter, perform your calls and remove unwanted sites in the final VCF once you are done.
This would be the most sane way of calling variants without losing read and reference information.
Keep in mind that if you are using Mutect2 to call somatic variants this may require additonal attention to detail since Mutect2 has more detailed filters all of which may filter in or our sites of your interest depending on the haplotype context.
For HaplotypeCaller this is less of an issue.
I hope this helps.
Please sign in to leave a comment.
5 comments