Why doesn't haplotypecaller use PON?
I am new to GATK and have just run HaplotypeCaller several times according to the best practice. I am now planning to try Mutect2 to compare my normal and tumor samples, but am rather confused by the differences in the procedures between HaplotypeCaller and Mutect2. I have read several times on the GATK websites that "HaplotypeCaller and Mutect2 are quite different and employ distinct algorithms", but could not find answers to the questions below.
(1) Why doesn't HaplotypeCaller use PON, if PON is good to detect technical artifacts (especially when a single sample is analyzed by HaplotypeCaller)?
(2) Why doesn't Mutect2 have a VQSR step (or does it have)? Also, is it necessary to do BQSR before Mutect2 analyses?
(3) Why does HaplotypeCaller use 1000G and Hapmap etc (train and truth) while Mutect2 use gnomAD for germline resources (no vice versa)?
Thank you in advance.
-
Hi sinc,
Thanks for trying out GATK and HaplotypeCaller! We have an article that talks about the differences between Mutect2 and HaplotypeCaller that you might find very helpful: https://gatk.broadinstitute.org/hc/en-us/articles/360035890491-Somatic-calling-is-NOT-simply-a-difference-between-two-callsets
To answer your more specific questions:
- HaplotypeCaller doesn't use a PON because it isn't responsible for filtering. The technical artifacts can be found from the annotations and strange sites will be filtered out with whatever filtering method you choose. Many of our users use VQSR, while for some cases CNN might be better at finding the technical artifacts. Mutect2 does not have this approach and needs a PON because there are not enough observations to get enough statistical power to find the artifacts.
- For somatic calling we recommend running FilterMutectCalls after Mutect2. BQSR is recommended for both somatic and germline.
- gnomAD contains all things that look like germline and Mutect2 does not want germline variants. Even bad variants in gnomad are filtered out because they are most likely artifacts. 100G and Hapmap are stable resources that we have used for germline for a long time.
Hope this helps!
Genevieve
Please sign in to leave a comment.
1 comment