I have analyzed a whole genome sequence of my sample (coverage = approx. x30) by HaplotypeCaller (a single sample) then Mutect2 (no matched control). The number of mutations (SNVs and INDELs) detected was different between HaplotypeCaller and Mutect2 (HaplotypeCaller = 4724972, Mutect2 = 55964) and that is OK.
When I plotted histograms of allele frequencies of mutations detected by HaplotypeCaller and Mutect2 (calculated from AD), the patterns were quite distinct, for which I was very confused. The allele frequencies of mutations detected by HaplotypeCaller were unimodal with the peak being around 0.5, while those detected by Mutect2 were bimodal with one peak around 0.1-0.2 and another around 1. I used the 1000G and hapmap data etc. for HaplotypeCaller and the gnomAD and PON data for Mutect2 provided by the GATK team. It appeared to me that Mutect2 removed most of the mutations detected by Haplotypecaller that had allele frequencies around 0.5, but is that correct and if so why is that?
This kind of question may have been asked before, but I could not find the relevant links. Thank you in advance.