GATK haplotype caller struggling to find variant that exists
Hi, I have been trying to call variants on my bam file on a specific region becuase I was missing some variants that were to be expected when calling other parts of the genome. I have been trying to get GATK to find a particular SNP by disabling filters that may have removed it or lowering the minimum quality for detection but still I cannot find it yet even when I look in IGV it is quite clear the SNP is there.
When creating a GVCF it calls a massive region where it may possibly contain a SNP but even after genotyping it refuses to call it.
a) GATK version used: 4.4
b) Exact command used:
bowtie2 -p 50 --rg-id SRR11006622 --rg \"PL:Illumina\" --rg \"SM:SRR11006622\" -x reference/NCBI_ref/GCF_002204515.2_AaegL5.0_genomic -U nokraken/SRR11006622/SRR11006622_1.fastq.gz nokraken/SRR11006622/SRR11006622_2.fastq.gz --fast-local --no-unal --no-discordant | samtools view -@ 50 -b - | samtools fixmate -@ 50 -m - - | samtools sort -@ 50 - | samtools markdup -@ 50 - SRR11006622.mkdup.bam -
gatk HaplotypeCaller -R "reference/NCBI_ref/GCF_002204515.2_AaegL5.0_genomic.fna" -I "SRR11006622.mkdup.bam" -O "SRR11006622_region.vcf.gz" -L "NC_035109.1:315933098-315946078" --dont-use-soft-clipped-bases true --native-pair-hmm-threads 50 --minimum-mapping-quality 1 --disable-read-filter NotDuplicateReadFilter
-
Just to add, changing the mapping quality for that read to be above 20 then HC will call the variant, so the paramaters --disable-read-filter doesnt seem to do anything. In the original post I didnt put in the MappingQualityFilter disable but I have tried it and it still doesnt call that SNP
-
We have some generic advice for finding missing variants in the HC here: https://gatk.broadinstitute.org/hc/en-us/articles/360043491652-When-HaplotypeCaller-and-Mutect2-do-not-call-an-expected-variant.
Your note about the mapping quality is likely instructive here. There is another argument we use in the HaplotypeCaller to change the mapping quality threshold that actually gets emitted to the genotyper: `--mapping-quality-threshold-for-genotyping` which defaults to 20. We generally consider reads with sub 20 MQ to be suspect and don't normally emit them for calling as there is a very high likelihood of false positives. In DRAGEN-GATK we added an algorithm called Base Quality Dropout (BQD) that attemtps to factor in mapping quality into the genotyper to rescue more low quality reads if there is strong evidence.
You could try the argument `--dragen-mode` or `--dragen-378-concordance-mode` that lowers the above threshold and activates `-bqd`. If you want to only run with this algorithm and not the other dragen algorithms you should include the following arguments in tandem and you might see good results though this is not a tested path and we only vouch for the full DRAGEN-GATK codepaths.--apply-bqd
--minimum-mapping-quality 1
--mapping-quality-threshold-for-genotyping 1
--disable-cap-base-qualities-to-map-quality -
Thank very much this worked
Please sign in to leave a comment.
3 comments