Why does GATK haplotype caller calls SNPs from soft-clipped bases?
Version of GATK used: 3.8-1-0-gf15c1c3ef and 4.1.8.1.
Command used for local indel-reaglinment: java -Xmx8g -jar /home/software/AlignmentPipeline/GenomeAnalysisTK-3.8-1-0-gf15c1c3ef/GenomeAnalysisTK.jar -T IndelRealigner -R /home/data/Shared/References/reference.fna -I ./Dir_sample/sample_rh.rmDupli.bam -targetIntervals ./Dir_sample/sample_rh.intervals -o ./Dir_sample/sample_rh.rmDupliIndelRealigned.bam
and
command used for variant calling (mainly interested in SNPs):
/home/software/VariantCallingPipeline/SnpCalling/gatk-4.1.8.1/gatk --java-options "-Xmx30G -Djava.library.path=/home/software/VariantCallingPipeline/SnpCalling/gatk-4.1.8.1/libs -XX:+UseParallelGC -XX:ParallelGCThreads=2" HaplotypeCaller -I '+argList[1]+" -O "+dest1+" -R "+argList[0]+" --sample-name "+argList[-1]+" --emit-ref-confidence GVCF -pairHMM FASTEST_AVAILABLE --native-pair-hmm-threads 2 -L "+argList[2]+":"+str(argList[3])+"-"+str(argList[4]))
Hi,
After using the above-mentioned two commands and following GATK's recommended filtering criteria, I noticed that thousands of SNPs have been called from soft-clipped bases, for example, in the attached photos (jbrowse alignment and text file), I observed that variants have been called at 242433 and 242435 positions, but there is no read aligned at these positions in the bam file. Instead, these positions are covered by soft-clipped bases, of course, I can turn on the parameter "--dont-use-soft-clipped-bases" but is it recommended? and I would be grateful if someone can explain me this behaviour of GATK haplotypecaller.
-
Hello AlibiKr, we do not support GATK3 anymore. Please re-run these commands with GATK4 and let us know if the issue persists.
Please sign in to leave a comment.
1 comment