HaplotypeCaller local Assembly
Can you please provide
a) GATK version used: GATK.4.1.3.0
b) Exact GATK commands used :
$path2gatk HaplotypeCaller \
-I $path2bam$1.bam \
-R $path2ref \
-O $path2output_$1.GATK.vcf \
-bamout $path2output$1_out.bam \
--force-active --disable-optimizations \
-ip 100
Hi,
You can see in the picture below two bam files. At the bottom, the original bam file and at the top, a bam file generated with the command above corresponding to the local assembly haplotypeCaller generated to call the variants. As you can see, the local assembly is definitely wrong in that region.
How could I avoid this kind of scenario?
Thank you very much!
-
Hi,
You said: "At the bottom, the original bam file and at the top, a bam file generated with the command above"?
So the original had no reads in the region but the bamout is showing reads with a lot of variation? Am I understanding this right?
-
Hi,
Thank you very much for your reply!
I should give you more information sorry.
First, that region ("without" reads) is ±50 bp.
Many reads that mapped that region are clipped here on IGV. So indeed, if you don't show soft clipped bases on IGV, that region has no reads.We actually know (based on short and long-read sequencing) that the sample sequenced shows in that region an insertion and a deletion compared to the reference. This structural variation clearly explains the pattern we see on IGV.
However, HaplotypeCaller seems to force the soft clipped bases to map the reference after the local assembly.
Do you know how I can exclude this kind of call?
Thanks a lot!
-
Hi Etergemina
So if I am understanding this correctly, you want HaplotypeCaller to exclude the soft clipped based. For that purpose you can use this argument --dont-use-soft-clipped-bases
-
Hi,
It worked! Thank you so much!
You can see a picture enclosed of the IGV browser view with the new bamout on top (with the --dont-use-soft-clipped-bases option). We can still see a few artificialHaplotypeRG in the middle but at least no wrong calls.
I have a last question sorry. Is it fine to use this option genome-wide?
Do you recommend it?
Thanks!
Picture description:
- Top: bamout with "--dont-use-soft-clipped-bases"
- Middle: bamout without --dont-use-soft-clipped-bases"
- Bottom: original bam (bwa mem)
-
Hi,
So the way HaplotypeCaller and Mutect2 work is that they assume that aligners sometimes make mistakes and hence do local reassembly in regions with variation(aka active sites). To do this they include soft clipped bases because if you excluded them you would be throwing away evidence, such as for longer insertions and deletions. So we do not recommend turn on the --dont-use-soft-clipped-bases option.
Also you mentioned that "We actually know (based on short and long-read sequencing) that the sample sequenced shows in that region an insertion and a deletion compared to the reference. This structural variation clearly explains the pattern we see on IGV." Which means that the HaplotypeCaller is detecting variation in that region and is not completely wrong although maybe the representation is not accurate.
-
Hi,
Thank you very much for your answer.
I find this problematic because in this case (which is rare I suppose) haplotypeCaller without --dont-use-soft-clipped-bases option created 10 variants that do not exist (and I don't see yet how to remove them based on the VCF created).
From what I understood (and I am perhaps wrong?), it looks like the artificial reads generated from the local assembly are kind of forced to map the reference and result in a profile we would not see with bwa mem for example. -
Not quite. Take a look at this doc: https://gatk.broadinstitute.org/hc/en-us/articles/360036227612-Local-re-assembly-and-haplotype-determination-HaplotypeCaller-and-Mutect2-
-
OK, Thanks
Please sign in to leave a comment.
8 comments