Dragen-mode seems dangerous
Hi. I have now encountered multiple instances where HaplotypeCaller with Dragen-mode miss or misscall a real variant. These variant have been validated by ortholog methods. It might be different issues depending on the variants in question but I'd like to go through with one to understand the issue or properly report a bug. As a side note, it's currently very confusing when you try to find what is the current best practice pipeline. For example, you have the multi-step generation of a BAM from Fastq and the tool Dragmap both of which claims to be the best practice. I have seen the problem with BAMs generated both ways. The problem is also seen whether using the gVCF mode or not, group calling or not. Here is, for example the vcf entries generated with Dragen-mode. First, the command line:
/work/soft/packages/gatk-4.2.2.0/gatk HaplotypeCaller -R Homo_sapiens_assembly38.fasta -I HM230077.bam -O HM230077.vcf --dragen-mode true --dragstr-params-path HM230077_dragstr_model.txt --bam-output HM230077bamout.bam
Then the vcf entry:
chr12 |
6018443 |
. |
G |
A |
0 |
. |
AC=0;AF=0.00;AN=2;BaseQRankSum=1.142;DP=307;ExcessHet=3.0103;FS=1.997;MLEAC=1;MLEAF=0.500;MQ=89.70;MQRankSum=-4.967;ReadPosRankSum=0.541;SOR=0.606 |
GT:AD:DP:GP:GQ:PG:PL |
0/0:191,109:300:0,0.77,51.78:1:0,34.77,37.78:34,0,48 |
If I remove from the command line above --Dragen-mode True, I get the right entry:
chr12 |
6018443 |
. |
G |
A |
2076.64 |
. |
AC=1;AF=0.500;AN=2;BaseQRankSum=0.914;DP=282;ExcessHet=3.0103;FS=1.492;MLEAC=1;MLEAF=0.500;MQ=54.41;MQRankSum=-4.004;QD=7.61;ReadPosRankSum=0.843;SOR=0.624 |
GT:AD:DP:GQ:PL |
0/1:168,105:273:99:2084,0,3657 |
In the bam and bamout, we also clearly see the alt allele at a ref/alt ratio of 60/40:
So HaplotypeCaller recognize that there is something going on there otherwise there would not be an entry but still decide that the genotype is 0/0. Quality of the alt reads and alt base are good too. Can you help me track what is going on and more importantly make sure that those evident good variant are not missed? Is it better to never use Dragen-mode? Thank you.
-
More info on the subject. I had followed the steps described here: https://gatk.broadinstitute.org/hc/en-us/articles/4407897446939--How-to-Run-germline-single-sample-short-variant-discovery-in-DRAGEN-mode
There is no Markduplicate steps listed. However if I look at the pipeline here: https://app.terra.bio/#workspaces/warp-pipelines/DRAGEN-GATK-Whole-Genome-Germline-Pipeline
There is the MarkDuplicates step between aligning the reads and the HaplotypeCaller step. If I add that step, I do get the proper call with Dragen-mode activated. Cased closed? No. The same library was sequenced again, analysed with the right processing steps, does not even get the position called allthough the bam looks exactly the same as the previous one on IGV. Removing the switch --Dragen-mode of the command line retrieves the variant call.
Running both sequencing of that sample in the actual physical Dragen pipeline (with FPGA) retrieves the variant in both cases. They are supposed to be functionnaly equivalent but currently are not.
-
Hi Luc Marchand
Latest advancements within HaplotypeCaller engine brings the pileup caller in combination with local assembly and realignment. DRAGEN uses both methods simultaneously to generate its calls but such advancements are making their way into the opensource GATK HaplotypeCaller just recently. It may be better if you can also use the latest version of the tools and try getting these calls.
--pileup-detection <Boolean> If enabled, the variant caller will create pileup-based haplotypes in addition to the
assembly-based haplotype generation. Default value: false. Possible values: {true, false}I hope this helps.
-
Hello Luc Marchand.
I would like to add one more suggestion for things to try to recover variants. In the recent GATK 4.5.0.0 release we included a lot of work improving the DRAGEN-GATK codebase accuracy and compatability. As part of that tool we created a new arument to support that:--dragen-378-concordance-mode
This includes a significantly improved version of the "pileup-detection" mode that SkyWarrior suggested as well as other changes and bugfixes that should help improve the accuracy of calling overall. It is more likely to output the variant in your case.
-
Thanks both for the answers. I will test those new switches / features as soon as I can.
Please sign in to leave a comment.
4 comments