gvcf in Dragen mode and GenomicsDBImport
Hi,
I'm running HC in Dragen mode (using broadinstitute/gatk:4.6.1.0 docker image):
gatk --java-option -Xmx4g HaplotypeCaller \
-R <reference> \
-I <input_bam> \
-O <output_gvcf> \
--dragen-mode true \
-ERC GVCF \
--native-pair-hmm-threads 1 \
-L interval \
--dragstr-params-path <dragstr_path>
--alleles IRPv1.2.forcegt.sites.all.vcf.gz
(I'm using alleles to force calls in sites used a downstream imputation, as is used by Dragen on BaseSpace)
With those gvcfs (15 of them) I'm calling:
gatk --java-options -Xmx8g GenomicsDBImport \
--genomicsdb-workspace-path <workspace_path> \
-L <interval> \
--arguments_file <args_file>
Where args_file holds 15 -V <gvcf_path> lines.
I'm getting the follwong error message:
In file/stream <STREAM NAME>, at contig <CHR>, position <POS>, for sample <SAMPLE NAME>, the field DRAGstrInfo has 1 elements; expected 2
If I look at the header of the gvcf files I see:
##INFO=<ID=DRAGstrInfo,Number=2,Type=Integer,Description="Indicates the period and repeat count">
But in the calls section it's:
chr1 45521 . G GA,<NON_REF> . . DP=6;DRAGstrInfo=1;DRAGstrParams=20.00;ExcessHet=0.00;RAW_MQandDP=2038,6
chr1 61350 . TA T,<NON_REF> . . BaseQRankSum=0.842;DP=7;DRAGstrInfo=1;DRAGstrParams=10.00;ExcessHet=0.00;MQ
So, it is indeed a single number in the calls, and defnined as two in the header. DRAGstrParams has the same problem (only 3 instead of 2).
I've been able to work around the issue by removing DRAGstrInfo and DRAGstrParams using bcftools.
On a second note, I wasn't able to get HaplotypeCaller to produce gvcf (-ERC GVCF) with --dragen-378-concordance-mode,
but I don't have the exact logs any more. There was an error that went away with the exact same command line if I switch to --dragen-mode true
(I can try to reproduce if it's helpful).
-
Certain DRAGEN INFO fields are known to cause issues with VCF standards so removing them is probably the only option you have to get your files working. On the other hand, inability to produce GVCF using dragen concordance mode is a known issue currently that pdhmm mode is not compatible with producing GVCF output. We are working on a fix to get that working however we must add that without pdhmm true concordance mode is not possible especially pileup detection mode fidelity lies in using pdhmm. If you are only interested in calling known sites per sample then you may disable GVCF output and combine all VCF files that you produce using DRAGEN concordance mode and do your analysis.
I hope this helps.
Regards.
-
Hi Gökalp Çelik
Thanks!
I'm actually hoping to get both known sites and regular active site detection results, I thought that's what passing in alleles did?
I'm calling a number lower coverage genomes, I was hoping to do joint calling followed by imputation on a the joint call set. Current number of genomes is small, but it's will grow into thousands.
-
Hi again.
If joint calling is on your list then current pdhmm implementation will not work for you. You may need to use basic parameters for dragen compatibility but not the concordance mode parameters.
Regards.
Please sign in to leave a comment.
3 comments