GenomicsDBImport just gives IntelInflater - Zero Bytes Written : 0 and GenotypeGVCFs becomes blank vcf
If you are seeing an error, please provide(REQUIRED) :
a) GATK version used: 4.2.1.0
b) Exact command used:
/gatk/gatk --java-options "-Xmx4g -Xms4g" \
GenomicsDBImport \
--genomicsdb-workspace-path UKBB \
--batch-size 500 \
-L tp53.gata2.canonical.splice.1_index.interval_list \ \
--sample-name-map map.sample_map \
--tmp-dir tmp \
--reader-threads 20
c) Entire error log:
I get this for every sample in `map.sample_map`
05:09:37.541 WARN IntelInflater - Zero Bytes Written : 0
05:09:37.550 WARN IntelInflater - Zero Bytes Written : 0
05:09:37.558 WARN IntelInflater - Zero Bytes Written : 0
05:09:37.566 WARN IntelInflater - Zero Bytes Written : 0
05:09:37.574 WARN IntelInflater - Zero Bytes Written : 0
05:09:37.583 WARN IntelInflater - Zero Bytes Written : 0
05:09:37.591 WARN IntelInflater - Zero Bytes Written : 0
05:09:37.600 WARN IntelInflater - Zero Bytes Written : 0
05:09:37.609 WARN IntelInflater - Zero Bytes Written : 0
05:09:37.618 WARN IntelInflater - Zero Bytes Written : 0
05:09:37.627 WARN IntelInflater - Zero Bytes Written : 0
05:09:37.636 WARN IntelInflater - Zero Bytes Written : 0
05:09:37.645 WARN IntelInflater - Zero Bytes Written : 0
If not an error, choose a category for your question(REQUIRED):
I am not getting anything written to my GenomicsDB and so when I run GenotypeGVCFs I have a blank VCF. I am trying to just run it on GVCFs with variants for just 2 genes.
-
Could you share the complete program log?
-
Hi Genevieve,
I have attached the log from bsub command. I have also attached an example of one of the sample VCF files.
Log file: https://wustl.box.com/s/wqierf165hth9vf17ng4vjlr0agp6loa
GVCF1: https://wustl.box.com/s/hbawc84mk408bp8n1wvr42llqxaf6syb
GVCF2: https://wustl.box.com/s/ktzvvczlk4jqke9q1xzszsm7zudti037 -
I should note that I did not know you could split multiallelics before running GenomicsDBImport and so I have to reverse bcftools split multiallelics to be able to run it.
-
I can't access those files without making an account. If the program log is too long to be pasted, could you paste the part of the program log with the stack trace? I don't think these warning messages are necessarily related to the problem.
If that isn't possible, you can upload a bug report to our FTP site: https://gatk.broadinstitute.org/hc/en-us/articles/360035889671
Thank you!
-
Hi Genevieve,
The issue was because I have normalized the variants with bcftools and unnormalizing doesn't give you back the same file. It works now but my followup question is based on the fact that I do not have these four INFO fields: QD, FS, SOR, and MQ for my variants that have a non-reference allele, which are defined for hard filtering here: https://gatk.broadinstitute.org/hc/en-us/articles/360035890471-Hard-filtering-germline-short-variants
Is this because I used the below group annotation (-G) parameters in my call to HaplotypeCaller?
$ gatk HaplotypeCaller --java-options "-Xmx32g" \
-R GRCh38_full_analysis_set_plus_decoy_hla.fa \
-I ${sample}_23153_0_0.bqsr.bam \
-O ${sample}_23153_0_0.vcf.gz \
-L tp53.gata2.canonical.splice.1_index.interval_list \
-ERC GVCF \
-G AS_StandardAnnotation \
-G StandardAnnotationSorry here is an example of my variant:
chr3 128486117 . G C,<NON_REF> 867.64 PASS AS_RAW_BaseQRankSum=|0.1,1|NaN;AS_RAW_MQ=140400.00|97200.00|0
.00;AS_RAW_MQRankSum=|0.0,1|NaN;AS_RAW_ReadPosRankSum=|0.8,1|NaN;AS_SB_TABLE=15,24|15,12|0,0;BaseQRankSum=0.112;DP=69;ExcessHet=3.010
3;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.000;RAW_MQandDP=248400,69;ReadPosRankSum=0.855 GT:AD:DP:GQ:PL:SB 0/1:39,27,0:66:
99:875,0,1379,992,1461,2453:15,24,15,12 -
OK so they are in the GenotypeGVCFs file after calling joint genotypes. So are these Hard filters supposed to be on a population level rather than on an individual level from HaplotypeCaller? If so it may be a good idea to write a sentence on this in the link to the Hard filters documentation maybe?
-
I'm glad that you were able to get to the bottom of your issue!
Yes, these annotations are at a population level, this is to filter out if these sites have data that cannot be trusted for accurate results. Even with VQSR, variant sites are being filtered out, not individual sample's data. I'll make that a note to our documentation team though, to hopefully spare some people in the future from this confusion!
We also have documentations about these annotations, which can be found here: https://gatk.broadinstitute.org/hc/en-us/articles/4409907944219--Tool-Documentation-Index#VariantAnnotations
Please sign in to leave a comment.
7 comments