GenomicsDBImport error from # fields in AS_UNIQ_ALT_READ_COUNT
AnsweredI'm trying to combine some gVCFs using GenomicsDBImport and am getting an error on the first gVCF line containing an explicit allele in addition to <NON_REF>, i.e. C,<NON_REF>. The error reads
"FicticiousSampName_stream", at contig "chr21", position 8530392, for sample "FicticiousSampName", the field AS_UNIQ_ALT_READ_COUNT has 1 elements; expected 2
The relevant gVCF line is
chr21 8530392 . A C,<NON_REF> 111.64 . AS_RAW_BaseQRankSum=|-0.8,1|NaN;AS_RAW_MQ=2628.00|3988.00|0.00;AS_RAW_MQRankSum=|0.5,1|NaN;AS_RAW_ReadPosRankSum=|1.2,1|NaN;AS_SB_TABLE=4,0|1,3|0,0;AS_UNIQ_ALT_READ_COUNT=4|0;BaseQRankSum=-0.792;ClippingRankSum=-1.006;DP=10;ExcessHet=3.0103;GQ_MEAN=97.00;LikelihoodRankSum=0.545;MBQ=38,38,0;MFRL=332,211,0;MLEAC=1,0;MLEAF=0.500,0.00;MMQ=27,31,60;MPOS=53,50;MQ0=0;MQRankSum=0.545;NCC=0;NCount=0;RAW_MQandDP=8074,10;REF_BASES=TTTCATTTCAATTAAAAGGAA;ReadPosRankSum=1.282;Samples=FicticiousSampName GT:AD:AF:DP:F1R2:F2R1:GQ:PL:SB 0/1:4,4,0:0.500,0.00:8:1,0,0:3,4,0:97:119,0,97,131,109,240:4,0,1,3
Right now I'm only using 12 gVCFs as a trial run, but I plan eventually to eventually combine around 150 (which is why I'm not using CombineGVCFs).
My problem is similar to the one described here https://gatk.broadinstitute.org/hc/en-us/community/posts/360076974412-CombineGVCFs-splitting-MNPs-into-individual-positions except that I don't have any MNPs, unless "C,<NON_REF>" counts as an MNP. I can be sure the data have no true MNPs because HaplotypeCaller was run with the option "--max-mnp-distance 0."
Any help you could offer would be enormously appreciated. Specs are below. Thanks!
a) GATK version used:
4.1.9.0
b) Exact command used:
gatk-4.1.9.0/gatk --java-options "-Xmx4g -Xms4g" GenomicsDBImport \
--genomicsdb-workspace-path /my/db/path \
--V variants_0_chr21.g.vcf.gz \
--V variants_1_chr21.g.vcf.gz \
--V variants_2_chr21.g.vcf.gz \
--V variants_3_chr21.g.vcf.gz \
--V variants_4_chr21.g.vcf.gz \
--V variants_5_chr21.g.vcf.gz \
--V variants_6_chr21.g.vcf.gz \
--V variants_7_chr21.g.vcf.gz \
--V variants_8_chr21.g.vcf.gz \
--V variants_9_chr21.g.vcf.gz \
--V variants_10_chr21.g.vcf.gz \
--V variants_11_chr21.g.vcf.gz \
--tmp-dir /my/tmp/dir \
-L chr21:1-43000000
c) Entire program log:
Using GATK jar /gatkpath/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx4g -Xms4g -jar /igm/apps/gatk/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar GenomicsDBImport --genomicsdb-workspace-path /my/db/path --V /my/db/pathgap_031222/filt/dbgap_0_chr21.g.vcf.gz --V /my/db/pathgap_031222/filt/dbgap_1_chr21.g.vcf.gz --V /my/db/pathgap_031222/filt/dbgap_2_chr21.g.vcf.gz --V /my/db/pathgap_031222/filt/dbgap_3_chr21.g.vcf.gz --V /my/db/pathgap_031222/filt/dbgap_4_chr21.g.vcf.gz --V /my/db/pathgap_031222/filt/dbgap_5_chr21.g.vcf.gz --V /my/db/pathgap_031222/filt/dbgap_6_chr21.g.vcf.gz --V /my/db/pathgap_031222/filt/dbgap_7_chr21.g.vcf.gz --V /my/db/pathgap_031222/filt/dbgap_8_chr21.g.vcf.gz --V /my/db/pathgap_031222/filt/dbgap_9_chr21.g.vcf.gz --V /my/db/pathgap_031222/filt/dbgap_10_chr21.g.vcf.gz --V /my/db/pathgap_031222/filt/dbgap_11_chr21.g.vcf.gz --tmp-dir /igm/temp/genomicsdb_tmp_030922 -L chr21:1-43000000
16:32:14.470 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Mar 12, 2022 4:32:14 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
16:32:14.601 INFO GenomicsDBImport - ------------------------------------------------------------
16:32:14.602 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.9.0
16:32:14.602 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
16:32:14.602 INFO GenomicsDBImport - Executing as jeff@jeff_system on Linux v3.10.0-1160.53.1.el7.x86_64 amd64
16:32:14.602 INFO GenomicsDBImport - Java runtime: Eclipse OpenJ9 VM v1.8.0_242-b08
16:32:14.602 INFO GenomicsDBImport - Start Date/Time: March 12, 2022 4:32:14 PM EST
16:32:14.602 INFO GenomicsDBImport - ------------------------------------------------------------
16:32:14.602 INFO GenomicsDBImport - ------------------------------------------------------------
16:32:14.603 INFO GenomicsDBImport - HTSJDK Version: 2.23.0
16:32:14.603 INFO GenomicsDBImport - Picard Version: 2.23.3
16:32:14.603 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:32:14.603 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:32:14.603 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:32:14.603 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:32:14.603 INFO GenomicsDBImport - Deflater: IntelDeflater
16:32:14.603 INFO GenomicsDBImport - Inflater: IntelInflater
16:32:14.603 INFO GenomicsDBImport - GCS max retries/reopens: 20
16:32:14.603 INFO GenomicsDBImport - Requester pays: disabled
16:32:14.603 INFO GenomicsDBImport - Initializing engine
16:32:16.127 INFO IntervalArgumentCollection - Processing 43000000 bp from intervals
16:32:16.128 INFO GenomicsDBImport - Done initializing engine
16:32:16.434 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.3.2-e18fa63
16:32:16.447 INFO GenomicsDBImport - Vid Map JSON file will be written to /my/db/path/vidmap.json
16:32:16.447 INFO GenomicsDBImport - Callset Map JSON file will be written to /my/db/path/callset.json
16:32:16.447 INFO GenomicsDBImport - Complete VCF Header will be written to /my/db/path/vcfheader.vcf
16:32:16.447 INFO GenomicsDBImport - Importing to workspace - /my/db/path
16:32:16.447 INFO ProgressMeter - Starting traversal
16:32:16.447 INFO ProgressMeter - Current Locus Elapsed Minutes Batches Processed Batches/Minute
16:32:17.090 INFO GenomicsDBImport - Importing batch 1 with 12 samples
terminate called after throwing an instance of 'VCF2BinaryException'
what(): VCF2BinaryException : Mismatch in field length and field length descriptor:
Length descriptor in vid/VCF header specifies that field "AS_UNIQ_ALT_READ_COUNT" should contain A element(s).
In file/stream "FicticiousSampName_stream", at contig "chr21", position 8530392, for sample "FicticiousSampName", the field AS_UNIQ_ALT_READ_COUNT has 1 elements; expected 2
JVMDUMP039I Processing dump event "abort", detail "" at 2022/03/12 16:32:18 - please wait.
JVMDUMP032I JVM requested System dump using '/my/script/dir/core.20220312.163218.43926.0001.dmp' in response to an event
JVMDUMP012E Error in System dump: The core file created by child process with pid = 43995 was not found. Expected to find core file with name "/my/script/dir/core.43995"
JVMDUMP032I JVM requested Java dump using '/my/script/dir/javacore.20220312.163218.43926.0002.txt' in response to an event
JVMDUMP010I Java dump written to /my/script/dir/javacore.20220312.163218.43926.0002.txt
JVMDUMP032I JVM requested Snap dump using '/my/script/dir/Snap.20220312.163218.43926.0003.trc' in response to an event
JVMDUMP010I Snap dump written to /my/script/dir/Snap.20220312.163218.43926.0003.trc
JVMDUMP007I JVM Requesting JIT dump using '/my/script/dir/jitdump.20220312.163218.43926.0004.dmp'
JVMDUMP010I JIT dump written to /my/script/dir/jitdump.20220312.163218.43926.0004.dmp
JVMDUMP013I Processed dump event "abort", detail "".
-
Hi Jeff Gaither,
This issue is coming from a bug with the AS_UNIQ_ALT_READ_COUNT. The annotation is not meant for germline use, so you won't need to have it in your file.
Here is another issue with that annotation in github, it's currently still open: https://github.com/broadinstitute/gatk/issues/7298. I commented with your example.
For the rest of your files, I would recommend not adding this annotation. You can just stick to the standard annotations with -G StandardAnnotation and -G AS_StandardAnnotation in HaplotypeCaller.
Please let me know if you have any more questions.
Best,
Genevieve
-
Hi Genevieve,
Cool, then I'll just cut the AS_UNIQ_ALT_READ_COUNT from future runs. Thanks for your help!
To anyone coming across this issue, removing the annotation from your VCF is as simple as
bcftools annotate -x INFO/AS_UNIQ_ALT_READ_COUNT your.vcf.gz
Jeff
-
No problem! Glad we could figure out the issue.
Please sign in to leave a comment.
3 comments