gatk CombineGVCFs output contains only one Chr
REQUIRED for all errors and issues:
a) GATK version used:
gatk4/4.4.0.0
b) Exact command used:
gatk CombineGVCFs \
-R ${gencode_path}/hg38.fa \
-V ${data_output}/15650_raw_variants.vcf \
-V ${data_output}/61850_LYG-3_raw_variants.vcf \
-V ${data_output}/752190_raw_variants.vcf \
-V ${data_output}/814546_raw_variants.vcf \
-O ${data_output}/cohort_g_raw_variants.vcf.gz
c) Entire program log:
--------------------------------------------------------------
Apply CombineGVCFs
--------------------------------------------------------------
Using GATK jar /risapps/rhel8/miniconda3/py39_4.12.0/envs/gatk4-4.4.0.0/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /risapps/rhel8/miniconda3/py39_4.12.0/envs/gatk4-4.4.0.0/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar CombineGVCFs -R /rsrch6/scratch/hema_bio-Malignan/fdarbaniyan/gencode/hg38/hg38.fa -V /rsrch6/scratch/hema_bio-Malignan/fdarbaniyan/Sattva_data/WES/P2007014_05162024/Bioinfirmagician_output/results/15650_raw_variants.vcf -V /rsrch6/scratch/hema_bio-Malignan/fdarbaniyan/Sattva_data/WES/P2007014_05162024/Bioinfirmagician_output/results/61850_LYG-3_raw_variants.vcf -V /rsrch6/scratch/hema_bio-Malignan/fdarbaniyan/Sattva_data/WES/P2007014_05162024/Bioinfirmagician_output/results/752190_raw_variants.vcf -V /rsrch6/scratch/hema_bio-Malignan/fdarbaniyan/Sattva_data/WES/P2007014_05162024/Bioinfirmagician_output/results/814546_raw_variants.vcf -O /rsrch6/scratch/hema_bio-Malignan/fdarbaniyan/Sattva_data/WES/P2007014_05162024/Bioinfirmagician_output/results/cohort_g_raw_variants.vcf.gz
11:55:49.507 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/risapps/rhel8/miniconda3/py39_4.12.0/envs/gatk4-4.4.0.0/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
11:55:49.549 INFO CombineGVCFs - ------------------------------------------------------------
11:55:49.552 INFO CombineGVCFs - The Genome Analysis Toolkit (GATK) v4.4.0.0
11:55:49.552 INFO CombineGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
11:55:49.552 INFO CombineGVCFs - Executing as fdarbaniyan@ldragon3 on Linux v4.18.0-425.3.1.el8.x86_64 amd64
11:55:49.552 INFO CombineGVCFs - Java runtime: OpenJDK 64-Bit Server VM v17.0.3-internal+0-adhoc..src
11:55:49.552 INFO CombineGVCFs - Start Date/Time: June 25, 2024 at 11:55:49 AM CDT
11:55:49.553 INFO CombineGVCFs - ------------------------------------------------------------
11:55:49.553 INFO CombineGVCFs - ------------------------------------------------------------
11:55:49.553 INFO CombineGVCFs - HTSJDK Version: 3.0.5
11:55:49.553 INFO CombineGVCFs - Picard Version: 3.0.0
11:55:49.553 INFO CombineGVCFs - Built for Spark Version: 3.3.1
11:55:49.554 INFO CombineGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
11:55:49.554 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
11:55:49.554 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
11:55:49.554 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
11:55:49.554 INFO CombineGVCFs - Deflater: IntelDeflater
11:55:49.554 INFO CombineGVCFs - Inflater: IntelInflater
11:55:49.554 INFO CombineGVCFs - GCS max retries/reopens: 20
11:55:49.554 INFO CombineGVCFs - Requester pays: disabled
11:55:49.555 INFO CombineGVCFs - Initializing engine
11:55:49.937 INFO FeatureManager - Using codec VCFCodec to read file file:///rsrch6/scratch/hema_bio-Malignan/fdarbaniyan/Sattva_data/WES/P2007014_05162024/Bioinfirmagician_output/results/15650_raw_variants.vcf
11:55:50.075 INFO FeatureManager - Using codec VCFCodec to read file file:///rsrch6/scratch/hema_bio-Malignan/fdarbaniyan/Sattva_data/WES/P2007014_05162024/Bioinfirmagician_output/results/61850_LYG-3_raw_variants.vcf
11:55:50.349 INFO FeatureManager - Using codec VCFCodec to read file file:///rsrch6/scratch/hema_bio-Malignan/fdarbaniyan/Sattva_data/WES/P2007014_05162024/Bioinfirmagician_output/results/752190_raw_variants.vcf
11:55:50.639 INFO FeatureManager - Using codec VCFCodec to read file file:///rsrch6/scratch/hema_bio-Malignan/fdarbaniyan/Sattva_data/WES/P2007014_05162024/Bioinfirmagician_output/results/814546_raw_variants.vcf
11:55:51.388 INFO CombineGVCFs - Done initializing engine
11:55:51.411 INFO ProgressMeter - Starting traversal
11:55:51.412 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
11:55:51.544 INFO CombineGVCFs - Shutting down engine
[June 25, 2024 at 11:55:51 AM CDT] org.broadinstitute.hellbender.tools.walkers.CombineGVCFs done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=1224736768
org.broadinstitute.hellbender.exceptions.GATKException: Exception thrown at chr1:960463 [VC /rsrch6/scratch/hema_bio-Malignan/fdarbaniyan/Sattva_data/WES/P2007014_05162024/Bioinfirmagician_output/results/15650_raw_variants.vcf @ chr1:960463 Q73.64 of type=SNP alleles=[G*, T] attr={AC=1, AF=0.500, AN=2, BaseQRankSum=-0.967, DP=3, ExcessHet=0.0000, FS=0.000, MLEAC=1, MLEAF=0.500, MQ=60.00, MQRankSum=0.000, QD=24.55, ReadPosRankSum=-0.967, SOR=0.223} GT=GT:AD:DP:GQ:PL 0/1:1,2:3:36:81,0,36 filters=
at org.broadinstitute.hellbender.engine.MultiVariantWalker.lambda$traverse$1(MultiVariantWalker.java:145)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1845)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at org.broadinstitute.hellbender.engine.MultiVariantWalker.traverse(MultiVariantWalker.java:136)
at org.broadinstitute.hellbender.engine.MultiVariantWalkerGroupedOnStart.traverse(MultiVariantWalkerGroupedOnStart.java:165)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1098)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Caused by: java.lang.IllegalStateException: Key END found in VariantContext field INFO at chr1:959226 but this key isn't defined in the VCFHeader. We require all VCFs to have complete VCF headers by default.
at htsjdk.variant.vcf.VCFEncoder.fieldIsMissingFromHeaderError(VCFEncoder.java:215)
at htsjdk.variant.vcf.VCFEncoder.write(VCFEncoder.java:148)
at htsjdk.variant.variantcontext.writer.VCFWriter.add(VCFWriter.java:250)
at org.broadinstitute.hellbender.tools.walkers.CombineGVCFs.endPreviousStates(CombineGVCFs.java:420)
at org.broadinstitute.hellbender.tools.walkers.CombineGVCFs.createIntermediateVariants(CombineGVCFs.java:229)
at org.broadinstitute.hellbender.tools.walkers.CombineGVCFs.apply(CombineGVCFs.java:174)
at org.broadinstitute.hellbender.engine.MultiVariantWalkerGroupedOnStart.apply(MultiVariantWalkerGroupedOnStart.java:133)
at org.broadinstitute.hellbender.engine.MultiVariantWalkerGroupedOnStart.apply(MultiVariantWalkerGroupedOnStart.java:108)
at org.broadinstitute.hellbender.engine.MultiVariantWalker.lambda$traverse$1(MultiVariantWalker.java:139)
... 21 more
-
Looking at the log that you provided somethings just raised a flag immediately
This particular line does not seem to be a GVCF variant context. It lacks specific GVCF items inside
chr1:960463 Q73.64 of type=SNP alleles=[G*, T] attr={AC=1, AF=0.500, AN=2, BaseQRankSum=-0.967, DP=3, ExcessHet=0.0000, FS=0.000, MLEAC=1, MLEAF=0.500, MQ=60.00, MQRankSum=0.000, QD=24.55, ReadPosRankSum=-0.967, SOR=0.223} GT=GT:AD:DP:GQ:PL 0/1:1,2:3:36:81,0,36 filters=
Can you check if all your input VCFs are actually GVCF? You may check their header sections and see if they contain all identical information for fields.
Regards.
-
Hello,
Thank you so much for your quick reply!
They are not gvcf. In fact they are in form of vcf. Does it cause an issue?
-
CombineGVCFs only accept GATK GVCF format therefore anything that is not GVCF cannot be combined.
Regards.
-
Thanks again!
Is there any way to combine vcf files in gatk? Do I have to repeat the previous step (HaplotypeCaller) and create gvcf file?
-
There may be other tools present for that but results may not be optimal.
I would suggest you to use
bcftools merge
GATK used to have a tool named CombineVariants in GATK 3.x era but it is deprecated and we no longer support it.
Regards.
-
Thank you!
-
In case that I redo gatk Haplotypecaller to get gvcf file, what is the best way to turn the combine.gvcf into vcf format?
-
Once GVCFs are collected in the form of a Combined GVCF or GenomicsDB you may use GenotypeGVCFs tool to get the result as a multi-sample VCF file ready for further filtering and analysis.
Regards.
Please sign in to leave a comment.
8 comments