GenotypeGVCFs
AnsweredI wanted to get merged.vcf file but I didn't get any output file after I ran GenotypeGVCFs.
I use GATK version used: gatk-4.2.0.0.
Samples are 35 human short-read WGS data.
I made the GenomicDB from these commands:
$gatk VcfToIntervalList
I=bwa_bam/gvcf/sample1_markdup.addRG.g.vcf
O=sample.interval_list
$gatk GenomicsDBImport \
--genomicsdb-workspace-path bwa_bam/GenomicDB \
--intervals bwa_bam/sample.interval_list \
--sample-name-map bwa_bam/gvcf_sample_name.txt
After that, I ran this command and got this log:
$gatk GenotypeGVCFs \
> -R bwa_index/hs38DH.fa \
> -V gendb://GenomicDB \
> -O mergedVCF/merged.vcf
c) Entire program log:12:02:24.354 INFO GenotypeGVCFs - ------------------------------------------------------------
12:02:24.354 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.2.0.0
12:02:24.355 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
12:02:24.355 INFO GenotypeGVCFs - Executing as hashimoto@gpu02 on Linux v3.10.0-1062.9.1.el7.x86_64 amd64
12:02:24.355 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_252-b09
12:02:24.355 INFO GenotypeGVCFs - Start Date/Time: 2022/08/17 12:02:23 JST
12:02:24.355 INFO GenotypeGVCFs - ------------------------------------------------------------
12:02:24.355 INFO GenotypeGVCFs - ------------------------------------------------------------
12:02:24.356 INFO GenotypeGVCFs - HTSJDK Version: 2.24.0
12:02:24.356 INFO GenotypeGVCFs - Picard Version: 2.25.0
12:02:24.356 INFO GenotypeGVCFs - Built for Spark Version: 2.4.5
12:02:24.356 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
12:02:24.356 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
12:02:24.356 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
12:02:24.356 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
12:02:24.356 INFO GenotypeGVCFs - Deflater: IntelDeflater
12:02:24.356 INFO GenotypeGVCFs - Inflater: IntelInflater
12:02:24.356 INFO GenotypeGVCFs - GCS max retries/reopens: 20
12:02:24.356 INFO GenotypeGVCFs - Requester pays: disabled
12:02:24.356 INFO GenotypeGVCFs - Initializing engine
12:02:25.562 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.3.2-e18fa63
12:02:25.564 INFO GenotypeGVCFs - Shutting down engine
[2022/08/17 12:02:25 JST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=1106247680
java.lang.IllegalStateException: There is no genome data stored in the database
at org.genomicsdb.reader.GenomicsDBFeatureReader.<init>(GenomicsDBFeatureReader.java:84)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getGenomicsDBFeatureReader(FeatureDataSource.java:409)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:328)
at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:284)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.initializeDrivingVariants(VariantLocusWalker.java:76)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.initializeFeatures(VariantWalkerBase.java:67)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:707)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.onStartup(VariantLocusWalker.java:63)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
-----------------------------
Do you have any ideas to solve this problem?
Which file is not correct?
Sincerely,
-
Hi Yumi Hashi,
Thank you for writing to the GATK forum. I hope that we can help you sort this out.
So when looking at your command line history, your second command seems inconsistent with the first command. When you ran the VcfToIntervalList command, it generated the interval file to the current directory, which is the parent of bwa_bam. In the GenomicsDBImport command, you specified the directory location as one step below the interval list you had just created with the first command.
Try rerunning with the corrected commands and see if that works. If not, please let me know, and we can investigate further.
Original:
$gatk VcfToIntervalList
I=bwa_bam/gvcf/sample1_markdup.addRG.g.vcf
O=sample.interval_list
$gatk GenomicsDBImport \
--genomicsdb-workspace-path bwa_bam/GenomicDB \
--intervals bwa_bam/sample.interval_list \
--sample-name-map bwa_bam/gvcf_sample_name.txt
Corrected:$gatk VcfToIntervalList \
I=bwa_bam/gvcf/sample1_markdup.addRG.g.vcf \
O=bwa_bam/sample.interval_list
$gatk GenomicsDBImport \
--genomicsdb-workspace-path bwa_bam/GenomicDB \
--intervals bwa_bam/sample.interval_list \
--sample-name-map bwa_bam/gvcf_sample_name.txtBest,
Anthony
Please sign in to leave a comment.
1 comment