GenomicsDBImport: are all intervals being processed or is there an error?
AnsweredDear GATK Team,
I am running GenomicsDBImport, with a list of intervals, as part of the (How to) Call somatic mutations using GATK4 Mutect2 workflow to create a panel of normals.
However, only one genomic location is acknowledged with ProgressMeter in the error log and I am concerned that not all intervals in the interval list are being considered, especially as the tool is finishing so quickly. If this is the case, how can I solve this error?
GATK version: 4.1.9.0.
Command used:
gatk GenomicsDBImport \
--variant normal1.vcf \
--variant normal2.vcf \
--variant normal3.vcf \
--variant normal4.vcf \
--variant normal5.vcf \
--variant normal6.vcf \
--variant normal7.vcf \
--variant normal8.vcf \
--variant normal9.vcf \
--variant normal10.vcf \
--variant normal11.vcf \
--variant normal12.vcf \
--variant normal13.vcf \
--variant normal14.vcf \
--variant normal15.vcf \
--variant normal16.vcf \
--variant normal17.vcf \
-R ucsc.hg19.fasta \
-L intervals.bed \
--interval-padding 100 \
--genomicsdb-workspace-path pon_db \
--merge-input-intervals true \
--tmp-dir $TMPDIR
General error log:
03:44:21.982 INFO GenomicsDBImport - ------------------------------------------------------------
03:44:21.982 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.9.0
03:44:21.982 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
03:44:21.985 INFO GenomicsDBImport - Executing as … on Linux v3.10.0-1127.el7.x86_64 amd64
03:44:21.985 INFO GenomicsDBImport - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_92-b14
03:44:21.985 INFO GenomicsDBImport - Start Date/Time: 21 November 2020 03:44:21 GMT
03:44:21.985 INFO GenomicsDBImport - ------------------------------------------------------------
03:44:21.985 INFO GenomicsDBImport - ------------------------------------------------------------
03:44:21.985 INFO GenomicsDBImport - HTSJDK Version: 2.23.0
03:44:21.985 INFO GenomicsDBImport - Picard Version: 2.23.3
03:44:21.985 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
03:44:21.985 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
03:44:21.985 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
03:44:21.985 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
03:44:21.985 INFO GenomicsDBImport - Deflater: IntelDeflater
03:44:21.986 INFO GenomicsDBImport - Inflater: IntelInflater
03:44:21.986 INFO GenomicsDBImport - GCS max retries/reopens: 20
03:44:21.986 INFO GenomicsDBImport - Requester pays: disabled
03:44:21.986 INFO GenomicsDBImport - Initializing engine
03:44:23.758 INFO FeatureManager - Using codec BEDCodec to read file file:///intervals.bed
03:44:23.945 INFO IntervalArgumentCollection - Processing … bp from intervals
03:44:24.178 INFO GenomicsDBImport - Done initializing engine
03:44:24.536 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.3.2-e18fa63
03:44:24.537 INFO GenomicsDBImport - Vid Map JSON file will be written to /…/pon_db/vidmap.json
03:44:24.537 INFO GenomicsDBImport - Callset Map JSON file will be written to /…/pon_db/callset.json
03:44:24.537 INFO GenomicsDBImport - Complete VCF Header will be written to /…/pon_db/vcfheader.vcf
03:44:24.537 INFO GenomicsDBImport - Importing to workspace - /…/pon_db
03:44:24.537 INFO ProgressMeter - Starting traversal
03:44:24.538 INFO ProgressMeter - Current Locus Elapsed Minutes Batches Processed Batches/Minute
03:44:25.266 INFO GenomicsDBImport - Importing batch 1 with 17 samples
03:44:31.477 INFO GenomicsDBImport - Importing batch 1 with 17 samples
03:44:36.251 INFO GenomicsDBImport - Importing batch 1 with 17 samples
03:44:37.374 INFO GenomicsDBImport - Importing batch 1 with 17 samples
03:44:38.162 INFO GenomicsDBImport - Importing batch 1 with 17 samples
03:44:38.960 INFO GenomicsDBImport - Importing batch 1 with 17 samples
03:44:39.983 INFO GenomicsDBImport - Importing batch 1 with 17 samples
03:44:40.790 INFO GenomicsDBImport - Importing batch 1 with 17 samples
03:44:41.359 INFO GenomicsDBImport - Importing batch 1 with 17 samples
03:44:42.136 INFO GenomicsDBImport - Importing batch 1 with 17 samples
03:44:42.794 INFO GenomicsDBImport - Importing batch 1 with 17 samples
03:44:43.674 INFO GenomicsDBImport - Importing batch 1 with 17 samples
03:44:44.260 INFO GenomicsDBImport - Importing batch 1 with 17 samples
03:44:44.772 INFO GenomicsDBImport - Importing batch 1 with 17 samples
03:44:45.325 INFO GenomicsDBImport - Importing batch 1 with 17 samples
03:44:46.693 INFO GenomicsDBImport - Importing batch 1 with 17 samples
03:44:47.195 INFO GenomicsDBImport - Importing batch 1 with 17 samples
03:44:47.833 INFO GenomicsDBImport - Importing batch 1 with 17 samples
03:44:48.293 INFO GenomicsDBImport - Importing batch 1 with 17 samples
03:44:48.976 INFO GenomicsDBImport - Importing batch 1 with 17 samples
03:44:49.481 INFO GenomicsDBImport - Importing batch 1 with 17 samples
03:44:50.032 INFO GenomicsDBImport - Importing batch 1 with 17 samples
03:44:50.687 INFO GenomicsDBImport - Importing batch 1 with 17 samples
03:44:51.393 INFO ProgressMeter - chr1:11167338 0.4 1 2.2
03:44:51.393 INFO GenomicsDBImport - Done importing batch 1/1
03:44:51.394 INFO ProgressMeter - chr1:11167338 0.4 1 2.2
03:44:51.394 INFO ProgressMeter - Traversal complete. Processed 1 total batches in 0.4 minutes.
03:44:51.394 INFO GenomicsDBImport - Import completed!
03:44:51.394 INFO GenomicsDBImport - Shutting down engine
[21 November 2020 03:44:51 GMT] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.50 minutes.
I converted the BED file to a Picard-style Interval List format to assess whether this would make a difference but the same error log was output.
Thank you for your time and help.
Kind regards.
-
Hi ISmolicz, you can check the data in the GenomicsDB with SelectVariants.
-
Thank you for your help Genevieve Brandt. I checked the data and multiple genomic locations are present although only one genomic location was listed in the error log.
However, please may I ask how data for only one sample can be extracted from GenomicsDB using SelectVariants? I used the --sample-name option but the 'tumor_sample' name in the output VCF header was different to the sample I was requesting data for. Is this due to the 'tumor_sample' name automatically being set to the first sample imported into GenomicsDB?
Thank you again.
-
The GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.
Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.
We cannot guarantee a reply, however, we ask other community members to help out if you know the answer.
For context, check out our support policy.
-
The progress meter will not update will all sites, so the GenomicsDBWorkspace could have been created just fine depending on the size of your input VCFs. I'm not sure what happened with your sample names. It is probably easiest to overtly specify the sample names you want in the GenomicsDB by using the --sample-name-map argument.
Please sign in to leave a comment.
4 comments