CombineGVCF slowing down at certain region on chromosome 1
Hi
I have 36 gvcf (for a non-model arthropod species) and i would like to combine them using CombineGVCF.
However, at a certain region on chromsome 1, CombineGVCF starts to become extremely slow (progressing with 1 kb instead of 500 kb, see below).
I tried to use CombineGVCF with a lower number of samples (batch of 6 samples instead of 36 samples), but again the same problem at exactly the same position. What would be causing this slowing down of CombineGVCF?
Any help would be much appreciated.
Regards WD
#CombineGVCF command:
gatk CombineGVCFs -R Reference/uT.fasta --variant output/B7_2.g.vcf.gz --variant output/B7.g.vcf.gz --variant output/B8_7.g.vcf.gz --variant output/B8_5.g.vcf.gz --variant output/B8_17.g.vcf.gz --variant output/B8_9.g.vcf.gz -O RC_batch1.vcf
#CombineGVCF progress for 6 samples; slowing down starts always at position 3729316, independent of the number of samples used:
19:51:16.950 INFO CombineGVCFs - ------------------------------------------------------------ 19:51:16.951 INFO CombineGVCFs - The Genome Analysis Toolkit (GATK) v4.1.6.0 19:51:16.951 INFO CombineGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/ 19:51:16.951 INFO CombineGVCFs - Executing as xxxxxx@galaxy on Linux v4.4.0-133-generic amd64 19:51:16.951 INFO CombineGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_192-b01 19:51:16.952 INFO CombineGVCFs - Start Date/Time: 26 May 2020 19:51:16 CEST 19:51:16.952 INFO CombineGVCFs - ------------------------------------------------------------ 19:51:16.952 INFO CombineGVCFs - ------------------------------------------------------------ 19:51:16.952 INFO CombineGVCFs - HTSJDK Version: 2.21.2 19:51:16.953 INFO CombineGVCFs - Picard Version: 2.21.9 19:51:16.953 INFO CombineGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2 19:51:16.953 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 19:51:16.953 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 19:51:16.953 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 19:51:16.953 INFO CombineGVCFs - Deflater: IntelDeflater 19:51:16.953 INFO CombineGVCFs - Inflater: IntelInflater 19:51:16.953 INFO CombineGVCFs - GCS max retries/reopens: 20 19:51:16.953 INFO CombineGVCFs - Requester pays: disabled 19:51:16.953 INFO CombineGVCFs - Initializing engine 19:51:17.530 INFO FeatureManager - Using codec VCFCodec to read file file:///data/wannesd/output/B7_2.g.vcf.gz 19:51:17.602 INFO FeatureManager - Using codec VCFCodec to read file file:///data/wannesd/output/B7_6.g.vcf.gz 19:51:17.622 INFO FeatureManager - Using codec VCFCodec to read file file:///data/wannesd/output/B8_7.g.vcf.gz 19:51:17.636 INFO FeatureManager - Using codec VCFCodec to read file file:///data/wannesd/output/B8_5.g.vcf.gz 19:51:17.654 INFO FeatureManager - Using codec VCFCodec to read file file:///data/wannesd/output/B8_17.g.vcf.gz 19:51:17.673 INFO FeatureManager - Using codec VCFCodec to read file file:///data/wannesd/output/B8_9.g.vcf.gz 19:51:23.378 INFO CombineGVCFs - Done initializing engine 19:51:23.424 INFO ProgressMeter - Starting traversal 19:51:23.424 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute 19:51:33.440 INFO ProgressMeter - pseudochromosome_1:585040 0.2 255000 1527708.4 19:51:43.465 INFO ProgressMeter - pseudochromosome_1:1281402 0.3 537000 1607784.4 19:51:53.480 INFO ProgressMeter - pseudochromosome_1:2202988 0.5 903000 1802635.1 19:52:03.480 INFO ProgressMeter - pseudochromosome_1:3152945 0.7 1250000 1872378.7 19:52:46.745 INFO ProgressMeter - pseudochromosome_1:3729316 1.4 1496000 1077279.4 19:53:33.746 INFO ProgressMeter - pseudochromosome_1:3732320 2.2 1497000 689215.9 19:54:23.073 INFO ProgressMeter - pseudochromosome_1:3736234 3.0 1498000 500311.7 19:55:18.306 INFO ProgressMeter - pseudochromosome_1:3738826 3.9 1499000 382915.7 19:56:08.226 INFO ProgressMeter - pseudochromosome_1:3742604 4.7 1500000 316009.0 19:56:55.211 INFO ProgressMeter - pseudochromosome_1:3745000 5.5 1501000 271439.2 19:57:43.918 INFO ProgressMeter - pseudochromosome_1:3747892 6.3 1502000 236850.6 19:58:26.673 INFO ProgressMeter - pseudochromosome_1:3748622 7.1 1503000 213066.1 19:59:14.785 INFO ProgressMeter - pseudochromosome_1:3750333 7.9 1504000 191446.0 20:00:02.121 INFO ProgressMeter - pseudochromosome_1:3763872 8.6 1505000 174090.1 20:00:53.798 INFO ProgressMeter - pseudochromosome_1:3765013 9.5 1506000 158422.4 20:01:35.450 INFO ProgressMeter - pseudochromosome_1:3766193 10.2 1507000 147738.8 20:02:22.420 INFO ProgressMeter - pseudochromosome_1:3769257 11.0 1508000 137299.8 20:03:04.423 INFO ProgressMeter - pseudochromosome_1:3772624 11.7 1509000 129158.7 20:03:49.334 INFO ProgressMeter - pseudochromosome_1:3776483 12.4 1510000 121462.4 20:04:44.180 INFO ProgressMeter - pseudochromosome_1:3780777 13.3 1511000 113218.0 20:05:27.668 INFO ProgressMeter - pseudochromosome_1:3784815 14.1 1512000 107457.2 20:06:18.646 INFO ProgressMeter - pseudochromosome_1:3789093 14.9 1513000 101405.0 20:07:10.366 INFO ProgressMeter - pseudochromosome_1:3797024 15.8 1514000 95929.8 20:07:45.728 INFO ProgressMeter - pseudochromosome_1:3798075 16.4 1515000 92537.6 20:08:20.466 INFO ProgressMeter - pseudochromosome_1:3799179 17.0 1516000 89435.9
-
Update on the problem above.
As smaller batches or selecting only one chromosome (using intervals option (-L)) did not speed up CombineGVCF, I decided to try GenomicsDBimport.
I successfully created a GenomicsDB of the 36 samples for chromosome 1, using the following command:
gatk GenomicsDBImport \
-V output/B7_2.g.vcf.gz \
-V (34 samples)... \
-V output/B8_7.g.vcf.gz \
--genomicsdb-workspace-path RC_chr1 \
--intervals chr_1and would like now use GenotypeVCF on this GenomicsDB using the following command:
gatk GenotypeGVCFs \
-R Reference/uT.fasta \
-V gendb://RC_chr1 \
-O RC_chr1_36samples.vcfbut not get the following error:
22:57:31.195 INFO GenotypeGVCFs - ------------------------------------------------------------
22:57:31.196 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.1.6.0
22:57:31.196 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
22:57:31.196 INFO GenotypeGVCFs - Executing as xxxxx@galaxy on Linux v4.4.0-133-generic amd64
22:57:31.196 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_192-b01
22:57:31.196 INFO GenotypeGVCFs - Start Date/Time: 26 May 2020 22:57:30 CEST
22:57:31.196 INFO GenotypeGVCFs - ------------------------------------------------------------
22:57:31.197 INFO GenotypeGVCFs - ------------------------------------------------------------
22:57:31.197 INFO GenotypeGVCFs - HTSJDK Version: 2.21.2
22:57:31.197 INFO GenotypeGVCFs - Picard Version: 2.21.9
22:57:31.198 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
22:57:31.198 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
22:57:31.198 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
22:57:31.198 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
22:57:31.198 INFO GenotypeGVCFs - Deflater: IntelDeflater
22:57:31.198 INFO GenotypeGVCFs - Inflater: IntelInflater
22:57:31.198 INFO GenotypeGVCFs - GCS max retries/reopens: 20
22:57:31.198 INFO GenotypeGVCFs - Requester pays: disabled
22:57:31.198 INFO GenotypeGVCFs - Initializing engine
22:57:32.260 INFO GenotypeGVCFs - Shutting down engine
[26 May 2020 22:57:32 CEST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=2313682944
***********************************************************************
A USER ERROR has occurred: Couldn't create GenomicsDBFeatureReader
***********************************************************************
In addition, suppose I would succeed in creating the GenotypeVCF for chr1, how do I combine this vcf with those for other chromosomes?
Any help is more than welcome!
Thank you in advance.
Regards
Wannes
PS: also tried the command below, based on the Gotcha's/Forum, but did not work either:
"TILEDB_DISABLE_FILE_LOCKING=1`` gatk --java-options "-Xmx30g" GenotypeGVCFs\
-R Reference/uT.fasta \
-V gendb://RC_chr1 \
-O RC_chr1_36samples.vcf -
Update on the problem above:
Added "-DGATK_STACKTRACE_ON_USER_EXCEPTION=true" to --java-options
Apparently "A USER ERROR has occurred: Couldn't create GenomicsDBFeatureReader"
is caused by
"Caused by: java.io.IOException: GenomicsDB JNI Error: GenomicsDBConfigException : Syntax error in JSON file"
Any suggestion is moer than welcome!
***********************************************************************
A USER ERROR has occurred: Couldn't create GenomicsDBFeatureReader
***********************************************************************
org.broadinstitute.hellbender.exceptions.UserException: Couldn't create GenomicsDBFeatureReader
at org.broadinstitute.hellbender.engine.FeatureDataSource.getGenomicsDBFeatureReader(FeatureDataSource.java:410)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:326)
at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:282)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.initializeDrivingVariants(VariantLocusWalker.java:76)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.initializeFeatures(VariantWalkerBase.java:67)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:706)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.onStartup(VariantLocusWalker.java:63)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:137)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
Caused by: java.io.IOException: GenomicsDB JNI Error: GenomicsDBConfigException : Syntax error in JSON file /data/wannesd/RG_chr1/callset.json
at org.genomicsdb.reader.GenomicsDBQueryStream.jniGenomicsDBInit(Native Method)
at org.genomicsdb.reader.GenomicsDBQueryStream.<init>(GenomicsDBQueryStream.java:209)
at org.genomicsdb.reader.GenomicsDBQueryStream.<init>(GenomicsDBQueryStream.java:182)
at org.genomicsdb.reader.GenomicsDBQueryStream.<init>(GenomicsDBQueryStream.java:91)
at org.genomicsdb.reader.GenomicsDBFeatureReader.generateHeadersForQuery(GenomicsDBFeatureReader.java:176)
at org.genomicsdb.reader.GenomicsDBFeatureReader.<init>(GenomicsDBFeatureReader.java:80)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getGenomicsDBFeatureReader(FeatureDataSource.java:407)
... 12 more
-
Hi @wd, sorry for the delay.
It appears another user was having this issue and it was due to the sample names getting altered. Does this apply to your situation?
Please sign in to leave a comment.
3 comments