core dumped when using IndexFeatureFinder to generate index for dbSNP.vcf file
Hi,
When trying to generate an index for known-sites file, I consistently have experienced a core dumped error. I figured this was a memory issue so I upped the memory requirements and the number of cores the program would run on, but the issue persists. Error messages also show the core being dumped on line 12 of the vcf file. I have included the first 20 lines below as well. Thanks for your help!
Using GATK jar /apps/lib/gatk/4.2.6.1/gatk-package-4.2.6.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /apps/lib/gatk/4.2.6.1/gatk-package-4.2.6.1-local.jar IndexFeatureFile --input hg19-v0-Homo_sapiens_assembly19.dbsnp.vcf.gz
13:33:36.480 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/apps/lib/gatk/4.2.6.1/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
13:33:36.951 INFO IndexFeatureFile - ------------------------------------------------------------
13:33:36.952 INFO IndexFeatureFile - The Genome Analysis Toolkit (GATK) v4.2.6.1
13:33:36.952 INFO IndexFeatureFile - For support and documentation go to https://software.broadinstitute.org/gatk/
13:33:36.954 INFO IndexFeatureFile - Executing as dr019@hn003.research.partners.org on Linux v3.10.0-1127.el7.x86_64 amd64
13:33:36.954 INFO IndexFeatureFile - Java runtime: Java HotSpot(TM) 64-Bit Server VM v20.0.2+9-78
13:33:36.954 INFO IndexFeatureFile - Start Date/Time: February 22, 2024, 1:33:36 PM EST
13:33:36.954 INFO IndexFeatureFile - ------------------------------------------------------------
13:33:36.955 INFO IndexFeatureFile - ------------------------------------------------------------
13:33:36.955 INFO IndexFeatureFile - HTSJDK Version: 2.24.1
13:33:36.956 INFO IndexFeatureFile - Picard Version: 2.27.1
13:33:36.956 INFO IndexFeatureFile - Built for Spark Version: 2.4.5
13:33:36.956 INFO IndexFeatureFile - HTSJDK Defaults.COMPRESSION_LEVEL : 2
13:33:36.956 INFO IndexFeatureFile - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:33:36.956 INFO IndexFeatureFile - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
13:33:36.956 INFO IndexFeatureFile - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:33:36.956 INFO IndexFeatureFile - Deflater: IntelDeflater
13:33:36.956 INFO IndexFeatureFile - Inflater: IntelInflater
13:33:36.957 INFO IndexFeatureFile - GCS max retries/reopens: 20
13:33:36.957 INFO IndexFeatureFile - Requester pays: disabled
13:33:36.957 INFO IndexFeatureFile - Initializing engine
13:33:36.957 INFO IndexFeatureFile - Done initializing engine
13:33:37.326 INFO FeatureManager - Using codec VCFCodec to read file file:///PHShome/dr019/prostate_wes/GATK/vcf_files/hg19-v0-Homo_sapiens_assembly19.dbsnp.vcf.gz
13:33:37.368 INFO ProgressMeter - Starting traversal
13:33:37.369 INFO ProgressMeter - Current Locus Elapsed Minutes Records Processed Records/Minute
13:33:47.375 INFO ProgressMeter - 1:53784785 0.2 919000 5512897.4
13:33:57.389 INFO ProgressMeter - 1:156934848 0.3 2265000 6791604.2
13:34:07.386 INFO ProgressMeter - 1:214246546 0.5 3293000 6582708.6
13:34:17.401 INFO ProgressMeter - 2:24142478 0.7 4371000 6553550.9
13:34:27.391 INFO ProgressMeter - 2:80233935 0.8 5412000 6491803.3
/PHShome/dr019/.lsbatch/1708626809.898580.shell: line 12: 53266 Quit (core dumped) gatk IndexFeatureFile --input hg19-v0-Homo_sapiens_assembly19.dbsnp.vcf.gz
Are there any alternative tools to create an index for for dbSNP.vcf files?
REQUIRED for all errors and issues:
a) GATK version used: 4.2.6.1
b) Exact command used:
gatk IndexFeatureFile --I ./hg19-v0-Homo_sapiens_assembly19.dbsnp.vcf.gz
c) Entire program log:
Using GATK jar /apps/lib/gatk/4.2.6.1/gatk-package-4.2.6.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /apps/lib/gatk/4.2.6.1/gatk-package-4.2.6.1-local.jar IndexFeatureFile --input hg19-v0-Homo_sapiens_assembly19.dbsnp.vcf.gz
13:33:36.480 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/apps/lib/gatk/4.2.6.1/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
13:33:36.951 INFO IndexFeatureFile - ------------------------------------------------------------
13:33:36.952 INFO IndexFeatureFile - The Genome Analysis Toolkit (GATK) v4.2.6.1
13:33:36.952 INFO IndexFeatureFile - For support and documentation go to https://software.broadinstitute.org/gatk/
13:33:36.954 INFO IndexFeatureFile - Executing as dr019@hn003.research.partners.org on Linux v3.10.0-1127.el7.x86_64 amd64
13:33:36.954 INFO IndexFeatureFile - Java runtime: Java HotSpot(TM) 64-Bit Server VM v20.0.2+9-78
13:33:36.954 INFO IndexFeatureFile - Start Date/Time: February 22, 2024, 1:33:36 PM EST
13:33:36.954 INFO IndexFeatureFile - ------------------------------------------------------------
13:33:36.955 INFO IndexFeatureFile - ------------------------------------------------------------
13:33:36.955 INFO IndexFeatureFile - HTSJDK Version: 2.24.1
13:33:36.956 INFO IndexFeatureFile - Picard Version: 2.27.1
13:33:36.956 INFO IndexFeatureFile - Built for Spark Version: 2.4.5
13:33:36.956 INFO IndexFeatureFile - HTSJDK Defaults.COMPRESSION_LEVEL : 2
13:33:36.956 INFO IndexFeatureFile - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:33:36.956 INFO IndexFeatureFile - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
13:33:36.956 INFO IndexFeatureFile - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:33:36.956 INFO IndexFeatureFile - Deflater: IntelDeflater
13:33:36.956 INFO IndexFeatureFile - Inflater: IntelInflater
13:33:36.957 INFO IndexFeatureFile - GCS max retries/reopens: 20
13:33:36.957 INFO IndexFeatureFile - Requester pays: disabled
13:33:36.957 INFO IndexFeatureFile - Initializing engine
13:33:36.957 INFO IndexFeatureFile - Done initializing engine
13:33:37.326 INFO FeatureManager - Using codec VCFCodec to read file file:///PHShome/dr019/prostate_wes/GATK/vcf_files/hg19-v0-Homo_sapiens_assembly19.dbsnp.vcf.gz
13:33:37.368 INFO ProgressMeter - Starting traversal
13:33:37.369 INFO ProgressMeter - Current Locus Elapsed Minutes Records Processed Records/Minute
13:33:47.375 INFO ProgressMeter - 1:53784785 0.2 919000 5512897.4
13:33:57.389 INFO ProgressMeter - 1:156934848 0.3 2265000 6791604.2
13:34:07.386 INFO ProgressMeter - 1:214246546 0.5 3293000 6582708.6
13:34:17.401 INFO ProgressMeter - 2:24142478 0.7 4371000 6553550.9
13:34:27.391 INFO ProgressMeter - 2:80233935 0.8 5412000 6491803.3
/PHShome/dr019/.lsbatch/1708626809.898580.shell: line 12: 53266 Quit (core dumped) gatk IndexFeatureFile --input hg19-v0-Homo_sapiens_assembly19.dbsnp.vcf.gz
First 20 lines of VCF file:
##fileformat=VCFv4.1
##FILTER=<ID=NC,Description="Inconsistent Genotype Submission For At Least One Sample">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
##INFO=<ID=ASP,Number=0,Type=Flag,Description="Is Assembly specific. This is set if the variant only maps to one assembly">
##INFO=<ID=ASS,Number=0,Type=Flag,Description="In acceptor splice site FxnCode = 73">
##INFO=<ID=CDA,Number=0,Type=Flag,Description="Variation is interrogated in a clinical diagnostic assay">
##INFO=<ID=CFL,Number=0,Type=Flag,Description="Has Assembly conflict. This is for weight 1 and 2 variant that maps to different chromosomes on different assemblies.">
##INFO=<ID=CLN,Number=0,Type=Flag,Description="Variant is Clinical(LSDB,OMIM,TPA,Diagnostic)">
##INFO=<ID=DSS,Number=0,Type=Flag,Description="In donor splice-site FxnCode = 75">
##INFO=<ID=G5,Number=0,Type=Flag,Description=">5% minor allele frequency in 1+ populations">
##INFO=<ID=G5A,Number=0,Type=Flag,Description=">5% minor allele frequency in each and all populations">
##INFO=<ID=GCF,Number=0,Type=Flag,Description="Has Genotype Conflict Same (rs, ind), different genotype. N/N is not included.">
##INFO=<ID=GENEINFO,Number=1,Type=String,Description="Pairs each of gene symbol:gene id. The gene symbol and id are delimited by a colon (:) and each pair is delimited by a vertical bar (|)">
##INFO=<ID=GMAF,Number=1,Type=Float,Description="Global Minor Allele Frequency [0, 0.5]; global population is 1000GenomesProject phase 1 genotype data from 629 individuals, released in the 08-04-2010 dataset">
##INFO=<ID=GNO,Number=0,Type=Flag,Description="Genotypes available. The variant has individual genotype (in SubInd table).">
##INFO=<ID=HD,Number=0,Type=Flag,Description="Marker is on high density genotyping kit (50K density or greater). The variant may have phenotype associations present in dbGaP.">
##INFO=<ID=INT,Number=0,Type=Flag,Description="In Intron FxnCode = 6">
##INFO=<ID=KGPROD,Number=0,Type=Flag,Description="1000 Genome production phase">
##INFO=<ID=KGPilot1,Number=0,Type=Flag,Description="1000 Genome discovery(pilot1) 2009">
##INFO=<ID=KGPilot123,Number=0,Type=Flag,Description="1000 Genome discovery all pilots 2010(1,2,3)">
-
The core dump message does not seem to be related to the vcf lines but about the executor's script's line number 12 which is the gatk command. If it is a core dump this could also have a java err file somewhere around the execution folder. If not we recommend you to activate verbose gatk error messages using the option below.
--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true'
Increasing the java heapsize using --java-options "-Xmx" option could solve the problem if it is a heapsize issue, however sometimes this could also be due to temporary folder assignment. Can you also try changing the temporary folder to another location where you have plenty of read and write access?
--tmp-dir /path/to/tmp
If none of them works you may use tabix on bgzipped vcf files to index
tabix -p vcf vcffile.vcf.gz
or you may try using bcftools index option.
I hope any of these will help.
-
Hi,
Thanks for your reply. I'm going to try generating the index using tabix and tell you if that works. I'll also try your above suggestions, and report back on what works. Thanks!
Please sign in to leave a comment.
2 comments