ApplyVQSR: "[W::bgzf_read_block] EOF marker is absent" and "[tabix] the index file exists." errors
Hi GATK team,
I've been trying to use a Hail Batch VQSR pipeline from Lindo Nkambule, but we've both been stuck on the following errors across multiple similar jobs. Any hints or pointers would be greatly appreciated!
a) GATK version used:
The Genome Analysis Toolkit (GATK) v4.2.6.1
b) Exact command used:
'/bin/bash' '-c' ' set -e mkdir -p /io/batch/1d7deb/VQSR__ApplyRecalibration-aHIm3 { { set -euo pipefail gatk --java-options "-Xms5g" \ ApplyVQSR \ -O tmp.indel.recalibrated.vcf \ -V gs://vqsr-test/subset_chr1_emptier_vqsr-ready.vcf.bgz \ --recal-file ${BATCH_TMPDIR}/VQSR__INDELsVariantRecalibratorScattered-whsBo/recalibration \ --tranches-file ${BATCH_TMPDIR}/VQSR__INDELGatherTranches-Oui6g/out_tranches \ --truth-sensitivity-filter-level 99.0 \ --create-output-variant-index true \ -L ${BATCH_TMPDIR}/Make_500_intervals-S4qc7/intervals/0002-scattered.interval_list \ --use-allele-specific-annotations \ -mode INDEL rm ${BATCH_TMPDIR}/VQSR__INDELsVariantRecalibratorScattered-whsBo/recalibration ${BATCH_TMPDIR}/VQSR__INDELGatherTranches-Oui6g/out_tranches gatk --java-options "-Xms5g" \ ApplyVQSR \ -O ${BATCH_TMPDIR}/VQSR__ApplyRecalibration-aHIm3/output_vcf.vcf.gz \ -V tmp.indel.recalibrated.vcf \ --recal-file ${BATCH_TMPDIR}/VQSR__SNPsVariantRecalibratorScattered-4rnLh/recalibration \ --tranches-file ${BATCH_TMPDIR}/VQSR__SNPGatherTranches-nmBSc/out_tranches \ --truth-sensitivity-filter-level 99.7 \ --create-output-variant-index true \ -L ${BATCH_TMPDIR}/Make_500_intervals-S4qc7/intervals/0002-scattered.interval_list \ --use-allele-specific-annotations \ -mode SNP } { interval=$(cat ${BATCH_TMPDIR}/Make_500_intervals-S4qc7/intervals/0002-scattered.interval_list | tail -n1 | awk '{print $1":"$2"-"$3}') bcftools view -t $interval ${BATCH_TMPDIR}/VQSR__ApplyRecalibration-aHIm3/output_vcf.vcf.gz --output-file ${BATCH_TMPDIR}/VQSR__ApplyRecalibration-aHIm3/output_vcf.vcf.gz --output-type z tabix ${BATCH_TMPDIR}/VQSR__ApplyRecalibration-aHIm3/output_vcf.vcf.gz } } '
c) Entire program log:
Using GATK jar /opt/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms5g -jar /opt/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar ApplyVQSR -O tmp.indel.recalibrated.vcf -V gs://vqsr-test/subset_chr1_emptier_vqsr-ready.vcf.bgz --recal-file /io/batch/1d7deb/VQSR__INDELsVariantRecalibratorScattered-whsBo/recalibration --tranches-file /io/batch/1d7deb/VQSR__INDELGatherTranches-Oui6g/out_tranches --truth-sensitivity-filter-level 99.0 --create-output-variant-index true -L /io/batch/1d7deb/Make_500_intervals-S4qc7/intervals/0002-scattered.interval_list --use-allele-specific-annotations -mode INDEL
22:50:10.378 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
22:50:10.603 INFO ApplyVQSR - ------------------------------------------------------------
22:50:10.604 INFO ApplyVQSR - The Genome Analysis Toolkit (GATK) v4.2.6.1
22:50:10.604 INFO ApplyVQSR - For support and documentation go to https://software.broadinstitute.org/gatk/
22:50:10.604 INFO ApplyVQSR - Executing as root@hostname-032da1825c on Linux v5.4.0-1042-gcp amd64
22:50:10.605 INFO ApplyVQSR - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
22:50:10.605 INFO ApplyVQSR - Start Date/Time: January 25, 2023 10:50:10 PM GMT
22:50:10.605 INFO ApplyVQSR - ------------------------------------------------------------
22:50:10.605 INFO ApplyVQSR - ------------------------------------------------------------
22:50:10.606 INFO ApplyVQSR - HTSJDK Version: 2.24.1
22:50:10.606 INFO ApplyVQSR - Picard Version: 2.27.1
22:50:10.606 INFO ApplyVQSR - Built for Spark Version: 2.4.5
22:50:10.606 INFO ApplyVQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 2
22:50:10.606 INFO ApplyVQSR - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
22:50:10.606 INFO ApplyVQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
22:50:10.606 INFO ApplyVQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
22:50:10.606 INFO ApplyVQSR - Deflater: IntelDeflater
22:50:10.606 INFO ApplyVQSR - Inflater: IntelInflater
22:50:10.606 INFO ApplyVQSR - GCS max retries/reopens: 20
22:50:10.606 INFO ApplyVQSR - Requester pays: disabled
22:50:10.607 INFO ApplyVQSR - Initializing engine
22:50:11.153 INFO FeatureManager - Using codec VCFCodec to read file file:///io/batch/1d7deb/VQSR__INDELsVariantRecalibratorScattered-whsBo/recalibration
22:50:12.940 INFO FeatureManager - Using codec VCFCodec to read file gs://vqsr-test/subset_chr1_emptier_vqsr-ready.vcf.bgz
22:50:16.195 INFO FeatureManager - Using codec IntervalListCodec to read file file:///io/batch/1d7deb/Make_500_intervals-S4qc7/intervals/0002-scattered.interval_list
22:50:16.271 INFO IntervalArgumentCollection - Processing 6169884 bp from intervals
22:50:16.370 INFO ApplyVQSR - Done initializing engine
22:50:16.373 INFO ApplyVQSR - Read tranche TruthSensitivityTranche targetTruthSensitivity=90.00 minVQSLod=0.0900 known=(17246 @ 0.0000) novel=(533030 @ 0.0000) truthSites(8218 accessible, 7397 called), name=VQSRTrancheINDEL0.00to90.00]
22:50:16.374 INFO ApplyVQSR - Read tranche TruthSensitivityTranche targetTruthSensitivity=99.00 minVQSLod=-10.0000 known=(20272 @ 0.0000) novel=(1120347 @ 0.0000) truthSites(8218 accessible, 7759 called), name=VQSRTrancheINDEL90.00to99.00]
22:50:16.403 INFO ApplyVQSR - Keeping all variants in tranche TruthSensitivityTranche targetTruthSensitivity=99.00 minVQSLod=-10.0000 known=(20272 @ 0.0000) novel=(1120347 @ 0.0000) truthSites(8218 accessible, 7759 called), name=VQSRTrancheINDEL90.00to99.00]
22:50:16.473 INFO ProgressMeter - Starting traversal
22:50:16.474 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
22:50:26.490 INFO ProgressMeter - chr1:16576294 0.2 306000 1833067.1
22:50:28.381 INFO ProgressMeter - chr1:18485011 0.2 437427 2204402.8
22:50:28.381 INFO ProgressMeter - Traversal complete. Processed 437427 total variants in 0.2 minutes.
22:50:28.396 INFO ApplyVQSR - Shutting down engine
[January 25, 2023 10:50:28 PM GMT] org.broadinstitute.hellbender.tools.walkers.vqsr.ApplyVQSR done. Elapsed time: 0.30 minutes.
Runtime.totalMemory()=5189795840
Using GATK jar /opt/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms5g -jar /opt/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar ApplyVQSR -O /io/batch/1d7deb/VQSR__ApplyRecalibration-aHIm3/output_vcf.vcf.gz -V tmp.indel.recalibrated.vcf --recal-file /io/batch/1d7deb/VQSR__SNPsVariantRecalibratorScattered-4rnLh/recalibration --tranches-file /io/batch/1d7deb/VQSR__SNPGatherTranches-nmBSc/out_tranches --truth-sensitivity-filter-level 99.7 --create-output-variant-index true -L /io/batch/1d7deb/Make_500_intervals-S4qc7/intervals/0002-scattered.interval_list --use-allele-specific-annotations -mode SNP
22:50:31.331 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
22:50:31.502 INFO ApplyVQSR - ------------------------------------------------------------
22:50:31.502 INFO ApplyVQSR - The Genome Analysis Toolkit (GATK) v4.2.6.1
22:50:31.502 INFO ApplyVQSR - For support and documentation go to https://software.broadinstitute.org/gatk/
22:50:31.502 INFO ApplyVQSR - Executing as root@hostname-032da1825c on Linux v5.4.0-1042-gcp amd64
22:50:31.503 INFO ApplyVQSR - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
22:50:31.503 INFO ApplyVQSR - Start Date/Time: January 25, 2023 10:50:31 PM GMT
22:50:31.503 INFO ApplyVQSR - ------------------------------------------------------------
22:50:31.503 INFO ApplyVQSR - ------------------------------------------------------------
22:50:31.504 INFO ApplyVQSR - HTSJDK Version: 2.24.1
22:50:31.504 INFO ApplyVQSR - Picard Version: 2.27.1
22:50:31.504 INFO ApplyVQSR - Built for Spark Version: 2.4.5
22:50:31.504 INFO ApplyVQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 2
22:50:31.504 INFO ApplyVQSR - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
22:50:31.504 INFO ApplyVQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
22:50:31.504 INFO ApplyVQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
22:50:31.504 INFO ApplyVQSR - Deflater: IntelDeflater
22:50:31.504 INFO ApplyVQSR - Inflater: IntelInflater
22:50:31.504 INFO ApplyVQSR - GCS max retries/reopens: 20
22:50:31.504 INFO ApplyVQSR - Requester pays: disabled
22:50:31.504 INFO ApplyVQSR - Initializing engine
22:50:31.914 INFO FeatureManager - Using codec VCFCodec to read file file:///io/batch/1d7deb/VQSR__SNPsVariantRecalibratorScattered-4rnLh/recalibration
22:50:32.015 INFO FeatureManager - Using codec VCFCodec to read file file:///tmp.indel.recalibrated.vcf
22:50:32.288 INFO FeatureManager - Using codec IntervalListCodec to read file file:///io/batch/1d7deb/Make_500_intervals-S4qc7/intervals/0002-scattered.interval_list
22:50:32.374 INFO IntervalArgumentCollection - Processing 6169884 bp from intervals
22:50:32.458 INFO ApplyVQSR - Done initializing engine
22:50:32.461 INFO ApplyVQSR - Read tranche TruthSensitivityTranche targetTruthSensitivity=90.00 minVQSLod=4.2700 known=(285661 @ 3.4366) novel=(2264200 @ 1.2189) truthSites(36950 accessible, 33259 called), name=VQSRTrancheSNP0.00to90.00]
22:50:32.461 INFO ApplyVQSR - Read tranche TruthSensitivityTranche targetTruthSensitivity=99.00 minVQSLod=-7.7000 known=(369813 @ 3.2645) novel=(12604411 @ 0.9987) truthSites(36950 accessible, 36580 called), name=VQSRTrancheSNP90.00to99.00]
22:50:32.461 INFO ApplyVQSR - Read tranche TruthSensitivityTranche targetTruthSensitivity=99.90 minVQSLod=-10.0000 known=(371594 @ 3.2621) novel=(12750020 @ 0.9989) truthSites(36950 accessible, 36628 called), name=VQSRTrancheSNP99.00to99.90]
22:50:32.480 INFO ApplyVQSR - Keeping all variants in tranche TruthSensitivityTranche targetTruthSensitivity=99.90 minVQSLod=-10.0000 known=(371594 @ 3.2621) novel=(12750020 @ 0.9989) truthSites(36950 accessible, 36628 called), name=VQSRTrancheSNP99.00to99.90]
22:50:32.547 INFO ProgressMeter - Starting traversal
22:50:32.547 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
22:50:42.550 INFO ProgressMeter - chr1:15728569 0.2 199000 1193880.6
22:50:51.580 INFO ProgressMeter - chr1:18485011 0.3 437427 1379098.3
22:50:51.580 INFO ProgressMeter - Traversal complete. Processed 437427 total variants in 0.3 minutes.
22:50:51.586 INFO ApplyVQSR - Shutting down engine
[January 25, 2023 10:50:51 PM GMT] org.broadinstitute.hellbender.tools.walkers.vqsr.ApplyVQSR done. Elapsed time: 0.34 minutes.
Runtime.totalMemory()=5189795840
[W::bgzf_read_block] EOF marker is absent. The input is probably truncated
[tabix] the index file exists. Please use '-f' to overwrite.
Please sign in to leave a comment.
0 comments