Base recalibration
Hello
I need suggestions in creating index and dictionary with vcf files. For the base recalibration step, I downloaded Homo_sapiens_assembly38.known_indels.vcf.gz from the given link: https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0;tab=objects?prefix=&forceOnObjectsSortingFiltering=false
I am using GATK version 4.2.0.0
I used the below command to create dictionary for the indels vcf file
gatk UpdateVCFSequenceDictionary -V Homo_sapiens_assembly38.known_indels.vcf.gz -R "/scicore/home/cichon/GROUP/memory_optimization/data/reference/gch38.fa" --output Homo_sapiens_assembly38.known_indels.vcf.gz
##I got the below error
Using GATK jar /scicore/soft/apps/GATK/4.2.0.0-foss-2018b-Java-1.8/gatk-package-4.2.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /scicore/soft/apps/GATK/4.2.0.0-foss-2018b-Java-1.8/gatk-package-4.2.0.0-local.jar UpdateVCFSequenceDictionary -V Homo_sapiens_assembly38.known_indels.vcf.gz -R /scicore/home/cichon/GROUP/memory_optimization/data/reference/gch38.fa --output Homo_sapiens_assembly38.known_indels.vcf.gz
10:10:32.980 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/scicore/soft/apps/GATK/4.2.0.0-foss-2018b-Java-1.8/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Sep 30, 2021 10:10:33 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
10:10:33.303 INFO UpdateVCFSequenceDictionary - ------------------------------------------------------------
10:10:33.304 INFO UpdateVCFSequenceDictionary - The Genome Analysis Toolkit (GATK) v4.2.0.0
10:10:33.304 INFO UpdateVCFSequenceDictionary - For support and documentation go to https://software.broadinstitute.org/gatk/
10:10:33.304 INFO UpdateVCFSequenceDictionary - Executing as thirun0000@login20.cluster.bc2.ch on Linux v3.10.0-1160.41.1.el7.x86_64 amd64
10:10:33.304 INFO UpdateVCFSequenceDictionary - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_212-b03
10:10:33.304 INFO UpdateVCFSequenceDictionary - Start Date/Time: September 30, 2021 10:10:32 AM CEST
10:10:33.304 INFO UpdateVCFSequenceDictionary - ------------------------------------------------------------
10:10:33.304 INFO UpdateVCFSequenceDictionary - ------------------------------------------------------------
10:10:33.305 INFO UpdateVCFSequenceDictionary - HTSJDK Version: 2.24.0
10:10:33.305 INFO UpdateVCFSequenceDictionary - Picard Version: 2.25.0
10:10:33.305 INFO UpdateVCFSequenceDictionary - Built for Spark Version: 2.4.5
10:10:33.305 INFO UpdateVCFSequenceDictionary - HTSJDK Defaults.COMPRESSION_LEVEL : 2
10:10:33.305 INFO UpdateVCFSequenceDictionary - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:10:33.305 INFO UpdateVCFSequenceDictionary - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:10:33.305 INFO UpdateVCFSequenceDictionary - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:10:33.305 INFO UpdateVCFSequenceDictionary - Deflater: IntelDeflater
10:10:33.306 INFO UpdateVCFSequenceDictionary - Inflater: IntelInflater
10:10:33.306 INFO UpdateVCFSequenceDictionary - GCS max retries/reopens: 20
10:10:33.306 INFO UpdateVCFSequenceDictionary - Requester pays: disabled
10:10:33.306 INFO UpdateVCFSequenceDictionary - Initializing engine
10:10:33.786 INFO UpdateVCFSequenceDictionary - Shutting down engine
[September 30, 2021 10:10:33 AM CEST] org.broadinstitute.hellbender.tools.walkers.variantutils.UpdateVCFSequenceDictionary done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=162267136
***********************************************************************
A USER ERROR has occurred: Cannot read file:///scicore/home/cichon/GROUP/memory_optimization/data/index_dict/Homo_sapiens_assembly38.known_indels.vcf.gz because no suitable codecs found
Has anyone come across this error? Is there any indel vcf is used for masking in the base recalibration step?
-
Hi Priyadarshini Thirunavukkarasu,
Could you check that the known indels VCF file is not malformed by running ValidateVariants? If it is not malformed, you can also try re-indexing the VCF with IndexFeatureFile.
Let me know if these solve the issue.
Best,
Genevieve
-
Hello Genevieve
I couldn't index the VCF file. I get the error:
Cannot read file:///scicore/home/cichon/GROUP/memory_optimization/data/index_dict/Homo_sapiens_assembly38.known_indels.vcf.gz because no suitable codecs found
I also tried running ValidateVariants and I got the same error:
Cannot read file:///scicore/home/cichon/GROUP/memory_optimization/data/index_dict/Homo_sapiens_assembly38.known_indels.vcf.gz because no suitable codecs found
Thanks
-
This file might be malformed. Try deleting it then redownloading it.
-
Thanks. Can you suggest any indels vcf files from any website or links?. Previously, I have downloaded this file many times and it seem to cause the same error. Anyway, I will try this time downloading and repeating the same step.
-
Hello
I downloaded the Homo_sapiens_assembly38.known_indels.vcf.gz file from https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0?pli=1. I could create index for the file. When I tried to validate the vcf using the command below, I got the error:Input files reference and features have incompatible contigs: No overlapping contigs found
gatk ValidateVariants \
> -R "/scicore/home/cichon/GROUP/memory_optimization/data/reference/gch38.fa" \
> -V "/scicore/home/cichon/GROUP/memory_optimization/data/index_dict/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf"
Using GATK jar /scicore/soft/apps/GATK/4.2.2.0-foss-2018b-Java-1.8/gatk-package-4.2.2.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /scicore/soft/apps/GATK/4.2.2.0-foss-2018b-Java-1.8/gatk-package-4.2.2.0-local.jar ValidateVariants -R /scicore/home/cichon/GROUP/memory_optimization/data/reference/gch38.fa -V /scicore/home/cichon/GROUP/memory_optimization/data/index_dict/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf
12:56:48.650 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/scicore/soft/apps/GATK/4.2.2.0-foss-2018b-Java-1.8/gatk-package-4.2.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Oct 04, 2021 12:56:48 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
12:56:48.795 INFO ValidateVariants - ------------------------------------------------------------
12:56:48.796 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.2.2.0
12:56:48.796 INFO ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
12:56:48.796 INFO ValidateVariants - Executing as thirun0000@login20.cluster.bc2.ch on Linux v3.10.0-1160.41.1.el7.x86_64 amd64
12:56:48.796 INFO ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_212-b03
12:56:48.797 INFO ValidateVariants - Start Date/Time: October 4, 2021 12:56:48 PM CEST
12:56:48.797 INFO ValidateVariants - ------------------------------------------------------------
12:56:48.797 INFO ValidateVariants - ------------------------------------------------------------
12:56:48.797 INFO ValidateVariants - HTSJDK Version: 2.24.1
12:56:48.797 INFO ValidateVariants - Picard Version: 2.25.4
12:56:48.798 INFO ValidateVariants - Built for Spark Version: 2.4.5
12:56:48.798 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
12:56:48.798 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
12:56:48.798 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
12:56:48.798 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
12:56:48.798 INFO ValidateVariants - Deflater: IntelDeflater
12:56:48.798 INFO ValidateVariants - Inflater: IntelInflater
12:56:48.798 INFO ValidateVariants - GCS max retries/reopens: 20
12:56:48.798 INFO ValidateVariants - Requester pays: disabled
12:56:48.799 INFO ValidateVariants - Initializing engine
12:56:49.262 INFO FeatureManager - Using codec VCFCodec to read file file:///scicore/home/cichon/GROUP/memory_optimization/data/index_dict/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf
12:56:49.956 INFO ValidateVariants - Shutting down engine
[October 4, 2021 12:56:49 PM CEST] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=162267136
***********************************************************************
A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found.VCF seem to be mapped to GRCh37, whereas the reference is GRCh38. So, I am not able to create dictionary for this vcf. Is there any indel vcf mapped to GRCh38?
-
One of our users located the files available and listed them here: https://gatk.broadinstitute.org/hc/en-us/community/posts/360075305092/comments/360014557672
Hope you can find what you need!
-
Thanks
Please sign in to leave a comment.
7 comments