af-only-gnomad.raw.sites.vcf
Hello, GATK team
I am using GATK 4.2.2.0 and trying to use af-only-gnomad.raw.sites.vcf (from GATK best-practices storage -> somaticb-37) as my germline resource for somatic analysis on the code:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /workspace/Somatico_hg19/gatk-4.2.2.0/gatk-package-4.2.2.0-local.jar GetPileupSummaries -I amostra1_sorted_rmdup_F4.bam -V af-only-gnomad.raw.sites.vcf -L amostra1_coverageBed20x.interval_list -O amostra2.table
Picked up JAVA_TOOL_OPTIONS: -Xmx12884m
18:54:22.004 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/workspace/Somatico_hg19/gatk-4.2.2.0/gatk-package-4.2.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Mar 04, 2024 6:54:22 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
18:54:22.130 INFO GetPileupSummaries - ------------------------------------------------------------
18:54:22.130 INFO GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.2.2.0
18:54:22.131 INFO GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
18:54:22.131 INFO GetPileupSummaries - Executing as gitpod@mcollodetti-somaticohg1-fxf17lnuh64 on Linux v6.1.75-060175-generic amd64
18:54:22.131 INFO GetPileupSummaries - Java runtime: OpenJDK 64-Bit Server VM v11.0.22+7-LTS
18:54:22.131 INFO GetPileupSummaries - Start Date/Time: March 4, 2024 at 6:54:21 PM UTC
18:54:22.131 INFO GetPileupSummaries - ------------------------------------------------------------
18:54:22.131 INFO GetPileupSummaries - ------------------------------------------------------------
18:54:22.132 INFO GetPileupSummaries - HTSJDK Version: 2.24.1
18:54:22.132 INFO GetPileupSummaries - Picard Version: 2.25.4
18:54:22.132 INFO GetPileupSummaries - Built for Spark Version: 2.4.5
18:54:22.132 INFO GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
18:54:22.132 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
18:54:22.132 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
18:54:22.132 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
18:54:22.132 INFO GetPileupSummaries - Deflater: IntelDeflater
18:54:22.132 INFO GetPileupSummaries - Inflater: IntelInflater
18:54:22.132 INFO GetPileupSummaries - GCS max retries/reopens: 20
18:54:22.133 INFO GetPileupSummaries - Requester pays: disabled
18:54:22.133 INFO GetPileupSummaries - Initializing engine
18:54:22.276 INFO GetPileupSummaries - Shutting down engine
[March 4, 2024 at 6:54:22 PM UTC] org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=1054867456
***********************************************************************
A USER ERROR has occurred: Cannot read file:///workspace/Somatico_hg19/af-only-gnomad.raw.sites.vcf because no suitable codecs found
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
Is there any processing that needs to be done before using af-only-gnomad.raw.sites.vcf file?
-
Changed GATK version to 4.2.6.1 and tried the following:
./gatk-4.2.6.1/gatk GetPileupSummaries \--disable-sequence-dictionary-validation true \-I amostra1_sorted_rmdup_F4.bam \-V af-only-gnomad.raw.sites.vcf \-L amostra1_coverageBed20x.interval_list \-O amostra1.tableBut the same error occured -
I also tried the rename step that Field-Ye-Tian suggested (https://gatk.broadinstitute.org/hc/en-us/community/posts/360073450391-A-USER-ERROR-has-occurred-af-only-gnomad-hg38-vcf-gz-because-no-suitable-codecs-found) but it did not work.
-
I also tried:
wget https://storage.googleapis.com/gatk-best-practices/somatic-b37/af-only-gnomad.raw.sites.vcf
bgzip af-only-gnomad.raw.sites.vcf
bcftools index -t af-only-gnomad.raw.sites.vcf.gz
Then I tried
./gatk-4.2.6.1/gatk GetPileupSummaries \--disable-sequence-dictionary-validation true \-I amostra1_sorted_rmdup_F4.bam \-V af-only-gnomad.raw.sites.vcf.1.gz.tbi \-L amostra1_coverageBed20x.interval_list \-O amostra1.tableand got the same error: A USER ERROR has occurred: Cannot read file:///workspace/Somatico_hg19/af-only-gnomad.raw.sites.vcf.1.gz.tbi because no suitable codecs found -
New try with GATK 4.2.2.0 and af-only-gnomad.raw.sites.vcf:
gatk-4.2.2.0/gatk IndexFeatureFile -I af-only-gnomad.raw.sites.vcf
after index was done:
./gatk-4.2.2.0/gatk GetPileupSummaries \-I amostra1_sorted_rmdup_F4.bam \-V af-only-gnomad.raw.sites.vcf.idx \-L amostra1_coverageBed20x.interval_list \-O amostra1.tableGot the same error: A USER ERROR has occurred: Cannot read file:///workspace/Somatico_hg19/af-only-gnomad.raw.sites.vcf.idx because no suitable codecs found -
IT FINALLY WORKED:
./gatk-4.2.6.1/gatk GetPileupSummaries \--disable-sequence-dictionary-validation true \-I amostra1_sorted_rmdup_F4.bam \-V af-only-gnomad.raw.sites.vcf.1 \-L amostra1_coverageBed20x.interval_list \-O amostra1.table -
Thanks for sharing your resolve in this matter. Looks like your former download was cut and file was downloaded with a different name. Which is probably what is happening to many of the similar cases posted here.
Regards.
-
You're welcome.
Basically, what needs to be done is:
-Fix the link to https://storage.googleapis.com/gatk-best-practices/somatic-b37/af-only-gnomad.raw.sites.vcf
-Index the raw file with:
gatk-4.2.2.0/gatk IndexFeatureFile -I af-only-gnomad.raw.sites.vcf
-Use GATK 4.2.6.1 to ignore the 'chr' prefix with:
./gatk-4.2.6.1/gatk GetPileupSummaries \--disable-sequence-dictionary-validation true \-I amostra1_sorted_rmdup_F4.bam \-V af-only-gnomad.raw.sites.vcf.1 \-L amostra1_coverageBed20x.interval_list \-O amostra1.table -
Melcar Collodetti A few comments on the workflow, which I hope will be helpful:
- The -L argument in GetPileupSummaries should usually be the same vcf file as the -V argument.
- We recommend using the small_exac_common VCF in the same best practices google bucket as the -V argument, not the af-only-gnomad.
- GATK tools require VCF inputs to be indexed but it is the VCF itself, not the .idx or .tbi index file that is used as the input.
- If you install gsutil (google cloud utilities) you can run GATK tools using the gs:// bucket path for the -V argument (likewise for -I bam/cram files and -R reference files) without downloading them to your local machine. The GATK uses the java NIO library to read only part of the files at once. This also works with some files in a google bucket and some files present locally.
- The workflow https://github.com/broadinstitute/gatk/blob/master/scripts/mutect2_wdl/mutect2.wdl in the GATK github repo does all of this for you.
Please sign in to leave a comment.
8 comments