Funcotator Misconfigured Error (All IGR Annotations)
Answered
I am experiencing an issue with funcotator:
17:10:41.501 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/env/share/gatk4-4.2.3.0-1/gatk-package-4.2.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
May 20, 2022 5:10:41 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
17:10:41.824 INFO Funcotator - ------------------------------------------------------------
17:10:41.825 INFO Funcotator - The Genome Analysis Toolkit (GATK) v4.2.3.0
17:10:41.825 INFO Funcotator - For support and documentation go to https://software.broadinstitute.org/gatk/
17:10:41.826 INFO Funcotator - Executing as fridellsa@cn0859 on Linux v3.10.0-862.14.4.el7.x86_64 amd64
17:10:41.826 INFO Funcotator - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_282-b08
17:10:41.826 INFO Funcotator - Start Date/Time: May 20, 2022 5:10:41 PM EDT
17:10:41.826 INFO Funcotator - ------------------------------------------------------------
17:10:41.826 INFO Funcotator - ------------------------------------------------------------
17:10:41.827 INFO Funcotator - HTSJDK Version: 2.24.1
17:10:41.827 INFO Funcotator - Picard Version: 2.25.4
17:10:41.827 INFO Funcotator - Built for Spark Version: 2.4.5
17:10:41.827 INFO Funcotator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
17:10:41.828 INFO Funcotator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
17:10:41.828 INFO Funcotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
17:10:41.828 INFO Funcotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
17:10:41.828 INFO Funcotator - Deflater: IntelDeflater
17:10:41.828 INFO Funcotator - Inflater: IntelInflater
17:10:41.828 INFO Funcotator - GCS max retries/reopens: 20
17:10:41.828 INFO Funcotator - Requester pays: disabled
17:10:41.828 INFO Funcotator - Initializing engine
17:10:42.801 INFO FeatureManager - Using codec VCFCodec to read file file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/./results/filtered/all.final.vcf.gz
17:10:43.188 INFO Funcotator - Done initializing engine
17:10:43.188 INFO Funcotator - Validating sequence dictionaries...
17:10:43.198 INFO Funcotator - Processing user transcripts/defaults/overrides...
17:10:43.199 INFO Funcotator - Initializing data sources...
17:10:43.205 INFO DataSourceUtils - Initializing data sources from directory: ./references/funcotator_dataSources.v1.7.20200521g
17:10:43.208 INFO DataSourceUtils - Data sources version: 1.7.2020521g
17:10:43.208 INFO DataSourceUtils - Data sources source: ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/funcotator/funcotator_dataSources.v1.7.20200521g.tar.gz
17:10:43.209 INFO DataSourceUtils - Data sources alternate source: gs://broad-public-datasets/funcotator/funcotator_dataSources.v1.7.20200521.tar.gz
17:10:43.234 INFO DataSourceUtils - Resolved data source file path: file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/clinvar_20180429_hg38.vcf -> file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/./references/funcotator_dataSources.v1.7.20200521g/clinvar/hg38/clinvar_20180429_hg38.vcf
17:10:43.238 INFO DataSourceUtils - Resolved data source file path: file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/acmg59_test_cleaned.txt -> file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/./references/funcotator_dataSources.v1.7.20200521g/acmg_rec/hg38/acmg59_test_cleaned.txt
17:10:43.242 INFO DataSourceUtils - Resolved data source file path: file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/acmg_lof.tsv -> file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/./references/funcotator_dataSources.v1.7.20200521g/acmg_lof/hg38/acmg_lof.tsv
17:10:43.248 INFO DataSourceUtils - Resolved data source file path: file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/LMM_Path_LP_VUS5-variants-6-12-18.sorted_liftover_b38.corrected.vcf -> file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/./references/funcotator_dataSources.v1.7.20200521g/lmm_known/hg38/LMM_Path_LP_VUS5-variants-6-12-18.sorted_liftover_b38.corrected.vcf
17:10:43.252 INFO DataSourceUtils - Resolved data source file path: file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/gencode.v34.annotation.REORDERED.gtf -> file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/./references/funcotator_dataSources.v1.7.20200521g/gencode/hg38/gencode.v34.annotation.REORDERED.gtf
17:10:43.254 INFO DataSourceUtils - Resolved data source file path: file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/gencode.v34.pc_transcripts.fa -> file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/./references/funcotator_dataSources.v1.7.20200521g/gencode/hg38/gencode.v34.pc_transcripts.fa
17:10:43.255 INFO Funcotator - Finalizing data sources (this step can be long if data sources are cloud-based)...
17:10:43.258 INFO DataSourceUtils - Resolved data source file path: file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/clinvar_20180429_hg38.vcf -> file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/./references/funcotator_dataSources.v1.7.20200521g/clinvar/hg38/clinvar_20180429_hg38.vcf
17:10:43.258 INFO DataSourceUtils - Setting lookahead cache for data source: ClinVar_VCF : 100000
17:10:43.305 INFO FeatureManager - Using codec VCFCodec to read file file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/./references/funcotator_dataSources.v1.7.20200521g/clinvar/hg38/clinvar_20180429_hg38.vcf
17:10:44.267 INFO DataSourceUtils - Resolved data source file path: file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/clinvar_20180429_hg38.vcf -> file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/./references/funcotator_dataSources.v1.7.20200521g/clinvar/hg38/clinvar_20180429_hg38.vcf
17:10:44.335 INFO FeatureManager - Using codec VCFCodec to read file file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/./references/funcotator_dataSources.v1.7.20200521g/clinvar/hg38/clinvar_20180429_hg38.vcf
17:10:44.404 INFO DataSourceUtils - Resolved data source file path: file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/acmg59_test_cleaned.txt -> file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/./references/funcotator_dataSources.v1.7.20200521g/acmg_rec/hg38/acmg59_test_cleaned.txt
17:10:44.407 INFO DataSourceUtils - Resolved data source file path: file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/acmg_lof.tsv -> file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/./references/funcotator_dataSources.v1.7.20200521g/acmg_lof/hg38/acmg_lof.tsv
17:10:44.410 INFO DataSourceUtils - Resolved data source file path: file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/LMM_Path_LP_VUS5-variants-6-12-18.sorted_liftover_b38.corrected.vcf -> file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/./references/funcotator_dataSources.v1.7.20200521g/lmm_known/hg38/LMM_Path_LP_VUS5-variants-6-12-18.sorted_liftover_b38.corrected.vcf
17:10:44.410 INFO DataSourceUtils - Setting lookahead cache for data source: LMMKnown : 100000
17:10:44.426 INFO FeatureManager - Using codec VCFCodec to read file file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/./references/funcotator_dataSources.v1.7.20200521g/lmm_known/hg38/LMM_Path_LP_VUS5-variants-6-12-18.sorted_liftover_b38.corrected.vcf
17:10:44.446 INFO DataSourceUtils - Resolved data source file path: file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/LMM_Path_LP_VUS5-variants-6-12-18.sorted_liftover_b38.corrected.vcf -> file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/./references/funcotator_dataSources.v1.7.20200521g/lmm_known/hg38/LMM_Path_LP_VUS5-variants-6-12-18.sorted_liftover_b38.corrected.vcf
17:10:44.451 INFO FeatureManager - Using codec VCFCodec to read file file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/./references/funcotator_dataSources.v1.7.20200521g/lmm_known/hg38/LMM_Path_LP_VUS5-variants-6-12-18.sorted_liftover_b38.corrected.vcf
17:10:44.454 INFO DataSourceUtils - Resolved data source file path: file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/gencode.v34.annotation.REORDERED.gtf -> file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/./references/funcotator_dataSources.v1.7.20200521g/gencode/hg38/gencode.v34.annotation.REORDERED.gtf
17:10:44.454 INFO DataSourceUtils - Setting lookahead cache for data source: Gencode : 100000
17:10:44.480 INFO FeatureManager - Using codec GencodeGtfCodec to read file file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/./references/funcotator_dataSources.v1.7.20200521g/gencode/hg38/gencode.v34.annotation.REORDERED.gtf
17:10:44.516 INFO DataSourceUtils - Resolved data source file path: file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/gencode.v34.pc_transcripts.fa -> file:///gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/./references/funcotator_dataSources.v1.7.20200521g/gencode/hg38/gencode.v34.pc_transcripts.fa
17:10:51.522 INFO Funcotator - Initializing Funcotator Engine...
17:10:51.524 INFO FuncotatorEngine - Using given VCF and Reference. No conversion required.
17:10:51.527 INFO Funcotator - Creating a VCF file for output: file:/gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/./results/annotated/variants.funcotated.vcf
17:10:51.595 INFO ProgressMeter - Starting traversal
17:10:51.595 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
17:11:01.889 INFO ProgressMeter - 1:24734265 0.2 7000 40808.4
17:11:12.356 INFO ProgressMeter - 1:77461640 0.3 21000 60693.6
17:11:22.538 INFO ProgressMeter - 1:163138580 0.5 41000 79503.6
17:11:32.681 INFO ProgressMeter - 1:248527573 0.7 63000 92004.4
17:11:42.743 INFO ProgressMeter - 10:61287718 0.9 84000 98539.5
17:11:52.764 INFO ProgressMeter - 11:13484795 1.0 106000 103977.6
17:12:02.869 INFO ProgressMeter - 11:94383232 1.2 128000 107754.7
17:12:12.925 INFO ProgressMeter - 12:39741050 1.4 150000 110661.6
17:12:22.948 INFO ProgressMeter - 12:123528759 1.5 172000 112969.6
17:12:32.962 INFO ProgressMeter - 13:97209953 1.7 194000 114831.4
17:12:42.966 INFO ProgressMeter - 14:85715778 1.9 216000 116368.9
17:12:53.005 INFO ProgressMeter - 15:79143791 2.0 238000 117619.0
17:13:03.422 INFO ProgressMeter - 16:56908272 2.2 260000 118337.8
17:13:13.427 INFO ProgressMeter - 17:42318366 2.4 282000 119296.9
17:13:23.659 INFO ProgressMeter - 18:39234210 2.5 304000 119950.3
17:13:33.715 INFO ProgressMeter - 19:36984135 2.7 326000 120652.1
17:13:44.150 INFO ProgressMeter - 2:59426982 2.9 349000 121353.3
17:13:54.254 INFO ProgressMeter - 2:150643634 3.0 371000 121867.1
17:14:04.323 INFO ProgressMeter - 2:241333661 3.2 393000 122349.2
17:14:14.426 INFO ProgressMeter - 21:9000718 3.4 415000 122762.9
17:14:24.548 INFO ProgressMeter - 22:46521181 3.5 437000 123126.3
17:14:34.621 INFO ProgressMeter - 3:78947119 3.7 459000 123483.9
17:14:44.747 INFO ProgressMeter - 3:168064602 3.9 481000 123783.0
17:14:54.829 INFO ProgressMeter - 4:49655655 4.1 503000 124078.6
17:15:04.849 INFO ProgressMeter - 4:135472268 4.2 525000 124381.5
17:15:14.870 INFO ProgressMeter - 5:33123255 4.4 547000 124661.0
17:15:24.987 INFO ProgressMeter - 5:121922385 4.6 569000 124876.5
17:15:35.082 INFO ProgressMeter - 6:25331244 4.7 591000 125085.5
17:15:45.201 INFO ProgressMeter - 6:106395208 4.9 613000 125270.3
17:15:55.285 INFO ProgressMeter - 7:23776325 5.1 635000 125457.3
17:16:05.719 INFO ProgressMeter - 7:107198462 5.2 658000 125683.3
17:16:16.164 INFO ProgressMeter - 8:33013581 5.4 681000 125890.8
17:16:26.611 INFO ProgressMeter - 8:132700274 5.6 704000 126083.9
17:16:37.040 INFO ProgressMeter - 9:95747829 5.8 727000 126272.3
17:16:47.258 INFO ProgressMeter - X:55084250 5.9 749000 126355.9
17:16:57.696 INFO ProgressMeter - Y:26653627 6.1 771000 126358.9
17:16:58.179 WARN IntelInflater - Zero Bytes Written : 0
17:16:58.261 INFO ProgressMeter - Y:56858213 6.1 772191 126359.1
17:16:58.261 INFO ProgressMeter - Traversal complete. Processed 772191 total variants in 6.1 minutes.
17:16:58.262 WARN Funcotator - ================================================================================
17:16:58.262 WARN Funcotator - [43m _ _ _ __ __ _ _ _ _
17:16:58.262 WARN Funcotator - | || || | \ \ / /_ _ _ __ _ __ (_)_ __ __ _ | || || |
17:16:58.262 WARN Funcotator - | || || | \ \ /\ / / _` | '__| '_ \| | '_ \ / _` | | || || |
17:16:58.262 WARN Funcotator - |_||_||_| \ \V V / (_| | | | | | | | | | | (_| | |_||_||_|
17:16:58.262 WARN Funcotator - (_)(_)(_) \_/\_/ \__,_|_| |_| |_|_|_| |_|\__, | (_)(_)(_)
17:16:58.262 WARN Funcotator - |___/ [0;0m
17:16:58.262 WARN Funcotator - --------------------------------------------------------------------------------
17:16:58.262 WARN Funcotator - Only IGRs were produced for this dataset. This STRONGLY indicates that this
17:16:58.262 WARN Funcotator - run was misconfigured.
17:16:58.262 WARN Funcotator - You MUST check your data sources to make sure they are correct for these data.
17:16:58.262 WARN Funcotator - ================================================================================
17:16:58.263 INFO VcfFuncotationFactory - ClinVar_VCF 20180429_hg38 cache hits/total: 0/0
17:16:58.263 INFO VcfFuncotationFactory - LMMKnown 20180618 cache hits/total: 0/0
17:16:58.593 INFO Funcotator - Shutting down engine
[May 20, 2022 5:16:58 PM EDT] org.broadinstitute.hellbender.tools.funcotator.Funcotator done. Elapsed time: 6.29 minutes.
Runtime.totalMemory()=1228406784
Tool returned:
true
Using GATK jar /gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/env/share/gatk4-4.2.3.0-1/gatk-package-4.2.3.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /gpfs/gsfs10/users/NICHD-core0/analysis/klubo-gwiezdzinska/mptc-variant-calling/env/share/gatk4-4.2.3.0-1/gatk-package-4.2.3.0-local.jar Funcotator --variant ./results/filtered/all.final.vcf.gz --reference ./references/GRCh38.fa.gz --ref-version hg38 --data-sources-path ./references/funcotator_dataSources.v1.7.20200521g --output ./results/annotated/variants.funcotated.vcf --output-file-format VCF
This is the command I ran:
gatk --java-options "-DGATK_STACKTRACE_ON_USER_EXCEPTION=true" Funcotator \
--variant ./results/filtered/all.final.vcf.gz \
--reference ./references/GRCh38.fa.gz \
--ref-version hg38 \
--data-sources-path ./references/funcotator_dataSources.v1.7.20200521g \
--output ./results/annotated/variants.funcotated.vcf \
--output-file-format VCF
This is the workflow I followed: https://github.com/snakemake-workflows/dna-seq-gatk-variant-calling (very standard GATK)
Here's where I believe the issue may lie:
- I did not alter any of the config files for the data sources after i downloaded them using the funcotator data source downloader for germline https://gatk.broadinstitute.org/hc/en-us/articles/5358893070491-FuncotatorDataSourceDownloader (maybe I should have?)
- I called the germline variants using intervals from a specified bed file from a collaborator. Unfortunately, the collaborator couldn't give me any information on what the build for the genome was used to make the bed.
- The reference genome used for calling was GRCH38. The nomenclature for GRCH38 is to call chromosomes by number, so 1 and not chr1. However, the bed files had chr1. I had to change these to do the interval calling.
-
I was able to determine from aligning the bed file with the ucsc genome browser that the coordinates are in hg38.
-
Hi Gus Fridell,
This looks like you have an issue with your resource files mismatching in terms of their reference versions. There are usually more differences between hg38 and GRCH38 than just changing the naming from "chr1" to "1". I would recommend looking more closely to determine that your reference versions are consistent. For changing reference versions, we recommend the LiftOver tool to be sure that all the information is changed properly.
I would also recommend that you update your Funcotator data sources because there were some issues in the 1.7 data sources I believe.
Please let me know if you have any other questions.
Best,
Genevieve
-
Hi Genevieve,
Thanks for your response! What data resource do you recommend using? I checked the google share and the ftp server, and based on the filenames, v1.7 are the latest (2020).
I will try to investigate the references more.
-
Yeah, it does look like the latest published online. It might have just been an issue with the somatic data sources. The person I would need to ask about this is on vacation for a couple weeks, so I'll find out and get back to you in a few weeks.
-
Thank you, Genevieve.
-
Hi Gus Fridell,
I have followed up with my co-worker and have some notes you should consider:
- The data sources should be fine, you're using the most up to date version of the germline data sources.
- Our data sources all have hg38 and GRCH38 using "chr1", not "1". So the reference genome you initially used for calling might be different than the data sources, which is why you are running into this issue.
- Take a look at the reference you used for the initial calling and either do a LiftOver or you can create funcotator data sources for the reference genome used.
Let me know if you have any other questions.
Best,
Genevieve
Please sign in to leave a comment.
6 comments