Funcotator - all IGR classification
AnsweredHi, I saw this post but wasn't sure it's the same problem.
I'm using gatk 4.2.6.1 using this command:
gatk Funcotator --variant new_vars.vcf --reference Homo_sapiens_assembly19.fasta --ref-version hg19 --data-sources-path funcotator_dataSources.v1.7.20200521g --output variants.funcotated.vcf --output-file-format VCF
and I get the error of "only IGR were produced".
I should mention that I have mixed type variants (somatic and germline) but this warning occurred using both data sources.
I downloaded the data sources using the tool in the tutorial.
10:52:32.698 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/student/Downloads/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
10:52:32.921 INFO Funcotator - ------------------------------------------------------------
10:52:32.921 INFO Funcotator - The Genome Analysis Toolkit (GATK) v4.2.6.1
10:52:32.922 INFO Funcotator - For support and documentation go to https://software.broadinstitute.org/gatk/
10:52:32.922 INFO Funcotator - Executing as student@ubuntu18 on Linux v4.15.0-60-generic amd64
10:52:32.922 INFO Funcotator - Java runtime: OpenJDK 64-Bit Server VM v11.0.15+10-Ubuntu-0ubuntu0.18.04.1
10:52:32.922 INFO Funcotator - Start Date/Time: May 25, 2022 at 10:52:32 AM UTC
10:52:32.922 INFO Funcotator - ------------------------------------------------------------
10:52:32.923 INFO Funcotator - ------------------------------------------------------------
10:52:32.923 INFO Funcotator - HTSJDK Version: 2.24.1
10:52:32.923 INFO Funcotator - Picard Version: 2.27.1
10:52:32.924 INFO Funcotator - Built for Spark Version: 2.4.5
10:52:32.924 INFO Funcotator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
10:52:32.924 INFO Funcotator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:52:32.924 INFO Funcotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:52:32.924 INFO Funcotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:52:32.924 INFO Funcotator - Deflater: IntelDeflater
10:52:32.925 INFO Funcotator - Inflater: IntelInflater
10:52:32.925 INFO Funcotator - GCS max retries/reopens: 20
10:52:32.925 INFO Funcotator - Requester pays: disabled
10:52:32.925 INFO Funcotator - Initializing engine
10:52:33.136 INFO FeatureManager - Using codec VCFCodec to read file file:///home/student/Downloads/gatk-4.2.6.1/new_vars.vcf
10:52:33.159 INFO Funcotator - Done initializing engine
10:52:33.159 INFO Funcotator - Validating sequence dictionaries...
10:52:33.160 INFO Funcotator - Processing user transcripts/defaults/overrides...
10:52:33.161 INFO Funcotator - Initializing data sources...
10:52:33.163 INFO DataSourceUtils - Initializing data sources from directory: funcotator_dataSources.v1.7.20200521g
10:52:33.165 INFO DataSourceUtils - Data sources version: 1.7.2020521g
10:52:33.165 INFO DataSourceUtils - Data sources source: ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/funcotator/funcotator_dataSources.v1.7.20200521g.tar.gz
10:52:33.165 INFO DataSourceUtils - Data sources alternate source: gs://broad-public-datasets/funcotator/funcotator_dataSources.v1.7.20200521.tar.gz
10:52:33.178 INFO DataSourceUtils - Resolved data source file path: file:///home/student/Downloads/gatk-4.2.6.1/gencode.v34lift37.annotation.REORDERED.gtf -> file:///home/student/Downloads/gatk-4.2.6.1/funcotator_dataSources.v1.7.20200521g/gencode/hg19/gencode.v34lift37.annotation.REORDERED.gtf
10:52:33.179 INFO DataSourceUtils - Resolved data source file path: file:///home/student/Downloads/gatk-4.2.6.1/gencode.v34lift37.pc_transcripts.fa -> file:///home/student/Downloads/gatk-4.2.6.1/funcotator_dataSources.v1.7.20200521g/gencode/hg19/gencode.v34lift37.pc_transcripts.fa
10:52:33.180 INFO DataSourceUtils - Resolved data source file path: file:///home/student/Downloads/gatk-4.2.6.1/acmg59_test_cleaned.txt -> file:///home/student/Downloads/gatk-4.2.6.1/funcotator_dataSources.v1.7.20200521g/acmg_rec/hg19/acmg59_test_cleaned.txt
10:52:34.644 INFO DataSourceUtils - Resolved data source file path: file:///home/student/Downloads/gatk-4.2.6.1/clinvar_20180401.vcf -> file:///home/student/Downloads/gatk-4.2.6.1/funcotator_dataSources.v1.7.20200521g/clinvar/hg19/clinvar_20180401.vcf
10:52:34.644 INFO DataSourceUtils - Resolved data source file path: file:///home/student/Downloads/gatk-4.2.6.1/acmg_lof.tsv -> file:///home/student/Downloads/gatk-4.2.6.1/funcotator_dataSources.v1.7.20200521g/acmg_lof/hg19/acmg_lof.tsv
10:52:34.645 INFO DataSourceUtils - Resolved data source file path: file:///home/student/Downloads/gatk-4.2.6.1/LMM_Path_LP_VUS5-variants-6-12-18.sorted.vcf -> file:///home/student/Downloads/gatk-4.2.6.1/funcotator_dataSources.v1.7.20200521g/lmm_known/hg19/LMM_Path_LP_VUS5-variants-6-12-18.sorted.vcf
10:52:35.247 INFO Funcotator - Finalizing data sources (this step can be long if data sources are cloud-based)...
10:52:35.248 INFO DataSourceUtils - Resolved data source file path: file:///home/student/Downloads/gatk-4.2.6.1/gencode.v34lift37.annotation.REORDERED.gtf -> file:///home/student/Downloads/gatk-4.2.6.1/funcotator_dataSources.v1.7.20200521g/gencode/hg19/gencode.v34lift37.annotation.REORDERED.gtf
10:52:35.248 INFO DataSourceUtils - Setting lookahead cache for data source: Gencode : 100000
10:52:35.252 INFO FeatureManager - Using codec GencodeGtfCodec to read file file:///home/student/Downloads/gatk-4.2.6.1/funcotator_dataSources.v1.7.20200521g/gencode/hg19/gencode.v34lift37.annotation.REORDERED.gtf
10:52:35.318 INFO DataSourceUtils - Resolved data source file path: file:///home/student/Downloads/gatk-4.2.6.1/gencode.v34lift37.pc_transcripts.fa -> file:///home/student/Downloads/gatk-4.2.6.1/funcotator_dataSources.v1.7.20200521g/gencode/hg19/gencode.v34lift37.pc_transcripts.fa
10:52:40.205 INFO DataSourceUtils - Resolved data source file path: file:///home/student/Downloads/gatk-4.2.6.1/acmg59_test_cleaned.txt -> file:///home/student/Downloads/gatk-4.2.6.1/funcotator_dataSources.v1.7.20200521g/acmg_rec/hg19/acmg59_test_cleaned.txt
10:52:40.208 INFO DataSourceUtils - Setting lookahead cache for data source: gnomAD_exome : 100000
10:52:45.627 INFO FeatureManager - Using codec VCFCodec to read file gs://broad-public-datasets/funcotator/gnomAD_2.1_VCF_INFO_AF_Only/hg19/gnomad.exomes.r2.1.sites.INFO_ANNOTATIONS_FIXED.vcf.bgz
10:53:08.830 INFO FeatureManager - Using codec VCFCodec to read file gs://broad-public-datasets/funcotator/gnomAD_2.1_VCF_INFO_AF_Only/hg19/gnomad.exomes.r2.1.sites.INFO_ANNOTATIONS_FIXED.vcf.bgz
10:53:10.502 INFO DataSourceUtils - Resolved data source file path: file:///home/student/Downloads/gatk-4.2.6.1/clinvar_20180401.vcf -> file:///home/student/Downloads/gatk-4.2.6.1/funcotator_dataSources.v1.7.20200521g/clinvar/hg19/clinvar_20180401.vcf
10:53:10.502 INFO DataSourceUtils - Setting lookahead cache for data source: ClinVar_VCF : 100000
10:53:10.514 INFO FeatureManager - Using codec VCFCodec to read file file:///home/student/Downloads/gatk-4.2.6.1/funcotator_dataSources.v1.7.20200521g/clinvar/hg19/clinvar_20180401.vcf
10:53:10.601 INFO DataSourceUtils - Resolved data source file path: file:///home/student/Downloads/gatk-4.2.6.1/clinvar_20180401.vcf -> file:///home/student/Downloads/gatk-4.2.6.1/funcotator_dataSources.v1.7.20200521g/clinvar/hg19/clinvar_20180401.vcf
10:53:10.661 INFO FeatureManager - Using codec VCFCodec to read file file:///home/student/Downloads/gatk-4.2.6.1/funcotator_dataSources.v1.7.20200521g/clinvar/hg19/clinvar_20180401.vcf
10:53:10.791 INFO DataSourceUtils - Resolved data source file path: file:///home/student/Downloads/gatk-4.2.6.1/acmg_lof.tsv -> file:///home/student/Downloads/gatk-4.2.6.1/funcotator_dataSources.v1.7.20200521g/acmg_lof/hg19/acmg_lof.tsv
10:53:10.792 INFO DataSourceUtils - Resolved data source file path: file:///home/student/Downloads/gatk-4.2.6.1/LMM_Path_LP_VUS5-variants-6-12-18.sorted.vcf -> file:///home/student/Downloads/gatk-4.2.6.1/funcotator_dataSources.v1.7.20200521g/lmm_known/hg19/LMM_Path_LP_VUS5-variants-6-12-18.sorted.vcf
10:53:10.792 INFO DataSourceUtils - Setting lookahead cache for data source: LMMKnown : 100000
10:53:10.794 INFO FeatureManager - Using codec VCFCodec to read file file:///home/student/Downloads/gatk-4.2.6.1/funcotator_dataSources.v1.7.20200521g/lmm_known/hg19/LMM_Path_LP_VUS5-variants-6-12-18.sorted.vcf
10:53:10.799 INFO DataSourceUtils - Resolved data source file path: file:///home/student/Downloads/gatk-4.2.6.1/LMM_Path_LP_VUS5-variants-6-12-18.sorted.vcf -> file:///home/student/Downloads/gatk-4.2.6.1/funcotator_dataSources.v1.7.20200521g/lmm_known/hg19/LMM_Path_LP_VUS5-variants-6-12-18.sorted.vcf
10:53:10.813 INFO FeatureManager - Using codec VCFCodec to read file file:///home/student/Downloads/gatk-4.2.6.1/funcotator_dataSources.v1.7.20200521g/lmm_known/hg19/LMM_Path_LP_VUS5-variants-6-12-18.sorted.vcf
10:53:10.827 INFO DataSourceUtils - Setting lookahead cache for data source: gnomAD_genome : 100000
10:53:16.074 INFO FeatureManager - Using codec VCFCodec to read file gs://broad-public-datasets/funcotator/gnomAD_2.1_VCF_INFO_AF_Only/hg19/gnomad.genomes.r2.1.sites.INFO_ANNOTATIONS_FIXED.vcf.bgz
10:53:33.486 INFO FeatureManager - Using codec VCFCodec to read file gs://broad-public-datasets/funcotator/gnomAD_2.1_VCF_INFO_AF_Only/hg19/gnomad.genomes.r2.1.sites.INFO_ANNOTATIONS_FIXED.vcf.bgz
10:53:35.773 INFO Funcotator - Initializing Funcotator Engine...
10:53:35.781 INFO FuncotatorUtils - Input VCF has been determined to not based on b37:
10:53:35.781 INFO FuncotatorUtils - The following contigs are present in b37 and missing in the input VCF sequence dictionary:
10:53:35.781 INFO FuncotatorUtils - GL000207.1 (len=4262,assembly=GRCh37)
10:53:35.781 INFO FuncotatorUtils - GL000226.1 (len=15008,assembly=GRCh37)
10:53:35.782 INFO FuncotatorUtils - GL000229.1 (len=19913,assembly=GRCh37)
10:53:35.782 INFO FuncotatorUtils - GL000231.1 (len=27386,assembly=GRCh37)
10:53:35.782 INFO FuncotatorUtils - GL000210.1 (len=27682,assembly=GRCh37)
10:53:35.782 INFO FuncotatorUtils - GL000239.1 (len=33824,assembly=GRCh37)
10:53:35.782 INFO FuncotatorUtils - GL000235.1 (len=34474,assembly=GRCh37)
10:53:35.782 INFO FuncotatorUtils - GL000201.1 (len=36148,assembly=GRCh37)
10:53:35.782 INFO FuncotatorUtils - GL000247.1 (len=36422,assembly=GRCh37)
10:53:35.782 INFO FuncotatorUtils - GL000245.1 (len=36651,assembly=GRCh37)
10:53:35.782 INFO FuncotatorUtils - GL000197.1 (len=37175,assembly=GRCh37)
10:53:35.782 INFO FuncotatorUtils - GL000203.1 (len=37498,assembly=GRCh37)
10:53:35.782 INFO FuncotatorUtils - GL000246.1 (len=38154,assembly=GRCh37)
10:53:35.782 INFO FuncotatorUtils - GL000249.1 (len=38502,assembly=GRCh37)
10:53:35.782 INFO FuncotatorUtils - GL000196.1 (len=38914,assembly=GRCh37)
10:53:35.782 INFO FuncotatorUtils - GL000248.1 (len=39786,assembly=GRCh37)
10:53:35.782 INFO FuncotatorUtils - GL000244.1 (len=39929,assembly=GRCh37)
10:53:35.783 INFO FuncotatorUtils - GL000238.1 (len=39939,assembly=GRCh37)
10:53:35.783 INFO FuncotatorUtils - GL000202.1 (len=40103,assembly=GRCh37)
10:53:35.783 INFO FuncotatorUtils - GL000234.1 (len=40531,assembly=GRCh37)
10:53:35.783 INFO FuncotatorUtils - GL000232.1 (len=40652,assembly=GRCh37)
10:53:35.783 INFO FuncotatorUtils - GL000206.1 (len=41001,assembly=GRCh37)
10:53:35.783 INFO FuncotatorUtils - GL000240.1 (len=41933,assembly=GRCh37)
10:53:35.783 INFO FuncotatorUtils - GL000236.1 (len=41934,assembly=GRCh37)
10:53:35.783 INFO FuncotatorUtils - GL000241.1 (len=42152,assembly=GRCh37)
10:53:35.783 INFO FuncotatorUtils - GL000243.1 (len=43341,assembly=GRCh37)
10:53:35.783 INFO FuncotatorUtils - GL000242.1 (len=43523,assembly=GRCh37)
10:53:35.783 INFO FuncotatorUtils - GL000230.1 (len=43691,assembly=GRCh37)
10:53:35.784 INFO FuncotatorUtils - GL000237.1 (len=45867,assembly=GRCh37)
10:53:35.784 INFO FuncotatorUtils - GL000233.1 (len=45941,assembly=GRCh37)
10:53:35.784 INFO FuncotatorUtils - GL000204.1 (len=81310,assembly=GRCh37)
10:53:35.784 INFO FuncotatorUtils - GL000198.1 (len=90085,assembly=GRCh37)
10:53:35.784 INFO FuncotatorUtils - GL000208.1 (len=92689,assembly=GRCh37)
10:53:35.784 INFO FuncotatorUtils - GL000191.1 (len=106433,assembly=GRCh37)
10:53:35.784 INFO FuncotatorUtils - GL000227.1 (len=128374,assembly=GRCh37)
10:53:35.784 INFO FuncotatorUtils - GL000228.1 (len=129120,assembly=GRCh37)
10:53:35.784 INFO FuncotatorUtils - GL000214.1 (len=137718,assembly=GRCh37)
10:53:35.784 INFO FuncotatorUtils - GL000221.1 (len=155397,assembly=GRCh37)
10:53:35.784 INFO FuncotatorUtils - GL000209.1 (len=159169,assembly=GRCh37)
10:53:35.784 INFO FuncotatorUtils - GL000218.1 (len=161147,assembly=GRCh37)
10:53:35.784 INFO FuncotatorUtils - GL000220.1 (len=161802,assembly=GRCh37)
10:53:35.784 INFO FuncotatorUtils - GL000213.1 (len=164239,assembly=GRCh37)
10:53:35.784 INFO FuncotatorUtils - GL000211.1 (len=166566,assembly=GRCh37)
10:53:35.784 INFO FuncotatorUtils - GL000199.1 (len=169874,assembly=GRCh37)
10:53:35.784 INFO FuncotatorUtils - GL000217.1 (len=172149,assembly=GRCh37)
10:53:35.784 INFO FuncotatorUtils - GL000216.1 (len=172294,assembly=GRCh37)
10:53:35.784 INFO FuncotatorUtils - GL000215.1 (len=172545,assembly=GRCh37)
10:53:35.784 INFO FuncotatorUtils - GL000205.1 (len=174588,assembly=GRCh37)
10:53:35.784 INFO FuncotatorUtils - GL000219.1 (len=179198,assembly=GRCh37)
10:53:35.784 INFO FuncotatorUtils - GL000224.1 (len=179693,assembly=GRCh37)
10:53:35.784 INFO FuncotatorUtils - GL000223.1 (len=180455,assembly=GRCh37)
10:53:35.784 INFO FuncotatorUtils - GL000195.1 (len=182896,assembly=GRCh37)
10:53:35.784 INFO FuncotatorUtils - GL000212.1 (len=186858,assembly=GRCh37)
10:53:35.784 INFO FuncotatorUtils - GL000222.1 (len=186861,assembly=GRCh37)
10:53:35.785 INFO FuncotatorUtils - GL000200.1 (len=187035,assembly=GRCh37)
10:53:35.785 INFO FuncotatorUtils - GL000193.1 (len=189789,assembly=GRCh37)
10:53:35.785 INFO FuncotatorUtils - GL000194.1 (len=191469,assembly=GRCh37)
10:53:35.785 INFO FuncotatorUtils - GL000225.1 (len=211173,assembly=GRCh37)
10:53:35.785 INFO FuncotatorUtils - GL000192.1 (len=547496,assembly=GRCh37)
10:53:35.785 INFO FuncotatorUtils - NC_007605 (len=171823,assembly=NC_007605.1)
10:53:35.785 INFO FuncotatorUtils - The following contigs are present in both b37 and the input VCF sequence dictionary, but have conflicting length information:
10:53:35.785 INFO FuncotatorUtils - 22 (len=0,assembly=null): VCF Length: 0, b37 Length: 51304566
10:53:35.785 INFO FuncotatorUtils - X (len=0,assembly=null): VCF Length: 0, b37 Length: 155270560
10:53:35.785 INFO FuncotatorUtils - Y (len=0,assembly=null): VCF Length: 0, b37 Length: 59373566
10:53:35.785 INFO FuncotatorUtils - 10 (len=0,assembly=null): VCF Length: 0, b37 Length: 135534747
10:53:35.786 INFO FuncotatorUtils - 11 (len=0,assembly=null): VCF Length: 0, b37 Length: 135006516
10:53:35.787 INFO FuncotatorUtils - 12 (len=0,assembly=null): VCF Length: 0, b37 Length: 133851895
10:53:35.787 INFO FuncotatorUtils - 13 (len=0,assembly=null): VCF Length: 0, b37 Length: 115169878
10:53:35.787 INFO FuncotatorUtils - 14 (len=0,assembly=null): VCF Length: 0, b37 Length: 107349540
10:53:35.787 INFO FuncotatorUtils - 15 (len=0,assembly=null): VCF Length: 0, b37 Length: 102531392
10:53:35.787 INFO FuncotatorUtils - 16 (len=0,assembly=null): VCF Length: 0, b37 Length: 90354753
10:53:35.787 INFO FuncotatorUtils - 17 (len=0,assembly=null): VCF Length: 0, b37 Length: 81195210
10:53:35.787 INFO FuncotatorUtils - 18 (len=0,assembly=null): VCF Length: 0, b37 Length: 78077248
10:53:35.787 INFO FuncotatorUtils - MT (len=0,assembly=null): VCF Length: 0, b37 Length: 16569
10:53:35.787 INFO FuncotatorUtils - 19 (len=0,assembly=null): VCF Length: 0, b37 Length: 59128983
10:53:35.787 INFO FuncotatorUtils - 1 (len=0,assembly=null): VCF Length: 0, b37 Length: 249250621
10:53:35.788 INFO FuncotatorUtils - 2 (len=0,assembly=null): VCF Length: 0, b37 Length: 243199373
10:53:35.788 INFO FuncotatorUtils - 3 (len=0,assembly=null): VCF Length: 0, b37 Length: 198022430
10:53:35.788 INFO FuncotatorUtils - 4 (len=0,assembly=null): VCF Length: 0, b37 Length: 191154276
10:53:35.788 INFO FuncotatorUtils - 5 (len=0,assembly=null): VCF Length: 0, b37 Length: 180915260
10:53:35.788 INFO FuncotatorUtils - 6 (len=0,assembly=null): VCF Length: 0, b37 Length: 171115067
10:53:35.788 INFO FuncotatorUtils - 7 (len=0,assembly=null): VCF Length: 0, b37 Length: 159138663
10:53:35.788 INFO FuncotatorUtils - 8 (len=0,assembly=null): VCF Length: 0, b37 Length: 146364022
10:53:35.788 INFO FuncotatorUtils - 9 (len=0,assembly=null): VCF Length: 0, b37 Length: 141213431
10:53:35.788 INFO FuncotatorUtils - 20 (len=0,assembly=null): VCF Length: 0, b37 Length: 63025520
10:53:35.788 INFO FuncotatorUtils - 21 (len=0,assembly=null): VCF Length: 0, b37 Length: 48129895
10:53:35.788 INFO FuncotatorEngine - Using given VCF and Reference. No conversion required.
10:53:35.788 INFO Funcotator - Creating a VCF file for output: file:/home/student/Downloads/gatk-4.2.6.1/variants.funcotated.vcf
10:53:35.832 INFO ProgressMeter - Starting traversal
10:53:35.832 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
10:53:53.962 INFO ProgressMeter - unmapped 0.3 173 572.5
10:53:53.966 INFO ProgressMeter - Traversal complete. Processed 173 total variants in 0.3 minutes.
10:53:53.966 WARN Funcotator - ================================================================================
10:53:53.967 WARN Funcotator - _ _ _ __ __ _ _ _ _
10:53:53.967 WARN Funcotator - | || || | \ \ / /_ _ _ __ _ __ (_)_ __ __ _ | || || |
10:53:53.967 WARN Funcotator - | || || | \ \ /\ / / _` | '__| '_ \| | '_ \ / _` | | || || |
10:53:53.967 WARN Funcotator - |_||_||_| \ \V V / (_| | | | | | | | | | | (_| | |_||_||_|
10:53:53.967 WARN Funcotator - (_)(_)(_) \_/\_/ \__,_|_| |_| |_|_|_| |_|\__, | (_)(_)(_)
10:53:53.967 WARN Funcotator - |___/
10:53:53.967 WARN Funcotator - --------------------------------------------------------------------------------
10:53:53.967 WARN Funcotator - Only IGRs were produced for this dataset. This STRONGLY indicates that this
10:53:53.967 WARN Funcotator - run was misconfigured.
10:53:53.968 WARN Funcotator - You MUST check your data sources to make sure they are correct for these data.
10:53:53.968 WARN Funcotator - ================================================================================
10:53:53.968 INFO VcfFuncotationFactory - ClinVar_VCF 20180401 cache hits/total: 0/0
10:53:53.968 INFO VcfFuncotationFactory - LMMKnown 20180612 cache hits/total: 0/0
10:53:53.968 INFO VcfFuncotationFactory - gnomAD_exome 2.1 cache hits/total: 0/11
10:53:53.968 INFO VcfFuncotationFactory - gnomAD_genome 2.1 cache hits/total: 0/131
10:53:53.985 INFO Funcotator - Shutting down engine
[May 25, 2022 at 10:53:58 AM UTC] org.broadinstitute.hellbender.tools.funcotator.Funcotator done. Elapsed time: 1.43 minutes.
Runtime.totalMemory()=832569344
Tool returned:
true
-
Hi Noam Rudberg,
Yes, this looks like the same issue from the previous post you linked. I would recommend looking closer into your VCF file to verify that it matches the reference version you are using for your data sources, because Funcotator did not find any matches.
Let me know if you have any other questions.
Best,
Genevieve
-
Hi Genevieve,
Thanks for your response.
I'm not sure I fully understand what your meaning is by looking into the VCF file. I used the ValidateVariants tool this way:
gatk ValidateVariants -R Homo_sapiens_assembly19.fasta -V unique_variants.vcf
and got this output which seems fine?
16:19:09.125 INFO ValidateVariants - ------------------------------------------------------------
16:19:09.126 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.2.6.1
16:19:09.126 INFO ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
16:19:09.127 INFO ValidateVariants - Executing as student@ubuntu18 on Linux v4.15.0-60-generic amd64
16:19:09.127 INFO ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v11.0.15+10-Ubuntu-0ubuntu0.18.04.1
16:19:09.128 INFO ValidateVariants - Start Date/Time: May 25, 2022 at 4:19:08 PM UTC
16:19:09.128 INFO ValidateVariants - ------------------------------------------------------------
16:19:09.129 INFO ValidateVariants - ------------------------------------------------------------
16:19:09.129 INFO ValidateVariants - HTSJDK Version: 2.24.1
16:19:09.130 INFO ValidateVariants - Picard Version: 2.27.1
16:19:09.130 INFO ValidateVariants - Built for Spark Version: 2.4.5
16:19:09.130 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:19:09.130 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:19:09.130 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:19:09.131 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:19:09.131 INFO ValidateVariants - Deflater: IntelDeflater
16:19:09.131 INFO ValidateVariants - Inflater: IntelInflater
16:19:09.131 INFO ValidateVariants - GCS max retries/reopens: 20
16:19:09.131 INFO ValidateVariants - Requester pays: disabled
16:19:09.132 INFO ValidateVariants - Initializing engine
16:19:09.341 INFO FeatureManager - Using codec VCFCodec to read file file:///home/student/Downloads/gatk-4.2.6.1/unique_variants.vcf
16:19:09.370 INFO ValidateVariants - Done initializing engine
16:19:09.370 WARN ValidateVariants - IDS validation cannot be done because no DBSNP file was provided
16:19:09.370 WARN ValidateVariants - Other possible validations will still be performed
16:19:09.375 INFO ProgressMeter - Starting traversal
16:19:09.375 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
16:19:19.406 INFO ProgressMeter - 7:72424622 0.2 537000 3212683.2
16:19:29.406 INFO ProgressMeter - 22:29184280 0.3 1341000 4016974.5
16:19:30.690 INFO ProgressMeter - Y:15363045 0.4 1403333 3950268.8
16:19:30.690 INFO ProgressMeter - Traversal complete. Processed 1403333 total variants in 0.4 minutes.
16:19:30.691 INFO ValidateVariants - Shutting down engine
[May 25, 2022 at 4:19:30 PM UTC] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.37 minutes.
Runtime.totalMemory()=161480704Do you have any other ideas for verifying the match between the files?
Thanks,
Noam
-
Hi Noam Rudberg,
Yes, this tool is a great first step. I noticed though that here you are validating unique_variants.vcf instead of the file you used for Funcotator new_vars.vcf. Could you try this command with new_vars.vcf?
Another troubleshooting option is to manually look at the VCF file and the annotation files to verify that the chromosome naming conventions match and make sure there are matching variants if possible.
Also, I see that in your unique_variants.vcf file, this is what the positions look like: 7:72424622. Generally hg19 has a naming convention of "chr1" instead of "1". This indicates to me that your variants might be an alternative version of hg19 that is different than the Funcotator hg19. You can take a look at this article for more information: https://gatk.broadinstitute.org/hc/en-us/articles/360035890951-Human-genome-reference-builds-GRCh38-or-hg38-b37-hg19
Let me know if you have any further questions.
Best,
Genevieve
-
new_vars.vcf is just a subset of the full VCF file I use to try Funcotator with, to save me some time until it works well :)
Anyway, here's what you asked for:
16:54:29.554 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/student/Downloads/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
16:54:29.806 INFO ValidateVariants - ------------------------------------------------------------
16:54:29.807 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.2.6.1
16:54:29.807 INFO ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
16:54:29.807 INFO ValidateVariants - Executing as student@ubuntu18 on Linux v4.15.0-60-generic amd64
16:54:29.808 INFO ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v11.0.15+10-Ubuntu-0ubuntu0.18.04.1
16:54:29.808 INFO ValidateVariants - Start Date/Time: May 25, 2022 at 4:54:29 PM UTC
16:54:29.808 INFO ValidateVariants - ------------------------------------------------------------
16:54:29.808 INFO ValidateVariants - ------------------------------------------------------------
16:54:29.809 INFO ValidateVariants - HTSJDK Version: 2.24.1
16:54:29.809 INFO ValidateVariants - Picard Version: 2.27.1
16:54:29.809 INFO ValidateVariants - Built for Spark Version: 2.4.5
16:54:29.810 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:54:29.810 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:54:29.810 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:54:29.810 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:54:29.810 INFO ValidateVariants - Deflater: IntelDeflater
16:54:29.810 INFO ValidateVariants - Inflater: IntelInflater
16:54:29.810 INFO ValidateVariants - GCS max retries/reopens: 20
16:54:29.810 INFO ValidateVariants - Requester pays: disabled
16:54:29.811 INFO ValidateVariants - Initializing engine
16:54:30.052 INFO FeatureManager - Using codec VCFCodec to read file file:///home/student/Downloads/gatk-4.2.6.1/new_vars.vcf
16:54:30.071 INFO ValidateVariants - Done initializing engine
16:54:30.072 WARN ValidateVariants - IDS validation cannot be done because no DBSNP file was provided
16:54:30.072 WARN ValidateVariants - Other possible validations will still be performed
16:54:30.080 INFO ProgressMeter - Starting traversal
16:54:30.080 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
16:54:30.125 INFO ProgressMeter - unmapped 0.0 173 235909.1
16:54:30.125 INFO ProgressMeter - Traversal complete. Processed 173 total variants in 0.0 minutes.
16:54:30.125 INFO ValidateVariants - Shutting down engine
[May 25, 2022 at 4:54:30 PM UTC] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=161480704Regarding the chromosome convention: it seemed that in the reference file I'm using the version is GRCh37.
From the .fasta file:
I can easily change the "1" to "chr1" in my VCF file, but not sure it will work with the reference this way?
-
Yeah, it looks like your reference for this VCF file is GRCh37, which is different than hg19. There are probably more differences than just renaming the chromosomes. We recommend using LiftOver to change the reference version of VCF files.
-
Thanks!
On the LiftOver page, there's a "b37tohg38.chain" chain file while there's no mention of b37 on the chain file download page. So I have two questions:
1. Do you know another source of chain files?
2. Did you mean that I should convert my GRCh37 to hg19?
BTW, from this article, it seems that my version is actually b37.. (reference file name + the chromosome naming)
-
- The file you are referring to is just an example, we don't maintain these chain files. Information about our resources can be found on our resource bundle page.
- Yes, you can convert your file or you can re-call your variants with a reference version that will be more compatible with Funcotator. Whatever works best for your goals. You can also create your own Funcotator data sources with the b37 reference.
-
Noam Rudberg actually, I think it's possible to run Funcotator out of the box with b37. Check out this other forum post: https://gatk.broadinstitute.org/hc/en-us/community/posts/360060979451-Funcotator-b37-and-hg19-contig-compatibility-issue
-
Genevieve, thanks a lot!
Adding the --force-b37-to-hg19-reference-contig-conversion flag worked.
-
Great news!
Please sign in to leave a comment.
10 comments