VCF input example
AnsweredHi, I'm using gatk 4.2.6.1 to create a MAF file from vcf of variants. My problem is that many fields in the output MAF file are empty (such as - t_ref_count, t_alt_count, tumor_sample_barcode..). I was wondering if maybe the problem could be in my vcf input file, but I couldn't find an example on this website (I found the link to the https://samtools.github.io/hts-specs/VCFv4.2.pdf but couldn't find a solution there). Is there an example vcf file you could provide me with?
Additionally, I saw in the log that there might be a problem with the dbsnp annotations in my vcf file, but I couldn't figure this out again.
The command I'm using is:
gatk Funcotator --variant somatics.maf --reference Homo_sapiens_assembly19.fasta --ref-version hg19
--data-sources-path funcotator_dataSources.v1.7.20200521g --output somatics.maf --output-file-format MAF
--java-options '-Xmx6G' --force-b37-to-hg19-reference-contig-conversion --QUIET true
and this is the log:
Using GATK jar /Local/md_keren/anaconda3/share/gatk4-4.2.6.1-1/gatk-package-4.2.6.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx6G -jar /Local/md_keren/anaconda3/share/gatk4-4.2.6.1-1/gatk-package-4.2.6.1-local.jar Funcotator --variant /storage/md_keren/noamrud/RNA-Mutect_WMN_data/PAAD_test/output/11_new_S2_somatics.vcf --reference /storage/md_keren/noamrud/RNA-Mutect_WMN_data/resource/reference/Homo_sapiens_assembly19.fasta --ref-version hg19 --data-sources-path /storage/md_keren/noamrud/RNA-Mutect_WMN_data/resource/funcotator/funcotator_dataSources.v1.7.20200521g --output /storage/md_keren/noamrud/RNA-Mutect_WMN_data/PAAD_test/output/11_new_S2_somatics.maf --output-file-format MAF --force-b37-to-hg19-reference-contig-conversion --QUIET true
12:26:10.462 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/Local/md_keren/anaconda3/share/gatk4-4.2.6.1-1/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
12:26:10.603 INFO Funcotator - Initializing engine
12:26:10.948 INFO FeatureManager - Using codec VCFCodec to read file file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/PAAD_test/output/11_new_S2_somatics.vcf
12:26:10.960 INFO Funcotator - Done initializing engine
12:26:10.960 INFO Funcotator - Validating sequence dictionaries...
12:26:10.961 INFO Funcotator - Processing user transcripts/defaults/overrides...
12:26:10.961 INFO Funcotator - Initializing data sources...
12:26:10.963 INFO DataSourceUtils - Initializing data sources from directory: /storage/md_keren/noamrud/RNA-Mutect_WMN_data/resource/funcotator/funcotator_dataSources.v1.7.20200521g
12:26:10.963 INFO DataSourceUtils - Data sources version: 1.7.2020521g
12:26:10.963 INFO DataSourceUtils - Data sources source: ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/funcotator/funcotator_dataSources.v1.7.20200521g.tar.gz
12:26:10.963 INFO DataSourceUtils - Data sources alternate source: gs://broad-public-datasets/funcotator/funcotator_dataSources.v1.7.20200521.tar.gz
12:26:10.968 INFO DataSourceUtils - Resolved data source file path: file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/PAAD_test/gencode.v34lift37.annotation.REORDERED.gtf -> file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/resource/funcotator/funcotator_dataSources.v1.7.20200521g/gencode/hg19/gencode.v34lift37.annotation.REORDERED.gtf
12:26:10.968 INFO DataSourceUtils - Resolved data source file path: file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/PAAD_test/gencode.v34lift37.pc_transcripts.fa -> file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/resource/funcotator/funcotator_dataSources.v1.7.20200521g/gencode/hg19/gencode.v34lift37.pc_transcripts.fa
12:26:10.969 INFO DataSourceUtils - Resolved data source file path: file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/PAAD_test/acmg_lof.tsv -> file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/resource/funcotator/funcotator_dataSources.v1.7.20200521g/acmg_lof/hg19/acmg_lof.tsv
12:26:10.970 INFO DataSourceUtils - Resolved data source file path: file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/PAAD_test/acmg59_test_cleaned.txt -> file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/resource/funcotator/funcotator_dataSources.v1.7.20200521g/acmg_rec/hg19/acmg59_test_cleaned.txt
12:26:10.971 INFO DataSourceUtils - Resolved data source file path: file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/PAAD_test/clinvar_20180401.vcf -> file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/resource/funcotator/funcotator_dataSources.v1.7.20200521g/clinvar/hg19/clinvar_20180401.vcf
12:26:10.973 INFO DataSourceUtils - Resolved data source file path: file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/PAAD_test/LMM_Path_LP_VUS5-variants-6-12-18.sorted.vcf -> file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/resource/funcotator/funcotator_dataSources.v1.7.20200521g/lmm_known/hg19/LMM_Path_LP_VUS5-variants-6-12-18.sorted.vcf
12:26:10.973 INFO Funcotator - Finalizing data sources (this step can be long if data sources are cloud-based)...
12:26:10.973 INFO DataSourceUtils - Resolved data source file path: file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/PAAD_test/gencode.v34lift37.annotation.REORDERED.gtf -> file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/resource/funcotator/funcotator_dataSources.v1.7.20200521g/gencode/hg19/gencode.v34lift37.annotation.REORDERED.gtf
12:26:10.973 INFO DataSourceUtils - Setting lookahead cache for data source: Gencode : 100000
12:26:10.977 INFO FeatureManager - Using codec GencodeGtfCodec to read file file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/resource/funcotator/funcotator_dataSources.v1.7.20200521g/gencode/hg19/gencode.v34lift37.annotation.REORDERED.gtf
12:26:11.004 INFO DataSourceUtils - Resolved data source file path: file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/PAAD_test/gencode.v34lift37.pc_transcripts.fa -> file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/resource/funcotator/funcotator_dataSources.v1.7.20200521g/gencode/hg19/gencode.v34lift37.pc_transcripts.fa
12:26:15.057 INFO DataSourceUtils - Resolved data source file path: file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/PAAD_test/acmg_lof.tsv -> file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/resource/funcotator/funcotator_dataSources.v1.7.20200521g/acmg_lof/hg19/acmg_lof.tsv
12:26:15.060 INFO DataSourceUtils - Resolved data source file path: file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/PAAD_test/acmg59_test_cleaned.txt -> file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/resource/funcotator/funcotator_dataSources.v1.7.20200521g/acmg_rec/hg19/acmg59_test_cleaned.txt
12:26:15.061 INFO DataSourceUtils - Resolved data source file path: file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/PAAD_test/clinvar_20180401.vcf -> file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/resource/funcotator/funcotator_dataSources.v1.7.20200521g/clinvar/hg19/clinvar_20180401.vcf
12:26:15.061 INFO DataSourceUtils - Setting lookahead cache for data source: ClinVar_VCF : 100000
12:26:15.065 INFO FeatureManager - Using codec VCFCodec to read file file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/resource/funcotator/funcotator_dataSources.v1.7.20200521g/clinvar/hg19/clinvar_20180401.vcf
12:26:15.105 INFO DataSourceUtils - Resolved data source file path: file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/PAAD_test/clinvar_20180401.vcf -> file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/resource/funcotator/funcotator_dataSources.v1.7.20200521g/clinvar/hg19/clinvar_20180401.vcf
12:26:15.142 INFO FeatureManager - Using codec VCFCodec to read file file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/resource/funcotator/funcotator_dataSources.v1.7.20200521g/clinvar/hg19/clinvar_20180401.vcf
12:26:15.178 INFO DataSourceUtils - Resolved data source file path: file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/PAAD_test/LMM_Path_LP_VUS5-variants-6-12-18.sorted.vcf -> file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/resource/funcotator/funcotator_dataSources.v1.7.20200521g/lmm_known/hg19/LMM_Path_LP_VUS5-variants-6-12-18.sorted.vcf
12:26:15.178 INFO DataSourceUtils - Setting lookahead cache for data source: LMMKnown : 100000
12:26:15.181 INFO FeatureManager - Using codec VCFCodec to read file file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/resource/funcotator/funcotator_dataSources.v1.7.20200521g/lmm_known/hg19/LMM_Path_LP_VUS5-variants-6-12-18.sorted.vcf
12:26:15.185 INFO DataSourceUtils - Resolved data source file path: file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/PAAD_test/LMM_Path_LP_VUS5-variants-6-12-18.sorted.vcf -> file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/resource/funcotator/funcotator_dataSources.v1.7.20200521g/lmm_known/hg19/LMM_Path_LP_VUS5-variants-6-12-18.sorted.vcf
12:26:15.189 INFO FeatureManager - Using codec VCFCodec to read file file:///storage/md_keren/noamrud/RNA-Mutect_WMN_data/resource/funcotator/funcotator_dataSources.v1.7.20200521g/lmm_known/hg19/LMM_Path_LP_VUS5-variants-6-12-18.sorted.vcf
12:26:15.192 INFO Funcotator - Initializing Funcotator Engine...
12:26:15.193 INFO FuncotatorEngine - Forcing B37 -> HG19 Variant conversion.
12:26:15.193 WARN FuncotatorEngine - WARNING: You are using B37 as a reference. Funcotator will convert your variants to GRCh37, and this will be fine in the vast majority of cases. There MAY be some errors (e.g. in the Y chromosome, but possibly in other places as well) due to changes between the two references.
12:26:15.193 INFO Funcotator - Creating a MAF file for output: file:/storage/md_keren/noamrud/RNA-Mutect_WMN_data/PAAD_test/output/11_new_S2_somatics.maf
12:26:15.203 INFO ProgressMeter - Starting traversal
12:26:15.203 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
12:26:15.309 WARN MafOutputRenderer - No dbSNP annotations exist for this variant. Cannot render the dbSNP fields in the MAF. These fields will not be correct. [VC Unknown @ chr1:949780 Q. of type=SNP alleles=[C*, T] attr={} GT=[] filters=
12:26:15.358 WARN MafOutputRenderer - No dbSNP annotations exist for this variant. Cannot render the dbSNP fields in the MAF. These fields will not be correct. [VC Unknown @ chr4:668083 Q. of type=SNP alleles=[G*, A] attr={} GT=[] filters=
12:26:15.376 WARN MafOutputRenderer - No dbSNP annotations exist for this variant. Cannot render the dbSNP fields in the MAF. These fields will not be correct. [VC Unknown @ chr5:16453126 Q. of type=SNP alleles=[G*, A] attr={} GT=[] filters=
12:26:15.430 WARN MafOutputRenderer - No dbSNP annotations exist for this variant. Cannot render the dbSNP fields in the MAF. These fields will not be correct. [VC Unknown @ chr7:6442134 Q. of type=SNP alleles=[A*, G] attr={} GT=[] filters=
12:26:15.620 WARN MafOutputRenderer - No dbSNP annotations exist for this variant. Cannot render the dbSNP fields in the MAF. These fields will not be correct. [VC Unknown @ chr7:94293573 Q. of type=SNP alleles=[T*, A] attr={} GT=[] filters=
12:26:15.821 WARN MafOutputRenderer - No dbSNP annotations exist for this variant. Cannot render the dbSNP fields in the MAF. These fields will not be correct. [VC Unknown @ chr7:105732273 Q. of type=SNP alleles=[A*, T] attr={} GT=[] filters=
12:26:16.147 WARN MafOutputRenderer - No dbSNP annotations exist for this variant. Cannot render the dbSNP fields in the MAF. These fields will not be correct. [VC Unknown @ chr9:130476414 Q. of type=SNP alleles=[G*, A] attr={} GT=[] filters=
12:26:16.613 WARN MafOutputRenderer - No dbSNP annotations exist for this variant. Cannot render the dbSNP fields in the MAF. These fields will not be correct. [VC Unknown @ chr9:140328777 Q. of type=SNP alleles=[C*, T] attr={} GT=[] filters=
12:26:16.784 WARN MafOutputRenderer - No dbSNP annotations exist for this variant. Cannot render the dbSNP fields in the MAF. These fields will not be correct. [VC Unknown @ chr10:75008731 Q. of type=SNP alleles=[T*, C] attr={} GT=[] filters=
12:26:17.150 WARN MafOutputRenderer - No dbSNP annotations exist for this variant. Cannot render the dbSNP fields in the MAF. These fields will not be correct. [VC Unknown @ chr11:62457792 Q. of type=SNP alleles=[G*, C] attr={} GT=[] filters=
12:26:17.352 WARN MafOutputRenderer - No dbSNP annotations exist for this variant. Cannot render the dbSNP fields in the MAF. These fields will not be correct. [VC Unknown @ chr12:95365146 Q. of type=SNP alleles=[G*, A] attr={} GT=[] filters=
12:26:17.354 INFO ProgressMeter - unmapped 0.0 11 306.8
12:26:17.354 INFO ProgressMeter - Traversal complete. Processed 11 total variants in 0.0 minutes.
12:26:17.355 INFO VcfFuncotationFactory - ClinVar_VCF 20180401 cache hits/total: 0/0
12:26:17.355 INFO VcfFuncotationFactory - LMMKnown 20180612 cache hits/total: 0/0
12:26:17.356 INFO Funcotator - Shutting down engine
Tool returned:
true
Finished!
-
Hi Noam Rudberg,
Thank you for writing to the GATK forum. I hope that we can help you sort this out.
Firstly, could you please clarify if all t_ref_count and t_alt_count fields are empty?
For some inputs, the input VCF likely does not have the fields entered in the header (e.g. tumor_sample_barcode). Funcotator allows users to set default annotations for variants that would fill in/annotate some of these fields for you if they weren't already. See argument below.
--annotation-default
For more specifics on Funcotator and its arguments, click here.
Please let me know about the t_ref_count and t_alt_count fields. Feel free to reach out with any other questions that pop up!
Best,
Anthony
Please sign in to leave a comment.
1 comment