Funcotator missing ClinVar annotations
I am using GATK version 4.5.0.0 and am trying to annotate variants using Funcotator with v1.8 somatic data source. My reads are aligned to hg19.
The output included most annotations without issue, but all ClinVar columns are blank.
I do not face this issue when using the v1.7 data source.
I downloaded the v1.8 data source from this site: https://42basepairs.com/search?query=funcotator_dataSources.v1.8.hg19.20230908s&bucket=gs/broad-public-datasets
The command I used for Funcotator:
~/tools/gatk-4.5.0.0/gatk Funcotator \
--variant tmp.vcf \
--reference ~/references/hs37d6.fa \
--ref-version hg19 \
--data-sources-path ~/references/funcotator_dataSources.v1.8.hg19.20230908s \
--output variants.funcotated.v1.8s..maf \
--output-file-format MAF
Please advise me on how to resolve this issue, as v1.7 ClinVar information is outdated. Thank you!
-
Hi Ong Zhi Xuan
We do not recognize the source of this resource file and we do not have any afiliations to this website. Can you try downloading our resource files using the tool named below?
gatk FuncotatorDataSourceDownloader
Regards.
-
Hi Gökalp Çelik, thank you for your reply. I would need to download the resources locally before using them on my institution's HPC server, as the HPC server does not have Internet access.
Could you suggest an alternative to using gatk FuncotatorDataSourceDownloader?
-
Hi again.
You may still use the tool on a computer with internet connection and then move there resource file that you download to the HPC of your preference.
Alternatively you may use the google cloud bucket for resource files.
https://console.cloud.google.com/storage/browser/broad-public-datasets/funcotator
I hope this helps.
-
Hi!
I meet the Clinvar annotation fields was empty using:
gatk FuncotatorDataSourceDownloader --somatic --validate-integrity --extract-after-download -hg38
Is there any possible solution for this case?
Besides that, there is a lot of UNKNOWN field in my output, is it common?
E.g., Entrez ID, Validation status and etc.
Thanks! -
Hi sernyan lim
Are you using the latest gatk and funcotator resource with it? My tests show that clinvar annotations work fine with version 4.6.1.0.
-
Hi Gökalp Çelik
Yes, I'm using the latest version 4.6.1.0 of gatk and my funcotator data sources were v1.8.hg38.20230908s. Is it the latest funcotator resources ? -
Hi again.
Clinvar annotations will occur only for those that are visible in clinvar source and the current version in the resource is from 20230717. You maybe able to update the source however certain clinvar fields might have been modified since then so pay attention to those annotation tags.
Do you not observe any clinvar entries in the output for those known clinvar included sites?
I am able to see those sites when I check for lines with certain clinvar text such as likely_benign etc..
-
Hi,
Seems like i have the latest version of Funcocator resources.
For the question, i did not observe any of the variant have clinvar annotation, i think it's really empty as non of the line have clinvar text!Thanks for replying fast!
-
Another quick question, why my funcotator resources seems like newer compared to the one you mentioned?
-
What is your clinvar version?
-
Not sure is it what you want but clinvar/hg38/clinvar_20230717_hg38.vcf this is the clinvar vcf i found in my funcotator log.
-
This is what I meant. We are using the same source files.
-
Hmmm.. seems like we are using the same version of everything. Wondering why i have empty Clinvar annotations and a lot of unknown field. Is that any possible solution that i can give it a try?
-
Hi sernyan lim
Looks like I was able to trackdown the problem with clinvar. We use default clinvar VCF files for annotation but looks like clinvar hg38 resource files are also posted with contig names without "chr" prefix therefore Funcotator is unable to address the variant info from the clinvar VCF file if your hg38 contig names are starting with "chr".
To fix this issue temporarily on your end you may use
bcftools annotate
to add chr prefix to clinvar VCF file in the Funcotator DataResource and rerun your annotation. We will post a fix on our end to our DataResources.
UNKNOWN fields are normal behavior for the Funcotator as those entries were not found in the resource or the VCF input itself.
I hope this helps.
-
Hello, I am wondering if a fix ended up being posted to your DataSources? I am having this same issue and was wondering how to work around it. I attempted using bcftools annotate as a temporary fix but, since the clinvar vcf files do not define the contigs and their lengths in the header, bcftools is unable to read the files and edit them.
For reference, I am trying to use clinvar_20230717_hg38.vcf within funcotator data sources.
[W::vcf_parse] Contig '1' is not defined in the header. (Quick workaround: index the file with tabix.)
Warning: Encountered an error, proceeding only because --force was given.
Note that this can result in a segfault or a silent corruption of the output file!
[E::vcf_format] Invalid BCF, CONTIG id=0 not present in the header -
Hi Kaina Millan
You can block gzip your clinvar vcf file using bgzip and use tabix to index that file. Once you perform this you need to modify the resource file for Clinvar in the same folder named clinvar_vcf.config
name = ClinVar_VCF
version = 20230717_hg38
src_file = clinvar_20230717_hg38.vcf.gzWe are working on a new data source and until then you can use this workaround to fix your issue.
I hope this helps.
Regards.
Please sign in to leave a comment.
16 comments