How to provide custom exon-intron annotation source in Funcotator
Hello,
I have used Mutect2 to call somatic variants in the HLA region. Since I had the HLA type for my samples, I aligned the reads for each sample to a reference consisting of the type-specific HLA allele genomic sequences (from IMGT-HLA) for that sample. Thus my called variant locations refer to these specific alleles (e.g. CHROM: pos = HLA06606:5223).
I would like to use Funcotator to classify my calls (SILENT, MISSENSE, etc.). I'm not clear if this is possible. I thought I could provide a data source consisting of a gtf file with the exon/utr information for these specific HLA allele. However, the documentation at https://gatk.broadinstitute.org/hc/en-us/articles/360035889931#1.1.5 appears to indicate that gtf files are only acceptable for datasources from GENCODE and one needs to provide a specific ncbi build whose choices are limited (screenshot below). Is there a way I can achieve my goal, if not with gtf with another of the allowed file formats?
Thanks,
Elisabetta
# Required field for GENCODE files. # NCBI build version (either hg19 or hg38): ncbi_build_version =
-
Yes, Funcotator can do what you are looking to do. You will need to make your own data sources. There is some information on how to do that in the Funcotator tutorial here: https://gatk.broadinstitute.org/hc/en-us/articles/360035889931-Funcotator-Information-and-Tutorial.
Please let me know if you have any further questions
Best,
Genevieve
-
Thanks for responding Genevieve!
The tutorial you indicate coincides with the one I was mentioning in my message, where I found that snippet whose screenshot I sent. Reading the user-defined data sources section, I got the impression that the only way to provide exon-intron annotation is with a data source in gtf format., whose type is referred to as 'gencode'. But for these it appears that the required NCBI field has to be one of hg19 or hg38 (as per screenshot), which in my case would be neither as I'm using HLA type-specific allele sequences as reference, not tied to any of these two builds. Maybe I misunderstood and there is a way to provide exon-intron annotation which is not tied to hg19 or hg38?
Best,
Elisabetta
-
Hi Elisabetta,
Yes, it is possible to provide any reference that you want. The reference just needs to match the folders inside of the data sources directory.
The documentation we have currently is not up to date - I've put in a ticket so that our documentation team can work on updating that article.
Some other notes:
- The supported types technically only include gencode and not gtf, but most gtf files will still work. Specify type = gencode but you can point it at a gtf file. Any problems you come up against are probably just because of the gtf header.
- Using your own reference, make sure the variants were called with that reference and the sequence dictionaries are the same. The --ref-version flag corresponds to the sub-folder name for a specific data source. Here is how that will look:
In the top level data sources directory, you'll have folders for specific data sources:
$ ls funcotator_dataSources.v1.7.20200521s MANIFEST.txt cancer_gene_census/ cosmic/ dbsnp/ gencode/ gnomAD_exome.tar.gz oreganno/ transcriptList.exact_uniprot_matches.AKT1_CRLF2_FGFR1.txt README.txt clinvar/ cosmic_fusion/ dna_repair_genes/ gencode_xhgnc/ gnomAD_genome.tar.gz simple_uniprot/ achilles/ clinvar_hgmd/ cosmic_tissue/ familial/ gencode_xrefseq/ hgnc/ template.config*
Every folder for a specific data source has sub folders for each supported reference for that data source. The value for --ref-version argument should correspond to one of these sub folders:
$ ls funcotator_dataSources.v1.7.20200521s/gencode hg19 hg38 $ ls funcotator_dataSources.v1.7.20200521s/dbsnp hg19 hg38
One of our GATK devs is going to respond to this thread with an example data sources directory for a bacterium that you can use as a template.
Let me know if you have any other questions.
Best,
Genevieve
-
Great, thanks for the clarification!
Elisabetta
Please sign in to leave a comment.
4 comments