Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

How to provide custom exon-intron annotation source in Funcotator



  • Avatar
    Genevieve Brandt (she/her)

    Hi Elisabetta Manduchi,

    Yes, Funcotator can do what you are looking to do. You will need to make your own data sources. There is some information on how to do that in the Funcotator tutorial here:

    Please let me know if you have any further questions



    Comment actions Permalink
  • Avatar
    Elisabetta Manduchi

    Thanks for responding Genevieve!

    The tutorial you indicate coincides with the one I was mentioning in my message, where I found that snippet whose screenshot I sent. Reading the user-defined data sources section, I got the impression that the only way to provide exon-intron annotation is with a data source in gtf format., whose type is referred to as 'gencode'. But for these it appears that the required NCBI field has to be one of hg19 or hg38 (as per screenshot), which in my case would be neither as I'm using HLA type-specific allele sequences as reference, not tied to any of these two builds. Maybe I misunderstood and there is a way to provide exon-intron annotation which is not tied to hg19 or hg38?



    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Elisabetta,

    Yes, it is possible to provide any reference that you want. The reference just needs to match the folders inside of the data sources directory. 

    The documentation we have currently is not up to date - I've put in a ticket so that our documentation team can work on updating that article.

    Some other notes:

    1. The supported types technically only include gencode and not gtf, but most gtf files will still work. Specify type = gencode but you can point it at a gtf file. Any problems you come up against are probably just because of the gtf header.
    2. Using your own reference, make sure the variants were called with that reference and the sequence dictionaries are the same. The --ref-version flag corresponds to the sub-folder name for a specific data source. Here is how that will look:

    In the top level data sources directory, you'll have folders for specific data sources:

    $ ls funcotator_dataSources.v1.7.20200521s
    MANIFEST.txt  cancer_gene_census/  cosmic/         dbsnp/             gencode/          gnomAD_exome.tar.gz   oreganno/         transcriptList.exact_uniprot_matches.AKT1_CRLF2_FGFR1.txt
    README.txt    clinvar/             cosmic_fusion/  dna_repair_genes/  gencode_xhgnc/    gnomAD_genome.tar.gz  simple_uniprot/
    achilles/     clinvar_hgmd/        cosmic_tissue/  familial/          gencode_xrefseq/  hgnc/                 template.config*

    Every folder for a specific data source has sub folders for each supported reference for that data source. The value for --ref-version argument should correspond to one of these sub folders:

    $ ls funcotator_dataSources.v1.7.20200521s/gencode
    hg19   hg38
    $ ls funcotator_dataSources.v1.7.20200521s/dbsnp
    hg19   hg38

    One of our GATK devs is going to respond to this thread with an example data sources directory for a bacterium that you can use as a template.

    Let me know if you have any other questions.



    Comment actions Permalink
  • Avatar
    Elisabetta Manduchi

    Great, thanks for the clarification!


    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk