Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Funocator chromosome missmatch to somatic variant calling pipeline

0

4 comments

  • Avatar
    Thomas Stevens

    Looking into this further, its more confusing. Both my variants and the data stored in funcotator_dataSources.v1.8.hg38.20230908s have the chr notation. Why is it searching for the non chr format?

     

     

    0
    Comment actions Permalink
  • Avatar
    David Roazen

    Hi Thomas Stevens,

    The message about "The following contigs are present in b37 and missing in the input VCF sequence dictionary" is just Funcotator's confusing way of telling you that your input VCF does not appear to have a b37/GRCh37 sequence dictionary. The list of contigs that it prints (1, 2, 3, etc.) are just the b37/GRCh37 contigs that are not present in your VCF's hg38 sequence dictionary.

    You say that you've confirmed that both your VCF and the Funcotator datasources are using the chr contig naming convention. Could you also confirm that your reference (Homo_sapiens_assembly38.fasta) has the same chr prefix in its contig names?

    How long did you leave Funcotator running? Does the traversal eventually finish (after several hours, for example)? Does the GATK process ever abort/exit?

    Regards,

    David

    0
    Comment actions Permalink
  • Avatar
    Thomas Stevens

    Hi David,

    The reference does have the chr prefix as well. 

    I left Funcotator running for around 5 days and it never progresses. GATK also never aborts it just stays stuck right at the beginning. 

    From experimenting I found that by deleting the other databases in the Funcotator folder to just keep dbsnp and the folders required by funcotator it runs to the end in ~ 20mins so its either an issue with it trying to use all the databases or something within one of them. I'm not sure how to fix it to be able to use more of the resources.

     

    Regards,

    Tommy 

    0
    Comment actions Permalink
  • Avatar
    David Roazen

    Hi Thomas Stevens,

    Do you have the gnomAD datasource activated (see https://gatk.broadinstitute.org/hc/en-us/articles/360035889931-Funcotator-Information-and-Tutorial#1.1.2.2)? The gnomAD datasource is very large and involves remote network accesses, and so can slow down Funcotator quite a bit on a slow network connection.

    Regards,

    David

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk