Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

HaplotypeCaller traversing all chromosomes, but they aren't in the output gvcf...

0

3 comments

  • Avatar
    Gökalp Çelik

    Hi Charity Goeckeritz

    We cannot do anything about what is permitted on your slurm cluster however we can give you our recommendations. It may be possible that your runs are going slow and exceeding the time permitted by slurm therefore they are prematurely stopped and you do not get any additional contigs beyond the first 2. 

    We recommend you to run HaplotypeCaller in parallel per contig or even per subcontig that is split by long runs of hard masked (Ns) regions. Once each shard is complete, you can gather those pieces to make the complete callset per sample which you can later on combine with GenomicsDBImport. 

    I hope this helps. 

    1
    Comment actions Permalink
  • Avatar
    Charity Goeckeritz

    Hi Gökalp, 

    Thanks so much for your response! I was checking the files this morning and found that some of them are getting further - I saw one had reached up to chromosome 8. As far as I can tell it doesn't seem to be a memory or runtime issue.. I can't find any such error in the log files. And they quit earlier than the time allotted for each. I'm unsure what's going on, I'll talk about this with our cluster manager too. Wouldn't GATK report, at the end of the log file, some kind of oom or time limit error?

    Thanks for your time, and I appreciate any other ideas you may have!

    Kindly,
    Charity 

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    GATK can only tell java or user errors but slurm errors can only be seen by slurm logs which you may need to get help from IT support. 

    If your read depths and genomic complexity is high then it is expected to observe long runtimes and higher heap size requirements by HaplotypeCaller. It would be better if you can split your genome into managable shards and call them separately. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk