Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Speeding up GATK4 DepthOfCoverage

0

2 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Michael Franklin, unfortunately there is no spark implementation planned for this tool. It is still in BETA development, so there is still progress to be made in terms of the functionalities. And yes, in GATK4 we do not have the -nt and -nct parameters. 

    1
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Michael Franklin, I got more information about this tool in case you could get some runtime improvements. One question to consider, are you running this in a shared resources cluster with slow disc reading and writing? DepthofCoverage writes a lot of files and so slow reading and writing can lead to an expensive runtime with this tool.

    An improvement can be to use --omit-depth-output-at-each-base. DepthofCoverage writes a line for every base in the genome, which can greatly increase the runtime. If you do not need this information for every base in the genome, then using that option will save you a lot of time.

    Also, I found that if you were to split the analysis into more intervals, the interval statistics would be fine to merge and would not lead to any changes in the results. However, at this point we do not provide an easy way to merge the outputs.

    1
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk