Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Spark Follow

5 comments

  • Avatar
    Anup Agarwal

    What options need to be specified when spark is running over YARN in cluster mode. From what I understand, there is no spark daemon process listening for jobs, one simply submits jobs using spark-submit binary with the master as yarn and spark picks yarn config from $HADOOP_CONF_DIR. Is that mode not supported?

    0
    Comment actions Permalink
  • Avatar
    Anand

    Hi

    It looks like this below link listed on this page is broken:

    See the example parameters below and the local-Spark tutorial for more information

    0
    Comment actions Permalink
  • Avatar
    yh guo

    "Some GATK tools only exist in a Spark-capable version

    Those tools don't have the "Spark" suffix."

    Is it mean tools like CombineGVCFs can run with paremater of --conf 'spark.executor.cores=8'?

    0
    Comment actions Permalink
  • Avatar
    Jiten Parmar

    Is it possible to run spark workflows on multiple nodes? If yes, how?

    0
    Comment actions Permalink
  • Avatar
    Changxin Lu

    Hi,

    This link about spark is broken:

    https://gatk.broadinstitute.org/hc/en-us/articles/360035889831

      

     
     
    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk