Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Error using FindBreakpointEvidenceSpark

Answered
0

7 comments

  • Avatar
    Pamela Bretscher

    Hi Heba B abusamra,

    This looks like it may be a memory issue. What kind of machine are you running this on? You may want to specifiy --java-options -xmx to specify how much memory you would like to allocate to the tool as well as specify a number of executor cores to use. You can find some information about running spark tools here: https://github.com/broadinstitute/gatk#sparklocal. This may also work if you try running it with the GATK docker image. I'll also add that this tool is part of a pretty outdated pipeline. Here is a link to a more updated structural variant pipeline: https://github.com/broadinstitute/gatk-sv.

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Heba B abusamra

    Thank you for your reply 

    I'm using HPC ~200G memory , and tried what you recommends regarding specifying executor cores to use but still have same problem 

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi Heba B abusamra,

    Thank you for trying my suggestion. Could you please post the full new command that you ran?

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Heba B abusamra

    Here it is

    /app/Genome/gatk-4.2.2.0/gatk FindBreakpointEvidenceSpark \
    -I Data/GM24385_2_S15.bam \
    --aligner-index-image reference.fasta.img \
    --kmers-to-ignore kmers_to_ignore.txt \
    --spark-runner LOCAL \
    --spark-master 'local[*]'\
    -L ../../recalset/nextera_dna_exome_targeted_regions_manifest_v1_2.bed \
    -O assemblies2.sam

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Thank you Heba B abusamra. Did you specify --java-options -Xmx in your command? I don't see it in your post and I believe specifying how much memory to give this job may help with your problem. I would recommend allocating about 70-80% of the available memory on your machine to the tool.

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Heba B abusamra

    It gave me an error when using  --java-options -Xmx with gatk. However I used it with gatk-package-4.2.2.0-local.jar. Unfortunately I still get the same problem 

    java -Xms200m -jar /app/Genome/gatk-4.2.2.0/gatk-package-4.2.2.0-local.jar FindBreakpointEvidenceSpark \
    -I Data/GM24385_2_S15.bam \
    --aligner-index-image reference.fasta.img \
    --kmers-to-ignore kmers_to_ignore.txt \
    -L ../../recalset/nextera_dna_exome_targeted_regions_manifest_v1_2.bed \
    -O assemblies2.sam 

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi Heba B abusamra,

    Okay, thank you. I still think this is likely an issue with memory as spark tools tend to use up a lot of memory and require the right amount of memory/cores to be allocated. Can you confirm that you have adequate disk space available on your machine? I would recommend playing around with the memory allocation and the number of executor cores that you specify, particularly trying a very low number of cores.

    Example:

     --spark-runner LOCAL \
      --num-executors 5 --executor-cores 2 --executor-memory 4g \
      --conf spark.executor.memoryOverhead=600

    Kind regards,

    Pamela

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk