Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

ReadsPipelineSparkMulticore.wdl, Unrecognized runtime attribute keys: discs, cpu

Answered
0

13 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Andrew,

    The java.lang.OutOfMemoryError: GC overhead limit exceeded indicates that you ran out of memory. I would recommend using java options to specify the memory you want to allocate:

    https://gatk.broadinstitute.org/hc/en-us/articles/360035532372-Java-is-using-too-many-resources-threads-memory-or-CPU-

    Let me know if this works!

    Best,

    Genevieve

    1
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Andrew Erzunov,

    It looks like this is a memory issue from the program log snippet you shared (java.lang.OutOfMemoryError: GC overhead limit exceeded). 

    Our GATK support team does not support personal cromwell instances, so I'm not familiar with troubleshooting this WDL and how to fix memory issues (ReadsPipelineSparkMulticore.wdl). You can see if other users are able to help out in the comments of this post and you can also check out these other cromwell resources:

    Alternatively, you can try running ReadsPipelineSpark within GATK and I will be able to better determine if there is something you can do to help out the memory problem. Here is a section in our readme about running GATK4 spark tools locally: https://github.com/broadinstitute/gatk#sparklocal.

    Please let me know if you have any other questions.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Andrew Erzunov

    Hi Genevieve,

    Thanks a lot for your response.
    I tried to run ReadsPipelineSpark within GATK locally.

    Here is a command i used:

    docker run -v /xchg/local/pipelines/references/cromwell/gatk_wdl/exome_data:/data -it broadinstitute/gatk:latest ./gatk ReadsPipelineSpark -I /data/exome_align.bam -R /data/hg38_no_alt.fa --known-sites /data/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf -O /data/output_exome.vcf --spark-runner LOCAL --spark-master 'local[100]'

    But got the following errors:

    I will be greateful for your help,

    Andrew

    0
    Comment actions Permalink
  • Avatar
    Andrew Erzunov

    Thank you very much for your response, Genevieve.
    Using the Java argument -Xmx really helped to solve "java.lang.OutOfMemoryError" problem.

    ReadsPipelineSpark successfully executed on exome data, but when i passed genome data, i got that "The covariates table is missing ReadGroup someId in RecalTable0" error:

    I set Read Group parameter via bwa.

    Bam file has been successfully validated:

    Command which i used for validation:

    java -jar picard.jar ValidateSamFile I=genome/genome_align.bam MODE=SUMMARY

    I will be greateful for your help,

    Andrew

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Andrew,

    Could you post the program log from when the pipeline runs BaseRecalibrator? It looks like this error might be related to this reported issue: https://github.com/broadinstitute/gatk/issues/6242. Where all the reads from one read group get filtered out during BaseRecalibrator, causing an error with ApplyBQSR. 

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Andrew Erzunov

    Hello, Genevieve.

    I am attaching the log file after running the following command:

    ./gatk-4.2.6.1/gatk ReadsPipelineSpark -I /media/gene/sdb/cromwell/gatk_wdl/genome/genome_align.bam -R /media/gene/sdb/cromwell/gatk_wdl/reference/hg38_no_alt.fa --known-sites /media/gene/sdb/cromwell/gatk_wdl/reference/Homo_sapiens_assembly38.dbsnp138.vcf -O /media/gene/sdb/cromwell/gatk_wdl/genome/output_genome.vcf --spark-runner LOCAL --spark-master 'local[45]' --java-options "-Xmx90G" --tmp-dir /media/gene/sdb/cromwell/gatk_wdl/temp_files

    Here is log file:

    https://drive.google.com/file/d/1SHNkuwBeYEZ48nsxUVsbdTkBBP4mVW3i/view?usp=sharing 

    Also i tried using "ReadGroupBlackListReadFilter" option with above command:

    --read-filter ReadGroupBlackListReadFilter --read-group-black-list RG:someId

    But got the following error:

    I will be greateful for your help,

    Andrew

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Andrew,

    I didn't see any information in the logs about reads being filtered, so I'm not sure about the cause. Could you verify that you successfully added read groups to your file with this command?

    samtools view -H sample.bam | grep '^@RG'

    I don't think that option will be your best step forward, it probably will be most helpful to find the cause of the read group problem. 

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Andrew Erzunov

    Hello, Genevieve.

    After applying the following command, i got the following output:

    Faithfully,

    Andrew

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Great thank you! Since you only have one read group in your file, that is why it does not work to blacklist that one read group.

    I noticed you don't have PU in your read group, you might want to add that. Here's more information about read groups: https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups

    I don't think the PU issue is causing your error, so I'm going to keep looking into the ReadsPipelineSpark error.

    0
    Comment actions Permalink
  • Avatar
    Andrew Erzunov

    Hi Genevieve,

    I tried to add PU field:

    And after running the following command:
    I got this error:

    Faithfully,

    Andrew

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Andrew,

    It looks like your error is the same after you added the PU field. I'm having trouble finding out what is truly causing the problem during ApplyBQSR in this ReadsPipelineSpark pipeline. The pipeline does not seem to print out the results from the read filters during the BaseRecalibrator step. There is better error handling with the tools themselves. Would you be able to run the tools separately that are in ReadsPipelineSpark so we can figure out why you are getting this error? I know this is not ideal, but I don't think there is a better way to troubleshoot. 

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Andrew Erzunov

    Hello Genevieve,

    After running BaseRecalibrator tool, I got the following result (BaseRecalibrator was able to recalibrate 0 reads):
    https://drive.google.com/file/d/1mVtbBjgUcA9WFNaI3bOZYiXc-yH-KqtK/view?usp=sharing
    And after running ApplyBQSR, i got the same error as after running ReadsPipelineSpark:
    https://drive.google.com/file/d/1Mipt7O0_CR1gosxdfsuSsR_mODzQ_F9t/view?usp=sharing

    Also, after using samtools coverage i got following result:



    Faithfully,

    Andrew


    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Andrew,

    It looks like all your reads were filtered by the MappingQualityNotZeroReadFilter:

    15:32:48.714 INFO  BaseRecalibrator - 993014098 read(s) filtered by: MappingQualityNotZeroReadFilter 

    This indicates that something went wrong with your mapping, because all your reads have a mapping quality of 0. You can read more about the read filter here: https://gatk.broadinstitute.org/hc/en-us/articles/5358856018459-MappingQualityNotZeroReadFilter

    Best,

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk