Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 report generic error of java and exit quickly.



  • Avatar
    James Emery

    Hello UTao Cao. This error looks like a genuine bug that we probably have to fix. It would help us to run this to ground if you can test a few things for us. First can you try running PrintReads on these inputs to see if the files. Next could you try running ValidateSamFile ( on your input files to make sure it validates. Our particular concern is that there might be an ordering problem that is causing this exception. 

    If both of those pass we would appreciate if you could post the bam headers for these files as there might be an undefined/poorly defined sort ordering that is causing this exception. If that is the problem then running sortsam on your input files first then running them through Mutect2 would probably fix the issue. 

    Comment actions Permalink
  • Avatar
    gabriele tosadori

    I am having a very similar problem, except it takes more than 2hs to show up.

    If needed, i am running the script using:

    openjdk 17.0.9 2023-10-17
    OpenJDK Runtime Environment (build 17.0.9+9-Ubuntu-122.04)
    OpenJDK 64-Bit Server VM (build 17.0.9+9-Ubuntu-122.04, mixed mode, sharing)

    This is the script i am using (for which i got an output):

    cd /home/user/genomics/gatk/

    ./gatk Mutect2 \
            -R /home/user/data-02/reference/a_reference_genome.fna \
            -I /home/user/data-02/2023_11_17_cell_line_12615/12615_sorted.bam \
            -tumor 12615 \
            -I /home/user/data-02/2023_11_13_cell_line_3684/3684_sorted.bam \
            -normal 3684 \
            -O /home/user/data-02/cells_comparison_results/mutect/somatic.vcf.gz

    cd /home/user/data-02/scripts/

    These the results:

    root@324579823:/home/user/data-02/cells_comparison_results/mutect# ls -lA
    total 4896
    -rw-r--r-- 1 root root 3883918 Feb  9 11:16 somatic.vcf.gz
    -rw-r--r-- 1 root root  127610 Feb  9 11:16 somatic.vcf.gz.tbi

    Here, the last few rows of the, very long, output i got:

    11:14:59.874 INFO  ProgressMeter - NW_003613811.1:1932257            129.8               2838510          21867.9
    11:16:31.353 INFO  VectorLoglessPairHMM - Time spent in setup for JNI call : 6.560276783000001
    11:16:31.353 INFO  PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 1693.8890707330002
    11:16:31.353 INFO  SmithWatermanAligner - Total compute time in native Smith-Waterman : 345.22 sec
    11:16:31.353 INFO  Mutect2 - Shutting down engine
    [February 9, 2024 at 11:16:31 AM CET] done. Elapsed time: 131.37 minutes.
    java.lang.OutOfMemoryError: Java heap space
            at htsjdk.samtools.SAMTextHeaderCodec$ParsedHeaderLine.<init>(
            at htsjdk.samtools.SAMTextHeaderCodec.decode(
            at htsjdk.samtools.reference.ReferenceSequenceFileFactory.loadDictionary(
            at htsjdk.samtools.reference.AbstractFastaSequenceFile.findAndLoadSequenceDictionary(
            at htsjdk.samtools.reference.AbstractFastaSequenceFile.lambda$new$9c19d50a$1(
            at htsjdk.samtools.reference.AbstractFastaSequenceFile$$Lambda$225/0x00007fa42458fd20.get(Unknown Source)
            at htsjdk.samtools.util.Lazy.get(
            at htsjdk.samtools.reference.AbstractFastaSequenceFile.getSequenceDictionary(
            at htsjdk.samtools.reference.IndexedFastaSequenceFile.getSequenceDictionary(
            at htsjdk.samtools.reference.AbstractIndexedFastaSequenceFile.<init>(
            at htsjdk.samtools.reference.IndexedFastaSequenceFile.<init>(
            at htsjdk.samtools.reference.IndexedFastaSequenceFile.<init>(
            at htsjdk.samtools.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(
            at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.<init>(
            at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.<init>(
            at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.<init>(
            at org.broadinstitute.hellbender.engine.ReferenceFileSource.<init>(
            at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(
            at org.broadinstitute.hellbender.engine.GATKTool.doWork(
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(
            at org.broadinstitute.hellbender.Main.runCommandLineProgram(
            at org.broadinstitute.hellbender.Main.mainEntry(
            at org.broadinstitute.hellbender.Main.main(

    Any idea what it may be?

    Comment actions Permalink
  • Avatar
    UTao Cao

    Thanks for your suggestions. The input bam files passed PrintReads and  ValidateSamFile tests. I attached the header below. 

    @HD    VN:1.6    SO:unknown    GO:query
    @SQ    SN:chr1    LN:248956422
    @SQ    SN:chrY_MU273398v1_fix    LN:865743
    @RG    ID:SRR9217692    SM:SRR9217692    PL:ILLUMINA
    @PG    ID:bwa-mem2    PN:bwa-mem2    VN:2.2.1    CL:bwa-mem2 mem -t 5 -M -R @RG\tID:SRR9217692\tSM:SRR9217692\tPL:ILLUMINA /media/bioinfo/reference/human/hg38/bwamem2/hg38.fa ../noncodingHCC/results/PRJNA504942_WGS/cleanqc/SRR9217692_clean_1.fq.gz ../noncodingHCC/results/PRJNA504942_WGS/cleanqc/SRR9217692_clean_2.fq.gz
    @PG    ID:samtools    PN:samtools    PP:bwa-mem2    VN:1.17    CL:samtools sort -O bam -@ 5 -o ../noncodingHCC/results/PRJNA504942_WGS/align/SRR9217692_sort.bam
    @PG    ID:MarkDuplicates    VN:Version:    CL:MarkDuplicates --INPUT ../noncodingHCC/results/PRJNA504942_WGS/align/SRR9217692_sort.bam --OUTPUT ../noncodingHCC/results/PRJNA504942_WGS/align/markdup/SRR9217692_sort_markdup.bam --METRICS_FILE ../noncodingHCC/results/PRJNA504942_WGS/align/markdup/SRR9217692_markdup_metrics.txt --REMOVE_DUPLICATES false --ASSUME_SORT_ORDER queryname --OPTICAL_DUPLICATE_PIXEL_DISTANCE 2500 --TMP_DIR ../noncodingHCC/results/PRJNA504942_WGS/temp/markdup/SRR9217692 --VALIDATION_STRINGENCY SILENT --CREATE_MD5_FILE false --MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP 50000 --MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 8000 --SORTING_COLLECTION_SIZE_RATIO 0.25 --TAG_DUPLICATE_SET_MEMBERS false --REMOVE_SEQUENCING_DUPLICATES false --TAGGING_POLICY DontTag --CLEAR_DT true --DUPLEX_UMI false --FLOW_MODE false --FLOW_QUALITY_SUM_STRATEGY false --USE_END_IN_UNPAIRED_READS false --USE_UNPAIRED_CLIPPED_END false --UNPAIRED_END_UNCERTAINTY 0 --FLOW_SKIP_FIRST_N_FLOWS 0 --FLOW_Q_IS_KNOWN_END false --FLOW_EFFECTIVE_QUALITY_THRESHOLD 15 --ADD_PG_TAG_TO_READS true --ASSUME_SORTED false --DUPLICATE_SCORING_STRATEGY SUM_OF_BASE_QUALITIES --PROGRAM_RECORD_ID MarkDuplicates --PROGRAM_GROUP_NAME MarkDuplicates --READ_NAME_REGEX <optimized capture of last three ':' separated fields as numeric values> --MAX_OPTICAL_DUPLICATE_SET_SIZE 300000 --VERBOSITY INFO --QUIET false --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false    PN:MarkDuplicates
    @PG    ID:GATK ApplyBQSR    VN:    CL:ApplyBQSR --output ../noncodingHCC/results/PRJNA504942_WGS/align/bqsr/SRR9217692_sort_markdup_bqsr.bam --bqsr-recal-file ../noncodingHCC/results/PRJNA504942_WGS/align/bqsr/SRR9217692_bqsr.table --use-original-qualities true --static-quantized-quals 10 --static-quantized-quals 20 --static-quantized-quals 30 --input ../noncodingHCC/results/PRJNA504942_WGS/align/markdup/SRR9217692_sort_markdup.bam --reference /media/bioinfo/reference/human/hg38/bwamem2/hg38.fa --create-output-bam-index true --create-output-bam-md5 true --add-output-sam-program-record true --preserve-qscores-less-than 6 --quantize-quals 0 --round-down-quantized false --emit-original-quals false --global-qscore-prior -1.0 --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-variant-index true --create-output-variant-md5 false --max-variants-per-shard 0 --lenient false --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays  --disable-tool-default-read-filters false    PN:GATK ApplyBQSR
    @PG    ID:samtools.1    PN:samtools    PP:samtools    VN:1.17    CL:samtools view --header-only -o log/nohup_reportbug_headsoftumor.txt results/PRJNA504942_WGS/align/bqsr/T33M77y_SRR9217692SRR9208219_bqsr.bam

    Comment actions Permalink
  • Avatar
    Can Kockan

    Hi UTao Cao , looking at your header it might be a sorting issue as James had previously suspected. I see that the sorting order is unknown and it is possible that your file is not coordinate-sorted. Therefore, my suggestion would be to sort your sam/bam file and retry.

    gabriele tosadori this looks like a different issue, your program ran out of Java heap memory so I would retry with more memory (e.g. -Xmx64g). 

    Comment actions Permalink
  • Avatar
    gabriele tosadori

    Can Kockan, yes it worked indeed. Well, at least i think it did. Is it correct if the last line mutect2 prints is something like this:

    10:13:08.683 INFO  ProgressMeter - NW_003613915.1:1166429            147.3               3587110          24347.4

    I have no idea what to expect from the standard output. Actually i was expecting something like "mutect finished" or something like that. So...can i assume it's done?

    Comment actions Permalink
  • Avatar
    Can Kockan

    gabriele tosadori That could still be an early termination, I'd expect an exit status as well. I'd check the output VCF to make sure but I highly suspect that this is similar to the following issue:

    See the last comment in that post by Louis Bergelson where he recommends leaving some memory for non-heap memory also, which might help fix the issue.

    Comment actions Permalink
  • Avatar
    UTao Cao

    Can Kockan, The command for sorting is 

    @PG    ID:samtools    PN:samtools    PP:bwa-mem2    VN:1.17    CL:samtools sort -O bam -@ 5 -o ../noncodingHCC/results/PRJNA504942_WGS/align/SRR9217692_sort.bam

    So the _sort.bam should be coordinate-sorted. (I also checked the first several lines of the bam files)

    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk