Haplotypecaller terminate with ExitCode 245, but without error
Hi,
I was running Haplotypecaller using one individual's HiFi reads. To minimize the running time, the genome was separated into chromosomes for execution simultaneously. But the process will always terminate in several chromosomes, giving an exit code of 245, no any error/warning report in the log file.
I am using GATK/4.1.4.1, the command is:
gatk HaplotypeCaller -L ../list/h2_01.list -R ../index/$ref -I $output.bam -O gvcf/$output.01.g.vcf.gz -ERC GVCF 2>>gvcf/$output.01.gatk.error
The log file starts with
Using GATK jar /sw/bioinfo/GATK/4.1.4.1/rackham/gatk-package-4.1.4.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk
.compression_level=2 -Xms512M -Xmx5G -jar /sw/bioinfo/GATK/4.1.4.1/rackham/gatk-package-4.1.4.1-local.jar HaplotypeCaller --pcr-indel-model NON
E -L ../list/h2_01-2.list -R ../index/she_h2.fasta -I she_h2.3ddna.map-hifi.markdup.bam -O gvcf/she_h2.3ddna.map-hifi.01-2.g.vcf.gz -ERC GVCF
23:21:12.908 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/sw/bioinfo/GATK/4.1.4.1/rackham/gatk-package-4.1.4.1-loca
l.jar!/com/intel/gkl/native/libgkl_compression.so
May 14, 2023 11:21:13 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
23:21:13.221 INFO HaplotypeCaller - ------------------------------------------------------------
23:21:13.221 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.1.4.1
23:21:13.221 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
23:21:13.223 INFO HaplotypeCaller - Executing as maoxm@r1069.uppmax.uu.se on Linux v3.10.0-1160.88.1.el7.x86_64 amd64
23:21:13.223 INFO HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_151-b12
23:21:13.223 INFO HaplotypeCaller - Start Date/Time: 2023年5月14日 下午11时21分12秒
23:21:13.223 INFO HaplotypeCaller - ------------------------------------------------------------
23:21:13.223 INFO HaplotypeCaller - ------------------------------------------------------------
23:21:13.224 INFO HaplotypeCaller - HTSJDK Version: 2.21.0
23:21:13.224 INFO HaplotypeCaller - Picard Version: 2.21.2
23:21:13.224 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
23:21:13.224 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
23:21:13.224 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
23:21:13.224 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
23:21:13.224 INFO HaplotypeCaller - Deflater: IntelDeflater
23:21:13.224 INFO HaplotypeCaller - Inflater: IntelInflater
23:21:13.225 INFO HaplotypeCaller - GCS max retries/reopens: 20
23:21:13.225 INFO HaplotypeCaller - Requester pays: disabled
23:21:13.225 INFO HaplotypeCaller - Initializing engine
23:21:13.829 INFO IntervalArgumentCollection - Processing 29411906 bp from intervals
23:21:13.837 INFO HaplotypeCaller - Done initializing engine
23:21:13.839 INFO HaplotypeCallerEngine - Tool is in reference confidence mode and the annotation, the following changes will be made to any specified annotations: 'StrandBiasBySample' will be enabled. 'ChromosomeCounts', 'FisherStrand', 'StrandOddsRatio' and 'QualByDepth' annotations have been disabled
23:21:13.864 INFO HaplotypeCallerEngine - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
23:21:13.864 INFO HaplotypeCallerEngine - All sites annotated with PLs forced to true for reference-model confidence output
23:21:13.878 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/sw/bioinfo/GATK/4.1.4.1/rackham/gatk-package-4.1.4.1-local.jar!/com/intel/gkl/native/libgkl_utils.so
23:21:13.897 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/sw/bioinfo/GATK/4.1.4.1/rackham/gatk-package-4.1.4.1-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
23:21:13.971 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
23:21:13.972 INFO IntelPairHmm - Available threads: 8
23:21:13.972 INFO IntelPairHmm - Requested threads: 4
23:21:13.972 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
23:21:14.016 INFO ProgressMeter - Starting traversal
23:21:14.016 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
23:21:15.105 WARN InbreedingCoeff - InbreedingCoeff will not be calculated; at least 10 samples must have called genotypes
23:21:24.818 INFO ProgressMeter - HiC_scaffold_3:82073 0.2 690 3832.6
....
and ends with
10:35:27.778 INFO ProgressMeter - HiC_scaffold_3:14357395 736.7 351740 477.4
10:35:38.416 INFO ProgressMeter - HiC_scaffold_3:14369585 736.9 351850 477.5
10:35:49.036 INFO ProgressMeter - HiC_scaffold_3:14376496 737.1 351900 477.4
It stopped again at nearby but not the same loci as I repeated it :
12:06:58.739 INFO ProgressMeter - HiC_scaffold_3:14346512 716.6 351670 490.8
12:07:09.126 INFO ProgressMeter - HiC_scaffold_3:14357395 716.7 351740 490.7
12:07:20.825 INFO ProgressMeter - HiC_scaffold_3:14371140 716.9 351860 490.8
Troubleshooting I did:
1. First, I expect the memory was run out, so I increased the CPU, and even over-booked it. But still failed with no error. The 'projstats' shown below:
2. Maybe there're problems in bam file. I used ValidateSamFile but no errors were found.
3. I checked the status and coverage of bam file to see if a collapse in the breaking point. It turns out ±20Kb around the point has an even mapping quality and coverage as normal.
HiC_scaffold_3 14371256 14371313 52
HiC_scaffold_3 14371313 14371314 51
HiC_scaffold_3 14371314 14371328 52
HiC_scaffold_3 14371328 14371329 51
HiC_scaffold_3 14371329 14371333 52
HiC_scaffold_3 14371333 14371334 50
HiC_scaffold_3 14371334 14371368 52
HiC_scaffold_3 14371368 14371369 51
HiC_scaffold_3 14371369 14371380 52
HiC_scaffold_3 14371380 14371381 51
HiC_scaffold_3 14371381 14371387 52
HiC_scaffold_3 14371387 14371388 51
HiC_scaffold_3 14371388 14371401 52
HiC_scaffold_3 14371401 14371404 22
HiC_scaffold_3 14371404 14371415 23
HiC_scaffold_3 14371415 14371416 24
HiC_scaffold_3 14371416 14371481 53
HiC_scaffold_3 14371481 14371482 52
HiC_scaffold_3 14371482 14371511 53
HiC_scaffold_3 14371511 14371512 52
HiC_scaffold_3 14371512 14371534 53
HiC_scaffold_3 14371534 14371535 52
HiC_scaffold_3 14371535 14371547 53
HiC_scaffold_3 14371547 14371548 52
HiC_scaffold_3 14371548 14371581 53
HiC_scaffold_3 14371581 14371582 52
HiC_scaffold_3 14371582 14371642 53
HiC_scaffold_3 14371642 14371643 52
4. As the same code has been run for Illumina reads calling for several times, I was suspected maybe the HiFi reads should have different parameter but didn't find any suggestions on the forum. So I added only '--pcr-indel-model NONE' but didn't help at all.
Could you please help me work out this failure of running Haplotypecaller? Is there any updated suggestions for HiFi reads specifically?
Thanks!
Best,
Xiaomeng
-
Hello Xiaomeng Mao.
I'm sorry to hear about your crashing on these data inputs. From experience, very often when you get no-message failures from HaplotpeCaller it is an out-of memory failure of one kind or another (very often running out of memory on the VM instance). It looks like you are setting-Xms512M -Xmx5G
so if you increase the available memory on the instance without increasing the java -xmx (which is the maximum memory java will restrict itself to using) value you aren't actually allowing java to utilize the extra space. Furthermore, I would be careful to also make sure that the -xmx is not too close to the total available memory for the total machine, as GATK has a number of non-java elements (namely compression/decompression libraries, smith waterman accelerators, etc...) that require memory as well so you must leave at least 1GB (and often 2GB is necessary in very messy edge cases).
The next comment I would make is that you are trying to use LongReads with HaplotypeCaller. HaplotypeCaller has been optimized for short reads but should be fine to run over the long reads but some older versions (the GATK/4.1.4.1 version you are using being several years old at this point) had compatibility issues that caused poor behavior with long-reads. I would recommend upgrading to a newer release (we a on 4.3.0.0 right now), as there have been a number of low level improvements to HTSJDK in the intervening years that might result in better performance on long reads. -
Hi James Emery,
Thanks for your reply and suggestions!
I retried by adding the below java options to the previous command line. (requiring 4 cores, each has a maximum of 6.8G RAM) But it still crushed at the same loci even though I chose the starting point from 100Kb before the breakpoint.
--java-options "-Xms1G -Xmx20G"
And in the log file, it contains both the new java memory setting and the default one. I am not sure if the increasing memory setting works.
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms512M -Xmx5G -Xms1G -Xmx20G -jar /sw/bioinfo/GATK/4.2.0.0/rackham/gatk-package-4.2.0.0-local.jar HaplotypeCaller --pcr-indel-model NONE -L HiC_scaffold_13:11789610-13990640 -R ../index/she_h1_curated.fasta -I she_h1.3ddna.map-hifi.addgroup.bam -O gvcf/she_h1.3ddna.map-hifi.05-1-2.g.vcf.gz -ERC GVCFThe second suggestion didn't work in GATK/4.3.0.0. I came across the same problem as issued in https://gatk.broadinstitute.org/hc/en-us/community/posts/11440622639387/comments/12925222020763 The bam file generated from Minimap2 looks incompatible with the latest version. When I changed to version 4.2.0.0, the problem disappeared.
Well, it is not surprising that it works when calling SNP from the several kb regions after the breaking point but this will leave a blank around the breaking point. I rechecked the input bam file but still didn't find any problems. Here's how I generate the bam file. Would you have other suggestions?
echo "##step1.mapping"
minimap2 -xasm20 -a -t 12 ../index/$ref $hifi_reads > $output.sam
samtools view -Sb -q 20 -@ 12 -T ../index/$ref $output.sam | samtools sort -@ 12 -o $output.sort.bam
echo "##step2.addgroup"
java -jar $PICARD_ROOT/picard.jar AddOrReplaceReadGroups I=$output.sort.bam O=$output.addgroup.bam RGID=$group RGLB=$group RGSM=$group RGPL=PacBio RGPU=none VALIDATION_STRINGENCY=LENIENTThanks!
Best wishes,
Xiaomeng
-
Interesting... a few comments:
- Regarding the multiple XMX-XMS commands in your inputs, can you post the full command line you are using for running GATK? Are you running the jar directly or running with the launcher script (./gatk)? Do you see a corresponding increase in the memory utilization to 20G in the profiling plots? Make sure that you haven't set any environment variables in your shell related to java (`JAVA_OPTS` for example) with those xmx/xms values.
- Regarding your issue with 4.3.0.0. I apologize, that was a mistake on my part 4.4.0.0 is actually the current version. However, from looking at the code it appears the same bug is present in 4.4.0.0, we are going to try to get it fixed by the next release, you can see the github ticket here: https://github.com/broadinstitute/gatk/issues/8335. If you want to use newer versions of GATK you are going to need to remove the offending tp tags from your bams (or wait for 4.4.1.0 which will hopefully have this fix).
I would not be surprised if the breakpoint is causing assembly to explode in GATK if it has to assemble megabases worth of softlcips/novel sequence from the long reads its possible that not amount of extra memory is going to work on it. First its worth determining if that is truly what is going on here or if we are just memory-starved from the codebase. What do the reads around that breakpoint look like in IGV? If it still fails after confirming that the memory bump is indeed being propagated (you should see java allocating upwards of 20GB) then we might want to look at where all of the memory is going in your case. -
Was this issue resolved? I am getting the same issue but I should be using the latest release of GATK 4.5.0.0 and am seeing the process gets terminated always at the same point (chr19), with an exit status of 245. Am using long read data
-
The issue should have been resolved by now and 4.5.0.0 and onwards should be OK to use. Can you provide more details of the problem that you are facing? Can you post your logs about the error?
Regards.
-
Sure -
Command executed:
gatk --java-options "-Xmx9830M -XX:-UsePerfData" \
HaplotypeCaller \
--input m21009_241011_231041.sorted.bam \
--output m21009_241011_231041.vcf.gz \
--reference Homo_sapiens_assembly38.fasta \
--native-pair-hmm-threads 6 \
--dbsnp dbsnp_146.hg38.vcf.gz \
\
\
\
--tmp-dir . \
cat <<-END_VERSIONS > versions.yml
"NFCORE_PACVAR:PACVAR:BAM_SNP_VARIANT_CALLING:GATK4_HAPLOTYPECALLER":
gatk4: $(echo $(gatk --version 2>&1) | sed 's/^.*(GATK) v//; s/ .*$//')
END_VERSIONS
Command exit status:
245
Command output:
(empty)
Command error:
07:33:55.190 INFO ProgressMeter - chr18:61584966 743.8 14881240 20008.1
07:34:05.210 INFO ProgressMeter - chr18:62125822 743.9 14884480 20007.9
07:34:15.218 INFO ProgressMeter - chr18:62704969 744.1 14887800 20007.9
07:34:25.225 INFO ProgressMeter - chr18:63281102 744.3 14891030 20007.8
07:34:35.236 INFO ProgressMeter - chr18:63822402 744.4 14894200 20007.5
07:34:45.264 INFO ProgressMeter - chr18:64479591 744.6 14897730 20007.8
07:34:55.283 INFO ProgressMeter - chr18:65051914 744.8 14901110 20007.8
07:35:05.323 INFO ProgressMeter - chr18:65568712 744.9 14904160 20007.4
07:35:15.329 INFO ProgressMeter - chr18:66196826 745.1 14907590 20007.6
07:35:25.350 INFO ProgressMeter - chr18:66729421 745.3 14910830 20007.4
07:35:35.354 INFO ProgressMeter - chr18:67334591 745.4 14914280 20007.6
07:35:45.480 INFO ProgressMeter - chr18:67919545 745.6 14917640 20007.5
07:35:55.493 INFO ProgressMeter - chr18:68356184 745.8 14920420 20006.8
07:36:05.500 INFO ProgressMeter - chr18:68899768 745.9 14923720 20006.7
07:36:15.543 INFO ProgressMeter - chr18:69483439 746.1 14927150 20006.9
07:36:25.558 INFO ProgressMeter - chr18:69999103 746.3 14930250 20006.5
07:36:35.582 INFO ProgressMeter - chr18:70516866 746.4 14933330 20006.2
07:36:45.617 INFO ProgressMeter - chr18:71130787 746.6 14936950 20006.6
07:36:55.625 INFO ProgressMeter - chr18:71692998 746.8 14940240 20006.5
07:37:05.628 INFO ProgressMeter - chr18:72207527 746.9 14943360 20006.2
07:37:15.640 INFO ProgressMeter - chr18:72758107 747.1 14946780 20006.3
07:37:25.640 INFO ProgressMeter - chr18:73399854 747.3 14950470 20006.8
07:37:35.651 INFO ProgressMeter - chr18:73989062 747.4 14954030 20007.1
07:37:45.673 INFO ProgressMeter - chr18:74410500 747.6 14956820 20006.3
07:37:55.710 INFO ProgressMeter - chr18:75118828 747.8 14960820 20007.2
07:38:05.730 INFO ProgressMeter - chr18:75751537 747.9 14964530 20007.7
07:38:15.749 INFO ProgressMeter - chr18:76372481 748.1 14968200 20008.2
07:38:25.750 INFO ProgressMeter - chr18:76975058 748.3 14971840 20008.6
07:38:35.756 INFO ProgressMeter - chr18:77431794 748.4 14974790 20008.0
07:38:45.774 INFO ProgressMeter - chr18:78108644 748.6 14978950 20009.1
07:38:55.776 INFO ProgressMeter - chr18:78788990 748.8 14982800 20009.8
07:39:05.804 INFO ProgressMeter - chr18:79779727 748.9 14988010 20012.3
07:39:15.828 INFO ProgressMeter - chr19:173182 749.1 14991950 20013.1
07:39:25.834 INFO ProgressMeter - chr19:822325 749.3 14996110 20014.2
07:39:35.847 INFO ProgressMeter - chr19:1532160 749.4 15000880 20016.1
07:39:45.856 INFO ProgressMeter - chr19:2193534 749.6 15005210 20017.4
07:39:55.892 INFO ProgressMeter - chr19:2840168 749.8 15009330 20018.5
07:40:05.906 INFO ProgressMeter - chr19:3311870 749.9 15012570 20018.3
07:40:12.724 WARN HaplotypeCallerGenotypingEngine - At position chr19:3742670-3742670 removed alt alleles where ploidy is 2 and original allele count is 49, whereas after trimming the allele count becomes 44. Alleles kept are:[T*, TGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGCAGATCACGAGGTCAGGAGATCAAGACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAAACACAAAAAATTAGCCGGGCGTGGTCGAGGGCGCCTGTGGTCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATGGCGTGAACCC$
07:40:15.913 INFO ProgressMeter - chr19:3942947 750.1 15016690 20019.4
07:40:25.960 INFO ProgressMeter - chr19:4531990 750.3 15020550 20020.1
07:40:36.010 INFO ProgressMeter - chr19:5091661 750.4 15024220 20020.5
07:40:46.031 INFO ProgressMeter - chr19:5828332 750.6 15028710 20022.0
07:40:56.144 WARN HaplotypeCallerGenotypingEngine - At position chr19:6429376-6429376 removed alt alleles where ploidy is 2 and
I've run this a few times and it keeps stoping at chromosome 19 -
Hi again.
Are you using this on long reads?
There may be unforseen bugs in our long read compatibility due to assembly engine failing for very complex regions. So you sample might have hit a certain edge case therefore it may not be working properly. You may wish to divide chr19 into several parts and see exactly where the tool fails and you may want to omit that region completely to reach a conclusion.
I hope this helps.
Regards.
Please sign in to leave a comment.
7 comments