gatk CollectAllelicCounts error
I am new to gatk and I have a question on the gatk CollectAllelicCounts
I used gatk4-4.0.12.0-0 on conda environment
I followed the data preprocessed workflow and generated the recalibrated bam file.
WD="/home/Desktop/CNV"
REF="${WD}/ref/hg38.fasta"
INT="${WD}/ref/wgs.hg38.interval_list"
DICT="${WD}/ref/hg38.fasta.dict"
TMPFILE=${WD}/tmp
time gatk --java-options "-Xmx16g -Djava.io.tmpdir=${TMPFILE}" CollectAllelicCounts \
--intervals ${INT} \
--input ${NAME}.addRG.mkdup.recal.bam \
--reference ${REF} \
--tmp-dir ${TMPFILE} \
--sequence-dictionary ${DICT} \
--output ${NAME}.allelic_counts.tsv
Error comes out:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3181) at java.util.ArrayList.grow(ArrayList.java:265) at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:239) at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:231) at java.util.ArrayList.add(ArrayList.java:462) at org.broadinstitute.hellbender.tools.copynumber.datacollection.AllelicCountCollector.collectAtLocus(AllelicCountCollector.java:72) at org.broadinstitute.hellbender.tools.copynumber.CollectAllelicCounts.apply(CollectAllelicCounts.java:152) at org.broadinstitute.hellbender.engine.LocusWalker.lambda$traverse$0(LocusWalker.java:176) at org.broadinstitute.hellbender.engine.LocusWalker$$Lambda$91/1519482659.accept(Unknown Source) at java.util.Iterator.forEachRemaining(Iterator.java:116) at org.broadinstitute.hellbender.engine.LocusWalker.traverse(LocusWalker.java:174) at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:966) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) at org.broadinstitute.hellbender.Main.main(Main.java:289)
and there is another error when I change the -Xmx16g to -Xmx8g
[May 22, 2020 12:15:40 PM HKT] org.broadinstitute.hellbender.tools.copynumber.CollectAllelicCounts done. Elapsed time: 21.76 minutes. Runtime.totalMemory()=15772155904 Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3181) at java.util.ArrayList.grow(ArrayList.java:265) at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:239) at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:231) at java.util.ArrayList.add(ArrayList.java:462) at org.broadinstitute.hellbender.tools.copynumber.datacollection.AllelicCountCollector.collectAtLocus(AllelicCountCollector.java:72) at org.broadinstitute.hellbender.tools.copynumber.CollectAllelicCounts.apply(CollectAllelicCounts.java:152) at org.broadinstitute.hellbender.engine.LocusWalker.lambda$traverse$0(LocusWalker.java:176) at org.broadinstitute.hellbender.engine.LocusWalker$$Lambda$91/2118482375.accept(Unknown Source) at java.util.Iterator.forEachRemaining(Iterator.java:116) at org.broadinstitute.hellbender.engine.LocusWalker.traverse(LocusWalker.java:174) at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:966) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) at org.broadinstitute.hellbender.Main.main(Main.java:289)
real 21m49.559s user 140m38.016s sys 0m15.373s
When I increased to -Xmx30g, it hanged at the chr10 and cannot run further so I ctrl+c to stop the terminal. (my computer has 32GB ram)
-
Hi peterchung , you may have more success if you break up the interval list by contig and then run the tool on each contig separately. Could you try that and keep the max heap size of 16g?
-
Thanks for your reply. May I ask how to split the wgs_calling_regions.hg38.interval_list into per chromosome to run.
I examined the gatk website:
https://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists
how to do like the gatk suggest by
-L
/--intervals
allows you to specify an interval or list of intervals to include.-L chr20
for contig chr20.Thanks
-
oh there is a function to do so.
https://gatk.broadinstitute.org/hc/en-us/articles/360036899592-SplitIntervals
-
Yes! I was just going to suggest that! Let us know if you run into any problems.
-
yes. I try to use gatk 4.1.7 and subset everything like bam file (3.5GB) into chr18.bam file (92M) and subset interval list but still have the similar error
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.broadinstitute.hellbender.utils.Nucleotide$Counter.<init>(Nucleotide.java:535)
at org.broadinstitute.hellbender.tools.copynumber.datacollection.AllelicCountCollector.collectAtLollelicCountCollector.java:60)
at org.broadinstitute.hellbender.tools.copynumber.CollectAllelicCounts.apply(CollectAllelicCounts.163)
at org.broadinstitute.hellbender.engine.LocusWalker.lambda$traverse$0(LocusWalker.java:162)
at org.broadinstitute.hellbender.engine.LocusWalker$$Lambda$107/516040753.accept(Unknown Source)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at org.broadinstitute.hellbender.engine.LocusWalker.traverse(LocusWalker.java:160)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLinePm.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:2
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
Using GATK jar /ubda/home/kcchung/anaconda3/share/gatk4-4.1.7.0-0/gatk-package-4.1.7.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use__io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx16g -Djava.io.tmpdir=/ubda/home/kcchung/cnv-analymp -jar /ubda/home/kcchung/anaconda3/share/gatk4-4.1.7.0-0/gatk-package-4.1.7.0-local.jar CollectAllelicCo--intervals /ubda/home/kcchung/cnv-analysis/ref/preprocessed.hg38.interval_list --input /ubda/home/kcchunganalysis/29406.bam --reference /ubda/home/kcchung/cnv-analysis/ref/hg38.fasta --sequence-dictionary /ubda/kcchung/cnv-analysis/ref/hg38.dict --output 29406.allelic_counts.tsv -
Hi peterchung,
This tool is intended to be run over a list of common SNP sites (see e.g., https://gatk.broadinstitute.org/hc/en-us/articles/360035890011--How-to-part-II-Sensitively-detect-copy-ratio-alterations-and-allelic-segments), not regions like those contained in wgs_calling_regions.hg38.interval_list.
Please sign in to leave a comment.
6 comments