Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

gatk CollectAllelicCounts error

1

6 comments

  • Avatar
    Tiffany Miller

    Hi peterchung , you may have more success if you break up the interval list by contig and then run the tool on each contig separately. Could you try that and keep the max heap size of 16g?

    1
    Comment actions Permalink
  • Avatar
    peterchung

    Thanks for your reply. May I ask how to split the wgs_calling_regions.hg38.interval_list into per chromosome to run. 

    I examined the gatk website:

    https://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists

    how to do like the gatk suggest by 

    -L / --intervals allows you to specify an interval or list of intervals to include.

    -L chr20 for contig chr20.

    Thanks

     

     

    0
    Comment actions Permalink
  • 0
    Comment actions Permalink
  • Avatar
    Tiffany Miller

    Yes! I was just going to suggest that! Let us know if you run into any problems.

    0
    Comment actions Permalink
  • Avatar
    peterchung

    yes. I try to use gatk 4.1.7 and subset everything like bam file (3.5GB) into chr18.bam file (92M) and subset interval list but still have the similar error

    Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
            at org.broadinstitute.hellbender.utils.Nucleotide$Counter.<init>(Nucleotide.java:535)
            at org.broadinstitute.hellbender.tools.copynumber.datacollection.AllelicCountCollector.collectAtLollelicCountCollector.java:60)
            at org.broadinstitute.hellbender.tools.copynumber.CollectAllelicCounts.apply(CollectAllelicCounts.163)
            at org.broadinstitute.hellbender.engine.LocusWalker.lambda$traverse$0(LocusWalker.java:162)
            at org.broadinstitute.hellbender.engine.LocusWalker$$Lambda$107/516040753.accept(Unknown Source)
            at java.util.Iterator.forEachRemaining(Iterator.java:116)
            at org.broadinstitute.hellbender.engine.LocusWalker.traverse(LocusWalker.java:160)
            at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLinePm.java:191)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:2
            at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
            at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
            at org.broadinstitute.hellbender.Main.main(Main.java:292)
    Using GATK jar /ubda/home/kcchung/anaconda3/share/gatk4-4.1.7.0-0/gatk-package-4.1.7.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use__io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx16g -Djava.io.tmpdir=/ubda/home/kcchung/cnv-analymp -jar /ubda/home/kcchung/anaconda3/share/gatk4-4.1.7.0-0/gatk-package-4.1.7.0-local.jar CollectAllelicCo--intervals /ubda/home/kcchung/cnv-analysis/ref/preprocessed.hg38.interval_list --input /ubda/home/kcchunganalysis/29406.bam --reference /ubda/home/kcchung/cnv-analysis/ref/hg38.fasta --sequence-dictionary /ubda/kcchung/cnv-analysis/ref/hg38.dict --output 29406.allelic_counts.tsv

    0
    Comment actions Permalink
  • Avatar
    Samuel Lee

    Hi peterchung,

    This tool is intended to be run over a list of common SNP sites (see e.g., https://gatk.broadinstitute.org/hc/en-us/articles/360035890011--How-to-part-II-Sensitively-detect-copy-ratio-alterations-and-allelic-segments), not regions like those contained in wgs_calling_regions.hg38.interval_list.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk