Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Problem with Germline CNV CollectReadCounts

0

9 comments

  • Avatar
    SkyWarrior

    Your

    test1.preprocessed.interval_list 

    file is empty. Could you check that? 


    If it is not empty it is possible that the file format is abnormal and cannot be recognized by gatk. 

    0
    Comment actions Permalink
  • Avatar
    firat zahid

    thank you for your answer. no it is not empty. 


    note: I have used the same bed file while generating bam file in "Gatk BaseRecalibrator" and in the first step of CNV calling "PreprocessIntervals" step.

    0
    Comment actions Permalink
  • Avatar
    Samuel Lee

    Can you check again whether test1.preprocessed.interval_list contains intervals (i.e., it may be non-empty because it only contains a sequence dictionary, but does not contain any intervals)?

    Given the following lines in your log, I suspect this might be the case (and I'm guessing @SkyWarrior did, too):

    09:46:21.932 INFO FeatureManager - Using codec IntervalListCodec to read file file:///mnt/d/gatk-4.1.4.1/test1.preprocessed.interval_list
    09:46:21.940 INFO IntervalArgumentCollection - Processing 0 bp from intervals

    It is possible that the preprocessing steps performed by PreprocessIntervals on your original bed file resulted in an empty interval list?

    0
    Comment actions Permalink
  • Avatar
    Sara Franzelli

    Hello, I'm having a similar problem, have you found a solution? I checked my preprocessed.interval_list file, it is not empty and it also contains the intervals. I also have these lines in my log:

    17:13:15.144 INFO  CollectReadCounts - Initializing engine
    17:13:18.773 INFO  IntervalArgumentCollection - Processing 3117292070 bp from intervals
    17:13:18.847 INFO  CollectReadCounts - Done initializing engine
    17:13:18.850 INFO  CollectReadCounts - Shutting down engine
    [October 11, 2023 5:13:18 PM CEST] org.broadinstitute.hellbender.tools.copynumber.CollectReadCounts done. Elapsed time: 0.09 minutes.
    Runtime.totalMemory()=2076049408
    java.lang.IllegalArgumentException: The string is null: string must not be null or empty
    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Sara Franzelli

    Can you post your whole commandline and log here as well? 

    This exception message is quite generic and does not seem like a regular GATK exception to us. 

     

    0
    Comment actions Permalink
  • Avatar
    Sara Franzelli

    I actually think I found the problem, my cram files didn't have the RG, so I used samtools addreplacerg to add it. Now I'm still testing it but it seems to work 

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Great that you solved your problem.

    Thank you for your feedback. 

    0
    Comment actions Permalink
  • Avatar
    Peter Lönnerberg

    Hello, I get the same error, and can not understand the reason. This is the command and output:

    INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
    INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
    Using GATK jar /opt/conda/share/gatk4-4.6.2.0-0/gatk-package-4.6.2.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx9830M -XX:-UsePerfData -jar /opt/conda/share/gatk4-4.6.2.0-0/gatk-package-4.6.2.0-local.jar CollectReadCounts --input P20911_115_S15_L003_R1_001.bam --intervals genome.interval_list --output SLPD003.hdf5 --reference Homo_sapiens_assembly38.fasta --tmp-dir . --format HDF5 --imr OVERLAPPING_ONLY
    08:50:34.540 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/conda/share/gatk4-4.6.2.0-0/gatk-package-4.6.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    08:50:34.867 INFO  CollectReadCounts - ------------------------------------------------------------
    08:50:34.871 INFO  CollectReadCounts - The Genome Analysis Toolkit (GATK) v4.6.2.0
    08:50:34.871 INFO  CollectReadCounts - For support and documentation go to https://software.broadinstitute.org/gatk/
    08:50:34.871 INFO  CollectReadCounts - Executing as peter@monod33.mbb.ki.se on Linux v4.18.0-553.53.1.el8_10.x86_64 amd64
    08:50:34.871 INFO  CollectReadCounts - Java runtime: OpenJDK 64-Bit Server VM v17.0.11-internal+0-adhoc..src
    08:50:34.871 INFO  CollectReadCounts - Start Date/Time: November 27, 2025 at 8:50:34 AM GMT
    08:50:34.871 INFO  CollectReadCounts - ------------------------------------------------------------
    08:50:34.871 INFO  CollectReadCounts - ------------------------------------------------------------
    08:50:34.872 INFO  CollectReadCounts - HTSJDK Version: 4.2.0
    08:50:34.872 INFO  CollectReadCounts - Picard Version: 3.4.0
    08:50:34.872 INFO  CollectReadCounts - Built for Spark Version: 3.5.0
    08:50:34.874 INFO  CollectReadCounts - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    08:50:34.874 INFO  CollectReadCounts - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    08:50:34.874 INFO  CollectReadCounts - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    08:50:34.874 INFO  CollectReadCounts - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    08:50:34.874 INFO  CollectReadCounts - Deflater: IntelDeflater
    08:50:34.874 INFO  CollectReadCounts - Inflater: IntelInflater
    08:50:34.874 INFO  CollectReadCounts - GCS max retries/reopens: 20
    08:50:34.875 INFO  CollectReadCounts - Requester pays: disabled
    08:50:34.875 INFO  CollectReadCounts - Initializing engine
    08:50:35.395 INFO  FeatureManager - Using codec IntervalListCodec to read file file:///datf/sl/users/peter/oncoanalyser/b5/3c129c1b7ef7657db931becbb05d75/genome.interval_list
    08:50:42.804 INFO  IntervalArgumentCollection - Processing 3043969085 bp from intervals
    08:50:42.901 INFO  CollectReadCounts - Done initializing engine
    08:50:42.903 INFO  CollectReadCounts - Shutting down engine
    [November 27, 2025 at 8:50:42 AM GMT] org.broadinstitute.hellbender.tools.copynumber.CollectReadCounts done. Elapsed time: 0.14 minutes.
    Runtime.totalMemory()=2080374784
    java.lang.IllegalArgumentException: The string is null: string must not be null or empty
        at org.broadinstitute.hellbender.utils.Utils.nonNull(Utils.java:643)
        at org.broadinstitute.hellbender.utils.Utils.nonEmpty(Utils.java:699)
        at org.broadinstitute.hellbender.utils.Utils.nonEmpty(Utils.java:715)
        at org.broadinstitute.hellbender.tools.copynumber.formats.metadata.SimpleSampleLocatableMetadata.<init>(SimpleSampleLocatableMetadata.java:18)
        at org.broadinstitute.hellbender.tools.copynumber.formats.metadata.MetadataUtils.fromHeader(MetadataUtils.java:46)
        at org.broadinstitute.hellbender.tools.copynumber.CollectReadCounts.onTraversalStart(CollectReadCounts.java:157)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1117)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:150)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:203)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:222)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
        at org.broadinstitute.hellbender.Main.main(Main.java:306)

    The interval list:

    $ head /datf/sl/users/peter/oncoanalyser/b5/3c129c1b7ef7657db931becbb05d75/genome.interval_list
    @HD    VN:1.6
    @SQ    SN:chr1    LN:248956422    M5:6aef897c3d6ff0c78aff06ac189178dd    AS:38    UR:/seq/references/Homo_sapiens_assembly38/v0/Homo_sapiens_assembly38.fasta    SP:Homo sapiens
    @SQ    SN:chr2    LN:242193529    M5:f98db672eb0993dcfdabafe2a882905c    AS:38    UR:/seq/references/Homo_sapiens_assembly38/v0/Homo_sapiens_assembly38.fasta    SP:Homo sapiens
    @SQ    SN:chr3    LN:198295559    M5:76635a41ea913a405ded820447d067b0    AS:38    UR:/seq/references/Homo_sapiens_assembly38/v0/Homo_sapiens_assembly38.fasta    SP:Homo sapiens
    @SQ    SN:chr4    LN:190214555    M5:3210fecf1eb92d5489da4346b3fddc6e    AS:38    UR:/seq/references/Homo_sapiens_assembly38/v0/Homo_sapiens_assembly38.fasta    SP:Homo sapiens
    @SQ    SN:chr5    LN:181538259    M5:a811b3dc9fe66af729dc0dddf7fa4f13    AS:38    UR:/seq/references/Homo_sapiens_assembly38/v0/Homo_sapiens_assembly38.fasta    SP:Homo sapiens
    @SQ    SN:chr6    LN:170805979    M5:5691468a67c7e7a7b5f2a3a683792c29    AS:38    UR:/seq/references/Homo_sapiens_assembly38/v0/Homo_sapiens_assembly38.fasta    SP:Homo sapiens
    ......

    I do have Read Groups in the bam file, (added with samtools addreplacerg):

    $ samtools view P20911_115_S15_L003_R1_001.bam | head
    A00689:301:HCYTNDSX2:3:2408:30418:36479    81    chr1    9996    0    65S86M    chr7    149581093    0    CACCTTTCGTTATAGGATATTTTAATGATTACAGTGAGAGTGTCTGGTGTTCATACGTTTGCTCTTCCGATATCCCTTACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA    ,,,,,,,,,:,,,F,FF,F,:FFFF,FF,,,FFFF,,F,,F,,,,F,:F,::,,,FFF:F::,FFF:FF:F,:,FFF,F:FFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF    NM:i:2    MD:Z:7A4A73    MC:Z:27S75M49S    AS:i:76    XS:i:75    RG:Z:HCYTNDSX2.3 SM:SLPD003 PL:illumina LB:SLPD003L003
    A00689:301:HCYTNDSX2:3:2516:6162:3615    81    chr1    9996    15    108S43M    chr2    32916518    0    CCCCCCCCCCCCACCCCCCCCCCCAACCCTAACCCTATCCATTAACATTACGCAAACCGATACCCTTTGGCTAACCGTAAGAGTCACCGGATCTCTTACGTTTTCTCTTCCGATAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA    ,:FFF,,:FFF,,FFFF,,FFFF,:,F:F,::F::F,,,,,F,:,,:F,F::,,,,,F:,,,,F:FF:,,,F::,,:F:F,F:F,F,,F,:,,,FF,,:,,FF,F,F:::F,,F,,F,:FF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFF    NM:i:0    MD:Z:43    MC:Z:112S39M    AS:i:43    XS:i:39    RG:Z:HCYTNDSX2.3 SM:SLPD003 PL:illumina LB:SLPD003L003
    A00689:301:HCYTNDSX2:3:2563:8748:7435    81    chr1    9996    14    114S37M    chr2    32916486    0    CCCCCTCCCCCCCCCCCCACCCCCACCCCCCCCCCCACCCCTACTCCTGCCGCTCACGGTATACCTGATCGTACTGCTAACAGTCAGAGTTACTGTAGCCCTGACGGTTTCTCTTCCGATAACCCTAACCCTAACCCTAACCCTAACCCTA    :,FFF,,FFFF,:,FFF,,,FFF,,,FFF,,,FF:,,,F:FF,,,F,,,,,,,F,FFF,,F:,F,,,:,F:,:,,FF,FFF,:FF,,,,F,FF,:,F:,,F:,FF,,,F,F,F:,,F,,F,,FFFFFFFFFFFFFFFFFFFFFFFFFFFFF    NM:i:0    MD:Z:37    MC:Z:106S45M    AS:i:37    XS:i:33    RG:Z:HCYTNDSX2.3 SM:SLPD003 PL:illumina LB:SLPD003L003

    Thanks for any help on this!

    0
    Comment actions Permalink
  • Avatar
    Samuel Lee

    Peter Lönnerberg can you check whether your BAM file has a corresponding @RG line in the header with the appropriate SM tag?


    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk