Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

IlluminaBasecallsToSam does not demultiplex NovaSeq barcoded reads

Answered
0

5 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hello myourshaw,

    You are comparing a few different PICARD tools and how they handle this data. It looks like IlluminaBaseCallsToSam and IlluminaBasecallingMetrics both have issues detecting the barcodes. So do you think that the issue is with both of those tools, while IlluminaBarcodesMetrics doesn't have the problem?

    Also, is your suggestion that the problem is with the detection of the barcodes?

    Thank you,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    myourshaw

    My guess is that the problem starts with IlluminaBaseCallsToSam. And the problem is with matching the barcodes in the BASECALS_DIR/BARCODES_DIR with those in the LibraryParams.

    0
    Comment actions Permalink
  • Avatar
    myourshaw

    Additional information.

    I tried to demultiplex the same data with IlluminaBaseCallsToFastq, which had the same falure to demultiplex. All reads were in the UNKNOWN fastq files. No reads were in any of the barcode fastq files.

    Command:

    docker run -it -v /mnt/hdd/dnanexus/tmp/0062:/home/dnanexus myourshaw/gatk:latest
    cd /home/dnanexus
    gatk --java-options -Xmx26g \
    IlluminaBasecallsToFastq \
    --READ_STRUCTURE 151T8B9M8B151T \
    --BASECALLS_DIR /home/dnanexus/210323_A00886_0062_BH5JM3DRXY/Data/Intensities/BaseCalls \
    --LANE 1 \
    --MULTIPLEX_PARAMS /home/dnanexus/210323_A00886_0062_BH5JM3DRXY.1.MultiplexParams.txt \
    --RUN_BARCODE 210323_A00886_0062_BH5JM3DRXY \
    --FLOWCELL_BARCODE BH5JM3DRXY \
    --APPLY_EAMSS_FILTER false \
    --BARCODES_DIR /home/dnanexus/barcodes \
    --IGNORE_UNEXPECTED_BARCODES false \
    --INCLUDE_NON_PF_READS false \
    --MACHINE_NAME A00886 \
    --NUM_PROCESSORS 1 \
    ;

    Log:

    Using GATK jar /gatk/gatk-package-4.2.0.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx26g -jar /gatk/gatk-package-4.2.0.0-local.jar IlluminaBasecallsToFastq --READ_STRUCTURE 151T8B9M8B151T --BASECALLS_DIR /home/dnanexus/210323_A00886_0062_BH5JM3DRXY/Data/Intensities/BaseCalls --LANE 1 --MULTIPLEX_PARAMS /home/dnanexus/210323_A00886_0062_BH5JM3DRXY.1.MultiplexParams.txt --RUN_BARCODE 210323_A00886_0062_BH5JM3DRXY --FLOWCELL_BARCODE BH5JM3DRXY --APPLY_EAMSS_FILTER false --BARCODES_DIR /home/dnanexus/barcodes --IGNORE_UNEXPECTED_BARCODES false --INCLUDE_NON_PF_READS false --MACHINE_NAME A00886 --NUM_PROCESSORS 1
    WARNING 2021-05-12 15:16:29 IlluminaBasecallsToFastq ADAPTERS_TO_CHECK is not used
    15:16:29.833 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    [Wed May 12 15:16:29 GMT 2021] IlluminaBasecallsToFastq --BASECALLS_DIR /home/dnanexus/210323_A00886_0062_BH5JM3DRXY/Data/Intensities/BaseCalls --BARCODES_DIR /home/dnanexus/barcodes --LANE 1 --RUN_BARCODE 210323_A00886_0062_BH5JM3DRXY --MACHINE_NAME A00886 --FLOWCELL_BARCODE BH5JM3DRXY --READ_STRUCTURE 151T8B9M8B151T --MULTIPLEX_PARAMS /home/dnanexus/210323_A00886_0062_BH5JM3DRXY.1.MultiplexParams.txt --NUM_PROCESSORS 1 --APPLY_EAMSS_FILTER false --INCLUDE_NON_PF_READS false --IGNORE_UNEXPECTED_BARCODES false --FORCE_GC true --SORT true --MAX_READS_IN_RAM_PER_TILE -1 --MINIMUM_QUALITY 2 --READ_NAME_FORMAT CASAVA_1_8 --COMPRESS_OUTPUTS false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
    May 12, 2021 3:16:30 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    [Wed May 12 15:16:30 GMT 2021] Executing as root@2096fc387ca2 on Linux 4.18.0-240.1.1.el8_3.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.2.0.0
    INFO 2021-05-12 15:16:30 IlluminaBasecallsToFastq READ STRUCTURE IS 151T8B9M8B151T
    INFO 2021-05-12 15:16:30 CbclReader Processing tile 2101
    INFO 2021-05-12 15:17:06 UnsortedBasecallsConverter Read 1,000,000 records. Elapsed time: 00:00:35s. Time for last 1,000,000: 17s. Last read position: */*
    INFO 2021-05-12 15:17:23 UnsortedBasecallsConverter Read 2,000,000 records. Elapsed time: 00:00:53s. Time for last 1,000,000: 17s. Last read position: */*
    INFO 2021-05-12 15:17:41 CbclReader Processing tile 2102
    INFO 2021-05-12 15:17:51 UnsortedBasecallsConverter Read 3,000,000 records. Elapsed time: 00:01:21s. Time for last 1,000,000: 28s. Last read position: */*
    INFO 2021-05-12 15:17:53 UnsortedBasecallsConverter Write 1,000,000 records. Elapsed time: 00:01:23s. Time for last 1,000,000: 12s. Last read position: */*
    INFO 2021-05-12 15:18:04 UnsortedBasecallsConverter Write 2,000,000 records. Elapsed time: 00:01:33s. Time for last 1,000,000: 10s. Last read position: */*
    INFO 2021-05-12 15:18:09 UnsortedBasecallsConverter Read 4,000,000 records. Elapsed time: 00:01:39s. Time for last 1,000,000: 17s. Last read position: */*
    INFO 2021-05-12 15:18:27 UnsortedBasecallsConverter Read 5,000,000 records. Elapsed time: 00:01:56s. Time for last 1,000,000: 17s. Last read position: */*
    ...
    INFO 2021-05-12 17:47:28 CbclReader Processing tile 2278
    INFO 2021-05-12 17:47:33 UnsortedBasecallsConverter Write 434,000,000 records. Elapsed time: 02:31:03s. Time for last 1,000,000: 37s. Last read position: */*
    INFO 2021-05-12 17:47:44 UnsortedBasecallsConverter Read 437,000,000 records. Elapsed time: 02:31:13s. Time for last 1,000,000: 26s. Last read position: */*
    INFO 2021-05-12 17:47:44 UnsortedBasecallsConverter Write 435,000,000 records. Elapsed time: 02:31:13s. Time for last 1,000,000: 10s. Last read position: */*
    INFO 2021-05-12 17:47:54 UnsortedBasecallsConverter Write 436,000,000 records. Elapsed time: 02:31:24s. Time for last 1,000,000: 10s. Last read position: */*
    INFO 2021-05-12 17:48:01 UnsortedBasecallsConverter Read 438,000,000 records. Elapsed time: 02:31:31s. Time for last 1,000,000: 17s. Last read position: */*
    INFO 2021-05-12 17:48:19 UnsortedBasecallsConverter Read 439,000,000 records. Elapsed time: 02:31:48s. Time for last 1,000,000: 17s. Last read position: */*
    INFO 2021-05-12 17:48:31 UnsortedBasecallsConverter Write 437,000,000 records. Elapsed time: 02:32:01s. Time for last 1,000,000: 36s. Last read position: */*
    INFO 2021-05-12 17:48:41 UnsortedBasecallsConverter Write 438,000,000 records. Elapsed time: 02:32:11s. Time for last 1,000,000: 9s. Last read position: */*
    INFO 2021-05-12 17:48:51 UnsortedBasecallsConverter Write 439,000,000 records. Elapsed time: 02:32:20s. Time for last 1,000,000: 9s. Last read position: */*
    [Wed May 12 17:48:55 GMT 2021] picard.illumina.IlluminaBasecallsToFastq done. Elapsed time: 152.44 minutes.
    Runtime.totalMemory()=9941024768
    Tool returned:
    0
    0
    Comment actions Permalink
  • Avatar
    myourshaw

    This problem appears to be a bug in the version of Picard that is contained in GATK 4.2.0.0.

    When will GATK be updated with this fix?

    "This bug has been fixed in Picard release https://github.com/broadinstitute/picard/releases/tag/2.25.4 - The version of gatk that you are using (4.2.0.0) was packaged with Picard https://github.com/broadinstitute/picard/releases/tag/2.25.0 in it (which has the bug)."

    See https://github.com/broadinstitute/picard/issues/1679 

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi myourshaw,

    Glad you were able to find the problem! We made a pull request to incorporate the newest Picard changes into the next release of GATK: https://github.com/broadinstitute/gatk/pull/7255

    Once that pull request is merged, you will be able to use GATK with the nightly docker build or build GATK yourself. We are planning to release the next version of GATK in the next few weeks and this pull request should be accessible in that version as well.

    Best,

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk