IlluminaBasecallsToSam does not demultiplex NovaSeq barcoded reads
AnsweredIf you are seeing an error, please provide(REQUIRED) :
a) GATK version used: 4.2.0.0
b) Exact command used:
gatk --java-options -Xmx26g \
IlluminaBasecallsToSam \
--BASECALLS_DIR /home/dnanexus/210323_A00886_0062_BH5JM3DRXY/Data/Intensities/BaseCalls \
--LANE 1 \
--LIBRARY_PARAMS /home/dnanexus/210323_A00886_0062_BH5JM3DRXY.1.LibraryParams.txt \
--READ_STRUCTURE 151T8B9M8B151T \
--RUN_BARCODE 210323_A00886_0062_BH5JM3DRXY \
--ADAPTERS_TO_CHECK PAIRED_END \
--ADAPTERS_TO_CHECK INDEXED \
--ADAPTERS_TO_CHECK DUAL_INDEXED \
--APPLY_EAMSS_FILTER false \
--BARCODES_DIR /home/dnanexus/barcodes \
--IGNORE_UNEXPECTED_BARCODES false \
--INCLUDE_NON_PF_READS false \
--PLATFORM ILLUMINA \
--READ_GROUP_ID 210323_A00886_0062_BH5JM3DRXY.1 \
--RUN_START_DATE 2021/03/23 \
--SEQUENCING_CENTER cmoco@cuanschutz.edu \
--NUM_PROCESSORS 1 \
;
c) Entire error log:
+ gatk --java-options -Xmx26g IlluminaBasecallsToSam --BASECALLS_DIR 210323_A00886_0062_BH5JM3DRXY/Data/Intensities/BaseCalls --LANE 1 --LIBRARY_PARAMS /home/dnanexus/inputs/input8146229603807897736/210323_A00886_0062_BH5JM3DRXY.1.LibraryParams.txt --READ_STRUCTURE 151T8B9M8B151T --RUN_BARCODE 210323_A00886_0062_BH5JM3DRXY --ADAPTERS_TO_CHECK PAIRED_END --ADAPTERS_TO_CHECK INDEXED --ADAPTERS_TO_CHECK DUAL_INDEXED --APPLY_EAMSS_FILTER false --BARCODES_DIR barcodes --IGNORE_UNEXPECTED_BARCODES false --INCLUDE_NON_PF_READS false --PLATFORM ILLUMINA --READ_GROUP_ID 210323_A00886_0062_BH5JM3DRXY.1 --RUN_START_DATE 2021/03/23 --SEQUENCING_CENTER cmoco@cuanschutz.edu --NUM_PROCESSORS 1
NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Mon May 03 01:03:48 GMT 2021] IlluminaBasecallsToSam --BASECALLS_DIR 210323_A00886_0062_BH5JM3DRXY/Data/Intensities/BaseCalls --BARCODES_DIR barcodes --LANE 1 --RUN_BARCODE 210323_A00886_0062_BH5JM3DRXY --READ_GROUP_ID 210323_A00886_0062_BH5JM3DRXY.1 --SEQUENCING_CENTER cmoco@cuanschutz.edu --RUN_START_DATE Tue Mar 23 00:00:00 GMT 2021 --PLATFORM ILLUMINA --READ_STRUCTURE 151T8B9M8B151T --LIBRARY_PARAMS /home/dnanexus/inputs/input8146229603807897736/210323_A00886_0062_BH5JM3DRXY.1.LibraryParams.txt --ADAPTERS_TO_CHECK INDEXED --ADAPTERS_TO_CHECK DUAL_INDEXED --ADAPTERS_TO_CHECK NEXTERA_V2 --ADAPTERS_TO_CHECK FLUIDIGM --ADAPTERS_TO_CHECK PAIRED_END --ADAPTERS_TO_CHECK INDEXED --ADAPTERS_TO_CHECK DUAL_INDEXED --NUM_PROCESSORS 1 --APPLY_EAMSS_FILTER false --INCLUDE_NON_PF_READS false --IGNORE_UNEXPECTED_BARCODES false --INCLUDE_BC_IN_RG_TAG false --MAX_READS_IN_RAM_PER_TILE -1 --MINIMUM_QUALITY 2 --MOLECULAR_INDEX_TAG RX --MOLECULAR_INDEX_BASE_QUALITY_TAG QX --BARCODE_POPULATION_STRATEGY ORPHANS_ONLY --INCLUDE_BARCODE_QUALITY false --SORT true --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
May 03, 2021 1:03:48 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
Executing as root@job-G27ZFFj05V61858y1vv1xgv0 on Linux 5.4.0-1045-aws amd64; OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.2.0.0
INFO 2021-05-03 01:03:49 IlluminaBasecallsToSam DONE_READING STRUCTURE IS 151T8B9M8B151T
INFO 2021-05-03 01:03:50 CbclReader Processing tile 2101
INFO 2021-05-03 01:04:20 UnsortedBasecallsConverter Read 1,000,000 records. Elapsed time: 00:00:31s. Time for last 1,000,000: 23s. Last read position: */*
INFO 2021-05-03 01:04:41 UnsortedBasecallsConverter Read 2,000,000 records. Elapsed time: 00:00:52s. Time for last 1,000,000: 21s. Last read position: */*
INFO 2021-05-03 01:05:01 CbclReader Processing tile 2102
...
INFO 2021-05-03 03:59:38 UnsortedBasecallsConverter Read 437,000,000 records. Elapsed time: 02:55:49s. Time for last 1,000,000: 28s. Last read position: */*
INFO 2021-05-03 04:00:00 UnsortedBasecallsConverter Read 438,000,000 records. Elapsed time: 02:56:11s. Time for last 1,000,000: 22s. Last read position: */*
INFO 2021-05-03 04:00:22 UnsortedBasecallsConverter Read 439,000,000 records. Elapsed time: 02:56:33s. Time for last 1,000,000: 22s. Last read position: */*
INFO 2021-05-03 04:00:32 UnsortedBasecallsConverter Write 437,000,000 records. Elapsed time: 02:56:43s. Time for last 1,000,000: 65s. Last read position: */*
INFO 2021-05-03 04:00:35 UnsortedBasecallsConverter Write 438,000,000 records. Elapsed time: 02:56:46s. Time for last 1,000,000: 2s. Last read position: */*
INFO 2021-05-03 04:00:38 UnsortedBasecallsConverter Write 439,000,000 records. Elapsed time: 02:56:49s. Time for last 1,000,000: 2s. Last read position: */*
2021-05-02 22:00:41 IlluminaBasecallsToSam:body STDOUT Tool returned:
[Mon May 03 04:00:41 GMT 2021] picard.illumina.IlluminaBasecallsToSam done. Elapsed time: 176.88 minutes.
Runtime.totalMemory()=10531897344
If not an error, choose a category for your question(REQUIRED):
a)How do I (......)?
b) What does (......) mean?
c) Why do I see (no demultiplexed reads from NovaSeq but not NextSeq)?
d) Where do I find (......)?
e) Will (......) be in future releases?
IlluminaBasecallsToSam does not demultiplex NovaSeq barcoded reads. bcl2fastq successfully demultiplexes the same Illumina run directory.
The library is a somatic panel using barcodes from the IDX xGen Dual UMI Adapters.
With a smaller library sequenced on a NextSeq that contains a subset of the same adapters, IlluminaBasecallsToSam successfully demultiplexes the reads.
With NovaSeq data:
- IlluminaBarcodesMetrics successfully recognizes the barcodes and counts the expected large number of reads for each.
- IlluminaBasecallsToSam does not demultiplex the barcodes and places all reads in the UNKNOWN uBAM file.
- IlluminaBasecallingMetrics does not recognize the barcodes and counts 0 bases for each.
- Inspection of barcodes (BC tag) on reads in the UNKNOWN uBAM file shows the correct expected barcodes.
- Successful demultiplexing by bcl2fastq suggests there may be a problem with IlluminaBasecallsToSam.
- Successful demultiplexing of a NextSeq library suggests that the problem is specific to NovaSeq.
I'll be happy to send additional details if needed.
Thanks,
Michael
-
Hello myourshaw,
You are comparing a few different PICARD tools and how they handle this data. It looks like IlluminaBaseCallsToSam and IlluminaBasecallingMetrics both have issues detecting the barcodes. So do you think that the issue is with both of those tools, while IlluminaBarcodesMetrics doesn't have the problem?
Also, is your suggestion that the problem is with the detection of the barcodes?
Thank you,
Genevieve
-
My guess is that the problem starts with IlluminaBaseCallsToSam. And the problem is with matching the barcodes in the BASECALS_DIR/BARCODES_DIR with those in the LibraryParams.
-
Additional information.
I tried to demultiplex the same data with IlluminaBaseCallsToFastq, which had the same falure to demultiplex. All reads were in the UNKNOWN fastq files. No reads were in any of the barcode fastq files.
Command:
docker run -it -v /mnt/hdd/dnanexus/tmp/0062:/home/dnanexus myourshaw/gatk:latest
cd /home/dnanexus
gatk --java-options -Xmx26g \
IlluminaBasecallsToFastq \
--READ_STRUCTURE 151T8B9M8B151T \
--BASECALLS_DIR /home/dnanexus/210323_A00886_0062_BH5JM3DRXY/Data/Intensities/BaseCalls \
--LANE 1 \
--MULTIPLEX_PARAMS /home/dnanexus/210323_A00886_0062_BH5JM3DRXY.1.MultiplexParams.txt \
--RUN_BARCODE 210323_A00886_0062_BH5JM3DRXY \
--FLOWCELL_BARCODE BH5JM3DRXY \
--APPLY_EAMSS_FILTER false \
--BARCODES_DIR /home/dnanexus/barcodes \
--IGNORE_UNEXPECTED_BARCODES false \
--INCLUDE_NON_PF_READS false \
--MACHINE_NAME A00886 \
--NUM_PROCESSORS 1 \
;Log:
Using GATK jar /gatk/gatk-package-4.2.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx26g -jar /gatk/gatk-package-4.2.0.0-local.jar IlluminaBasecallsToFastq --READ_STRUCTURE 151T8B9M8B151T --BASECALLS_DIR /home/dnanexus/210323_A00886_0062_BH5JM3DRXY/Data/Intensities/BaseCalls --LANE 1 --MULTIPLEX_PARAMS /home/dnanexus/210323_A00886_0062_BH5JM3DRXY.1.MultiplexParams.txt --RUN_BARCODE 210323_A00886_0062_BH5JM3DRXY --FLOWCELL_BARCODE BH5JM3DRXY --APPLY_EAMSS_FILTER false --BARCODES_DIR /home/dnanexus/barcodes --IGNORE_UNEXPECTED_BARCODES false --INCLUDE_NON_PF_READS false --MACHINE_NAME A00886 --NUM_PROCESSORS 1
WARNING 2021-05-12 15:16:29 IlluminaBasecallsToFastq ADAPTERS_TO_CHECK is not used
15:16:29.833 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed May 12 15:16:29 GMT 2021] IlluminaBasecallsToFastq --BASECALLS_DIR /home/dnanexus/210323_A00886_0062_BH5JM3DRXY/Data/Intensities/BaseCalls --BARCODES_DIR /home/dnanexus/barcodes --LANE 1 --RUN_BARCODE 210323_A00886_0062_BH5JM3DRXY --MACHINE_NAME A00886 --FLOWCELL_BARCODE BH5JM3DRXY --READ_STRUCTURE 151T8B9M8B151T --MULTIPLEX_PARAMS /home/dnanexus/210323_A00886_0062_BH5JM3DRXY.1.MultiplexParams.txt --NUM_PROCESSORS 1 --APPLY_EAMSS_FILTER false --INCLUDE_NON_PF_READS false --IGNORE_UNEXPECTED_BARCODES false --FORCE_GC true --SORT true --MAX_READS_IN_RAM_PER_TILE -1 --MINIMUM_QUALITY 2 --READ_NAME_FORMAT CASAVA_1_8 --COMPRESS_OUTPUTS false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
May 12, 2021 3:16:30 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
[Wed May 12 15:16:30 GMT 2021] Executing as root@2096fc387ca2 on Linux 4.18.0-240.1.1.el8_3.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.2.0.0
INFO 2021-05-12 15:16:30 IlluminaBasecallsToFastq READ STRUCTURE IS 151T8B9M8B151T
INFO 2021-05-12 15:16:30 CbclReader Processing tile 2101
INFO 2021-05-12 15:17:06 UnsortedBasecallsConverter Read 1,000,000 records. Elapsed time: 00:00:35s. Time for last 1,000,000: 17s. Last read position: */*
INFO 2021-05-12 15:17:23 UnsortedBasecallsConverter Read 2,000,000 records. Elapsed time: 00:00:53s. Time for last 1,000,000: 17s. Last read position: */*
INFO 2021-05-12 15:17:41 CbclReader Processing tile 2102
INFO 2021-05-12 15:17:51 UnsortedBasecallsConverter Read 3,000,000 records. Elapsed time: 00:01:21s. Time for last 1,000,000: 28s. Last read position: */*
INFO 2021-05-12 15:17:53 UnsortedBasecallsConverter Write 1,000,000 records. Elapsed time: 00:01:23s. Time for last 1,000,000: 12s. Last read position: */*
INFO 2021-05-12 15:18:04 UnsortedBasecallsConverter Write 2,000,000 records. Elapsed time: 00:01:33s. Time for last 1,000,000: 10s. Last read position: */*
INFO 2021-05-12 15:18:09 UnsortedBasecallsConverter Read 4,000,000 records. Elapsed time: 00:01:39s. Time for last 1,000,000: 17s. Last read position: */*
INFO 2021-05-12 15:18:27 UnsortedBasecallsConverter Read 5,000,000 records. Elapsed time: 00:01:56s. Time for last 1,000,000: 17s. Last read position: */*
...
INFO 2021-05-12 17:47:28 CbclReader Processing tile 2278
INFO 2021-05-12 17:47:33 UnsortedBasecallsConverter Write 434,000,000 records. Elapsed time: 02:31:03s. Time for last 1,000,000: 37s. Last read position: */*
INFO 2021-05-12 17:47:44 UnsortedBasecallsConverter Read 437,000,000 records. Elapsed time: 02:31:13s. Time for last 1,000,000: 26s. Last read position: */*
INFO 2021-05-12 17:47:44 UnsortedBasecallsConverter Write 435,000,000 records. Elapsed time: 02:31:13s. Time for last 1,000,000: 10s. Last read position: */*
INFO 2021-05-12 17:47:54 UnsortedBasecallsConverter Write 436,000,000 records. Elapsed time: 02:31:24s. Time for last 1,000,000: 10s. Last read position: */*
INFO 2021-05-12 17:48:01 UnsortedBasecallsConverter Read 438,000,000 records. Elapsed time: 02:31:31s. Time for last 1,000,000: 17s. Last read position: */*
INFO 2021-05-12 17:48:19 UnsortedBasecallsConverter Read 439,000,000 records. Elapsed time: 02:31:48s. Time for last 1,000,000: 17s. Last read position: */*
INFO 2021-05-12 17:48:31 UnsortedBasecallsConverter Write 437,000,000 records. Elapsed time: 02:32:01s. Time for last 1,000,000: 36s. Last read position: */*
INFO 2021-05-12 17:48:41 UnsortedBasecallsConverter Write 438,000,000 records. Elapsed time: 02:32:11s. Time for last 1,000,000: 9s. Last read position: */*
INFO 2021-05-12 17:48:51 UnsortedBasecallsConverter Write 439,000,000 records. Elapsed time: 02:32:20s. Time for last 1,000,000: 9s. Last read position: */*
[Wed May 12 17:48:55 GMT 2021] picard.illumina.IlluminaBasecallsToFastq done. Elapsed time: 152.44 minutes.
Runtime.totalMemory()=9941024768
Tool returned:
0 -
This problem appears to be a bug in the version of Picard that is contained in GATK 4.2.0.0.
When will GATK be updated with this fix?
"This bug has been fixed in Picard release https://github.com/broadinstitute/picard/releases/tag/2.25.4 - The version of gatk that you are using (4.2.0.0) was packaged with Picard https://github.com/broadinstitute/picard/releases/tag/2.25.0 in it (which has the bug)."
-
Hi myourshaw,
Glad you were able to find the problem! We made a pull request to incorporate the newest Picard changes into the next release of GATK: https://github.com/broadinstitute/gatk/pull/7255
Once that pull request is merged, you will be able to use GATK with the nightly docker build or build GATK yourself. We are planning to release the next version of GATK in the next few weeks and this pull request should be accessible in that version as well.
Best,
Genevieve
Please sign in to leave a comment.
5 comments