IlluminaBasecallsToSam - barcode was not expected
Hi!
I am trying to demultiplex some Illumina data using IlluminaBasecallsToSam. These reads have an unusual architecture. Unfortunately I systematically run into the same error whatever I do (see log below):
picard.PicardException: Read records with barcode ACAGCTTAATCGCCGTACTAG, but this barcode was not expected. (Is it referenced in the parameters file?)
It looks like IlluminaBasecallsToSam struggles to make sense of the dual barcoding. It seems to always report an error based on the last barcode in the barcode sample sheet (here ACAGCTTAATCGC+CGTACTAG).
The error has been reported previously here but the solution provided - adding unassigned N N to the barcode sample sheet - doesn't work in my case:
Can someone help me solve this problem ? Thank you in advance!
Java version
java -version
openjdk version "11.0.7" 2020-04-14
OpenJDK Runtime Environment (build 11.0.7+10-post-Ubuntu-2ubuntu219.10)
OpenJDK 64-Bit Server VM (build 11.0.7+10-post-Ubuntu-2ubuntu219.10, mixed mode, sharing)
Picard version: 2.23.0
Illumina folder is ok according to Picard CheckIlluminaDirectory.
Error Message:
[Wed Jun 10 20:54:47 CEST 2020] IlluminaBasecallsToSam BASECALLS_DIR=/home/smith/Desktop/200608_NB551561_0022_AHNFG3BGXF/Data/Intensities/BaseCalls BARCODES_DIR=/home/smith/Desktop/200608_NB551561_0022_AHNFG3BGXF/barcodes LANE=1 RUN_BARCODE=AHNFG3BGXF_08062020 SEQUENCING_CENTER=IOB READ_STRUCTURE=8M13B8B16T78T LIBRARY_PARAMS=/home/smith/Desktop/GITHUB/SampleSheet/LIB_08062020_LIBRARYPARAM_SAMPLESHEET.txt ADAPTERS_TO_CHECK=[INDEXED, DUAL_INDEXED, NEXTERA_V2, FLUIDIGM, INDEXED] NUM_PROCESSORS=1 INCLUDE_NON_PF_READS=false MOLECULAR_INDEX_TAG=RX TMP_DIR=[/home/smith/Desktop/tmp] PLATFORM=ILLUMINA INCLUDE_BC_IN_RG_TAG=false FORCE_GC=true APPLY_EAMSS_FILTER=true MAX_READS_IN_RAM_PER_TILE=1200000 MINIMUM_QUALITY=2 IGNORE_UNEXPECTED_BARCODES=false MOLECULAR_INDEX_BASE_QUALITY_TAG=QX BARCODE_POPULATION_STRATEGY=ORPHANS_ONLY INCLUDE_BARCODE_QUALITY=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Jun 10 20:54:47 CEST 2020] Executing as smith on Linux 5.3.0-1020-gcp amd64; OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu219.10; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.23.0
INFO 2020-06-10 20:54:47 IlluminaBasecallsToSam DONE_READING STRUCTURE IS 8M13B8B16T78T
Exception in thread "pool-2-thread-1" ERROR 2020-06-10 20:54:48 IlluminaBasecallsConverter Failure encountered in worker thread; attempting to shut down remaining worker threads and terminate ...
java.lang.InterruptedException
at java.base/java.lang.Object.wait(Native Method)
at java.base/java.lang.Object.wait(Object.java:328)
at picard.illumina.IlluminaBasecallsConverter$TileReadAggregator.awaitWorkComplete(IlluminaBasecallsConverter.java:609)
at picard.illumina.IlluminaBasecallsConverter.doTileProcessing(IlluminaBasecallsConverter.java:234)
at picard.illumina.IlluminaBasecallsToSam.doWork(IlluminaBasecallsToSam.java:276)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
picard.PicardException: Read records with barcode ACAGCTTAATCGCCGTACTAG, but this barcode was not expected. (Is it referenced in the parameters file?)
at picard.illumina.IlluminaBasecallsConverter$TileProcessingRecord.addRecord(IlluminaBasecallsConverter.java:359)
at picard.illumina.IlluminaBasecallsConverter$TileReader.process(IlluminaBasecallsConverter.java:472)
at picard.illumina.IlluminaBasecallsConverter$TileReadAggregator$1.run(IlluminaBasecallsConverter.java:560)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
[Wed Jun 10 20:54:48 CEST 2020] picard.illumina.IlluminaBasecallsToSam done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2147483648
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" picard.PicardException: Failure encountered in worker thread; see log for details.
at picard.illumina.IlluminaBasecallsConverter.doTileProcessing(IlluminaBasecallsConverter.java:237)
at picard.illumina.IlluminaBasecallsToSam.doWork(IlluminaBasecallsToSam.java:276)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
Exception in thread "pool-2-thread-2" picard.PicardException: IOException opening cluster binary file /home/smith/Desktop/200608_NB551561_0022_AHNFG3BGXF/Data/Intensities/BaseCalls/L001/0001.bcl.bgzf.bci
at picard.illumina.parser.readers.MMapBackedIteratorFactory.getBuffer(MMapBackedIteratorFactory.java:119)
at picard.illumina.parser.readers.MMapBackedIteratorFactory.getLongIterator(MMapBackedIteratorFactory.java:82)
at picard.illumina.parser.readers.BclIndexReader.<init>(BclIndexReader.java:47)
at picard.illumina.parser.readers.BclReader.seek(BclReader.java:267)
at picard.illumina.parser.MultiTileBclParser.makeReader(MultiTileBclParser.java:60)
at picard.illumina.parser.MultiTileBclParser.access$000(MultiTileBclParser.java:38)
at picard.illumina.parser.MultiTileBclParser$MultiTileBclDataCycleFileParser.<init>(MultiTileBclParser.java:134)
at picard.illumina.parser.MultiTileBclParser.makeCycleFileParser(MultiTileBclParser.java:77)
at picard.illumina.parser.MultiTileBclParser.makeCycleFileParser(MultiTileBclParser.java:71)
at picard.illumina.parser.PerTileCycleParser.seekToTile(PerTileCycleParser.java:133)
at picard.illumina.parser.MultiTileBclParser.seekToTile(MultiTileBclParser.java:38)
at picard.illumina.parser.MultiTileBclParser.initialize(MultiTileBclParser.java:53)
at picard.illumina.parser.MultiTileBclParser.<init>(MultiTileBclParser.java:47)
at picard.illumina.parser.IlluminaDataProviderFactory.makeParser(IlluminaDataProviderFactory.java:436)
at picard.illumina.parser.IlluminaDataProviderFactory.makeDataProvider(IlluminaDataProviderFactory.java:292)
at picard.illumina.IlluminaBasecallsConverter$TileReader.process(IlluminaBasecallsConverter.java:463)
at picard.illumina.IlluminaBasecallsConverter$TileReadAggregator$1.run(IlluminaBasecallsConverter.java:560)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.nio.channels.ClosedByInterruptException
at java.base/java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:199)
at java.base/sun.nio.ch.FileChannelImpl.endBlocking(FileChannelImpl.java:162)
at java.base/sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:388)
at picard.illumina.parser.readers.MMapBackedIteratorFactory.getBuffer(MMapBackedIteratorFactory.java:113)
... 19 more
Code:
ExtractIlluminaBarcodes
java -Xmx8g -jar $PICARD ExtractIlluminaBarcodes \
BASECALLS_DIR=$BASECALLDIR \
BARCODE_FILE=$BARCODE_SAMPLESHEET \
READ_STRUCTURE=8M13B8B16T78T \
MAX_MISMATCHES=3 \
NUM_PROCESSORS=16 \
LANE=1 \
OUTPUT_DIR=$RAWDIR/barcodes/ \
METRICS_FILE=$RAWDIR/barcode_metrics.txt
where BARCODE_SAMPLESHEET refers to:
barcode_name library_name barcode_sequence_1 barcode_sequence_2
A_001 sampleA_1 AAGTGATTAGCAA TAAGGCGA
A_002 sampleA_2 AGAATCCCCCTAA TAAGGCGA
A_003 sampleA_3 ACCTGGGAAACTA TAAGGCGA
A_004 sampleA_4 ATACCTCCCAGGA TAAGGCGA
A_005 sampleA_5 AATTTGTGGTATA TAAGGCGA
A_006 sampleA_6 ACCCGAGAGATCA TAAGGCGA
A_007 sampleA_7 AGAGTATAGGGTA TAAGGCGA
A_008 sampleA_8 ATCTTAATTGAGA TAAGGCGA
A_033 sampleA_33 TCAGCTTAATCGC TAAGGCGA
B_001 sampleB_1 AAGTGATTAGCAA CGTACTAG
B_002 sampleB_2 AGAATCCCCCTAA CGTACTAG
B_003 sampleB_3 ACCTGGGAAACTA CGTACTAG
B_004 sampleB_4 ATACCTCCCAGGA CGTACTAG
B_005 sampleB_5 AATTTGTGGTATA CGTACTAG
B_006 sampleB_6 ACCCGAGAGATCA CGTACTAG
B_007 sampleB_7 AGAGTATAGGGTA CGTACTAG
B_008 sampleB_8 ATCTTAATTGAGA CGTACTAG
B_033 sampleB_33 ACAGCTTAATCGC CGTACTAG
Giving results that make sense (metrics file):
## METRICS CLASS picard.illumina.ExtractIlluminaBarcodes$BarcodeMetric
BARCODE BARCODE_WITHOUT_DELIMITER BARCODE_NAME LIBRARY_NAME READS PF_READS PERFECT_MATCHES PF_PERFECT_MATCHES ONE_MISMATCH_MATCHES PF_ONE_MISMATCH_MATCHES PCT_MATCHES RATIO_THIS_BARCODE_TO_BEST_BARCODE_PCT PF_PCT_MATCHES PF_RATIO_THIS_BARCODE_TO_BEST_BARCODE_PCT PF_NORMALIZED_MATCHES
AAGTGATTAGCAA-TAAGGCGA AAGTGATTAGCAATAAGGCGA A_001 sampleA_1 8962093 8962093 4995594 4995594 2176873 2176873 0.056937 0.794616 0.06801 0.794616 1.287568
AGAATCCCCCTAA-TAAGGCGA AGAATCCCCCTAATAAGGCGA A_002 sampleA_2 11278526 11278526 5728142 5728142 2674654 2674654 0.071653 1 0.085588 1 1.620365
ACCTGGGAAACTA-TAAGGCGA ACCTGGGAAACTATAAGGCGA A_003 sampleA_3 8239697 8239697 1699871 1699871 3197162 3197162 0.052347 0.730565 0.062528 0.730565 1.183782
ATACCTCCCAGGA-TAAGGCGA ATACCTCCCAGGATAAGGCGA A_004 sampleA_4 9201382 9201382 3129099 3129099 2384232 2384232 0.058457 0.815832 0.069826 0.815832 1.321946
AATTTGTGGTATA-TAAGGCGA AATTTGTGGTATATAAGGCGA A_005 sampleA_5 10593798 10593798 5810101 5810101 2735450 2735450 0.067303 0.939289 0.080392 0.939289 1.521992
ACCCGAGAGATCA-TAAGGCGA ACCCGAGAGATCATAAGGCGA A_006 sampleA_6 8814476 8814476 1532248 1532248 4126522 4126522 0.055999 0.781527 0.06689 0.781527 1.26636
AGAGTATAGGGTA-TAAGGCGA AGAGTATAGGGTATAAGGCGA A_007 sampleA_7 3639243 3639243 1855789 1855789 1164088 1164088 0.02312 0.32267 0.027617 0.32267 0.522843
ATCTTAATTGAGA-TAAGGCGA ATCTTAATTGAGATAAGGCGA A_008 sampleA_8 4336915 4336915 2180522 2180522 1062017 1062017 0.027553 0.384529 0.032911 0.384529 0.623077
TCAGCTTAATCGC-TAAGGCGA TCAGCTTAATCGCTAAGGCGA A_033 sampleA_33 1238 1238 123 123 155 155 0.000008 0.00011 0.000009 0.00011 0.000178
AAGTGATTAGCAA-CGTACTAG AAGTGATTAGCAACGTACTAG B_001 sampleB_1 6380036 6380036 3562844 3562844 1512472 1512472 0.040533 0.56568 0.048416 0.56568 0.916608
AGAATCCCCCTAA-CGTACTAG AGAATCCCCCTAACGTACTAG B_002 sampleB_2 10055160 10055160 5141138 5141138 2355990 2355990 0.063881 0.891531 0.076305 0.891531 1.444607
ACCTGGGAAACTA-CGTACTAG ACCTGGGAAACTACGTACTAG B_003 sampleB_3 6529373 6529373 1330831 1330831 2489997 2489997 0.041481 0.578921 0.049549 0.578921 0.938063
ATACCTCCCAGGA-CGTACTAG ATACCTCCCAGGACGTACTAG B_004 sampleB_4 6607731 6607731 2261297 2261297 1716711 1716711 0.041979 0.585868 0.050143 0.585868 0.949321
AATTTGTGGTATA-CGTACTAG AATTTGTGGTATACGTACTAG B_005 sampleB_5 8886022 8886022 4890866 4890866 2230611 2230611 0.056453 0.787871 0.067433 0.787871 1.276639
ACCCGAGAGATCA-CGTACTAG ACCCGAGAGATCACGTACTAG B_006 sampleB_6 4125193 4125193 704612 704612 1944509 1944509 0.026208 0.365756 0.031304 0.365756 0.592659
AGAGTATAGGGTA-CGTACTAG AGAGTATAGGGTACGTACTAG B_007 sampleB_7 4180357 4180357 2163069 2163069 1297418 1297418 0.026558 0.370647 0.031723 0.370647 0.600584
ATCTTAATTGAGA-CGTACTAG ATCTTAATTGAGACGTACTAG B_008 sampleB_8 6716766 6716766 3398337 3398337 1631045 1631045 0.042672 0.595536 0.050971 0.595536 0.964986
ACAGCTTAATCGC-CGTACTAG ACAGCTTAATCGCCGTACTAG B_033 sampleB_33 6740693 6740693 1026 1026 4829987 4829987 0.042824 0.597657 0.051152 0.597657 0.968423
NNNNNNNNNNNNN-NNNNNNNN NNNNNNNNNNNNNNNNNNNNN 32116068 6487769 0 0 0 0 0.204035 2.847541 0.049233 0.575232 0
and subset of barcodes/s_1_11101_barcode.txt
AGACAACCCA........... N atacctcccaggataaggcga 3 3
AATTTGTGGTATATAAGGCGA Y AATTTGTGGTATATAAGGCGA 0 2
CCCTGGGAAACTATAAGGCGA Y ACCTGGGAAACTATAAGGCGA 1 3
CCCCGAGAGATCATAAGGCGA Y ACCCGAGAGATCATAAGGCGA 1 3
GGAATCCCCCTAGCGTACTAG Y AGAATCCCCCTAACGTACTAG 2 4
IlluminaBasecallsToSam
java -Xmx10G -jar $PICARD IlluminaBasecallsToSam \
NUM_PROCESSORS=1 \
BASECALLS_DIR=$BASECALLDIR \
BARCODES_DIR=$RAWDIR/barcodes/ \
LANE=1 \
READ_STRUCTURE=8M13B8B16T78T \
RUN_BARCODE=AHNFG3BGXF \
LIBRARY_PARAMS=$LIBRARYPARAM_SAMPLESHEET \
TMP_DIR=/home/smith/Desktop/tmp/ \
MOLECULAR_INDEX_TAG=RX \
ADAPTERS_TO_CHECK=INDEXED \
INCLUDE_NON_PF_READS=false \
SEQUENCING_CENTER=Test
where LIBRARYPARAM_SAMPLESHEET refers to
OUTPUT SAMPLE_ALIAS LIBRARY_NAME BARCODE_1 BARCODE_2
sampleA_1.unmapped.bam sampleA_1 sampleA AAGTGATTAGCAA TAAGGCGA
sampleA_2.unmapped.bam sampleA_2 sampleA AGAATCCCCCTAA TAAGGCGA
sampleA_3.unmapped.bam sampleA_3 sampleA ACCTGGGAAACTA TAAGGCGA
sampleA_4.unmapped.bam sampleA_4 sampleA ATACCTCCCAGGA TAAGGCGA
sampleA_5.unmapped.bam sampleA_5 sampleA AATTTGTGGTATA TAAGGCGA
sampleA_6.unmapped.bam sampleA_6 sampleA ACCCGAGAGATCA TAAGGCGA
sampleA_7.unmapped.bam sampleA_7 sampleA AGAGTATAGGGTA TAAGGCGA
sampleA_8.unmapped.bam sampleA_8 sampleA ATCTTAATTGAGA TAAGGCGA
sampleA_33.unmapped.bam sampleA_33 sampleA TCAGCTTAATCGC TAAGGCGA
sampleB_1.unmapped.bam sampleB_1 sampleB AAGTGATTAGCAA CGTACTAG
sampleB_2.unmapped.bam sampleB_2 sampleB AGAATCCCCCTAA CGTACTAG
sampleB_3.unmapped.bam sampleB_3 sampleB ACCTGGGAAACTA CGTACTAG
sampleB_4.unmapped.bam sampleB_4 sampleB ATACCTCCCAGGA CGTACTAG
sampleB_5.unmapped.bam sampleB_5 sampleB AATTTGTGGTATA CGTACTAG
sampleB_6.unmapped.bam sampleB_6 sampleB ACCCGAGAGATCA CGTACTAG
sampleB_7.unmapped.bam sampleB_7 sampleB AGAGTATAGGGTA CGTACTAG
sampleB_8.unmapped.bam sampleB_8 sampleB ATCTTAATTGAGA CGTACTAG
sampleB_33.unmapped.bam sampleB_33 sampleB TCAGCTTAATCGC CGTACTAG
unassiged.unmapped.bam unassiged unassiged_lib N N
-
Official comment
Hi Vincent Hahaut thanks for your question.
It looks like you may be missing the barcode in LIBRARYPARAM_SAMPLESHEET.
The LIBRARYPARAM_SAMPLESHEET needs to have the barcodes associated with the reads in the input. This is required so that the tool can assign the appropriate sample names to the reads based on the barcodes in the LIBRARYPARAM_SAMPLESHEET.Comment actions
Please sign in to leave a comment.
1 comment