[Repost] ReadGroup missing when Applying BQSR
AnsweredDear all,
I am currently using gatk4.1.9.0 to run BQSR on my sorted and deduped bam files, which I have had no problems with my previous 2 files, but having the error as stated below:
Command inserted:
./gatk ApplyBQSR -R cf3.1toplevel.fa -I Kuu_sorted_dedup.bam -bqsr recal_kuu.table -O recal_reads_kuu.bam
Error log:
$ ./gatk ApplyBQSR -R cf3.1toplevel.fa -I Kuu_sorted_dedup.bam -bqsr recal_kuu.table -O recal_reads_kuu.bam
Using GATK jar /mnt/d/Docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /mnt/d/Docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar ApplyBQSR -R cf3.1toplevel.fa -I Kuu_sorted_dedup.bam -bqsr recal_kuu.table -O recal_reads_kuu.bam
12:30:09.851 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/d/Docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jan 01, 2021 12:30:09 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
12:30:09.993 INFO ApplyBQSR - ------------------------------------------------------------
12:30:09.993 INFO ApplyBQSR - The Genome Analysis Toolkit (GATK) v4.1.9.0
12:30:09.993 INFO ApplyBQSR - For support and documentation go to https://software.broadinstitute.org/gatk/
12:30:09.993 INFO ApplyBQSR - Executing as naika@DESKTOP-TMKR2LT on Linux v4.19.128-microsoft-standard amd64
12:30:09.993 INFO ApplyBQSR - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
12:30:09.993 INFO ApplyBQSR - Start Date/Time: January 1, 2021 12:30:09 PM JST
12:30:09.993 INFO ApplyBQSR - ------------------------------------------------------------
12:30:09.993 INFO ApplyBQSR - ------------------------------------------------------------
12:30:09.994 INFO ApplyBQSR - HTSJDK Version: 2.23.0
12:30:09.994 INFO ApplyBQSR - Picard Version: 2.23.3
12:30:09.994 INFO ApplyBQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 2
12:30:09.994 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
12:30:09.994 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
12:30:09.994 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
12:30:09.994 INFO ApplyBQSR - Deflater: IntelDeflater
12:30:09.994 INFO ApplyBQSR - Inflater: IntelInflater
12:30:09.994 INFO ApplyBQSR - GCS max retries/reopens: 20
12:30:09.994 INFO ApplyBQSR - Requester pays: disabled
12:30:09.994 INFO ApplyBQSR - Initializing engine
12:30:10.354 INFO ApplyBQSR - Done initializing engine
12:30:10.407 INFO ProgressMeter - Starting traversal
12:30:10.407 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
12:30:10.619 INFO ApplyBQSR - Shutting down engine
[January 1, 2021 12:30:10 PM JST] org.broadinstitute.hellbender.tools.walkers.bqsr.ApplyBQSR done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=982515712
java.lang.IllegalStateException: The covariates table is missing ReadGroup H753JDSXY.2 in RecalTable0
at org.broadinstitute.hellbender.utils.Utils.validate(Utils.java:750)
at org.broadinstitute.hellbender.utils.recalibration.covariates.ReadGroupCovariate.keyForReadGroup(ReadGroupCovariate.java:81)
at org.broadinstitute.hellbender.utils.recalibration.covariates.ReadGroupCovariate.recordValues(ReadGroupCovariate.java:53)
at org.broadinstitute.hellbender.utils.recalibration.covariates.StandardCovariateList.recordAllValuesInStorage(StandardCovariateList.java:133)
at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:546)
at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:527)
at org.broadinstitute.hellbender.transformers.BQSRReadTransformer.apply(BQSRReadTransformer.java:145)
at org.broadinstitute.hellbender.transformers.BQSRReadTransformer.apply(BQSRReadTransformer.java:27)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:94)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
To be safe, I will attach the log of the recal.table when I first created the table as well as the command:
Command:
./gatk BaseRecalibrator -R cf3.1toplevel.fa -I Kuu_sorted_dedup.bam --known-sites bqsr_kuu_snps.vcf --known-sites bqsr_kuu_indels.vcf -O recal_kuu.table
Completion log:
12:09:57.665 INFO ProgressMeter - AAEX03025150.1:3340 121.9 524369000 4301180.3
12:10:12.594 INFO BaseRecalibrator - 43702519 read(s) filtered by: MappingQualityNotZeroReadFilter
0 read(s) filtered by: MappingQualityAvailableReadFilter
0 read(s) filtered by: MappedReadFilter
0 read(s) filtered by: NotSecondaryAlignmentReadFilter
120929799 read(s) filtered by: NotDuplicateReadFilter
0 read(s) filtered by: PassesVendorQualityCheckReadFilter
0 read(s) filtered by: WellformedReadFilter
164632318 total reads filtered
12:10:12.595 INFO ProgressMeter - AAEX03026059.1:1598 122.2 524603691 4294340.3
12:10:12.595 INFO ProgressMeter - Traversal complete. Processed 524603691 total reads in 122.2 minutes.
12:10:12.614 INFO BaseRecalibrator - Calculating quantized quality scores...
12:10:12.621 INFO BaseRecalibrator - Writing recalibration report...
12:10:13.454 INFO BaseRecalibrator - ...done!
12:10:13.454 INFO BaseRecalibrator - BaseRecalibrator was able to recalibrate 524603691 reads
12:10:13.455 INFO BaseRecalibrator - Shutting down engine
[January 1, 2021 12:10:13 PM JST] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 122.19 minutes.
Runtime.totalMemory()=693108736
Tool returned:
SUCCESS
I have had no problem with my previous ApplyBQSR with the same flow, same parameters and entry command, but somehow I am having issues with this current run. At the same time, I realize I face this problem with all 3 WGS data retrieved from an Illumina platform but not on the data from MGI platform. The flow from alignment, samtools fix, samtools sort and picard mark duplicates were all the same, and when I check if my bam files were corrupted using samtools the results came out clean.
Any ideas what might be wrong, seeing I actually have 524603691 reads recalibrated?
Thanks.
-
Hi Robinn Teoh,
Would you be able to run ValidateSamFile on your input BAM file?
Thanks,
Genevieve
-
Dear Genevieve Brandt ,
I have only executed on one of the files, and it was okay and the log said no error was found.
I will run on the subsequent 2 files and let you know if the others are okay too.
Thanks.
Robinn
-
Dear Genevieve Brandt ,
I have ran ValidateSamFile on all three BAM files and all of them came back with no errors found after running on picard for 25 minutes.
Any idea what else I should do?
Best wishes,
Robinn
-
Hi Robinn,
Thanks for running that to confirm there were no issues with the input! I looked this over with my team and we think that the issue might be that an entire read group is getting filtered out by the MappingQualityNotZeroReadFilter (43702519 read(s) filtered by: MappingQualityNotZeroReadFilter). BaseRecalibrator has stricter read filters than ApplyBQSR and we have seen this occur before but don't have a fix yet.
You can use -RF MappingQualityNotZeroReadFilter with BaseRecalibrator to confirm that this is causing the issue. If ApplyBQSR is successful, then you know what is causing the issue and can fix it from there. Either continuing with BQSR after disabling the read filter and the reads will get filtered out later in GATK or remove the read group before BQSR so that ApplyBQSR does not have the issue.
Genevieve
-
Dear Genevieve,
Thank you for the heads up! I have came across with that post and yea I couldn't find any fix around the problem, and most users have had 0 recalibrated reads instead of 524603691 recalibrated reads in their results log so I thought the problem was different from what I encountered.
So to confirm again, I should be applying -RF MappingQualityNotZeroReadFilter on BaseRecalibrator, get the recal_table and apply to BQSR right? I will get on to that and let you know if it works.
Robinn
-
Okay so I tried using -RF MappingQualityNotZeroReadFilter with BaseRecalibrator with the following command,
./gatk BaseRecalibrator -R cf3.1toplevel.fa -I Kuu_sorted_dedup.bam --known-sites bqsr_kuu_snps.vcf --known-sites bqsr_kuu_indels.vcf -RF MappingQualityNotZeroReadFilter -O recal_kuu.table
This was the progress meter:
14:05:06.118 INFO BaseRecalibrator - 43702519 read(s) filtered by: MappingQualityNotZeroReadFilter
0 read(s) filtered by: MappingQualityAvailableReadFilter
0 read(s) filtered by: MappedReadFilter
0 read(s) filtered by: NotSecondaryAlignmentReadFilter
120929799 read(s) filtered by: NotDuplicateReadFilter
0 read(s) filtered by: PassesVendorQualityCheckReadFilter
0 read(s) filtered by: WellformedReadFilter
164632318 total reads filtered
14:05:06.118 INFO ProgressMeter - AAEX03026059.1:1598 129.1 524603691 4062622.7
14:05:06.119 INFO ProgressMeter - Traversal complete. Processed 524603691 total reads in 129.1 minutes.
14:05:06.139 INFO BaseRecalibrator - Calculating quantized quality scores...
14:05:06.146 INFO BaseRecalibrator - Writing recalibration report...
14:05:06.960 INFO BaseRecalibrator - ...done!
14:05:06.960 INFO BaseRecalibrator - BaseRecalibrator was able to recalibrate 524603691 reads
14:05:06.960 INFO BaseRecalibrator - Shutting down engine
[January 12, 2021 2:05:06 PM JST] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 129.16 minutes.
Runtime.totalMemory()=409993216
Tool returned:
SUCCESSWhen I ApplyBQSR nothing seemed to be different except of the error though...
./gatk ApplyBQSR -R cf3.1toplevel.fa -I Kuu_sorted_dedup.bam -bqsr recal_kuu.table -O recal_reads_kuu.bam
Using GATK jar /mnt/d/docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /mnt/d/docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar ApplyBQSR -R cf3.1toplevel.fa -I Kuu_sorted_dedup.bam -bqsr recal_kuu.table -O recal_reads_kuu.bam
14:18:40.691 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/d/docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jan 12, 2021 2:18:40 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
14:18:40.835 INFO ApplyBQSR - ------------------------------------------------------------
14:18:40.835 INFO ApplyBQSR - The Genome Analysis Toolkit (GATK) v4.1.9.0
14:18:40.835 INFO ApplyBQSR - For support and documentation go to https://software.broadinstitute.org/gatk/
14:18:40.835 INFO ApplyBQSR - Executing as naika@DESKTOP-TMKR2LT.localdomain on Linux v4.19.128-microsoft-standard amd64
14:18:40.835 INFO ApplyBQSR - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
14:18:40.835 INFO ApplyBQSR - Start Date/Time: January 12, 2021 2:18:40 PM JST
14:18:40.835 INFO ApplyBQSR - ------------------------------------------------------------
14:18:40.835 INFO ApplyBQSR - ------------------------------------------------------------
14:18:40.836 INFO ApplyBQSR - HTSJDK Version: 2.23.0
14:18:40.836 INFO ApplyBQSR - Picard Version: 2.23.3
14:18:40.836 INFO ApplyBQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 2
14:18:40.836 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
14:18:40.836 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
14:18:40.836 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
14:18:40.836 INFO ApplyBQSR - Deflater: IntelDeflater
14:18:40.836 INFO ApplyBQSR - Inflater: IntelInflater
14:18:40.836 INFO ApplyBQSR - GCS max retries/reopens: 20
14:18:40.836 INFO ApplyBQSR - Requester pays: disabled
14:18:40.836 INFO ApplyBQSR - Initializing engine
14:18:41.212 INFO ApplyBQSR - Done initializing engine
14:18:41.249 INFO ProgressMeter - Starting traversal
14:18:41.250 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
14:18:41.468 INFO ApplyBQSR - Shutting down engine
[January 12, 2021 2:18:41 PM JST] org.broadinstitute.hellbender.tools.walkers.bqsr.ApplyBQSR done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=983564288
java.lang.IllegalStateException: The covariates table is missing ReadGroup H753JDSXY.2 in RecalTable0
at org.broadinstitute.hellbender.utils.Utils.validate(Utils.java:750)
at org.broadinstitute.hellbender.utils.recalibration.covariates.ReadGroupCovariate.keyForReadGroup(ReadGroupCovariate.java:81)
at org.broadinstitute.hellbender.utils.recalibration.covariates.ReadGroupCovariate.recordValues(ReadGroupCovariate.java:53)
at org.broadinstitute.hellbender.utils.recalibration.covariates.StandardCovariateList.recordAllValuesInStorage(StandardCovariateList.java:133)
at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:546)
at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:527)
at org.broadinstitute.hellbender.transformers.BQSRReadTransformer.apply(BQSRReadTransformer.java:145)
at org.broadinstitute.hellbender.transformers.BQSRReadTransformer.apply(BQSRReadTransformer.java:27)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:94)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)And when you say disable read filter should I actually be applying -DF MappingQualityNotZeroReadFilter instead? I got a little confused on how -RF would help me solve this issue though.
Robinn
-
Robinn Teoh you are right! I made a mistake, to disable the read filter you would use -DF. Sorry about that!
-
Dear Genevieve,
I have tried disabling the said filter and applied BQSR, but the same error came back. This time round it was the MappedReadFilter which have filtered most of the reads with the following log:
The command I used for BQSR was :
./gatk BaseRecalibrator -R cf3.1toplevel.fa -I Kuu_sorted_dedup.bam --known-sites bqsr_kuu_snps.vcf --known-sites bqsr_kuu_indels.vcf -DF MappingQualityNotZeroReadFilter -O recal_kuu.table
The log for the BQSR table was:
11:25:56.578 INFO BaseRecalibrator - 0 read(s) filtered by: MappingQualityAvailableReadFilter
3635899 read(s) filtered by: MappedReadFilter
0 read(s) filtered by: NotSecondaryAlignmentReadFilter
132251119 read(s) filtered by: NotDuplicateReadFilter
0 read(s) filtered by: PassesVendorQualityCheckReadFilter
0 read(s) filtered by: WellformedReadFilter
135887018 total reads filtered
11:25:56.578 INFO ProgressMeter - AAEX03026067.1:1145 126.9 553348991 4362157.5
11:25:56.578 INFO ProgressMeter - Traversal complete. Processed 553348991 total reads in 126.9 minutes.
11:25:56.597 INFO BaseRecalibrator - Calculating quantized quality scores...
11:25:56.604 INFO BaseRecalibrator - Writing recalibration report...
11:25:57.421 INFO BaseRecalibrator - ...done!
11:25:57.421 INFO BaseRecalibrator - BaseRecalibrator was able to recalibrate 553348991 reads
11:25:57.421 INFO BaseRecalibrator - Shutting down engine
[January 13, 2021 11:25:57 AM JST] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 126.88 minutes.
Runtime.totalMemory()=727711744
Tool returned:
SUCCESSAnd the log for apply BQSR was:
./gatk ApplyBQSR -R cf3.1toplevel.fa -I Kuu_sorted_dedup.bam -bqsr recal_kuu.table -O recal_reads_kuu.bam
Using GATK jar /mnt/d/Docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /mnt/d/Docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar ApplyBQSR -R cf3.1toplevel.fa -I Kuu_sorted_dedup.bam -bqsr recal_kuu.table -O recal_reads_kuu.bam
16:27:38.042 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/d/Docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jan 13, 2021 4:27:38 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
16:27:38.181 INFO ApplyBQSR - ------------------------------------------------------------
16:27:38.181 INFO ApplyBQSR - The Genome Analysis Toolkit (GATK) v4.1.9.0
16:27:38.181 INFO ApplyBQSR - For support and documentation go to https://software.broadinstitute.org/gatk/
16:27:38.181 INFO ApplyBQSR - Executing as naika@DESKTOP-TMKR2LT on Linux v4.19.128-microsoft-standard amd64
16:27:38.181 INFO ApplyBQSR - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
16:27:38.181 INFO ApplyBQSR - Start Date/Time: January 13, 2021 4:27:38 PM JST
16:27:38.181 INFO ApplyBQSR - ------------------------------------------------------------
16:27:38.181 INFO ApplyBQSR - ------------------------------------------------------------
16:27:38.182 INFO ApplyBQSR - HTSJDK Version: 2.23.0
16:27:38.182 INFO ApplyBQSR - Picard Version: 2.23.3
16:27:38.182 INFO ApplyBQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:27:38.182 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:27:38.182 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:27:38.182 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:27:38.182 INFO ApplyBQSR - Deflater: IntelDeflater
16:27:38.182 INFO ApplyBQSR - Inflater: IntelInflater
16:27:38.182 INFO ApplyBQSR - GCS max retries/reopens: 20
16:27:38.182 INFO ApplyBQSR - Requester pays: disabled
16:27:38.182 INFO ApplyBQSR - Initializing engine
16:27:38.557 INFO ApplyBQSR - Done initializing engine
16:27:38.591 INFO ProgressMeter - Starting traversal
16:27:38.591 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
16:27:38.805 INFO ApplyBQSR - Shutting down engine
[January 13, 2021 4:27:38 PM JST] org.broadinstitute.hellbender.tools.walkers.bqsr.ApplyBQSR done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=946339840
java.lang.IllegalStateException: The covariates table is missing ReadGroup H753JDSXY.2 in RecalTable0
at org.broadinstitute.hellbender.utils.Utils.validate(Utils.java:750)
at org.broadinstitute.hellbender.utils.recalibration.covariates.ReadGroupCovariate.keyForReadGroup(ReadGroupCovariate.java:81)
at org.broadinstitute.hellbender.utils.recalibration.covariates.ReadGroupCovariate.recordValues(ReadGroupCovariate.java:53)
at org.broadinstitute.hellbender.utils.recalibration.covariates.StandardCovariateList.recordAllValuesInStorage(StandardCovariateList.java:133)
at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:546)
at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:527)
at org.broadinstitute.hellbender.transformers.BQSRReadTransformer.apply(BQSRReadTransformer.java:145)
at org.broadinstitute.hellbender.transformers.BQSRReadTransformer.apply(BQSRReadTransformer.java:27)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:94)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)Should I be omitting ALL the filters for BQSR? If that is the case is there any point of performing BQSR in the first place?
Thank you!
Robinn
-
Hi Robinn,
It looks like one of your read groups has problems which is why it is getting filtered out at the BaseRecalibrator step. Unfortunately, we don't have a good fix for it yet. Go ahead and comment on this post with your example so that when the developers are able to address this problem they have more information.
The goal of BQSR is to correct for systematic bias from the sequencer. While it's not ideal to disable read filters, if you are going to use that read group in your analysis, you will need to run ApplyBQSR on it as well. If the read group is getting filtered by BaseRecalibrator, it may also be filtered out in another step in the pipeline and it may not make much of a difference.
The workaround now would still be to either disable the read filters or to remove this read group. (We don't generally recommend disabling read filters, however if these same read filters will be applied by HaplotypeCaller, then it may work out fine).
Genevieve
-
Dear Genevieve,
Unfortunately removing these read groups from the analysis is not a viable option because these three samples are my main samples to be analyzed. I actually ran my analysis using the fastq files from the same three samples using a commercial software and there was also BQSR in the software too. What I was told my the support team from the commercial software was that Illumina samples may not require BQSR in their current sequences because of their superior quality, is that true?
What I realized from my BQSR was that running BQSR and applying them on MGI sequences were fine but not with Illumina, I do wonder what may be the problem though. What I realized was also after samfix using samtools, the BAM files retrieved from the SAM files from Illumina were significantly reduced in size as compared their MGI counterparts. Will this be a source of problem to this complete filtering in the BQSR step as well?
It is extremely odd, seeing how files from two different sequencing companies can result in such disparity in the BQSR step, even though I ran close to exact similar commands for files from both companies, except for the difference in read groups when I first set them up in the bwa-mem aligner.
-
I think there is something important to clarify - the goal of BQSR is not to filter out reads, it changes the quality scores in the reads to correct systematic bias from the sequencer. If you haven't read it yet, I would recommend this article, where the algorithm and logic is discussed.
What occurred for you is that read filters are applied before processing by BaseRecalibrator and an entire read group got filtered out. This is pretty uncommon, which is why we haven't run into this bug very often.
I would recommend looking at those reads that were filtered out and determine why those filters were applied to figure out the root of the issue with the read group. Because many of those same read filters are applied by HaplotypeCaller so you may run into the same problem again.
-
Dear Genevieve,
I think I might have a hunch on what is going on. All this while apart from GATK I have also referred to this site:
to run my GATK analysis, and on BQSR the author have deployed auto-BQSR where he first called for the variants, extract the snps and indels from the BAM files and use them for the BQSR plotting of the same BAM file.
May that be the problem? Should I be just running BQSR without calling the variants first with a reference snp file such as ncbi and ensembl, then continue with the variant calling instead? I feel like I messed up this step and that is why I am having problems.
But weirdly, auto-BQSR was successful in my MGI sequences while it failed in my Illumina sequences.
-
Our best available information is in our best practices, where we go over the steps we recommend and the purpose. If you are doing germline short variant discovery, you can find the information here.
-
Dear Genevieve,
I have reviewed the whole process and sticked to GATK best practices for the whole line and for BQSR, and the same problem still exist. I guess it is really a bug with BaseRecalibrator and my reads? In this case, it means that I may not proceed with any base calling with GATK for hereon right?
-
Robinn Teoh did you try disabling the read filters yet with any success? That may work for you.
-
Dear Genevieve,
I have tried disabling two filters with the following command:
./gatk BaseRecalibrator -R cf3.1toplevel.fa -I Kuu_sorted_dedup.bam --known-sites enscan101.vcf --known-sites ncbibuild151.vcf -DF MappingQualityNotZeroReadFilter -DF MappedReadFilter -O recal_kuu.table
but BQSR seemed to shut down automatically before the read details were available:
11:18:25.773 INFO ProgressMeter - AAEX03025817.1:4676 134.9 552926000 4098051.2
11:18:32.065 INFO BaseRecalibrator - Shutting down engine
[January 20, 2021 11:18:32 AM JST] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 135.05 minutes.
Runtime.totalMemory()=563609600
java.lang.IllegalStateException: No cigar elements left after removing leading and trailing deletions.
at org.broadinstitute.hellbender.utils.Utils.validate(Utils.java:741)
at org.broadinstitute.hellbender.utils.read.CigarBuilder.make(CigarBuilder.java:138)
at org.broadinstitute.hellbender.utils.read.CigarBuilder.make(CigarBuilder.java:143)
at org.broadinstitute.hellbender.utils.recalibration.BaseRecalibrationEngine.consolidateCigar(BaseRecalibrationEngine.java:293)
at org.broadinstitute.hellbender.transformers.ReadTransformer.lambda$andThen$f85d1091$1(ReadTransformer.java:20) at org.broadinstitute.hellbender.transformers.ReadTransformer.lambda$andThen$f85d1091$1(ReadTransformer.java:20) at org.broadinstitute.hellbender.transformers.ReadTransformer.lambda$andThen$f85d1091$1(ReadTransformer.java:20) at org.broadinstitute.hellbender.transformers.ReadTransformer.lambda$andThen$f85d1091$1(ReadTransformer.java:20) at org.broadinstitute.hellbender.utils.recalibration.BaseRecalibrationEngine.processRead(BaseRecalibrationEngine.java:118)
at org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator.apply(BaseRecalibrator.java:191)
at org.broadinstitute.hellbender.engine.ReadWalker.lambda$traverse$0(ReadWalker.java:96)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:94)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)Am I only allowed to disable one filter at a time only?
-
Hi Robinn Teoh,
I had my team look at this again and we are wondering if you could look more into what is the issue with the read groups that are getting filtered out.
Could you run these commands and post the output of the second?
1)
gatk FlagStat -I {INPUT BAM} -RF ReadGroupReadFilter --keep-read-group {PROBLEMATIC READ GROUP ID}
2)gatk FlagStat -I {INPUT BAM} -RF ReadGroupReadFilter --keep-read-group {PROBLEMATIC READ GROUP ID} -RF NotSecondaryAlignmentReadFilter
Thank you!
Genevieve
-
Dear Genevieve
I have ran the commands and here are the outputs:
./gatk FlagStat -I Kuu_sorted_dedup.bam -RF ReadGroupReadFilter --keep-read-group H753JDSXY.2
Using GATK jar /mnt/d/Docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /mnt/d/Docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar FlagStat -I Kuu_sorted_dedup.bam -RF ReadGroupReadFilter --keep-read-group H753JDSXY.2
13:11:50.357 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/d/Docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jan 23, 2021 1:11:50 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
13:11:50.557 INFO FlagStat - ------------------------------------------------------------
13:11:50.557 INFO FlagStat - The Genome Analysis Toolkit (GATK) v4.1.9.0
13:11:50.557 INFO FlagStat - For support and documentation go to https://software.broadinstitute.org/gatk/
13:11:50.557 INFO FlagStat - Executing as naika@DESKTOP-TMKR2LT on Linux v4.19.128-microsoft-standard amd64
13:11:50.557 INFO FlagStat - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
13:11:50.557 INFO FlagStat - Start Date/Time: January 23, 2021 1:11:50 PM JST
13:11:50.557 INFO FlagStat - ------------------------------------------------------------
13:11:50.557 INFO FlagStat - ------------------------------------------------------------
13:11:50.558 INFO FlagStat - HTSJDK Version: 2.23.0
13:11:50.558 INFO FlagStat - Picard Version: 2.23.3
13:11:50.558 INFO FlagStat - HTSJDK Defaults.COMPRESSION_LEVEL : 2
13:11:50.558 INFO FlagStat - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:11:50.558 INFO FlagStat - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
13:11:50.558 INFO FlagStat - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:11:50.558 INFO FlagStat - Deflater: IntelDeflater
13:11:50.558 INFO FlagStat - Inflater: IntelInflater
13:11:50.558 INFO FlagStat - GCS max retries/reopens: 20
13:11:50.558 INFO FlagStat - Requester pays: disabled
13:11:50.558 INFO FlagStat - Initializing engine
13:11:50.821 INFO FlagStat - Done initializing engine
13:11:50.821 INFO ProgressMeter - Starting traversal
13:11:50.821 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
13:39:13.773 INFO FlagStat - 0 read(s) filtered by: WellformedReadFilter
689238729 read(s) filtered by: ReadGroupReadFilter
689238729 total reads filtered
13:39:13.774 INFO ProgressMeter - unmapped 27.4 0 0.0
13:39:13.774 INFO ProgressMeter - Traversal complete. Processed 0 total reads in 27.4 minutes.
13:39:13.774 INFO FlagStat - Shutting down engine
[January 23, 2021 1:39:13 PM JST] org.broadinstitute.hellbender.tools.FlagStat done. Elapsed time: 27.39 minutes.
Runtime.totalMemory()=367525888
Tool returned:
0 in total
0 QC failure
0 duplicates
0 mapped (�%)
0 paired in sequencing
0 read1
0 read2
0 properly paired (�%)
0 with itself and mate mapped
0 singletons (�%)
0 with mate mapped to a different chr
0 with mate mapped to a different chr (mapQ>=5)And the second one:
./gatk FlagStat -I Kuu_sorted_dedup.bam -RF ReadGroupReadFilter --keep-read-group H753JDSXY.2 -RF NotSecondaryAlignmentReadFilter
Using GATK jar /mnt/d/Docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /mnt/d/Docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar FlagStat -I Kuu_sorted_dedup.bam -RF ReadGroupReadFilter --keep-read-group H753JDSXY.2 -RF NotSecondaryAlignmentReadFilter
14:46:10.596 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/d/Docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jan 23, 2021 2:46:10 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
14:46:10.748 INFO FlagStat - ------------------------------------------------------------
14:46:10.749 INFO FlagStat - The Genome Analysis Toolkit (GATK) v4.1.9.0
14:46:10.749 INFO FlagStat - For support and documentation go to https://software.broadinstitute.org/gatk/
14:46:10.749 INFO FlagStat - Executing as naika@DESKTOP-TMKR2LT on Linux v4.19.128-microsoft-standard amd64
14:46:10.749 INFO FlagStat - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
14:46:10.749 INFO FlagStat - Start Date/Time: January 23, 2021 2:46:10 PM JST
14:46:10.749 INFO FlagStat - ------------------------------------------------------------
14:46:10.749 INFO FlagStat - ------------------------------------------------------------
14:46:10.750 INFO FlagStat - HTSJDK Version: 2.23.0
14:46:10.750 INFO FlagStat - Picard Version: 2.23.3
14:46:10.750 INFO FlagStat - HTSJDK Defaults.COMPRESSION_LEVEL : 2
14:46:10.750 INFO FlagStat - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
14:46:10.750 INFO FlagStat - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
14:46:10.750 INFO FlagStat - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
14:46:10.750 INFO FlagStat - Deflater: IntelDeflater
14:46:10.750 INFO FlagStat - Inflater: IntelInflater
14:46:10.750 INFO FlagStat - GCS max retries/reopens: 20
14:46:10.750 INFO FlagStat - Requester pays: disabled
14:46:10.750 INFO FlagStat - Initializing engine
14:46:10.996 INFO FlagStat - Done initializing engine
14:46:10.996 INFO ProgressMeter - Starting traversal
14:46:10.997 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
15:08:38.292 INFO FlagStat - 0 read(s) filtered by: WellformedReadFilter
689238729 read(s) filtered by: ReadGroupReadFilter
0 read(s) filtered by: NotSecondaryAlignmentReadFilter
689238729 total reads filtered
15:08:38.293 INFO ProgressMeter - unmapped 22.5 0 0.0
15:08:38.293 INFO ProgressMeter - Traversal complete. Processed 0 total reads in 22.5 minutes.
15:08:38.293 INFO FlagStat - Shutting down engine
[January 23, 2021 3:08:38 PM JST] org.broadinstitute.hellbender.tools.FlagStat done. Elapsed time: 22.46 minutes.
Runtime.totalMemory()=571473920
Tool returned:
0 in total
0 QC failure
0 duplicates
0 mapped (�%)
0 paired in sequencing
0 read1
0 read2
0 properly paired (�%)
0 with itself and mate mapped
0 singletons (�%)
0 with mate mapped to a different chr
0 with mate mapped to a different chr (mapQ>=5)Let me know if you found anything! Thanks!
Robinn
-
Hi Robinn Teoh,
Thank you for running that! It looks like there are no reads in your BAM from those read groups. You can see that all other reads are being filtered out (689238729 read(s) filtered by: ReadGroupReadFilter) and no reads are being kept from those read groups.
I know from our discussion above, you wanted to keep this read group. You will have to go back to your previous steps in your analysis and figure out when the read group was lost.
Otherwise, to remove the read group from the header to proceed without the error, you can get the header:
samtools view -H INPUT.bam > header.txt
Then manually delete the
@RG
line with the problematic read group in header.txt. Then, replace the header:samtools reheader header.txt INPUT.bam > OUTPUT.bam
Best,
Genevieve
-
Dear Genevieve,
Thank you for the explanation! It is very weird to see the whole read group got filtered out before the BQSR step though, but I do realize compared to my DNBSEQ bam files, my Illumina bam files are way smaller post conversion from sam to bam with the following command
samtools fixmate -O BAM -@32 Kuu.aln.sam Kuu.fix.bam
will the fixmate tools actually filter the reads off? Because the subsequent steps were just sortsam and mark duplicates, which only imposes filtering functions in BQSR right?
On the other hand, if I were to remove the read group from the header and proceed with the analysis, what exactly does it mean? That the filtered read group were all duplicates and the remaining ones are the true reads? Will proceeding with bqsr and haploytype caller be successful?
So sorry for the silly questions, I am still pretty new in dealing with NGS and you guys are literally my only teachers.
Best wishes,
Robinn
-
Hi Robinn,
I can really only provide information/support for GATK Tools. I do know that MarkDuplicates does not remove reads, it just marks them.
If you remove the read group from the header, you should be able to proceed just fine with BQSR and HaplotypeCaller. I'm not sure if they were duplicates though, you'll have to look more in depth into your files and what happened in your analysis.
Best,
Genevieve
-
Dear Genevieve,
Understood. I will retry the method which you have suggested above and get back to you when I am done with it.
Thank you for your support throughout the whole conundrum!
Best wishes,
Robinn
-
No problem, glad we could help!
-
Dear Genevieve,
Just to confirm, I was supposed to apply BQSR table from my previous bam file to the new reheaded bam file, yes no?
As when I try to run BaseRecalibrator on my reheaded bam file i got an error of:
A USER ERROR has occurred: Number of read groups must be >= 1, but is 0
Best wishes,
Robinn
-
Hi Robinn,
I'm not sure I understand your question, could you clarify?
Genevieve
-
Dear Genevieve,
So sorry for the confusion. I will clarify the situation:
I have a BAM file (A) which holds the original header for the read group, one which I performed BaseRecalibrator on which produced a recalibrated table, yet failed to be applied with BQSR.
Then I have a BAM file (B) where the read group header has been removed via deletion using samtools rehead, one which when I perform BaseRecalibrator does not work and produces the error below:
A USER ERROR has occurred: Number of read groups must be >= 1, but is 0
So my question now is, should I be (1) applying BQSR to BAM file (B) using the recal table retrieved from BaseRecalibrator of BAM file (A), or (2) should I be creating a new recal table using BAM file (B) which I might be making a mistake?
In the case of (1), where I apply the recalibrated table of my original readgroup BAM file(A) to the deheaded BAM file(B), I got the logs shown below:
09:29:58.753 INFO ApplyBQSR - ------------------------------------------------------------
09:29:58.753 INFO ApplyBQSR - The Genome Analysis Toolkit (GATK) v4.1.9.0
09:29:58.754 INFO ApplyBQSR - For support and documentation go to https://software.broadinstitute.org/gatk/
09:29:58.754 INFO ApplyBQSR - Executing as naika@DESKTOP-TMKR2LT on Linux v4.19.128-microsoft-standard amd64
09:29:58.754 INFO ApplyBQSR - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
09:29:58.754 INFO ApplyBQSR - Start Date/Time: January 29, 2021 9:29:58 AM JST
09:29:58.754 INFO ApplyBQSR - ------------------------------------------------------------
09:29:58.754 INFO ApplyBQSR - ------------------------------------------------------------
09:29:58.754 INFO ApplyBQSR - HTSJDK Version: 2.23.0
09:29:58.754 INFO ApplyBQSR - Picard Version: 2.23.3
09:29:58.754 INFO ApplyBQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 2
09:29:58.754 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:29:58.755 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
09:29:58.755 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:29:58.755 INFO ApplyBQSR - Deflater: IntelDeflater
09:29:58.755 INFO ApplyBQSR - Inflater: IntelInflater
09:29:58.755 INFO ApplyBQSR - GCS max retries/reopens: 20
09:29:58.755 INFO ApplyBQSR - Requester pays: disabled
09:29:58.755 INFO ApplyBQSR - Initializing engine
09:29:59.191 INFO ApplyBQSR - Done initializing engine
09:29:59.226 INFO ProgressMeter - Starting traversal
09:29:59.227 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
09:29:59.500 INFO ApplyBQSR - Shutting down engine
[January 29, 2021 9:29:59 AM JST] org.broadinstitute.hellbender.tools.walkers.bqsr.ApplyBQSR done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=972029952
java.lang.NullPointerException
at org.broadinstitute.hellbender.utils.recalibration.covariates.ReadGroupCovariate.getID(ReadGroupCovariate.java:65)
at org.broadinstitute.hellbender.utils.recalibration.covariates.ReadGroupCovariate.recordValues(ReadGroupCovariate.java:52)
at org.broadinstitute.hellbender.utils.recalibration.covariates.StandardCovariateList.recordAllValuesInStorage(StandardCovariateList.java:133)
at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:546)
at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:527)
at org.broadinstitute.hellbender.transformers.BQSRReadTransformer.apply(BQSRReadTransformer.java:145)
at org.broadinstitute.hellbender.transformers.BQSRReadTransformer.apply(BQSRReadTransformer.java:27)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:94)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)I feel like the recalibrated table itself is somehow much smaller in size at 64kb compared to the files in my successful runs at 900 to 1400kb in size? Or does it actually not matter?
Please do let me know if it is still confusing.
Thank you!
Best wishes,
Robinn
-
Hi Robinn,
I believe option 2) is what you want to do. It is possible you are making a mistake. Make sure you are only removing the problematic read groups when you do the reheader, not all read groups.
Genevieve
-
Dear Genevieve,
What if that is the only read group I have in my BAM file?
@RG ID:PG3847_01_a LB:PG3847_01_A3535_H1 PL:Illumina PU:H753JDSXY.2 PM:Novaseq6000 SM:KuuKuu_tm_R1.fastq.gz
@PG ID:bwa PN:bwa VN:0.7.12-r1039 CL:bwa mem cf3.1toplevel -t 16 -R @RG\tID:PG3847_01_a\tLB:PG3847_01_A3535_H1\tPL:Illumina\tPU:H753JDSXY.2 \tPM:Novaseq6000\tSM:KuuKuu_tm_R1.fastq.gz Kuu_tm_R2.fastq.gz
@PG ID:MarkDuplicates VN:2.23.8 CL:MarkDuplicates INPUT=[Kuu.sorted.bam] OUTPUT=Kuu_sorted_dedup.bam METRICS_FILE=Kuu_markeddup.txt MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false PN:MarkDuplicates
@PG ID:samtools PN:samtools PP:bwa VN:1.10 CL:samtools view -H Kuu_sorted_dedup.bam
@PG ID:samtools.1 PN:samtools PP:MarkDuplicates VN:1.10 CL:samtools view -H Kuu_sorted_dedup.bamIf I remove @RG from this header I am indeed left with no read group any more though, as there is only 1 read group in each BAM file. Or perhaps there is something wrong with my read group input during bwa alignment?
Robinn
-
Hi Robinn Teoh,
That doesn't seem to make sense. Earlier in our thread, when you ran FlagStat, it showed that many of the reads in your BAM were from a different read group.
14:46:10.996 INFO ProgressMeter - Starting traversal
14:46:10.997 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
15:08:38.292 INFO FlagStat - 0 read(s) filtered by: WellformedReadFilter
689238729 read(s) filtered by: ReadGroupReadFilter
0 read(s) filtered by: NotSecondaryAlignmentReadFilter
689238729 total reads filtered
15:08:38.293 INFO ProgressMeter - unmapped 22.5 0 0.0
15:08:38.293 INFO ProgressMeter - Traversal complete. Processed 0 total reads in 22.5 minutes.
15:08:38.293 INFO FlagStat - Shutting down engine
[January 23, 2021 3:08:38 PM JST] org.broadinstitute.hellbender.tools.FlagStat done. Elapsed time: 22.46 minutes.
Runtime.totalMemory()=571473920
Tool returned:
0 in totalThe command:
./gatk FlagStat -I Kuu_sorted_dedup.bam -RF ReadGroupReadFilter --keep-read-group H753JDSXY.2
In that above stack trace, I can see that there are other read groups because the ReadGroupReadFilter is removing all reads not in the read group you specified. This is 689238729 reads. There are no reads left once all other read groups are filtered out.
Could you investigate this further and try to determine what is going on?
Genevieve
-
Dear Genevieve Brandt (she/her),
I am not too sure how should I be going about to investigate, but I will show you what the logs of my command exports:
samtools view -H Kuu_sorted_dedup.bam > Kuuheader.txt
which gives the txt file: (you probably want to scroll down to the end)
@HD VN:1.6 SO:coordinate
~ summarized
@SQ SN:AAEX03026071.1 LN:1573
@SQ SN:AAEX03026072.1 LN:1115
@RG ID:PG3847_01_a LB:PG3847_01_A3535_H1 PL:Illumina PU:H753JDSXY.2 PM:Novaseq6000 SM:KuuKuu_tm_R1.fastq.gz
@PG ID:bwa PN:bwa VN:0.7.12-r1039 CL:bwa mem cf3.1toplevel -t 16 -R @RG\tID:PG3847_01_a\tLB:PG3847_01_A3535_H1\tPL:Illumina\tPU:H753JDSXY.2 \tPM:Novaseq6000\tSM:KuuKuu_tm_R1.fastq.gz Kuu_tm_R2.fastq.gz
@PG ID:MarkDuplicates VN:2.23.8 CL:MarkDuplicates INPUT=[Kuu.sorted.bam] OUTPUT=Kuu_sorted_dedup.bam METRICS_FILE=Kuu_markeddup.txt MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false PN:MarkDuplicates
@PG ID:samtools PN:samtools PP:bwa VN:1.10 CL:samtools view -H Kuu_sorted_dedup.bam
@PG ID:samtools.1 PN:samtools PP:MarkDuplicates VN:1.10 CL:samtools view -H Kuu_sorted_dedup.bamI dont' see another @RG line anywhere though... Any idea what else I should do next?
Robinn
Please sign in to leave a comment.
44 comments