Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

[Repost] ReadGroup missing when Applying BQSR

Answered
0

44 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Robinn Teoh

    Would you be able to run ValidateSamFile on your input BAM file?

    Thanks,

    Genevieve

     

    0
    Comment actions Permalink
  • Avatar
    Robinn Teoh

    Dear Genevieve Brandt ,

    I have only executed on one of the files, and it was okay and the log said no error was found.

    I will run on the subsequent 2 files and let you know if the others are okay too.

    Thanks.

    Robinn 

    0
    Comment actions Permalink
  • Avatar
    Robinn Teoh

    Dear Genevieve Brandt ,

    I have ran ValidateSamFile on all three BAM files and all of them came back with no errors found after running on picard for 25 minutes.

    Any idea what else I should do?

    Best wishes,

    Robinn

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Robinn,

    Thanks for running that to confirm there were no issues with the input! I looked this over with my team and we think that the issue might be that an entire read group is getting filtered out by the MappingQualityNotZeroReadFilter (43702519 read(s) filtered by: MappingQualityNotZeroReadFilter). BaseRecalibrator has stricter read filters than ApplyBQSR and we have seen this occur before but don't have a fix yet. 

    You can use -RF MappingQualityNotZeroReadFilter with BaseRecalibrator to confirm that this is causing the issue. If ApplyBQSR is successful, then you know what is causing the issue and can fix it from there. Either continuing with BQSR after disabling the read filter and the reads will get filtered out later in GATK or remove the read group before BQSR so that ApplyBQSR does not have the issue.

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Robinn Teoh

    Dear Genevieve,

    Thank you for the heads up! I have came across with that post and yea I couldn't find any fix around the problem, and most users have had 0 recalibrated reads instead of 524603691 recalibrated reads in their results log so I thought the problem was different from what I encountered.

    So to confirm again, I should be applying -RF MappingQualityNotZeroReadFilter on BaseRecalibrator, get the recal_table and apply to BQSR right? I will get on to that and let you know if it works.

    Robinn 

    0
    Comment actions Permalink
  • Avatar
    Robinn Teoh

    Okay so I tried using -RF MappingQualityNotZeroReadFilter with BaseRecalibrator with the following command,

    ./gatk BaseRecalibrator -R cf3.1toplevel.fa -I Kuu_sorted_dedup.bam --known-sites bqsr_kuu_snps.vcf --known-sites bqsr_kuu_indels.vcf -RF MappingQualityNotZeroReadFilter -O recal_kuu.table

    This was the progress meter:

    14:05:06.118 INFO BaseRecalibrator - 43702519 read(s) filtered by: MappingQualityNotZeroReadFilter
    0 read(s) filtered by: MappingQualityAvailableReadFilter
    0 read(s) filtered by: MappedReadFilter
    0 read(s) filtered by: NotSecondaryAlignmentReadFilter
    120929799 read(s) filtered by: NotDuplicateReadFilter
    0 read(s) filtered by: PassesVendorQualityCheckReadFilter
    0 read(s) filtered by: WellformedReadFilter
    164632318 total reads filtered
    14:05:06.118 INFO ProgressMeter - AAEX03026059.1:1598 129.1 524603691 4062622.7
    14:05:06.119 INFO ProgressMeter - Traversal complete. Processed 524603691 total reads in 129.1 minutes.
    14:05:06.139 INFO BaseRecalibrator - Calculating quantized quality scores...
    14:05:06.146 INFO BaseRecalibrator - Writing recalibration report...
    14:05:06.960 INFO BaseRecalibrator - ...done!
    14:05:06.960 INFO BaseRecalibrator - BaseRecalibrator was able to recalibrate 524603691 reads
    14:05:06.960 INFO BaseRecalibrator - Shutting down engine
    [January 12, 2021 2:05:06 PM JST] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 129.16 minutes.
    Runtime.totalMemory()=409993216
    Tool returned:
    SUCCESS

    When I ApplyBQSR nothing seemed to be different except of the error though...

    ./gatk ApplyBQSR -R cf3.1toplevel.fa -I Kuu_sorted_dedup.bam -bqsr recal_kuu.table -O recal_reads_kuu.bam
    Using GATK jar /mnt/d/docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /mnt/d/docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar ApplyBQSR -R cf3.1toplevel.fa -I Kuu_sorted_dedup.bam -bqsr recal_kuu.table -O recal_reads_kuu.bam
    14:18:40.691 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/d/docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Jan 12, 2021 2:18:40 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    14:18:40.835 INFO ApplyBQSR - ------------------------------------------------------------
    14:18:40.835 INFO ApplyBQSR - The Genome Analysis Toolkit (GATK) v4.1.9.0
    14:18:40.835 INFO ApplyBQSR - For support and documentation go to https://software.broadinstitute.org/gatk/
    14:18:40.835 INFO ApplyBQSR - Executing as naika@DESKTOP-TMKR2LT.localdomain on Linux v4.19.128-microsoft-standard amd64
    14:18:40.835 INFO ApplyBQSR - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
    14:18:40.835 INFO ApplyBQSR - Start Date/Time: January 12, 2021 2:18:40 PM JST
    14:18:40.835 INFO ApplyBQSR - ------------------------------------------------------------
    14:18:40.835 INFO ApplyBQSR - ------------------------------------------------------------
    14:18:40.836 INFO ApplyBQSR - HTSJDK Version: 2.23.0
    14:18:40.836 INFO ApplyBQSR - Picard Version: 2.23.3
    14:18:40.836 INFO ApplyBQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    14:18:40.836 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    14:18:40.836 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    14:18:40.836 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    14:18:40.836 INFO ApplyBQSR - Deflater: IntelDeflater
    14:18:40.836 INFO ApplyBQSR - Inflater: IntelInflater
    14:18:40.836 INFO ApplyBQSR - GCS max retries/reopens: 20
    14:18:40.836 INFO ApplyBQSR - Requester pays: disabled
    14:18:40.836 INFO ApplyBQSR - Initializing engine
    14:18:41.212 INFO ApplyBQSR - Done initializing engine
    14:18:41.249 INFO ProgressMeter - Starting traversal
    14:18:41.250 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
    14:18:41.468 INFO ApplyBQSR - Shutting down engine
    [January 12, 2021 2:18:41 PM JST] org.broadinstitute.hellbender.tools.walkers.bqsr.ApplyBQSR done. Elapsed time: 0.01 minutes.
    Runtime.totalMemory()=983564288
    java.lang.IllegalStateException: The covariates table is missing ReadGroup H753JDSXY.2 in RecalTable0
    at org.broadinstitute.hellbender.utils.Utils.validate(Utils.java:750)
    at org.broadinstitute.hellbender.utils.recalibration.covariates.ReadGroupCovariate.keyForReadGroup(ReadGroupCovariate.java:81)
    at org.broadinstitute.hellbender.utils.recalibration.covariates.ReadGroupCovariate.recordValues(ReadGroupCovariate.java:53)
    at org.broadinstitute.hellbender.utils.recalibration.covariates.StandardCovariateList.recordAllValuesInStorage(StandardCovariateList.java:133)
    at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:546)
    at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:527)
    at org.broadinstitute.hellbender.transformers.BQSRReadTransformer.apply(BQSRReadTransformer.java:145)
    at org.broadinstitute.hellbender.transformers.BQSRReadTransformer.apply(BQSRReadTransformer.java:27)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
    at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:94)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)

    And when you say disable read filter should I actually be applying -DF MappingQualityNotZeroReadFilter instead? I got a little confused on how -RF would help me solve this issue though.

    Robinn

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Robinn Teoh you are right! I made a mistake, to disable the read filter you would use -DF. Sorry about that!

    0
    Comment actions Permalink
  • Avatar
    Robinn Teoh

    Dear Genevieve,

    I have tried disabling the said filter and applied BQSR, but the same error came back. This time round it was the MappedReadFilter which have filtered most of the reads with the following log:

    The command I used for BQSR was :

    ./gatk BaseRecalibrator -R cf3.1toplevel.fa -I Kuu_sorted_dedup.bam --known-sites bqsr_kuu_snps.vcf --known-sites bqsr_kuu_indels.vcf -DF MappingQualityNotZeroReadFilter -O recal_kuu.table

    The log for the BQSR table was:

    11:25:56.578 INFO BaseRecalibrator - 0 read(s) filtered by: MappingQualityAvailableReadFilter
    3635899 read(s) filtered by: MappedReadFilter
    0 read(s) filtered by: NotSecondaryAlignmentReadFilter
    132251119 read(s) filtered by: NotDuplicateReadFilter
    0 read(s) filtered by: PassesVendorQualityCheckReadFilter
    0 read(s) filtered by: WellformedReadFilter
    135887018 total reads filtered
    11:25:56.578 INFO ProgressMeter - AAEX03026067.1:1145 126.9 553348991 4362157.5
    11:25:56.578 INFO ProgressMeter - Traversal complete. Processed 553348991 total reads in 126.9 minutes.
    11:25:56.597 INFO BaseRecalibrator - Calculating quantized quality scores...
    11:25:56.604 INFO BaseRecalibrator - Writing recalibration report...
    11:25:57.421 INFO BaseRecalibrator - ...done!
    11:25:57.421 INFO BaseRecalibrator - BaseRecalibrator was able to recalibrate 553348991 reads
    11:25:57.421 INFO BaseRecalibrator - Shutting down engine
    [January 13, 2021 11:25:57 AM JST] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 126.88 minutes.
    Runtime.totalMemory()=727711744
    Tool returned:
    SUCCESS

    And the log for apply BQSR was:

    ./gatk ApplyBQSR -R cf3.1toplevel.fa -I Kuu_sorted_dedup.bam -bqsr recal_kuu.table -O recal_reads_kuu.bam
    Using GATK jar /mnt/d/Docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /mnt/d/Docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar ApplyBQSR -R cf3.1toplevel.fa -I Kuu_sorted_dedup.bam -bqsr recal_kuu.table -O recal_reads_kuu.bam
    16:27:38.042 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/d/Docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Jan 13, 2021 4:27:38 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    16:27:38.181 INFO ApplyBQSR - ------------------------------------------------------------
    16:27:38.181 INFO ApplyBQSR - The Genome Analysis Toolkit (GATK) v4.1.9.0
    16:27:38.181 INFO ApplyBQSR - For support and documentation go to https://software.broadinstitute.org/gatk/
    16:27:38.181 INFO ApplyBQSR - Executing as naika@DESKTOP-TMKR2LT on Linux v4.19.128-microsoft-standard amd64
    16:27:38.181 INFO ApplyBQSR - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
    16:27:38.181 INFO ApplyBQSR - Start Date/Time: January 13, 2021 4:27:38 PM JST
    16:27:38.181 INFO ApplyBQSR - ------------------------------------------------------------
    16:27:38.181 INFO ApplyBQSR - ------------------------------------------------------------
    16:27:38.182 INFO ApplyBQSR - HTSJDK Version: 2.23.0
    16:27:38.182 INFO ApplyBQSR - Picard Version: 2.23.3
    16:27:38.182 INFO ApplyBQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    16:27:38.182 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    16:27:38.182 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    16:27:38.182 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    16:27:38.182 INFO ApplyBQSR - Deflater: IntelDeflater
    16:27:38.182 INFO ApplyBQSR - Inflater: IntelInflater
    16:27:38.182 INFO ApplyBQSR - GCS max retries/reopens: 20
    16:27:38.182 INFO ApplyBQSR - Requester pays: disabled
    16:27:38.182 INFO ApplyBQSR - Initializing engine
    16:27:38.557 INFO ApplyBQSR - Done initializing engine
    16:27:38.591 INFO ProgressMeter - Starting traversal
    16:27:38.591 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
    16:27:38.805 INFO ApplyBQSR - Shutting down engine
    [January 13, 2021 4:27:38 PM JST] org.broadinstitute.hellbender.tools.walkers.bqsr.ApplyBQSR done. Elapsed time: 0.01 minutes.
    Runtime.totalMemory()=946339840
    java.lang.IllegalStateException: The covariates table is missing ReadGroup H753JDSXY.2 in RecalTable0
    at org.broadinstitute.hellbender.utils.Utils.validate(Utils.java:750)
    at org.broadinstitute.hellbender.utils.recalibration.covariates.ReadGroupCovariate.keyForReadGroup(ReadGroupCovariate.java:81)
    at org.broadinstitute.hellbender.utils.recalibration.covariates.ReadGroupCovariate.recordValues(ReadGroupCovariate.java:53)
    at org.broadinstitute.hellbender.utils.recalibration.covariates.StandardCovariateList.recordAllValuesInStorage(StandardCovariateList.java:133)
    at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:546)
    at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:527)
    at org.broadinstitute.hellbender.transformers.BQSRReadTransformer.apply(BQSRReadTransformer.java:145)
    at org.broadinstitute.hellbender.transformers.BQSRReadTransformer.apply(BQSRReadTransformer.java:27)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
    at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:94)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)

    Should I be omitting ALL the filters for BQSR? If that is the case is there any point of performing BQSR in the first place?

     

    Thank you!

    Robinn 

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Robinn,

    It looks like one of your read groups has problems which is why it is getting filtered out at the BaseRecalibrator step. Unfortunately, we don't have a good fix for it yet. Go ahead and comment on this post with your example so that when the developers are able to address this problem they have more information.

    The goal of BQSR is to correct for systematic bias from the sequencer. While it's not ideal to disable read filters, if you are going to use that read group in your analysis, you will need to run ApplyBQSR on it as well. If the read group is getting filtered by BaseRecalibrator, it may also be filtered out in another step in the pipeline and it may not make much of a difference.

    The workaround now would still be to either disable the read filters or to remove this read group. (We don't generally recommend disabling read filters, however if these same read filters will be applied by HaplotypeCaller, then it may work out fine).

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Robinn Teoh

    Dear Genevieve,

    Unfortunately removing these read groups from the analysis is not a viable option because these three samples are my main samples to be analyzed. I actually ran my analysis using the fastq files from the same three samples using a commercial software and there was also BQSR in the software too. What I was told my the support team from the commercial software was that Illumina samples may not require BQSR in their current sequences because of their superior quality, is that true?

    What I realized from my BQSR was that running BQSR and applying them on MGI sequences were fine but not with Illumina, I do wonder what may be the problem though. What I realized was also after samfix using samtools, the BAM files retrieved from the SAM files  from Illumina were significantly reduced in size as compared their MGI counterparts. Will this be a source of problem to this complete filtering in the BQSR step as well?

    It is extremely odd, seeing how files from two different sequencing companies can result in such disparity in the BQSR step, even though I ran close to exact similar commands for files from both companies, except for the difference in read groups when I first set them up in the bwa-mem aligner.

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    I think there is something important to clarify - the goal of BQSR is not to filter out reads, it changes the quality scores in the reads to correct systematic bias from the sequencer. If you haven't read it yet, I would recommend this article, where the algorithm and logic is discussed. 

    What occurred for you is that read filters are applied before processing by BaseRecalibrator and an entire read group got filtered out. This is pretty uncommon, which is why we haven't run into this bug very often. 

    I would recommend looking at those reads that were filtered out and determine why those filters were applied to figure out the root of the issue with the read group. Because many of those same read filters are applied by HaplotypeCaller so you may run into the same problem again.

    0
    Comment actions Permalink
  • Avatar
    Robinn Teoh

    Dear Genevieve,

    I think I might have a hunch on what is going on. All this while apart from GATK I have also referred to this site:

    https://gencore.bio.nyu.edu/variant-calling-pipeline-gatk4/

    to run my GATK analysis, and on BQSR the author have deployed auto-BQSR where he first called for the variants, extract the snps and indels from the BAM files and use them for the BQSR plotting of the same BAM file.

    May that be the problem? Should I be just running BQSR without calling the variants first with a reference snp file such as ncbi and ensembl, then continue with the variant calling instead? I feel like I messed up this step and that is why I am having problems.

    But weirdly, auto-BQSR was successful in my MGI sequences while it failed in my Illumina sequences.

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Our best available information is in our best practices, where we go over the steps we recommend and the purpose. If you are doing germline short variant discovery, you can find the information here.

    0
    Comment actions Permalink
  • Avatar
    Robinn Teoh

    Dear Genevieve,

    I have reviewed the whole process and sticked to GATK best practices for the whole line and for BQSR, and the same problem still exist. I guess it is really a bug with BaseRecalibrator and my reads? In this case, it means that I may not proceed with any base calling with GATK for hereon right?

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Robinn Teoh did you try disabling the read filters yet with any success? That may work for you.

    0
    Comment actions Permalink
  • Avatar
    Robinn Teoh

    Dear Genevieve,

    I have tried disabling two filters with the following command:

    ./gatk BaseRecalibrator -R cf3.1toplevel.fa -I Kuu_sorted_dedup.bam --known-sites enscan101.vcf --known-sites ncbibuild151.vcf -DF MappingQualityNotZeroReadFilter -DF MappedReadFilter -O recal_kuu.table

    but BQSR seemed to shut down automatically before the read details were available:

    11:18:25.773 INFO ProgressMeter - AAEX03025817.1:4676 134.9 552926000 4098051.2
    11:18:32.065 INFO BaseRecalibrator - Shutting down engine
    [January 20, 2021 11:18:32 AM JST] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 135.05 minutes.
    Runtime.totalMemory()=563609600
    java.lang.IllegalStateException: No cigar elements left after removing leading and trailing deletions.
    at org.broadinstitute.hellbender.utils.Utils.validate(Utils.java:741)
    at org.broadinstitute.hellbender.utils.read.CigarBuilder.make(CigarBuilder.java:138)
    at org.broadinstitute.hellbender.utils.read.CigarBuilder.make(CigarBuilder.java:143)
    at org.broadinstitute.hellbender.utils.recalibration.BaseRecalibrationEngine.consolidateCigar(BaseRecalibrationEngine.java:293)
    at org.broadinstitute.hellbender.transformers.ReadTransformer.lambda$andThen$f85d1091$1(ReadTransformer.java:20) at org.broadinstitute.hellbender.transformers.ReadTransformer.lambda$andThen$f85d1091$1(ReadTransformer.java:20) at org.broadinstitute.hellbender.transformers.ReadTransformer.lambda$andThen$f85d1091$1(ReadTransformer.java:20) at org.broadinstitute.hellbender.transformers.ReadTransformer.lambda$andThen$f85d1091$1(ReadTransformer.java:20) at org.broadinstitute.hellbender.utils.recalibration.BaseRecalibrationEngine.processRead(BaseRecalibrationEngine.java:118)
    at org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator.apply(BaseRecalibrator.java:191)
    at org.broadinstitute.hellbender.engine.ReadWalker.lambda$traverse$0(ReadWalker.java:96)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
    at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:94)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)

    Am I only allowed to disable one filter at a time only?

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Robinn Teoh,

    I had my team look at this again and we are wondering if you could look more into what is the issue with the read groups that are getting filtered out.

    Could you run these commands and post the output of the second?

    1) gatk FlagStat -I {INPUT BAM} -RF ReadGroupReadFilter --keep-read-group {PROBLEMATIC READ GROUP ID} 
    2) gatk FlagStat -I {INPUT BAM} -RF ReadGroupReadFilter --keep-read-group {PROBLEMATIC READ GROUP ID} -RF NotSecondaryAlignmentReadFilter

    Thank you!

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Robinn Teoh

    Dear Genevieve

    I have ran the commands and here are the outputs:

    ./gatk FlagStat -I Kuu_sorted_dedup.bam -RF ReadGroupReadFilter --keep-read-group H753JDSXY.2
    Using GATK jar /mnt/d/Docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /mnt/d/Docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar FlagStat -I Kuu_sorted_dedup.bam -RF ReadGroupReadFilter --keep-read-group H753JDSXY.2
    13:11:50.357 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/d/Docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Jan 23, 2021 1:11:50 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    13:11:50.557 INFO FlagStat - ------------------------------------------------------------
    13:11:50.557 INFO FlagStat - The Genome Analysis Toolkit (GATK) v4.1.9.0
    13:11:50.557 INFO FlagStat - For support and documentation go to https://software.broadinstitute.org/gatk/
    13:11:50.557 INFO FlagStat - Executing as naika@DESKTOP-TMKR2LT on Linux v4.19.128-microsoft-standard amd64
    13:11:50.557 INFO FlagStat - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
    13:11:50.557 INFO FlagStat - Start Date/Time: January 23, 2021 1:11:50 PM JST
    13:11:50.557 INFO FlagStat - ------------------------------------------------------------
    13:11:50.557 INFO FlagStat - ------------------------------------------------------------
    13:11:50.558 INFO FlagStat - HTSJDK Version: 2.23.0
    13:11:50.558 INFO FlagStat - Picard Version: 2.23.3
    13:11:50.558 INFO FlagStat - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    13:11:50.558 INFO FlagStat - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    13:11:50.558 INFO FlagStat - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    13:11:50.558 INFO FlagStat - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    13:11:50.558 INFO FlagStat - Deflater: IntelDeflater
    13:11:50.558 INFO FlagStat - Inflater: IntelInflater
    13:11:50.558 INFO FlagStat - GCS max retries/reopens: 20
    13:11:50.558 INFO FlagStat - Requester pays: disabled
    13:11:50.558 INFO FlagStat - Initializing engine
    13:11:50.821 INFO FlagStat - Done initializing engine
    13:11:50.821 INFO ProgressMeter - Starting traversal
    13:11:50.821 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
    13:39:13.773 INFO FlagStat - 0 read(s) filtered by: WellformedReadFilter
    689238729 read(s) filtered by: ReadGroupReadFilter
    689238729 total reads filtered
    13:39:13.774 INFO ProgressMeter - unmapped 27.4 0 0.0
    13:39:13.774 INFO ProgressMeter - Traversal complete. Processed 0 total reads in 27.4 minutes.
    13:39:13.774 INFO FlagStat - Shutting down engine
    [January 23, 2021 1:39:13 PM JST] org.broadinstitute.hellbender.tools.FlagStat done. Elapsed time: 27.39 minutes.
    Runtime.totalMemory()=367525888
    Tool returned:
    0 in total
    0 QC failure
    0 duplicates
    0 mapped (�%)
    0 paired in sequencing
    0 read1
    0 read2
    0 properly paired (�%)
    0 with itself and mate mapped
    0 singletons (�%)
    0 with mate mapped to a different chr
    0 with mate mapped to a different chr (mapQ>=5)

    And the second one:

    ./gatk FlagStat -I Kuu_sorted_dedup.bam -RF ReadGroupReadFilter --keep-read-group H753JDSXY.2 -RF NotSecondaryAlignmentReadFilter
    Using GATK jar /mnt/d/Docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /mnt/d/Docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar FlagStat -I Kuu_sorted_dedup.bam -RF ReadGroupReadFilter --keep-read-group H753JDSXY.2 -RF NotSecondaryAlignmentReadFilter
    14:46:10.596 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/d/Docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Jan 23, 2021 2:46:10 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    14:46:10.748 INFO FlagStat - ------------------------------------------------------------
    14:46:10.749 INFO FlagStat - The Genome Analysis Toolkit (GATK) v4.1.9.0
    14:46:10.749 INFO FlagStat - For support and documentation go to https://software.broadinstitute.org/gatk/
    14:46:10.749 INFO FlagStat - Executing as naika@DESKTOP-TMKR2LT on Linux v4.19.128-microsoft-standard amd64
    14:46:10.749 INFO FlagStat - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
    14:46:10.749 INFO FlagStat - Start Date/Time: January 23, 2021 2:46:10 PM JST
    14:46:10.749 INFO FlagStat - ------------------------------------------------------------
    14:46:10.749 INFO FlagStat - ------------------------------------------------------------
    14:46:10.750 INFO FlagStat - HTSJDK Version: 2.23.0
    14:46:10.750 INFO FlagStat - Picard Version: 2.23.3
    14:46:10.750 INFO FlagStat - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    14:46:10.750 INFO FlagStat - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    14:46:10.750 INFO FlagStat - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    14:46:10.750 INFO FlagStat - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    14:46:10.750 INFO FlagStat - Deflater: IntelDeflater
    14:46:10.750 INFO FlagStat - Inflater: IntelInflater
    14:46:10.750 INFO FlagStat - GCS max retries/reopens: 20
    14:46:10.750 INFO FlagStat - Requester pays: disabled
    14:46:10.750 INFO FlagStat - Initializing engine
    14:46:10.996 INFO FlagStat - Done initializing engine
    14:46:10.996 INFO ProgressMeter - Starting traversal
    14:46:10.997 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
    15:08:38.292 INFO FlagStat - 0 read(s) filtered by: WellformedReadFilter
    689238729 read(s) filtered by: ReadGroupReadFilter
    0 read(s) filtered by: NotSecondaryAlignmentReadFilter
    689238729 total reads filtered
    15:08:38.293 INFO ProgressMeter - unmapped 22.5 0 0.0
    15:08:38.293 INFO ProgressMeter - Traversal complete. Processed 0 total reads in 22.5 minutes.
    15:08:38.293 INFO FlagStat - Shutting down engine
    [January 23, 2021 3:08:38 PM JST] org.broadinstitute.hellbender.tools.FlagStat done. Elapsed time: 22.46 minutes.
    Runtime.totalMemory()=571473920
    Tool returned:
    0 in total
    0 QC failure
    0 duplicates
    0 mapped (�%)
    0 paired in sequencing
    0 read1
    0 read2
    0 properly paired (�%)
    0 with itself and mate mapped
    0 singletons (�%)
    0 with mate mapped to a different chr
    0 with mate mapped to a different chr (mapQ>=5)

    Let me know if you found anything! Thanks!

    Robinn

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Robinn Teoh,

    Thank you for running that! It looks like there are no reads in your BAM from those read groups. You can see that all other reads are being filtered out (689238729 read(s) filtered by: ReadGroupReadFilter) and no reads are being kept from those read groups.

    I know from our discussion above, you wanted to keep this read group. You will have to go back to your previous steps in your analysis and figure out when the read group was lost.

    Otherwise, to remove the read group from the header to proceed without the error, you can get the header:

    samtools view -H INPUT.bam > header.txt

    Then manually delete the @RG line with the problematic read group in header.txt. Then, replace the header:

    samtools reheader header.txt INPUT.bam > OUTPUT.bam

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Robinn Teoh

    Dear Genevieve,

    Thank you for the explanation! It is very weird to see the whole read group got filtered out before the BQSR step though, but I do realize compared to my DNBSEQ bam files, my Illumina bam files are way smaller post conversion from sam to bam with the following command

    samtools fixmate -O BAM -@32 Kuu.aln.sam Kuu.fix.bam

    will the fixmate tools actually filter the reads off? Because the subsequent steps were just sortsam and mark duplicates, which only imposes filtering functions in BQSR right?

    On the other hand, if I were to remove the read group from the header and proceed with the analysis, what exactly does it mean? That the filtered read group were all duplicates and the remaining ones are the true reads? Will proceeding with bqsr and haploytype caller be successful?

    So sorry for the silly questions, I am still pretty new in dealing with NGS and you guys are literally my only teachers.

    Best wishes,

    Robinn

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Robinn,

    I can really only provide information/support for GATK Tools. I do know that MarkDuplicates does not remove reads, it just marks them

    If you remove the read group from the header, you should be able to proceed just fine with BQSR and HaplotypeCaller. I'm not sure if they were duplicates though, you'll have to look more in depth into your files and what happened in your analysis.

    Best,

    Genevieve

     

    0
    Comment actions Permalink
  • Avatar
    Robinn Teoh

    Dear Genevieve,

    Understood. I will retry the method which you have suggested above and get back to you when I am done with it.

    Thank you for your support throughout the whole conundrum!

    Best wishes,

    Robinn 

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    No problem, glad we could help!

    0
    Comment actions Permalink
  • Avatar
    Robinn Teoh

    Dear Genevieve,

    Just to confirm, I was supposed to apply BQSR table from my previous bam file to the new reheaded bam file, yes no?

    As when I try to run BaseRecalibrator on my reheaded bam file i got an error of:

    A USER ERROR has occurred: Number of read groups must be >= 1, but is 0

    Best wishes,

    Robinn

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Robinn, 

    I'm not sure I understand your question, could you clarify?

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Robinn Teoh

    Dear Genevieve,

    So sorry for the confusion. I will clarify the situation:

    I have a BAM file (A) which holds the original header for the read group, one which I performed BaseRecalibrator on which produced a recalibrated table, yet failed to be applied with BQSR.

    Then I have a BAM file (B) where the read group header has been removed via deletion using samtools rehead, one which when I perform BaseRecalibrator does not work and produces the error below:

    A USER ERROR has occurred: Number of read groups must be >= 1, but is 0

    So my question now is, should I be (1) applying BQSR to BAM file (B) using the recal table retrieved from BaseRecalibrator of BAM file (A), or (2) should I be creating a new recal table using BAM file (B) which I might be making a mistake?

    In the case of (1), where I apply the recalibrated table of my original readgroup BAM file(A) to the deheaded BAM file(B), I got the logs shown below:

    09:29:58.753 INFO ApplyBQSR - ------------------------------------------------------------
    09:29:58.753 INFO ApplyBQSR - The Genome Analysis Toolkit (GATK) v4.1.9.0
    09:29:58.754 INFO ApplyBQSR - For support and documentation go to https://software.broadinstitute.org/gatk/
    09:29:58.754 INFO ApplyBQSR - Executing as naika@DESKTOP-TMKR2LT on Linux v4.19.128-microsoft-standard amd64
    09:29:58.754 INFO ApplyBQSR - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
    09:29:58.754 INFO ApplyBQSR - Start Date/Time: January 29, 2021 9:29:58 AM JST
    09:29:58.754 INFO ApplyBQSR - ------------------------------------------------------------
    09:29:58.754 INFO ApplyBQSR - ------------------------------------------------------------
    09:29:58.754 INFO ApplyBQSR - HTSJDK Version: 2.23.0
    09:29:58.754 INFO ApplyBQSR - Picard Version: 2.23.3
    09:29:58.754 INFO ApplyBQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    09:29:58.754 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    09:29:58.755 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    09:29:58.755 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    09:29:58.755 INFO ApplyBQSR - Deflater: IntelDeflater
    09:29:58.755 INFO ApplyBQSR - Inflater: IntelInflater
    09:29:58.755 INFO ApplyBQSR - GCS max retries/reopens: 20
    09:29:58.755 INFO ApplyBQSR - Requester pays: disabled
    09:29:58.755 INFO ApplyBQSR - Initializing engine
    09:29:59.191 INFO ApplyBQSR - Done initializing engine
    09:29:59.226 INFO ProgressMeter - Starting traversal
    09:29:59.227 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
    09:29:59.500 INFO ApplyBQSR - Shutting down engine
    [January 29, 2021 9:29:59 AM JST] org.broadinstitute.hellbender.tools.walkers.bqsr.ApplyBQSR done. Elapsed time: 0.02 minutes.
    Runtime.totalMemory()=972029952
    java.lang.NullPointerException
    at org.broadinstitute.hellbender.utils.recalibration.covariates.ReadGroupCovariate.getID(ReadGroupCovariate.java:65)
    at org.broadinstitute.hellbender.utils.recalibration.covariates.ReadGroupCovariate.recordValues(ReadGroupCovariate.java:52)
    at org.broadinstitute.hellbender.utils.recalibration.covariates.StandardCovariateList.recordAllValuesInStorage(StandardCovariateList.java:133)
    at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:546)
    at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.computeCovariates(RecalUtils.java:527)
    at org.broadinstitute.hellbender.transformers.BQSRReadTransformer.apply(BQSRReadTransformer.java:145)
    at org.broadinstitute.hellbender.transformers.BQSRReadTransformer.apply(BQSRReadTransformer.java:27)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
    at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:94)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)

    I feel like the recalibrated table itself is somehow much smaller in size at 64kb compared to the files in my successful runs at 900 to 1400kb in size? Or does it actually not matter?

    Please do let me know if it is still confusing.

    Thank you!

    Best wishes,

    Robinn

     

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Robinn,

    I believe option 2) is what you want to do. It is possible you are making a mistake. Make sure you are only removing the problematic read groups when you do the reheader, not all read groups.

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Robinn Teoh

    Dear  Genevieve,

    What if that is the only read group I have in my BAM file?

    @RG ID:PG3847_01_a LB:PG3847_01_A3535_H1 PL:Illumina PU:H753JDSXY.2 PM:Novaseq6000 SM:KuuKuu_tm_R1.fastq.gz
    @PG ID:bwa PN:bwa VN:0.7.12-r1039 CL:bwa mem cf3.1toplevel -t 16 -R @RG\tID:PG3847_01_a\tLB:PG3847_01_A3535_H1\tPL:Illumina\tPU:H753JDSXY.2 \tPM:Novaseq6000\tSM:KuuKuu_tm_R1.fastq.gz Kuu_tm_R2.fastq.gz
    @PG ID:MarkDuplicates VN:2.23.8 CL:MarkDuplicates INPUT=[Kuu.sorted.bam] OUTPUT=Kuu_sorted_dedup.bam METRICS_FILE=Kuu_markeddup.txt MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false PN:MarkDuplicates
    @PG ID:samtools PN:samtools PP:bwa VN:1.10 CL:samtools view -H Kuu_sorted_dedup.bam
    @PG ID:samtools.1 PN:samtools PP:MarkDuplicates VN:1.10 CL:samtools view -H Kuu_sorted_dedup.bam

    If I remove @RG from this header I am indeed left with no read group any more though, as there is only 1 read group in each BAM file. Or perhaps there is something wrong with my read group input during bwa alignment? 

    Robinn

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Robinn Teoh,

    That doesn't seem to make sense. Earlier in our thread, when you ran FlagStat, it showed that many of the reads in your BAM were from a different read group. 

    14:46:10.996 INFO ProgressMeter - Starting traversal
    14:46:10.997 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
    15:08:38.292 INFO FlagStat - 0 read(s) filtered by: WellformedReadFilter
    689238729 read(s) filtered by: ReadGroupReadFilter
    0 read(s) filtered by: NotSecondaryAlignmentReadFilter
    689238729 total reads filtered
    15:08:38.293 INFO ProgressMeter - unmapped 22.5 0 0.0
    15:08:38.293 INFO ProgressMeter - Traversal complete. Processed 0 total reads in 22.5 minutes.
    15:08:38.293 INFO FlagStat - Shutting down engine
    [January 23, 2021 3:08:38 PM JST] org.broadinstitute.hellbender.tools.FlagStat done. Elapsed time: 22.46 minutes.
    Runtime.totalMemory()=571473920
    Tool returned:
    0 in total

    The command:

    ./gatk FlagStat -I Kuu_sorted_dedup.bam -RF ReadGroupReadFilter --keep-read-group H753JDSXY.2

    In that above stack trace, I can see that there are other read groups because the ReadGroupReadFilter is removing all reads not in the read group you specified. This is 689238729 reads. There are no reads left once all other read groups are filtered out.

    Could you investigate this further and try to determine what is going on?

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Robinn Teoh

    Dear Genevieve Brandt (she/her),

    I am not too sure how should I be going about to investigate, but I will show you what the logs of my command exports:

    samtools view -H Kuu_sorted_dedup.bam > Kuuheader.txt

    which gives the txt file: (you probably want to scroll down to the end)

    @HD VN:1.6 SO:coordinate
    ~ summarized
    @SQ SN:AAEX03026071.1 LN:1573
    @SQ SN:AAEX03026072.1 LN:1115
    @RG ID:PG3847_01_a LB:PG3847_01_A3535_H1 PL:Illumina PU:H753JDSXY.2 PM:Novaseq6000 SM:KuuKuu_tm_R1.fastq.gz
    @PG ID:bwa PN:bwa VN:0.7.12-r1039 CL:bwa mem cf3.1toplevel -t 16 -R @RG\tID:PG3847_01_a\tLB:PG3847_01_A3535_H1\tPL:Illumina\tPU:H753JDSXY.2 \tPM:Novaseq6000\tSM:KuuKuu_tm_R1.fastq.gz Kuu_tm_R2.fastq.gz
    @PG ID:MarkDuplicates VN:2.23.8 CL:MarkDuplicates INPUT=[Kuu.sorted.bam] OUTPUT=Kuu_sorted_dedup.bam METRICS_FILE=Kuu_markeddup.txt MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false PN:MarkDuplicates
    @PG ID:samtools PN:samtools PP:bwa VN:1.10 CL:samtools view -H Kuu_sorted_dedup.bam
    @PG ID:samtools.1 PN:samtools PP:MarkDuplicates VN:1.10 CL:samtools view -H Kuu_sorted_dedup.bam

     

    I dont' see another @RG line anywhere though... Any idea what else I should do next?

    Robinn

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk