Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

My recalibration table from BaseRecalibrator is empty

0

5 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Calum Tattersfield,

    Could you please share your complete program log output from BaseRecalibrator?

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Calum Tattersfield

    Thanks for your reply Genevieve! Here is what I'm seeing for after BaseRecalibrator but the table is empty (as shown), and then when I go to ApplyBQSR I get the read group error.

    Recalibrating bases
    Using GATK jar /n/app/gatk/4.1.9.0/gatk-package-4.1.9.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /n/app/gatk/4.1.9.0/gatk-package-4.1.9.0-local.jar BaseRecalibrator -I /n/scratch3/users/c/ct194/diskUsageTest/102_002_002_test/MDuBQSR_outputs/BQSR_temp/102_002_002_test_001_markedDupes.bam -R /n/scratch3/users/c/ct194/gatk/index/GRCh38.primary_assembly.genome.fa --known-sites /n/scratch3/users/c/ct194/gatk/common_vcfs/dbsnp_GRch38-common_all.vcf -O /n/scratch3/users/c/ct194/diskUsageTest/102_002_002_test/MDuBQSR_outputs/BQSR_temp/102_002_002_test_001_recal_data.table
    12:57:49.870 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/n/app/gatk/4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Aug 26, 2021 12:57:50 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    12:57:50.054 INFO BaseRecalibrator - ------------------------------------------------------------
    12:57:50.055 INFO BaseRecalibrator - The Genome Analysis Toolkit (GATK) v4.1.9.0
    12:57:50.055 INFO BaseRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
    12:57:50.055 INFO BaseRecalibrator - Executing as ct194@compute-e-16-231.o2.rc.hms.harvard.edu on Linux v3.10.0-1062.el7.x86_64 amd64
    12:57:50.055 INFO BaseRecalibrator - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_112-b15
    12:57:50.055 INFO BaseRecalibrator - Start Date/Time: August 26, 2021 12:57:49 PM EDT
    12:57:50.055 INFO BaseRecalibrator - ------------------------------------------------------------
    12:57:50.055 INFO BaseRecalibrator - ------------------------------------------------------------
    12:57:50.055 INFO BaseRecalibrator - HTSJDK Version: 2.23.0
    12:57:50.056 INFO BaseRecalibrator - Picard Version: 2.23.3
    12:57:50.056 INFO BaseRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    12:57:50.056 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    12:57:50.056 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    12:57:50.056 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    12:57:50.056 INFO BaseRecalibrator - Deflater: IntelDeflater
    12:57:50.056 INFO BaseRecalibrator - Inflater: IntelInflater
    12:57:50.056 INFO BaseRecalibrator - GCS max retries/reopens: 20
    12:57:50.056 INFO BaseRecalibrator - Requester pays: disabled
    12:57:50.056 INFO BaseRecalibrator - Initializing engine
    12:57:50.593 INFO FeatureManager - Using codec VCFCodec to read file file:///n/scratch3/users/c/ct194/gatk/common_vcfs/dbsnp_GRch38-common_all.vcf
    12:57:50.623 INFO BaseRecalibrator - Done initializing engine
    12:57:50.630 INFO BaseRecalibrationEngine - The covariates being used here:
    12:57:50.630 INFO BaseRecalibrationEngine - ReadGroupCovariate
    12:57:50.630 INFO BaseRecalibrationEngine - QualityScoreCovariate
    12:57:50.630 INFO BaseRecalibrationEngine - ContextCovariate
    12:57:50.630 INFO BaseRecalibrationEngine - CycleCovariate
    12:57:50.639 INFO ProgressMeter - Starting traversal
    12:57:50.639 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
    12:57:58.426 INFO BaseRecalibrator - 8000000 read(s) filtered by: MappingQualityNotZeroReadFilter
    0 read(s) filtered by: MappingQualityAvailableReadFilter
    0 read(s) filtered by: MappedReadFilter
    0 read(s) filtered by: NotSecondaryAlignmentReadFilter
    0 read(s) filtered by: NotDuplicateReadFilter
    0 read(s) filtered by: PassesVendorQualityCheckReadFilter
    0 read(s) filtered by: WellformedReadFilter
    8000000 total reads filtered
    12:57:58.428 INFO ProgressMeter - unmapped 0.1 0 0.0
    12:57:58.428 INFO ProgressMeter - Traversal complete. Processed 0 total reads in 0.1 minutes.
    12:57:58.428 INFO BaseRecalibrator - Calculating quantized quality scores...
    12:57:58.441 INFO BaseRecalibrator - Writing recalibration report...
    12:57:58.484 INFO BaseRecalibrator - ...done!
    12:57:58.485 INFO BaseRecalibrator - BaseRecalibrator was able to recalibrate 0 reads
    12:57:58.485 INFO BaseRecalibrator - Shutting down engine
    [August 26, 2021 12:57:58 PM EDT] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 0.15 minutes.
    Runtime.totalMemory()=2076049408
    Tool returned:
    SUCCESS

    I realized that I made a typo above. My input file is actually the bam file after marking duplicates.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks for sharing that! It looks like a lot of your reads (8,000,000) were filtered by the MappingQualityNotZeroReadFilter. BaseRecalibrator is also finishing very quickly (0.15 minutes). How many reads did you start with in your marked duplicates input bam? If you don't have any reads left after filtering, that is probably why you are getting the read group error because it does not exist.

    0
    Comment actions Permalink
  • Avatar
    Calum Tattersfield

    So it looks like I have 8000000 reads in the marked duplicates bam file as well.

    # Read count
    samtools view -c 102_002_002_test_001_markedDupes.bam
    8000000

    # First 3 rows
    samtools view 102_002_002_test_001_markedDupes.bam | head -n 3
    BFC08P1:47:C5KAUACXX:1:1101:10000:14834 77 * 0 0 * * 0 0 GTGTATGCTCCCAGCAGCAACGGAGGTTCAGGCAAGATGCCCGAAGGAGGGAAGGGTGACAAGGGCAGTGGGGAGA BB@FFFFFHHHHHJJJJJJJJJJJJJ?DHIIGJIJJJJJJJJJGHGI@FHIEHEBD?BBCEEEDDDBB<CDDD?B5 PG:Z:bwa RG:Z:C5KAU.1
    BFC08P1:47:C5KAUACXX:1:1101:10000:14834 141 * 0 0 * * 0 0 GGTGACAGAGCGAGACTCTGTCTCAAAAAATACAATACAATACAATACAATACAGAAAGAAAGGTGTGTTCCTCCC @@?DFFFFHHHHGJJJJJJJGHIJJJJJJJJJJJJJJJJIGIIJJJJJJJGGIIJIIJJGHIJHAEAHFFFFFFDD PG:Z:bwa RG:Z:C5KAU.1
    BFC08P1:47:C5KAUACXX:1:1101:10000:18447 77 * 0 0 * * 0 0 TAGCATTATATGAAAAATCCCGTTTCCAACGAAGGCCACAAAGAGGTCCAAATATCCACTTGCAGATTCTGCAAAA CCCFFFFFFFHHHJDGIJJGIHCFGGGHIJGIIIIGGIJJ<HGHGI<DFFDHIIJBGHIGIIJBHGIGBDEHD:AC PG:Z:bwa RG:Z:C5KAU.1

    Is it usual for it to filter that many reads with MappingQualityNotZero?

     

    Additionally my mergedAlignmentBam.bam which I used as input for marking duplicates also has 8000000 reads.

     

    samtools view -c 102_002_002_test_001_mergedAlignmentBam.bam
    8000000
    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    This read filter filtered out all your reads because they have a mapping quality of zero. Most likely something went wrong during your mapping step, so you should check at that step for issues.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk