My recalibration table from BaseRecalibrator is empty
GATK version 4.1.9
c) Why do I see (......)? Nothing in my recalibration table
I have been following this pipeline. But my baserecalibrator does not seem to be working. See below.
My function call is:
gatk BaseRecalibrator \
-I 102_002_002_test_001_mergedAlignmentBam.bam \
-R GRCh38.primary_assembly.genome.fa \
--known-sites dbsnp_GRch38-common_all.vcf \
-O 102_002_002_test_001_recal_data.table
My table looks like this:
#:GATKReport.v1.1:5
#:GATKTable:2:17:%s:%s:;
#:GATKTable:Arguments:Recalibration argument collection values used in this run
Argument Value
binary_tag_name null
covariate ReadGroupCovariate,QualityScoreCovariate,ContextCovariate,CycleCovariate
default_platform null
deletions_default_quality 45
force_platform null
indels_context_size 3
insertions_default_quality 45
low_quality_tail 2
maximum_cycle_value 500
mismatches_context_size 2
mismatches_default_quality -1
no_standard_covs false
quantizing_levels 16
recalibration_report null
run_without_dbsnp false
solid_nocall_strategy THROW_EXCEPTION
solid_recal_mode SET_Q_ZERO
#:GATKTable:3:94:%d:%d:%d:;
#:GATKTable:Quantized:Quality quantization map
QualityScore Count QuantizedScore
0 0 93
1 0 93
2 0 93
3 0 93
4 0 93
5 0 93
6 0 93
7 0 93
8 0 93
9 0 93
10 0 93
11 0 93
12 0 93
13 0 93
14 0 93
15 0 93
16 0 93
17 0 93
18 0 93
19 0 93
20 0 93
21 0 93
22 0 93
23 0 93
24 0 93
25 0 93
26 0 93
27 0 93
28 0 93
29 0 93
30 0 93
31 0 93
32 0 93
33 0 93
34 0 93
35 0 93
36 0 93
37 0 93
38 0 93
39 0 93
40 0 93
41 0 93
42 0 93
43 0 93
44 0 93
45 0 93
46 0 93
47 0 93
48 0 93
49 0 93
50 0 93
51 0 93
52 0 93
53 0 93
54 0 93
55 0 93
56 0 93
57 0 93
58 0 93
59 0 93
60 0 93
61 0 93
62 0 93
63 0 93
64 0 93
65 0 93
66 0 93
67 0 93
68 0 93
69 0 93
70 0 93
71 0 93
72 0 93
73 0 93
74 0 93
75 0 93
76 0 93
77 0 93
78 0 93
79 0 79
80 0 80
81 0 81
82 0 82
83 0 83
84 0 84
85 0 85
86 0 86
87 0 87
88 0 88
89 0 89
90 0 90
91 0 91
92 0 92
93 0 93
#:GATKTable:6:0:%s:%s:%.4f:%.4f:%d:%.2f:;
#:GATKTable:RecalTable0:
ReadGroup EventType EmpiricalQuality EstimatedQReported Observations Errors
#:GATKTable:6:0:%s:%d:%s:%.4f:%d:%.2f:;
#:GATKTable:RecalTable1:
ReadGroup QualityScore EventType EmpiricalQuality Observations Errors
#:GATKTable:8:0:%s:%d:%s:%s:%s:%.4f:%d:%.2f:;
#:GATKTable:RecalTable2:
ReadGroup QualityScore CovariateValue CovariateName EventType EmpiricalQuality Observations Errors
While my input file looks like this
samtools view 102_002_002_test_001_mergedAlignmentBam.bam | head -n 3
BFC08P1:47:C5KAUACXX:1:1101:10000:14834 77 * 0 0 * * 0 0 GTGTATGCTCCCAGCAGCAACGGAGGTTCAGGCAAGATGCCCGAAGGAGGGAAGGGTGACAAGGGCAGTGGGGAGA BB@FFFFFHHHHHJJJJJJJJJJJJJ?DHIIGJIJJJJJJJJJGHGI@FHIEHEBD?BBCEEEDDDBB<CDDD?B5 PG:Z:bwa RG:Z:C5KAU.1
BFC08P1:47:C5KAUACXX:1:1101:10000:14834 141 * 0 0 * * 0 0 GGTGACAGAGCGAGACTCTGTCTCAAAAAATACAATACAATACAATACAATACAGAAAGAAAGGTGTGTTCCTCCC @@?DFFFFHHHHGJJJJJJJGHIJJJJJJJJJJJJJJJJIGIIJJJJJJJGGIIJIIJJGHIJHAEAHFFFFFFDD PG:Z:bwa RG:Z:C5KAU.1
BFC08P1:47:C5KAUACXX:1:1101:10000:18447 77 * 0 0 * * 0 0 TAGCATTATATGAAAAATCCCGTTTCCAACGAAGGCCACAAAGAGGTCCAAATATCCACTTGCAGATTCTGCAAAA CCCFFFFFFFHHHJDGIJJGIHCFGGGHIJGIIIIGGIJJ<HGHGI<DFFDHIIJBGHIGIIJBHGIGBDEHD:AC PG:Z:bwa RG:Z:C5KAU.1
When I ApplyBQSR it isn't able to recognize readgroups, but they're in the file - RG:Z:C5KAU.1. Is there something obvious that I've messed up here, or would cause this kind of an output?
-
Could you please share your complete program log output from BaseRecalibrator?
Best,
Genevieve
-
Thanks for your reply Genevieve! Here is what I'm seeing for after BaseRecalibrator but the table is empty (as shown), and then when I go to ApplyBQSR I get the read group error.
Recalibrating bases
Using GATK jar /n/app/gatk/4.1.9.0/gatk-package-4.1.9.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /n/app/gatk/4.1.9.0/gatk-package-4.1.9.0-local.jar BaseRecalibrator -I /n/scratch3/users/c/ct194/diskUsageTest/102_002_002_test/MDuBQSR_outputs/BQSR_temp/102_002_002_test_001_markedDupes.bam -R /n/scratch3/users/c/ct194/gatk/index/GRCh38.primary_assembly.genome.fa --known-sites /n/scratch3/users/c/ct194/gatk/common_vcfs/dbsnp_GRch38-common_all.vcf -O /n/scratch3/users/c/ct194/diskUsageTest/102_002_002_test/MDuBQSR_outputs/BQSR_temp/102_002_002_test_001_recal_data.table
12:57:49.870 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/n/app/gatk/4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Aug 26, 2021 12:57:50 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
12:57:50.054 INFO BaseRecalibrator - ------------------------------------------------------------
12:57:50.055 INFO BaseRecalibrator - The Genome Analysis Toolkit (GATK) v4.1.9.0
12:57:50.055 INFO BaseRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
12:57:50.055 INFO BaseRecalibrator - Executing as ct194@compute-e-16-231.o2.rc.hms.harvard.edu on Linux v3.10.0-1062.el7.x86_64 amd64
12:57:50.055 INFO BaseRecalibrator - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_112-b15
12:57:50.055 INFO BaseRecalibrator - Start Date/Time: August 26, 2021 12:57:49 PM EDT
12:57:50.055 INFO BaseRecalibrator - ------------------------------------------------------------
12:57:50.055 INFO BaseRecalibrator - ------------------------------------------------------------
12:57:50.055 INFO BaseRecalibrator - HTSJDK Version: 2.23.0
12:57:50.056 INFO BaseRecalibrator - Picard Version: 2.23.3
12:57:50.056 INFO BaseRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
12:57:50.056 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
12:57:50.056 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
12:57:50.056 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
12:57:50.056 INFO BaseRecalibrator - Deflater: IntelDeflater
12:57:50.056 INFO BaseRecalibrator - Inflater: IntelInflater
12:57:50.056 INFO BaseRecalibrator - GCS max retries/reopens: 20
12:57:50.056 INFO BaseRecalibrator - Requester pays: disabled
12:57:50.056 INFO BaseRecalibrator - Initializing engine
12:57:50.593 INFO FeatureManager - Using codec VCFCodec to read file file:///n/scratch3/users/c/ct194/gatk/common_vcfs/dbsnp_GRch38-common_all.vcf
12:57:50.623 INFO BaseRecalibrator - Done initializing engine
12:57:50.630 INFO BaseRecalibrationEngine - The covariates being used here:
12:57:50.630 INFO BaseRecalibrationEngine - ReadGroupCovariate
12:57:50.630 INFO BaseRecalibrationEngine - QualityScoreCovariate
12:57:50.630 INFO BaseRecalibrationEngine - ContextCovariate
12:57:50.630 INFO BaseRecalibrationEngine - CycleCovariate
12:57:50.639 INFO ProgressMeter - Starting traversal
12:57:50.639 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
12:57:58.426 INFO BaseRecalibrator - 8000000 read(s) filtered by: MappingQualityNotZeroReadFilter
0 read(s) filtered by: MappingQualityAvailableReadFilter
0 read(s) filtered by: MappedReadFilter
0 read(s) filtered by: NotSecondaryAlignmentReadFilter
0 read(s) filtered by: NotDuplicateReadFilter
0 read(s) filtered by: PassesVendorQualityCheckReadFilter
0 read(s) filtered by: WellformedReadFilter
8000000 total reads filtered
12:57:58.428 INFO ProgressMeter - unmapped 0.1 0 0.0
12:57:58.428 INFO ProgressMeter - Traversal complete. Processed 0 total reads in 0.1 minutes.
12:57:58.428 INFO BaseRecalibrator - Calculating quantized quality scores...
12:57:58.441 INFO BaseRecalibrator - Writing recalibration report...
12:57:58.484 INFO BaseRecalibrator - ...done!
12:57:58.485 INFO BaseRecalibrator - BaseRecalibrator was able to recalibrate 0 reads
12:57:58.485 INFO BaseRecalibrator - Shutting down engine
[August 26, 2021 12:57:58 PM EDT] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 0.15 minutes.
Runtime.totalMemory()=2076049408
Tool returned:
SUCCESSI realized that I made a typo above. My input file is actually the bam file after marking duplicates.
-
Thanks for sharing that! It looks like a lot of your reads (8,000,000) were filtered by the MappingQualityNotZeroReadFilter. BaseRecalibrator is also finishing very quickly (0.15 minutes). How many reads did you start with in your marked duplicates input bam? If you don't have any reads left after filtering, that is probably why you are getting the read group error because it does not exist.
-
So it looks like I have 8000000 reads in the marked duplicates bam file as well.
# Read count
samtools view -c 102_002_002_test_001_markedDupes.bam
8000000
# First 3 rows
samtools view 102_002_002_test_001_markedDupes.bam | head -n 3
BFC08P1:47:C5KAUACXX:1:1101:10000:14834 77 * 0 0 * * 0 0 GTGTATGCTCCCAGCAGCAACGGAGGTTCAGGCAAGATGCCCGAAGGAGGGAAGGGTGACAAGGGCAGTGGGGAGA BB@FFFFFHHHHHJJJJJJJJJJJJJ?DHIIGJIJJJJJJJJJGHGI@FHIEHEBD?BBCEEEDDDBB<CDDD?B5 PG:Z:bwa RG:Z:C5KAU.1
BFC08P1:47:C5KAUACXX:1:1101:10000:14834 141 * 0 0 * * 0 0 GGTGACAGAGCGAGACTCTGTCTCAAAAAATACAATACAATACAATACAATACAGAAAGAAAGGTGTGTTCCTCCC @@?DFFFFHHHHGJJJJJJJGHIJJJJJJJJJJJJJJJJIGIIJJJJJJJGGIIJIIJJGHIJHAEAHFFFFFFDD PG:Z:bwa RG:Z:C5KAU.1
BFC08P1:47:C5KAUACXX:1:1101:10000:18447 77 * 0 0 * * 0 0 TAGCATTATATGAAAAATCCCGTTTCCAACGAAGGCCACAAAGAGGTCCAAATATCCACTTGCAGATTCTGCAAAA CCCFFFFFFFHHHJDGIJJGIHCFGGGHIJGIIIIGGIJJ<HGHGI<DFFDHIIJBGHIGIIJBHGIGBDEHD:AC PG:Z:bwa RG:Z:C5KAU.1Is it usual for it to filter that many reads with MappingQualityNotZero?
Additionally my mergedAlignmentBam.bam which I used as input for marking duplicates also has 8000000 reads.
samtools view -c 102_002_002_test_001_mergedAlignmentBam.bam
8000000 -
This read filter filtered out all your reads because they have a mapping quality of zero. Most likely something went wrong during your mapping step, so you should check at that step for issues.
Please sign in to leave a comment.
5 comments