BaseRecalibrator tool recalibration table
AnsweredDear GATK,
I have used the gatk BaseRecalibrator tool on my bam files but I noticed that all the Recal_data.tables that have been generated on my 97 bam files have exactly the same weight
From 111005 bytes and 111100, this seems very strange , I have processed other samples and the Recal_data.table files were always very variable in size.
For recalibration I’ve used the following files from the GATK bundle :
--known-sites Homo_sapiens_assembly38.known_indels.vcf --known-sites
--known-sites 1000G_phase1.snps.high_confidence.hg38.vcf --known-sites --known-sites
--known-sites Mills_and_1000G_gold_standard.indels.hg38.vcf
The only thing that I changed in my pipeline compared to the past was the fasta file that I used to align the fastq files to the reference genome. Before I used the fasta hg38 available in GATK bundle, this time I used the fasta file that did not contain the ALT haplotypes https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.26_GRCh38/GRCh38_major_release_seqs_for_alignment_pipelines/
Are the known sites compatible with my new fasta file (hg38)?
REQUIRED for all errors and issues:
a) GATK version used:The Genome Analysis Toolkit (GATK) v4.2.3.0
b) Exact command used:
/lustrehome/sharonnatashacox/.conda/envs/venv2/bin/gatk BaseRecalibrator -R /lustrehome/sharonnatashacox/ex_storage_2021/Chromosomes/NO_ALT_GENOME/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna -I /lustre/parkinsongiudice/PESCE/110_EGF/OUT_sorted_nodup_prefix.bam -O Recal_data110_manua,table --known-sites /lustrehome/sharonnatashacox/ex_storage_2021/Chromosomes/GATKbundle/Homo_sapiens_assembly38.known_indels.vcf --known-sites /lustrehome/sharonnatashacox/ex_storage_2021/Chromosomes/GATKbundle/1000G_phase1.snps.high_confidence.hg38.vcf --known-sites /lustrehome/sharonnatashacox/ex_storage_2021/Chromosomes/GATKbundle/Mills_and_1000G_gold_standard.indels.hg38.vcf --java-options -DGATK_STACKTRACE_ON_USER_EXCEPTION=true; done
c) Entire program log:
/OUT_sorted_nodup_prefix.bam
Using GATK jar /lustrehome/sharonnatashacox/.conda/envs/venv2/share/gatk4-4.2.3.0-1/gatk-package-4.2.3.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /lustrehome/sharonnatashacox/.conda/envs/venv2/share/gatk4-4.2.3.0-1/gatk-package-4.2.3.0-local.jar BaseRecalibrator -R /lustrehome/sharonnatashacox/ex_storage_2021/Chromosomes/NO_ALT_GENOME/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna -I /lustre/parkinsongiudice/PESCE/110_EGF/OUT_sorted_nodup_prefix.bam -O /lustre/parkinsongiudice/PESCE/110_EGF/Recal_data.table --known-sites /lustrehome/sharonnatashacox/ex_storage_2021/Chromosomes/GATKbundle/Homo_sapiens_assembly38.known_indels.vcf --known-sites /lustrehome/sharonnatashacox/ex_storage_2021/Chromosomes/GATKbundle/1000G_phase1.snps.high_confidence.hg38.vcf --known-sites /lustrehome/sharonnatashacox/ex_storage_2021/Chromosomes/GATKbundle/Mills_and_1000G_gold_standard.indels.hg38.vcf
17:32:55.690 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/lustrehome/sharonnatashacox/.conda/envs/venv2/share/gatk4-4.2.3.0-1/gatk-package-4.2.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jul 22, 2022 5:32:55 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
17:32:55.958 INFO BaseRecalibrator - ------------------------------------------------------------
17:32:55.959 INFO BaseRecalibrator - The Genome Analysis Toolkit (GATK) v4.2.3.0
17:32:55.959 INFO BaseRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
17:32:55.959 INFO BaseRecalibrator - Executing as sharonnatashacox@ui02.recas.ba.infn.it on Linux v3.10.0-1160.71.1.el7.x86_64 amd64
17:32:55.959 INFO BaseRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v11.0.8-internal+0-adhoc..src
17:32:55.959 INFO BaseRecalibrator - Start Date/Time: July 22, 2022 at 5:32:55 PM CEST
17:32:55.960 INFO BaseRecalibrator - ------------------------------------------------------------
17:32:55.960 INFO BaseRecalibrator - ------------------------------------------------------------
17:32:55.961 INFO BaseRecalibrator - HTSJDK Version: 2.24.1
17:32:55.961 INFO BaseRecalibrator - Picard Version: 2.25.4
17:32:55.961 INFO BaseRecalibrator - Built for Spark Version: 2.4.5
17:32:55.961 INFO BaseRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
17:32:55.961 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
17:32:55.961 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
17:32:55.961 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
17:32:55.962 INFO BaseRecalibrator - Deflater: IntelDeflater
17:32:55.962 INFO BaseRecalibrator - Inflater: IntelInflater
17:32:55.962 INFO BaseRecalibrator - GCS max retries/reopens: 20
17:32:55.962 INFO BaseRecalibrator - Requester pays: disabled
17:32:55.962 INFO BaseRecalibrator - Initializing engine
17:32:56.615 INFO FeatureManager - Using codec VCFCodec to read file file:///lustrehome/sharonnatashacox/ex_storage_2021/Chromosomes/GATKbundle/Homo_sapiens_assembly38.known_indels.vcf
17:32:56.755 INFO FeatureManager - Using codec VCFCodec to read file file:///lustrehome/sharonnatashacox/ex_storage_2021/Chromosomes/GATKbundle/1000G_phase1.snps.high_confidence.hg38.vcf
17:32:57.266 INFO FeatureManager - Using codec VCFCodec to read file file:///lustrehome/sharonnatashacox/ex_storage_2021/Chromosomes/GATKbundle/Mills_and_1000G_gold_standard.indels.hg38.vcf
17:32:57.456 INFO BaseRecalibrator - Done initializing engine
17:32:57.461 INFO BaseRecalibrationEngine - The covariates being used here:
17:32:57.461 INFO BaseRecalibrationEngine - ReadGroupCovariate
17:32:57.461 INFO BaseRecalibrationEngine - QualityScoreCovariate
17:32:57.461 INFO BaseRecalibrationEngine - ContextCovariate
17:32:57.461 INFO BaseRecalibrationEngine - CycleCovariate
17:32:57.470 INFO ProgressMeter - Starting traversal
17:32:57.471 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
17:33:07.509 INFO ProgressMeter - chr1:6072168 0.2 177000 1058296.0
17:33:17.532 INFO ProgressMeter - chr1:11112939 0.3 382000 1142572.3......................................................................................................................................................
18:32:17.262 INFO ProgressMeter - chrX:153949283 59.3 72859000 1228032.8
18:32:27.295 INFO ProgressMeter - chr22_KI270734v1_random:82127 59.5 73109000 1228783.3
18:32:29.924 WARN IntelInflater - Zero Bytes Written : 0
18:32:29.927 INFO BaseRecalibrator - 5103426 read(s) filtered by: MappingQualityNotZeroReadFilter
0 read(s) filtered by: MappingQualityAvailableReadFilter
0 read(s) filtered by: MappedReadFilter
0 read(s) filtered by: NotSecondaryAlignmentReadFilter
0 read(s) filtered by: NotDuplicateReadFilter
0 read(s) filtered by: PassesVendorQualityCheckReadFilter
0 read(s) filtered by: WellformedReadFilter
5103426 total reads filtered
18:32:29.927 INFO ProgressMeter - chrUn_GL000218v1:160879 59.5 73176003 1229003.3
18:32:29.927 INFO ProgressMeter - Traversal complete. Processed 73176003 total reads in 59.5 minutes.
18:32:29.999 INFO BaseRecalibrator - Calculating quantized quality scores...
18:32:30.012 INFO BaseRecalibrator - Writing recalibration report...
18:32:30.814 INFO BaseRecalibrator - ...done!
18:32:30.814 INFO BaseRecalibrator - BaseRecalibrator was able to recalibrate 73176003 reads
18:32:30.814 INFO BaseRecalibrator - Shutting down engine
[July 22, 2022 at 6:32:30 PM CEST] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 59.59 minutes.
Runtime.totalMemory()=1581252608
Tool returned:
SUCCESS
Thankyou for your help
-
Thanks for writing into the GATK forum! Let's see if we can figure out if there is something wrong here.
It's normal that your recalibration tables should be similar/the same size but the tables should have different contents. You could run a diff command between the files to verify that the contents are different.
You can also run the tool AnalyzeCovariates to analyze your recalibration tables to determine if there were issues during BQSR: https://gatk.broadinstitute.org/hc/en-us/articles/5358816130587-AnalyzeCovariates.
I did notice that there was potentially a typo in your output variable, so you might want to check your command: -O Recal_data110_manua,table.
Let me know if you have any questions.
Best,
Genevieve
Please sign in to leave a comment.
1 comment