gatk 4.2.6.1 BaseRecalibrator cycle covariate
Hello, below you can see the entire program log after running baserecalibrator where you can see the gatk version and full command ran.
Some background into my project: I have 8 cell cultures from the same individual and hifi reads that went into a reference genome and I am using all of them for snp calling. (performed separate for each sample and joint afterwards)
Because this is a non-model organism I am using the final snps as a dbSNP and plan to do 3 rounds of recalibration for each individual sample before joint calling one last time. On the first round, right after joint calling I get the log below for all sample types.
The problem is that no matter how much I increase the max cycle the cycle found is always 1 higher than what I use. Why is it? How can I fix it?
Using GATK jar /path/to/gatk/4.2.6.1/gatk-package-4.2.6.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx4g -Xms4g -XX:ParallelGCThreads=2 -jar /path/to/gatk/4.2.6.1/gatk-package-4.2.6.1-local.jar BaseRecalibrator -I RESULTS/MAPPED_READS/HIFI/hifi_reads.bam -R /share/pool/CompGenomVert/StrixAssembly/RESULTS/CURATED/ref_genome.inter.fa --known-sites RESULTS/MAPPED_READS/HC/allsites.vcf.gz --maximum-cycle-value 20000 -O RESULTS/CALLED/HIFI/recalibration1/hifi_reads.recalc.table
12:21:41.777 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/path/to/gatk/4.2.6.1/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
12:21:41.896 INFO BaseRecalibrator - ------------------------------------------------------------
12:21:41.896 INFO BaseRecalibrator - The Genome Analysis Toolkit (GATK) v4.2.6.1
12:21:41.896 INFO BaseRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
12:21:41.896 INFO BaseRecalibrator - Executing as ychrysostomakis@compute-0-9.local on Linux v3.10.0-1160.105.1.el7.x86_64 amd64
12:21:41.896 INFO BaseRecalibrator - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_231-b11
12:21:41.897 INFO BaseRecalibrator - Start Date/Time: 2. August 2024 12:21:41 MESZ
12:21:41.897 INFO BaseRecalibrator - ------------------------------------------------------------
12:21:41.897 INFO BaseRecalibrator - ------------------------------------------------------------
12:21:41.897 INFO BaseRecalibrator - HTSJDK Version: 2.24.1
12:21:41.897 INFO BaseRecalibrator - Picard Version: 2.27.1
12:21:41.897 INFO BaseRecalibrator - Built for Spark Version: 2.4.5
12:21:41.897 INFO BaseRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
12:21:41.897 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
12:21:41.897 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
12:21:41.897 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
12:21:41.897 INFO BaseRecalibrator - Deflater: IntelDeflater
12:21:41.898 INFO BaseRecalibrator - Inflater: IntelInflater
12:21:41.898 INFO BaseRecalibrator - GCS max retries/reopens: 20
12:21:41.898 INFO BaseRecalibrator - Requester pays: disabled
12:21:41.898 INFO BaseRecalibrator - Initializing engine
12:21:42.276 INFO FeatureManager - Using codec VCFCodec to read file file:///path/to/RESULTS/MAPPED_READS/HC/allsites.vcf.gz
12:21:42.359 INFO BaseRecalibrator - Done initializing engine
12:21:42.360 WARN BaseRecalibrator - This tool has only been well tested on ILLUMINA-based sequencing data. For other data use at your own risk.
12:21:42.362 INFO BaseRecalibrationEngine - The covariates being used here:
12:21:42.362 INFO BaseRecalibrationEngine - ReadGroupCovariate
12:21:42.362 INFO BaseRecalibrationEngine - QualityScoreCovariate
12:21:42.362 INFO BaseRecalibrationEngine - ContextCovariate
12:21:42.362 INFO BaseRecalibrationEngine - CycleCovariate
12:21:42.374 INFO ProgressMeter - Starting traversal
12:21:42.374 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
12:21:43.421 INFO BaseRecalibrator - Shutting down engine
[2. August 2024 12:21:43 MESZ] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=4116185088
***********************************************************************
A USER ERROR has occurred: The maximum allowed value for the cycle is 20000, but a larger cycle (20001) was detected. Please use the --maximum-cycle-value argument (when creating the recalibration table in BaseRecalibrator) to increase this value (at the expense of requiring more memory to run)
***********************************************************************
I have the same error regardless of whether I use
--java-options "-Xmx4g -Xms4g -XX:ParallelGCThreads=2"
-
Hi Yanis Chrys
HiFi data would definitely have reads longer than this value. Did you check the length distribution of your data using tools such as FASTQC?
-
Hi Gökalp Çelik
Thank you very much for your reply!
I think this speaks to my complete misunderstanding of what the cycle covariate is. I was under the impression that it had to do with model iterations, as it was said to affect memory/runtime. I have checked the read lengths so I have an idea of each sample's read size distribution.
Should this value be set to the maximum read size in the dataset? -
Hi again.
Yes that would be appropriate. The default parameter is set to 500 which covers most short read sequencing data types unless one chooses to use 2x300 cycles.
I hope this helps.
-
Thank you, this worked and it's running fine now.
Please sign in to leave a comment.
4 comments