Switching version of gatk during the workflow
I have currently generated g.vcf files for each sample with gatk 3.4 . Now I am in the middle of jointgenotyping with 6092 samples, which takes too long. Therefore I am trying to switch gatk version to 4.2 enabling GenomicDBimport function.
Is it okay to do so? Otherwise, is there a option that gatk 3.4 has for performing jointgenotyping for only one chromosome?
REQUIRED for all errors and issues:
a) GATK version used: gatk-3.4
b) Exact command used:
java -Xmx100g -XX:ParallelGCThreads=4 -jar $GATKJAR -T GenotypeGVCFs -R $REF \
-stand_emit_conf 10 --disable_auto_index_creation_and_locking_when_reading_rods \
-V /home/jc2545/palmer_scratch/jy/WES_gvcf/AT1.1/AT1.1.g.vcf.gz \
-V /home/jc2545/palmer_scratch/jy/WES_gvcf/AT1.2/AT1.2.g.vcf.gz \
-V /home/jc2545/palmer_scratch/jy/WES_gvcf/AT10.1/AT10.1.g.vcf.gz \
-V /home/jc2545/palmer_scratch/jy/WES_gvcf/AT10.2/AT10.2.g.vcf.gz \
...
c) Entire program log:
-
Our team usually does not recommend switching to different versions of GATK for analysis unless it is recommended by us in a documented manner. However you may be able to use GATK 4.2 or later with a CombineGVCFs product or GenomicsDBImport product using
--allow-old-rms-mapping-quality-annotation-data
parameter to enable genotyping old GVCFs with RAW_MQ and DP tags separated. Though due to unforeseen changes in the code and long forgotten but fixed bugs along the way this combination may still yield problematic results so your mileage may vary.
Besides GATK version 3 should also have -L parameter for GenotypeGVCFs therefore you may use
-L contig_name
to run the tool on a single chromosome only.
Below is the output of the tool for those parameters.
------------------------------------------------------------------------------------
The Genome Analysis Toolkit (GATK) v3.8-1-0-gf15c1c3ef, Compiled 2018/02/19 05:43:50
Copyright (c) 2010-2016 The Broad Institute
For support and documentation go to https://software.broadinstitute.org/gatk
[Mon Nov 06 16:42:13 UTC 2023] Executing on Linux 6.5.0-10-generic amd64
OpenJDK 64-Bit Server VM 1.8.0_111-8u111-b14-2~bpo8+1-b14
------------------------------------------------------------------------------------
-L,--intervals <intervals> One or more genomic intervals
over which to operate
-XL,--excludeIntervals <excludeIntervals> One or more genomic intervals
to exclude from processing
-isr,--interval_set_rule <interval_set_rule> Set merging approach to use
for combining interval inputs
(UNION|INTERSECTION)
-im,--interval_merging <interval_merging> Interval merging rule for
abutting intervals (ALL|
OVERLAPPING_ONLY)
-ip,--interval_padding <interval_padding> Amount of padding (in bp) to
add to each intervalI hope this helps.
Please sign in to leave a comment.
1 comment