--interval-padding range differs called variants
REQUIRED for all errors and issues:
a) GATK version used: gatk-4.3.0.0
b) Exact command used:$GATK --java-options "-Xmx"$memory"G -XX:ParallelGCThreads=$thread" HaplotypeCaller -R $GENOME_FASTA -I $BQSR_DIR/recal_reads_"$pid".bam --max-reads-per-alignment-start 0 \
-L $VBED -ip 100 -bamout $VAR_DIR/hc_"$pid".bam -O $VAR_DIR/raw_variants_"$pid".vcf
c) Entire program log:Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx160G -XX:ParallelGCThreads=44 -jar /gpfs/shared/WES_analyses/tools/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar HaplotypeCaller -R /gpfs/shared/WES_analyses/refs/refs/broad_human_bundle_hg37/Homo_sapiens_assembly19.fasta -I /gpfs/shared/WES_analyses/Sets-annalysis/BSH/hg19/BQSR_output/recal_reads_51_S51.bam --max-reads-per-alignment-start 0 -L /gpfs/shared/WES_analyses/bedfiles/BSH_hg19/PAH.bed -ip 100 -bamout /gpfs/shared/WES_analyses/Sets-annalysis/BSH_TGDA_QiaSeqControl/51_S51/hg19/Variants_output/hc_51_S51.bam -O /gpfs/shared/WES_analyses/Sets-annalysis/BSH/hg19/Variants_output/raw_variants_51_S51.vcf
12:27:36.922 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/shared/WES_analyses/tools/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
12:27:37.219 INFO HaplotypeCaller - ------------------------------------------------------------
12:27:37.220 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.3.0.0
12:27:37.220 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
12:27:37.220 INFO HaplotypeCaller - Executing as sselvi@server001.mesten.local on Linux v3.10.0-1062.12.1.el7.x86_64 amd64
12:27:37.220 INFO HaplotypeCaller - Java runtime: IBM J9 VM v8.0.5.30 - pxa6480sr5fp30-20190207_01(SR5 FP30)
12:27:37.220 INFO HaplotypeCaller - Start Date/Time: March 12, 2024 12:27:36 PM TRT
12:27:37.220 INFO HaplotypeCaller - ------------------------------------------------------------
12:27:37.220 INFO HaplotypeCaller - ------------------------------------------------------------
12:27:37.221 INFO HaplotypeCaller - HTSJDK Version: 3.0.1
12:27:37.221 INFO HaplotypeCaller - Picard Version: 2.27.5
12:27:37.221 INFO HaplotypeCaller - Built for Spark Version: 2.4.5
12:27:37.221 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
12:27:37.221 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
12:27:37.221 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
12:27:37.221 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
12:27:37.221 INFO HaplotypeCaller - Deflater: IntelDeflater
12:27:37.221 INFO HaplotypeCaller - Inflater: IntelInflater
12:27:37.221 INFO HaplotypeCaller - GCS max retries/reopens: 20
12:27:37.221 INFO HaplotypeCaller - Requester pays: disabled
12:27:37.221 INFO HaplotypeCaller - Initializing engine
12:27:37.681 INFO FeatureManager - Using codec BEDCodec to read file file:///gpfs/shared/WES_analyses/bedfiles/BSH_hg19/PAH.bed
12:27:37.694 INFO IntervalArgumentCollection - Processing 4089 bp from intervals
12:27:37.698 INFO HaplotypeCaller - Done initializing engine
12:27:37.715 INFO HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output
12:27:37.749 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/gpfs/shared/WES_analyses/tools/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
12:27:37.750 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/gpfs/shared/WES_analyses/tools/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
12:27:37.774 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
12:27:37.775 INFO IntelPairHmm - Available threads: 44
12:27:37.775 INFO IntelPairHmm - Requested threads: 4
12:27:37.775 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
12:27:37.807 INFO ProgressMeter - Starting traversal
12:27:37.807 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
12:27:55.315 WARN InbreedingCoeff - InbreedingCoeff will not be calculated at position chr12:103237468 and possibly subsequent; at least 10 samples must have called genotypes
12:27:59.212 INFO ProgressMeter - chr12:103238009 0.4 10 28.0
12:28:11.823 INFO ProgressMeter - chr12:103248874 0.6 20 35.3
12:28:28.877 INFO ProgressMeter - chr12:103288708 0.9 30 35.2
12:28:41.697 INFO HaplotypeCaller - 26 read(s) filtered by: MappingQualityReadFilter
0 read(s) filtered by: MappingQualityAvailableReadFilter
0 read(s) filtered by: MappedReadFilter
0 read(s) filtered by: NotSecondaryAlignmentReadFilter
0 read(s) filtered by: NotDuplicateReadFilter
0 read(s) filtered by: PassesVendorQualityCheckReadFilter
0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
0 read(s) filtered by: GoodCigarReadFilter
0 read(s) filtered by: WellformedReadFilter
26 total reads filtered
12:28:41.697 INFO ProgressMeter - chr12:103288708 1.1 35 32.9
12:28:41.697 INFO ProgressMeter - Traversal complete. Processed 35 total regions in 1.1 minutes.
12:28:41.824 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.144322589
12:28:41.824 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 29.350178015
12:28:41.824 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 5.06 sec
12:28:42.208 INFO HaplotypeCaller - Shutting down engine
[March 12, 2024 12:28:42 PM TRT] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 1.09 minutes.
Runtime.totalMemory()=721682432
Hello,
I am experiencing an issue with calling the expected variant. The variant is not detected when using the -ip 100 parameter as shown above, whereas it is detected when using the -ip 20. Why the range of interval padding affect calling variant? How can I trust to use -ip parameter?
variant called with "-ip 20":
chr12 103237568 . C T 5896.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=-14.555;DP=2155;ExcessHet=0.0000;FS=141.529;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=4.18;ReadPosRankSum=-1.444;SOR=4.155 GT:AD:DP:GQ:PL 0/1:1052,357:1409:99:5904,0,24615
Thank you in advance
-
Hi Sinem Selvi
This is a known issue about how intervals affect HaplotypeCaller and Mutect2's performance in calling variants.
When a certain amount of interval provided to HaplotypeCaller/Mutect2, region of interest and active region search is automatically cropped to that size. So only reads that are captured and reference genome within the region of interest is available to the local reassembly engine. Changing the amount of reference genome provided changes all the the reassembly inputs therefore a different assembly may occur and the weights and evidences for the haplotype found with the shorter interval may be invalidated with the additional data. Therefore our recommendation is to keep intervals as long as possible but beware that complex regions and repeats will always show this discrepancy. The best practice is to provide intervals split by long stretches of Ns and later on limiting the number of variants discovered by SelectVariants tool.
Looking at the call you have there it is quite unbalanced in terms of ADs due to the regions complexity and repetitiveness both of which is counteracting the variant call quality.
Additionally you may want to check the article below for any variant that you think is real but not captured by HaplotypeCaller/Mutect2.
Finally. Our latest versions of GATK includes a parameter to add pileup based calling in HaplotypeCaller. If you are interested in you can try the parameter below.
--pileup-detection <Boolean> If enabled, the variant caller will create pileup-based haplotypes in addition to the
assembly-based haplotype generation. Default value: false. Possible values: {true, false}I hope this helps.
Please sign in to leave a comment.
1 comment