Why does HaplotypeCaller split one variant into two variants? (Originally one variant)
REQUIRED for all errors and issues:
a) GATK version used: 4.4.0.0
b) Exact command used: GATKCommandLine=<ID=GenotypeGVCFs,CommandLine="GenotypeGVCFs --output 3.vcf --variant 3.g.vcf.gz --intervals bed --interval-padding 1000 --reference ucsc_hg19.fa --include-non-variant-sites false --merge-input-intervals false --input-is-somatic false --tumor-lod-to-emit 3.5 --allele-fraction-error 0.001 --keep-combined-raw-annotations false --use-posteriors-to-calculate-qual false --dont-use-dragstr-priors false --use-new-qual-calculator true --annotate-with-num-discovered-alleles false --heterozygosity 0.001 --indel-heterozygosity 1.25E-4 --heterozygosity-stdev 0.01 --standard-min-confidence-threshold-for-calling 30.0 --max-alternate-alleles 6 --max-genotype-count 1024 --sample-ploidy 2 --num-reference-samples-if-no-call 0 --genotype-assignment-method USE_PLS_TO_ASSIGN --genomicsdb-max-alternate-alleles 50 --call-genotypes false --genomicsdb-use-bcf-codec false --genomicsdb-shared-posixfs-optimizations false --genomicsdb-use-gcs-hdfs-connector false --only-output-calls-starting-in-intervals false --interval-set-rule UNION --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --max-variants-per-shard 0 --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false --disable-tool-default-annotations false --enable-all-annotations false --allow-old-rms-mapping-quality-annotation-data false",Version="4.4.0.0"
c) Entire program log:
I have identified two variants that share the same PID, indicating a correlation between them. Upon examining the sample's BAM file using IGV, it appears to be a single variant (15bp deletion) rather than two separate ones, which HaplotypeCaller has split into two calls. How should this be interpreted? Should we consider the single variant shown in IGV as the actual variant, and the two called variants as artifacts of the calling process?
Are there any specific options or parameters that could help achieve variant calls that more accurately reflect the actual variant structure?
I will attach the variants information and IGV screenshot below.
Thank you!
chr4 187122302 . ATCATACAGGTCATC A 3082.03 PASS AC=2;AF=1;AN=2;DP=70;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=60;QD=31.53;SOR=0.844 GT:AD:DP:GQ:PGT:PID:PL:PS 1|1:0,69:69:99:1|1:187122302_ATCATACAGGTCATC_A:3096,208,0:187122302
chr4 187122318 . CT C 3084.03 PASS AC=2;AF=1;AN=2;DP=75;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=60;QD=28.35;SOR=0.844 GT:AD:DP:GQ:PGT:PID:PL:PS 1|1:0,69:69:99:1|1:187122302_ATCATACAGGTCATC_A:3098,209,0:187122302
-
Hi S
If you are using default parameters for HaplotypeCaller this behavior is expected for sites that may have alternative representations which may cost less in Smith-Waterman realignment after local assembly. HaplotypeCaller by default is not aware of how pileups look in IGV and its realignment default parameters may cause representations like these ones and technically they are not wrong. Our newer versions of HaplotypeCaller can utilize pileups and add them to the reassembly path as Partially Determined Haplotypes and may bring a more refined outcome based on PDHMMs. This requires users to activate pileup calling and pdhmm however these parameters currently have other downsides. The best implementation of these parameters are in Dragen 3.7.8 compatibility mode which we call as Functional Equivalence with Illumina DRAGEN. This mode may produce variants that correspond to what pileups also support but it is much slower and currently not compatible with GVCF output.
We are working to get native accelerated version of PDHMM along with GVCF compatibility but that may take sometime. In the meantime if GVCF is not your target and you are working with single sample workflows you may activate functional equivalence for GATK and try to see if it works for your data. One thing to note that this mode is calibrated only for data generated on Illumina NextSeq/NovaSeq sequencers. We cannot guarantee fidelity with other short read sequencing technologies.
I hope this helps.
Regards.
-
Gökalp Çelik Thank you! Gökalp! Functional equivalence is only available for GATK-Dragen?
-
Hi again.
FE is only for DRAGEN-GATK compatibility. However enabling pdhmm and pileup calling is not restricted to FE. You may enable those parameters individually. Yet again enabling pdhmm breaks GVCF compatibility. So your mileage may vary.
Regards.
Please sign in to leave a comment.
3 comments