HaplotypeCaller produces variant lines with no depth information when calling gvcf
Versions used: -gatk-4.1.8.1
-openjdk version 11.0.8
Commands used:
HaplotypeCaller --emit-ref-confidence GVCF --output denemegatkvar.gvcf --input Documents/sorted.bam --reference Documents/hg19.fa
HaplotypeCaller --output denemegatkvar.vcf --input Documents/sorted.bam --reference Documents/hg19.fa
Greetings GATK community!
I need some help in an issue I could not be sure if it's a known feature or a bug
While I was working on a project, I noticed some differences in variant lines between gvcf and vcf files, both produced by HaplotypeCaller(as indicated in code above, respectively).
These differences include quality, DP values, as well as some additional variant lines with no genotype/coverage info in gvcf file, which are completely absent in the vcf file.
When the positions are looked in IGV, it seems that -ERC GVCF have found the variants (but failed to give any genotype/coverage info, even though there seems many reads), whereas vcf call didn't find (or filtered out due to the quality score) these lines.
--difference screenshot
--vcf lines screentshot
--IGV screenshot
For my project, I need to extract coverage infos, therefore these lines pose a problem. If it was a known issue, please have the patience to explain the reason why it happens and if there's a way to overcome this issue.
Thanks in advance
-
Hi Berk Gonenc, using HaplotypeCaller in these modes produce different outputs. Please see these links to our documentation for more information on why. Another point to keep in mind is these results are intended to be filtered and are used in different cases. So when comparing specific variants, it would be better to compare after filtering or after genotyping the GVCF.
GVCF - Genomic Variant Call Format
HaplotypeCaller Reference Confidence Model (GVCF mode)
Calling variants on cohorts of samples using the HaplotypeCaller in GVCF mode
Please sign in to leave a comment.
1 comment