Hi, as topic describes, I am writing regarding different types of ploidy in GT field in GenotypedVCF when allele call is missing. I scanned through genotyped vcf to check for patterns and distribution. 97.5% of my samples have at least one case where GT field would be '.' instead of './.' or '.|.'
My conclusion would be that at genotyping step, sites not found in the samples' gVCF are filled with dots/zeros, though I cannot confirm my thoughts with any docs. However, is it expected to not match ploidy found across the sample? If it is the case of calls without enough support for ref/alt would you recommend imputing those to wild types?
I will be happy to get any feedback regarding this - thanks in advance!
GATK version used:
The Genome Analysis Toolkit (GATK) v184.108.40.206
HTSJDK Version: 2.21.0
Picard Version: 2.21.2
# Single call per sample
gatk HaplotypeCaller -I <input_bam> -R <ref.fa> -O <output.g.vcf.gz> -ERC GVCF
# Create DB
gatk GenomicsDBImport --genomicsdb-update-workspace-path <prefix_genomics_db> -L 20 --sample-name-map <map_file> --reader-threads <threads> --batch-size 500
# Genotype VCFs
gatk --java-options -Xmx40g GenotypeGVCFs -R <ref.fa> -V <preifx_genomics_db> -O <my.genotyped.vcf.gz>
Please sign in to leave a comment.