Using GATK 184.108.40.206 I'm essentially trying to merge different samples from two different vcf files. To make things a bit more challenging the genotypes are of mixed ploidies (2/4).
With already called genotypes in these vcfs (2 separate runs with haplotype caller), with PL annotated I'm running
gatk CombineGVCFs -R ref.fasta -V tetra.vcf.gz -V mixed.vcf.gz -O merged.vcf --tmp-dir $TMPDIR
to combine the two files. What's strange here already is for ALT alleles it removes the known allele and replaces it with <NON-REF> for all loci. Even if I run CombineGVCFs with just one of the files (gatk CombineGVCFs -R ref.fasta -V tetra.vcf.gz -O foo.vcf).
When I then run
gatk GenotypeGVCFs -R ref.fasta -V merged.vcf -O merged_GT.vcf --tmp-dir $TMPDIR
I get only the header of the new, called vcf but nothing below the #CHROM line.
I can narrow this problem down to two questions:
1. Can I run CombineGVCFs / GenotypeGVCFs twice in this pipeline (calling for individual cohorts, merging)
2. Does ALT: <NON-REF> in the combined gVCF cause the missing genotypes after GenotypeGVCFs?
I do not have access to bam's of the first cohort so joint calling directly after haplotype caller is not possible.
Please sign in to leave a comment.