haplotypecaller: possible to generate phased vcf from phased bam?
Hi, i have two phased bam files (paternal.bam and maternal.bam), and i attempted to get two haploid gvcf (e.g., GT=0 in paternal.vcf and GT=1 in maternal.vcf), then combine them into phased diploid vcf (e.g., GT=0|1 in combined.vcf). However, currently this does not seems to be what haplotypecaller doing:
gatk HaplotypeCaller -R reference.fasta -I paternal.bam -O paternal.g.vcf.gz -ERC GVCF
gatk HaplotypeCaller -R reference.fasta -I maternal.bam -O maternal.g.vcf.gz -ERC GVCF
gatk CombineGVCFs -R reference.fasta --variant paternal.g.vcf.gz --variant maternal.g.vcf.gz -O combined.g.vcf.gz
these commands seems to consider paternal.bam and maternal.bam as two samples, and generate a multi-sample gvcf. Is there any mistakes a made? Thank you very much for your help.
-
Hi Weichen Song
HaplotypeCaller does perform phasing of variants within short distances that are covered by read length and supported by enough reads however if you wish to perform phasing by transmission then you may need to use other tools such as CalculateGenotypePosteriors which is a part of our genotype refinement workflow.
Past versions of GATK had PhaseByTransmission tool which performed variant phasing based on pedigree and proper inputs however that tool has been discontinued in GATK 4.
There are other tools and callers which may have the functionality to call phased variants from phased bam files such as whatshap and deepvariant but those are not within our domain and your mileage may vary. For those tools you may wish to consult biostars and seqanswers forums to get more information.
I hope this helps.
Regards.
Please sign in to leave a comment.
1 comment