MuTect2 Phasing tags (PGT:PID:PS:) in FORMAT column: How to avoid/remove them?
AnsweredHi, there, I used M2 and followed the SOP and like many others came across this phasing tags (PGT:PID:PS:) insertion problem. This unorthodox GT "0|1:10,2:0.216:12:6,2:3,0:0|1:161166384_G_T:161166384:4,6,1,1" caused parsing problem in my downstream analysis. I raise this issue separately again because I didn't see any meaningful solution back there. Either it has been solved in the latest version of gatk? Or nobody cares to solve this problem? Which I don't understand. Did "-kmer-size 18" work? Has gatk/4.1.7.0 solved this problem? How do I avoid them or at least remove them afterwards? Thanks very much! Regards, SSH123
BTW,
a) GATK version used: gatk/4.1.4.1-python-3.7.4
b) Exact command used: gatk Mutect2 \
-R ref.fa \
-I tumor \
-I normal \
-normal \
--germline-resource somatic-b37-af-only-gnomad.raw.sites.vcf \
--panel-of-normals pon.vcf.gz \
-O tumor.vcf.gz
c) Entire error log:
-
Hi ,
I am going to move your post into our Community Discussions -> General Discussion topic, as this topic is for reporting bugs with GATK.
You can read more about our forum guidelines and the topics here: Forum Guidelines.
Best,
Bhanu
-
Hi SSH123,
Thanks for bringing up this issue so that we can look into it. Could you clarify where this is causing downstream issues? Is it a GATK tool? I'm also not sure what you mean by the previous post. Could you provide a link?
Best,
Genevieve
-
Hi, Genevieve,
Thanks for reviewing my question. I was trying to load the M2 VCF files into Gemini and run some queries like what I did with Varscan2 VCF files. Somehow it didn't work. The error code appeared to be "cyvcf2/cyvcf2.pyx" and "gt_bases not implemented for ploidy > 2"; even after I used "bcftools merge -m none" for no new multiallelics, output multiple records instead. I then checked GT of merged VCF and found this special GT type that is mixed with phasing tag, e.g., GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:10,2:0.216:12:6,2:3,0:0|1:161166384_G_T:161166384:4,6,1,1. I searched GATK Community Forum and saw this solution: https://gatk.broadinstitute.org/hc/en-us/community/posts/360056128991-Mutect2-miscalls-AF-and-AD-of-EGFR-G719S-on-a-reference-sample-and-FilterMutectCalls-flags-it-with-strand-bias-. But the phasing tags still existed after applying "-kmer-size 18" flag. I know Gemini is not a part of GATK tool. But just wondering if there's a method to suppress reporting phasing tags so that I could avoid possible parsing problem in downstream, such as Gemini and R packages.
Thanks very much!
SSH123
-
SSH123 Thanks for sharing the previous post. It looks like what you are trying to do is slightly different here.
I looked this over with the team and we weren't entirely sure that these tags are causing the issue, but you would need to follow up with the Gemini people to figure that out. In the VCF lines you shared from your original post, we don't see any blatant issues that normally cause downstream problems, so it could potentially be a bug with Gemini.
One workaround you could try would be to use SelectVariants with the argument --drop-genotype-annotation to drop those annotations. However, this wouldn't be ideal so you should definitely reach out to the other tool to specifically find out what is going wrong.
Best,
Genevieve
Please sign in to leave a comment.
4 comments