Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

MuTect2 Phasing tags (PGT:PID:PS:) in FORMAT column: How to avoid/remove them?

Answered
0

4 comments

  • Avatar
    Bhanu Gandham

    Hi ,

    I am going to move your post into our Community Discussions -> General Discussion topic, as this topic is for reporting bugs with GATK.

    You can read more about our forum guidelines and the topics here: Forum Guidelines.

    Best,

    Bhanu

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi SSH123,

    Thanks for bringing up this issue so that we can look into it. Could you clarify where this is causing downstream issues? Is it a GATK tool? I'm also not sure what you mean by the previous post. Could you provide a link?

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    SSH123

    Hi, Genevieve,

    Thanks for reviewing my question. I was trying to load the M2 VCF files into Gemini and run some queries like what I did with Varscan2 VCF files. Somehow it didn't work. The error code appeared to be "cyvcf2/cyvcf2.pyx" and "gt_bases not implemented for ploidy > 2"; even after I used "bcftools merge -m none"  for no new multiallelics, output multiple records instead. I then checked GT of merged VCF and found this special GT type that is mixed with phasing tag, e.g., GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:10,2:0.216:12:6,2:3,0:0|1:161166384_G_T:161166384:4,6,1,1. I searched GATK Community Forum and saw this solution: https://gatk.broadinstitute.org/hc/en-us/community/posts/360056128991-Mutect2-miscalls-AF-and-AD-of-EGFR-G719S-on-a-reference-sample-and-FilterMutectCalls-flags-it-with-strand-bias-. But the phasing tags still existed after applying "-kmer-size 18" flag. I know Gemini is not a part of GATK tool. But just wondering if there's a method to suppress reporting phasing tags so that I could avoid possible parsing problem in downstream, such as Gemini and R packages.

    Thanks very much!

    SSH123

     
     
     
    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    SSH123 Thanks for sharing the previous post. It looks like what you are trying to do is slightly different here.

    I looked this over with the team and we weren't entirely sure that these tags are causing the issue, but you would need to follow up with the Gemini people to figure that out. In the VCF lines you shared from your original post, we don't see any blatant issues that normally cause downstream problems, so it could potentially be a bug with Gemini.

    One workaround you could try would be to use SelectVariants with the argument --drop-genotype-annotation to drop those annotations. However, this wouldn't be ideal so you should definitely reach out to the other tool to specifically find out what is going wrong.

    Best,

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk