Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

The output of Mutect2 cannot be accepted by FilterMutectCalls in GATK-4.2.0.0

Answered
0

10 comments

  • Avatar
    Qihan Long

    Hi, actually I changed all the separators within "AS_UNIQ_ALT_READ_COUNT=167|35|14" from "|" into "," and used "AS_UNIQ_ALT_READ_COUNT=167,35,14" as input records. Then the error has disappeared, which indicated the separator "|" had caused the issue. 

    So I digged into the java source code (github link) for AS_UNIQ_ALT_READ_COUNT annotation, found that it finally revoked encodeAnyASListWithRawDelim function to combine list into "|" (ALLELE_SPECIFIC_RAW_DELIM) separated string and appeared as actual annotation. I'm not sure whether to use ALLELE_SPECIFIC_REDUCED_DELIM (",") or ALLELE_SPECIFIC_RAW_DELIM ("|") was better here, but seems that "," separated annotation here can be accepted by FilterMutectCalls. 

    P.S: Several downstream annotations (CONTQ, SEQQ and STRANDQ) were gone missing within FilterMutectCalls results, hope you can enlighten me the reason for their disappearance. Much thanks!

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Qihan Long

    Thanks for your post and digging into this issue! This does indeed look like a bug in the 4.2.0.0 version of FilterMutectCalls, or a bug in Mutect2 for producing the output like this. I have created a ticket here where you can follow along for updates regarding a fix to the issue: https://github.com/broadinstitute/gatk/issues/7298

    For now you can either use the older version of FilterMutectCalls or you can remove this annotation from your output with -AX AS_UNIQ_ALT_READ_COUNT. 

    Regarding your P.S. message, there is another thread about the STRANDQ annotation and the others should fall in the same category as well: https://gatk.broadinstitute.org/hc/en-us/community/posts/360078035372-STRANDQ-missing-from-mutect2-vcf-records. Not all annotations are meant to be seen in the output of FilterMutectCalls.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Qihan Long

    Thanks for your instant reply and inspection!

    I shall try to use Mutect2 (4.2.0.0) combined with FilterMutectCalls (4.1.6.0) to finish my variant calling and annotation process. Hope it's a bug within FilterMutectCalls (4.2.0.0), otherwise I'll have to re-call all the variants or modify vcf files accordingly which can be a messy work, hahah.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Yeah, I hope so too!

    I'll try to get more information about this from the Mutect2 developers and will let you know.

    0
    Comment actions Permalink
  • Avatar
    Qihan Long

    Hi Genevieve,

    Thanks for your time~

    Much appreciated!

    0
    Comment actions Permalink
  • Avatar
    Qihan Long

    FYI, if anyone wants to utilize PyVCF to read in vcf records I mentioned before, you'll definitely found the error "ValueError: could not convert string to float: '167|35|14" which means PyVCF cannot parse this AS_UNIQ_ALT_READ_COUNT INFO domain since they're labeled to be Integer in vcf header's INFO annotation.

    Therefore, manually change the INFO's annotation within PyVCF can temporarily fix this issue, corresponding codes listed below:

    import vcf
    import collections
    vcf_reader = vcf.Reader(filename="C:\\Users\\-PC\Desktop\sftp\\test.vcf.gz")
    # Change INFO annotation for this errorneous field within vcf_reader class
    pre_info = vcf_reader.infos['AS_UNIQ_ALT_READ_COUNT']
    after_info = collections.namedtuple('Info', ['id', 'num', 'type', 'desc', 'source', 'version'])
    after_info = after_info(pre_info.id, pre_info.num,
    "String", pre_info.desc,
    pre_info.source, pre_info.version)
    vcf_reader.infos['AS_UNIQ_ALT_READ_COUNT'] = after_info
    # Feel free to continue your work!
    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Qihan Long,

    Thank you for posting the solution you found for other users! It's very helpful.

    I confirmed with the developers that the AS annotations had formatting changes recently and it looks like they were not fully updated. The team will continue to look into this issue at the ticket I created. https://github.com/broadinstitute/gatk/issues/7298

    You can continue to use 4.1.6.0 for FilterMutectCalls. Or, you could remove the annotation with bcftools annotate, then you can use 4.2.0.0 FilterMutectCalls. The AS_UNIQ_ALT_READ_COUNT annotation is really only used for Broad internally for special use cases or troubleshooting.

    Thanks for writing in about this issue!

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Qihan Long

    Hi Genevieve Brandt

    Thank you for your timely update, I successfully removed the annotation using GATK's SelectVariants with --drop-info-annotation option which proved GATK's versatility~

    Unfortunately, it seems that I spotted another bug emitted within other Mutect2 records. 

    chr22 22215158 . G A,C . . AC=1,1;AF=0.200,0.200;AN=5;AS_BaseQRankSum=-0.518;AS_MQ=60.00,;AS_MQRankSum=0.000;AS_ReadPosRankSum=1.087;AS_SB_TABLE=74,66|6,24|0,0;AS_UNIQ_ALT_READ_COUNT=30|0;BQHIST=13,0,0,1,17,0,0,1,19,1,0,0,20,0,0,13,21,0,0,1,22,0,0,2,23,0,0,1,24,2,0,2,25,0,0,1,26,0,0,4,27,0,0,7,28,2,0,8,29,6,0,24,30,7,0,11,31,9,0,10,32,0,0,4,33,0,0,1,35,0,0,11,36,0,0,14,37,0,0,3,38,0,0,1,40,0,0,2,42,0,0,2,43,0,0,3,45,0,0,4;BaseQRankSum=-0.518;ClippingRankSum=-3.599;DP=179;ECNT=29;FS=29.496;LikelihoodRankSum=0.564;MBQ=30,30,0;MFRL=142,160,0;MMQ=60,60,60;MPOS=16,50;MQ=60.00;MQ0=0;MQRankSum=0.000;NALOD=1.37,1.37;NCC=0;NCount=0;NLOD=13.55,13.24;OCM=0;POPAF=6.00,6.00;REF_BASES=GGATCTCAGAGAGATTCTCTG;ReadPosRankSum=1.087;SOR=2.607;Samples=TCGA-55-8092-01A-11R-2241-07;TLOD=95.63,14.73 GT:AD:AF:DP:F1R2:F2R1:SB 0/1/2:90,30,0:0.235,0.050:120:39,12,0:40,14,0:42,48,6,24 0/0:50,0,0:0.021,0.021:50:27,0,0:23,0,0:32,18,0,0

    This INFO domain(AS_MQ) containing erroneous value (60.00,) which cannot be parsed by PyVCF package since its value is annotated to be Float. 

    All my reports were based on switching on "--enable-all-annotations" option, so perhaps the bug I mentioned cannot be encountered if following a standard pipeline (GATK Best Practice). Sorry for your precious time to inspect this rare bug. 

    P.S: Actually I spent 3h trying to locate the bug within AS_RMSMappingQuality related source code (Github link), only found it possibly related to makeFinalizedAnnotationString function since it missed another allele's AS_MQ value which supposed to be after the comma (","), hope it helps. 

    P.P.S: To make vcf file readable to PyVCF, simply change the INFO's annotation within PyVCF as I mentioned before, then it's ok to continue. 

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thank you for updating us about this other issue you found! I have added it to the github ticket I originally created, here: https://github.com/broadinstitute/gatk/issues/7298. The team will get this fixed as soon as possible within their timelines.

    Also - thank you for taking the time to find the related code! No worries about not finding it though :)

    I don't think many users run Mutect2 with --enable-all-annotations so that must be why we did not identify this bug as of now. I confirmed with the developer team that the AS_MQ annotation is not generally used other than troubleshooting and special use cases. So it shouldn't be a problem to remove that one as well from your analysis.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Qihan Long

    Hi Genevieve Brandt

    I got it! Thanks for your clarification, it's of great help~

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk