Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Issue with GenotypegVCFS after Genomics DB import, building vs updating

0

6 comments

  • Avatar
    Gökalp Çelik

    Hi Melissa Spear

    Can you share your error messages from your logs so we can try to pinpoint the exact issue?

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Melissa Spear

    Hi @Gökalp Çelik ,

    Thank you, here was the error message as mentioned earlier in the post: 

    "A USER ERROR has occurred: Bad input: Presence of '-RAW_MQ' annotation is detected. This GATK version expects key RAW_MQandDP with a tuple of sum of squared MQ values and total reads over variant genotypes as the value. This could indicate that the provided input was produced with an older version of GATK. Use the argument '--allow-old-rms-mapping-quality-annotation-data' to override and attempt the deprecated MQ calculation. There may be differences in how newer GATK versions calculate DP and MQ that may result in worse MQ results. Use at your own risk."
    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Melissa Spear

    Do all each of your samples have all variants within their respective VCF or are they separated per chromosome? 

    Can you check your VCF headers using 

    bcftools view -h

    to find out which VCFs have the new RAW_MQandDP field and which ones have split RAW_MQ, DP fields?

    GATK tools starting version 4.1 and on uses RAW_MQandDP INFO field. If at least one of your sample is called using a different version using either of these fields then GenotypeGVCFs will throw that "User Error" message and the only ways to overcome this issue is to use the 

    --allow-old-rms-mapping-quality-annotation-data

    parameter or recall those samples to match INFO fields properly. 

    I hope this helps. 

     

    0
    Comment actions Permalink
  • Avatar
    Melissa Spear

    Hi Gökalp Çelik 

    Each individual sample has their VCFs split by chromosome. I looked into some of the samples, and yes it is 1000 genomes that has the split RAW_MQ and DP fields. The other samples have the RAW_MQandDP field. 

    Sure I can use the flag

    --allow-old-rms-mapping-quality-annotation-data

    But do you know what the risk is, as the original error mentions to ‘use at your own risk’. 

    Thank you! 

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Melissa Spear

    Older versions of GATK (Prior to 4.1) produces slightly different numbers for RAW_MQ and DP (due to fixes and changes in the local reassembly code)  values but nothing substantially different to make a huge change in your actual calculations. In fact looking at the codebase for this particular option there are some samples within gnomAD that also has the old INFO tags therefore I don't think  you will be facing any issues unless some of those samples were called with a GATK version too old such as pre 4.0.0.0. Below image shows one such site from a sample VCF.

    As you can see call is still the same but numbers are slightly different. Nothing too substantial. 

    If you still wish not to use this parameter you may need to revisit those samples and call variants again with a recent version of GATK. 

    Regards.

    0
    Comment actions Permalink
  • Avatar
    Melissa Spear

    Hi Gökalp Çelik 

    Its reassuring to hear that there aren't substantial changes in the outputs. Thank you so much again! 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk