Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

New version of GATK leads to VariantRecalibrator error.

Answered
0

10 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi woodword,

    In the issue you linked to, the issue was not a GATK bug, but a new check in GATK that revealed an existing issue in the user's data. The fix was to add more information to the error message so that users can find the problems more easily. 

    I can update the issue with your example and determine if the fix was successful, since your error message does not have location information. Did this issue persist when you tried GATK 4.1.4.0? 

    Please double check your data because this is not a GATK issue, but an issue with your reference alleles being inconsistent, which indicates using inconsistent reference versions.

    0
    Comment actions Permalink
  • Avatar
    woodword

    I am not sure where the problem is. I've tried the every version of GATK ( 4.1.4.0 to 4.1.9.0) with the same data and resource files. The issue disappeared when I ran 4.1.4.0 and 4.1.4.1. GATK 4.1.5.0 or newer will lead to this problem,

     

    By the way the dbsnp resource (dbsnp_138.b37.vcf.gz) I used has a MD5 of fb24e974627684d6a7e455a450a4d405, I hope I didn't download the wrong file.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi woodword, yes, you are correct, there was a new check introduced in GATK 4.1.5.0 that throws an error when there are issues with the reference file. I have created an issue ticket here so that we can improve the error message. The improved error message will help you find the location of the issue so you can fix your file and run the tool.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Just wanted to follow up that we have merged a change to improve the error message and the fix will be in the next release.

    0
    Comment actions Permalink
  • Avatar
    Ahmed S. Chakroun

    Hi,

    I hit the same issue using VariantRecalibrator from GATK release 4.1.9.0 so I upgraded to 4.2.0.0 to check the position and it turns out that it was at position 6:29857105 with the following error description:

    Caused by: java.lang.IllegalStateException: The provided reference alleles do not appear to represent the same position, AC* vs. AA

    Now, checking the dbsnp_138.b37.vcf.gz file for that position gave:

    6 29857105 rs201835144 A C . . OTHERKG;RS=201835144;RSPOS=29857105;SAO=0;SSR=0;VC=SNV;VLD;VP=0x050000000001040002000100;WGT=1;dbSNPBuildID=137
    6 29857105 rs9278395 AA A,AC . . GNO;NOC;OTHERKG;RS=9278395;RSPOS=29857106;SAO=0;SLO;SSR=0;VC=DIV;VP=0x050100000001000102000210;WGT=1;dbSNPBuildID=118
    6 29857105 rs202000432 AC A . . OTHERKG;RS=202000432;RSPOS=29857111;SAO=0;SSR=0;VC=DIV;VP=0x050000000001000002000200;WGT=1;dbSNPBuildID=137

    and checking my own vcf for that same position gave:

    6 29857105 . AC A 138.92 . AC=2;AF=1.00;AN=2;AS_BaseQRankSum=.;AS_FS=0.000;AS_MQ=22.28;AS_MQRankSum=.;AS_QD=27.80;AS_ReadPosRankSum=.;AS_SOR=3.611;DP=6;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=22.95;QD=27.78;SOR=3.611 GT:AD:DP:GQ:PL 1/1:0,5:5:15:153,15,0

    Please, could you help me to see what is going wrong?

    Thank you very much.
    Regards.
    Ahmed

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Ahmed,

    Thanks for giving this example, it looks like there is an issue with this dbSNP file which is causing issues with the reference context. The 2nd and 3rd records are conflicting. 

    You may want to use a newer dbSNP version to fix this issue.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Ahmed S. Chakroun

    Hi Genevieve,

    Thank you very much for your reply. Please, do you have any suggestions where to find a newer version of dbSNP for build37.2?

    All the best.
    Ahmed

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    I'm not sure, do you know where you got this version of dbSNP? Is it in our data resources?

    0
    Comment actions Permalink
  • Avatar
    Ahmed S. Chakroun

    Absolutely, I dowloaded it from the Broad Institute ftp bundle. The google bucket is exclusively for hg38.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Ok I see, dbSNP versions are not always edited for these kind of issues, so there is not a lot we can do for this. The GATK Tool can't handle these sites because these are conflicting entries. 

    I'll look into this from our end, but I can't guarantee that we will be able to provide a fix because this is a dbSNP resource, not a resource that we made. 

    woodword did you ever find a workaround for this?

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk