Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GATK4.1.3.0 HaplotypeCaller ERROR

Answered
0

8 comments

  • Avatar
    WenyaWang

    Besides that I come up with the same error when I use latest Version.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    If that is the only area with an M, you can use the option -XL to exclude the interval from processing. However, if that is not the only issue, you can check out this documentation to diagnose the problem: https://gatk.broadinstitute.org/hc/en-us/articles/360035891231-Errors-in-SAM-or-BAM-files-can-be-diagnosed-with-ValidateSamFile, then fix your file.

    1
    Comment actions Permalink
  • Avatar
    WenyaWang

    Thanks for your suggestion. I followed the instructions in this documentation: https://gatk.broadinstitute.org/hc/en-us/articles/360035891231-Errors-in-SAM-or-BAM-files-can-be-diagnosed-with-ValidateSamFile.

    Unfortunately, it returns "No errors found". 

    I wonder whether it is because that I use Novoalign to do the alignment. And have you met the same error before? 

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi WenyaWang, glad it found no other errors. You can look for non-GATK solutions to remove the M, or use the -XL option in HaplotypeCaller to exclude the region like I wrote above. Here is the documentation link.

    Unfortunately we only provide solutions for GATK issues. But if someone in the community has seen this issue with Novoalign, please let us know!

    0
    Comment actions Permalink
  • Avatar
    WenyaWang

    Hi,

    Here is the reply from NovoAlign:

    "Hi Wenya

     
    Thanks for your email.
     
    Yes it does appear that GATK does not like the ambiguous IUPAC base 'M' that exists and it is a known issue that HaplotypeCaller does not support IUPAC codes. In our recommended workflow with Novoalign you could build an IUPAC reference novoindex and align your reads to that. However when you do variant calling or GATK you would use a regular reference FASTA (not a IUPAC one). 
    If you are still seeing this error after following the workflow above you may need to manipulate your BAM file before running GATK to replace the M's with an A or C in the SAM SEQ column, and then run GATK to see if that works. Also don't forget the other IUPAC codes should also be replaced.
    Picard validation probably does not check the actual sequence in SAM format for IUPAC codes which is why you're seeing no validation errors."
     
    So the problem is generated by reference genome.
    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thank you for updating this thread, WenyaWang, it will definitely be useful to other users!

    0
    Comment actions Permalink
  • Avatar
    Divon Lan

    I ran into the same problem and developed a quick fix for it, using Genozip, in case it useful for anyone else: https://genozip.readthedocs.io/gatk-unexpected-base.html

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Divon Lan, thank you for providing this resource!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk