Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

ArtificialHaplotypeRG makes variant heterozygous

Answered
0

5 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Wiebe Kooistra,

    HaplotypeCaller does local realignment so the counts before and after at a specific site will not always match. Could you check how many of the reads are marked with ArtificialHaplotype in the RG field?

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Wiebe Kooistra

    Hi Genevieve Brandt (she/her)

    I did read about the local realignment. There are 173 ArtificialHaplotypeRG's marked sequences. That the numbers don't add up I understand, I was just wondering why my homozygous SNP for the alternative allele is getting so much heterozygous scores in the Artificial Haplotype RGs?

    Regards,

    Wiebe

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Could you share the GVCF or VCF output line? It may help me better understand why the bamout appears like this.

    0
    Comment actions Permalink
  • Avatar
    Wiebe Kooistra

    Hey,

    Sure. Here is the VCF output line: SL4.0ch11 25548831 . T C,A 4.09785e+06 PASS AC=1,0;AF=0.523,0.006162;AN=2;BaseQRankSum=-0.295;DP=257674;ExcessHet=-0;FS=0;InbreedingCoeff=0.6408;MLEAC=819,8;MLEAF=0.713,0.006969;MQ=60;MQRankSum=0;QD=34.5;ReadPosRankSum=-0.269;SOR=2.688GT:AD:DP:FT:GQ:PL 0/1:41,91,0:132:PASS:99:1045,0,601,1162,800,1962

    The gVCF output line is: SL4.0ch11 25548831 . T C,<NON_REF> 1037.60 . BaseQRankSum=8.323;DP=374;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.000;RAW_MQandDP=1346400,374;ReadPosRankSum=-0.517 GT:AD:DP:GQ:PL:SB 0/1:41,91,0:132:99:1045,0,601,1162,800,1962:41,0,91,0

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks for sharing these lines! 

    The bamout is a debugging tool meant to view the local reassembly of HaplotypeCaller and understand the called haplotypes. The reads marked with ArtificialHaplotypeRG are not real reads and are constructed by HaplotypeCaller during the local realignment step. You don't need to worry about these reads in the bamout and can remove them from the view if you want.

    Otherwise, it looks like this site ended up being called because of the local realignment here. After local realignment, HaplotypeCaller found 41 reads in support of of T and 91 reads in support of C. 

    If you think that this haplotype should not have been called, you can try some troubleshooting methods following this document: When HaplotypeCaller and Mutect2 do not call an expected variant. I would recommend trying --linked-de-bruijn graph and also making sure your GATK version is up to date.

    Hopefully this helps you to understand your results. Let me know if you have any follow up questions.

    Best,

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk