Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GATK4 VariantAnnotator doesn't support AlleleBalance

0

11 comments

  • Avatar
    Pamela Bretscher

    Hi astrinaki_maria,

    Yes, the AlleleBlance parameter is not automatically calculated anymore in the newest versions of GATK. However, I believe the Picard tool FilterVcf can be used in your use case to calculate and filter based on AlleleBalance. Is this helpful?

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    astrinaki_maria

    Dear Pamela thank you so much for your reply!

    Could you advise me what would be the best way to calculate allele balance using this tool ?

    Thank you in advance,

    Maria

     

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi astrinaki_maria,

    The --MIN_AB option can be specified with FilterVcf which will calculate the allele balance for each sample in your vcf and filter out any sites where the allele balance is below the limit that you specify. Please let me know if you have any other questions or concerns.

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    astrinaki_maria

    Dear Pamela,

    thank you, I truly appreciate your help,

    I used the --MIN_AB parameter and it actually worked!

    However I still have some questions about homozygosity and heterozygosity.

    If I am not mistaken this information in the files produced by GATK3 is given by AB_HET, and AB_HOM, yet in my file (which was produced with GATK4), does not exist.

    Further more I would like to ask you if you have any suggestions on how I could get this information and if so,
    do you know the limits that GATK uses for homozygosity and heterozygosity?

    thanks in advance for your reply,
    My regards
    Maria

     

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi astrinaki_maria,

    Yes, the AB_HET and AB_HOM filters are no longer available in GATK4. I would suggest looking into the AlleleFraction annotation which should work for the same purpose. I'm not sure exactly what you mean by the limits that GATK uses for heterozygosity but you can find the default values of hetero/homozygosity used for likelihood determination and variant calling in the HaplotypeCaller documentation. I hope this is helpful.

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    astrinaki_maria

    Hi Pamela thank you very much for your reply,

    I run AlleleFraction annotation  and I got the below record for the three samples

    6 31896527 . G A 10481.73 PASS AC=6;AF=1.00;AN=6;DP=299;ExcessHet=3.0103;FS=0.000;MLEAC=6;MLEAF=1.00;MQ=59.99;QD=32.96;SOR=0.878 GT:AD:AF:DP:GQ:PL 1/1:0,103:1.00:103:99:3783,310,0 1/1:0,88:1.00:88:99:3298,265,0 1/1:0,89:1.00:89:99:3414,268,0

    In the above case we have homozygosity, right ? I am trying to understand how I can use the Allele Fraction to understand If we have Hom or het ?

    Moreover, each sample has its own AF right?But what is the AF=1 (with the bold)  ?

    Thank you in advance,

    Maria

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi astrinaki_maria,

    Yes, you're correct that the AF of 1.00 would indicate homozygosity for this sample. The AF is calculated for each sample and represents the proportion of the reads that support the variant allele, which is a measure of the homozygosity. Please let me know if this is unclear.

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    astrinaki_maria

    Hi Pamela,

    thank you for your quick response.

    So, in the below example

    1 1922670 . A G 2843.95 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=-1.302e+00;DP=55;ExcessHet=1.5490;FS=10.558;MQ=60.00;MQRankSum=0.00;QD=26.58;ReadPosRankSum=0.204;SOR=0.608 GT:AD:AF:DP:GQ:PL 0/1:31,24:0.436:55:99:763,0,1105

    the allele fraction value is 0.436, this value arises from the division of the reads of the alternate allele / the total reads of the position (24/55)  right? Is the Allele fraction used to determine whether a variant is Heterozygous or homozygous ? and if so, could you help me with the thresholds for filtering for example Allele fraction =0.5 is heterozygous. If an allele fraction is 0.33  or 0.22 or 0.60  for example, will be called as heterozygous / homosygous in the vcf ?

    Are there any thresholds in the allele fraction value when we curate germline mutations ?

    Thank you a lot for your help! 

    Maria

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi astrinaki_maria,

    You are correct that the AF value arises from the division of the number of reads supporting the variant allele by the total number of reads. I'm not sure you will be able to achieve exactly what you are looking for with calling variants as strictly homozygous or heterozygous as I don't think there is a set threshold AF value. I believe that any sample with an AF that is not 1 or close to 1 could be considered heterozygous. You can also see that at the beginning of the VCF line, you have AF=0.500, which would denote that this variant is heterozygous. Do you have samples in your VCF where the AF is not clearly close to 0.5 or 1, such as 0.22? 

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    astrinaki_maria

    Hi Pamela,

    Thank you for your answer!

    yes it seems that we have Allele Fraction less than 0.5 , as you can see below

    12 122945877 . CAAA C 47.60 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=0.319;DP=17;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=57.58;MQRankSum=-1.150e+00;QD=3.97;ReadPosRankSum=-4.220e-01;SOR=0.148 GT:AD:AF:DP:GQ:PL 0/1:9,3:0.250:12:55:55,0,314

    In this example we have 12  total reads from which 3 reads are of  the alternate allele and 9 reads are of  the reference allele. The algorithm returns this variant as heterozygous. As  we would expect in an heterozygous variant half reads would be of the alternate allele  and the other half would of the reference allele. Could we trust the above allele fraction as less than half of the reads are the alternate allele? 

    Thank you for your help,

    Maria

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi astrinaki_maria,

    Thank you for providing this. Yes, I would say that you can trust that the variant is heterozygous as the algorithms generally result in accurate detection. You can read more about the algorithms for germline variant calling here. Given that you only have 12 reads, the AF is definitely more prone to chance events which is why you are not seeing exactly 0.500. Given that there are multiple reads that support the variant allele, this is most likely a heterozygous variant and the low AF is just due to chance.

    Kind regards,

    Pamela

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk