GATK4 VariantAnnotator doesn't support AlleleBalance
Hello,
I try to analyze a trio sample (mother, father, son) so I used the following pipeline "Calling variants on cohorts of samples using the HaplotypeCaller in GVCF mode" then I did variant recalibration. Now I need to filter my variants using the condition allele balance but I can't find it in my vcf . I think that GATK gatk44.2.2.0 doesn't calculate this parameter anymore but for me is very important to filter my variants using this condition.
What can I do ? any suggestion?
Thanks a lot,
Astrinaki Maria

Hi astrinaki_maria,
Yes, the AlleleBlance parameter is not automatically calculated anymore in the newest versions of GATK. However, I believe the Picard tool FilterVcf can be used in your use case to calculate and filter based on AlleleBalance. Is this helpful?
Kind regards,
Pamela

Dear Pamela thank you so much for your reply!
Could you advise me what would be the best way to calculate allele balance using this tool ?
Thank you in advance,
Maria

Hi astrinaki_maria,
The MIN_AB option can be specified with FilterVcf which will calculate the allele balance for each sample in your vcf and filter out any sites where the allele balance is below the limit that you specify. Please let me know if you have any other questions or concerns.
Kind regards,
Pamela

Dear Pamela,
thank you, I truly appreciate your help,
I used the MIN_AB parameter and it actually worked!
However I still have some questions about homozygosity and heterozygosity.
If I am not mistaken this information in the files produced by GATK3 is given by AB_HET, and AB_HOM, yet in my file (which was produced with GATK4), does not exist.
Further more I would like to ask you if you have any suggestions on how I could get this information and if so,
do you know the limits that GATK uses for homozygosity and heterozygosity?thanks in advance for your reply,
My regards
Maria 
Hi astrinaki_maria,
Yes, the AB_HET and AB_HOM filters are no longer available in GATK4. I would suggest looking into the AlleleFraction annotation which should work for the same purpose. I'm not sure exactly what you mean by the limits that GATK uses for heterozygosity but you can find the default values of hetero/homozygosity used for likelihood determination and variant calling in the HaplotypeCaller documentation. I hope this is helpful.
Kind regards,
Pamela

Hi Pamela thank you very much for your reply,
I run AlleleFraction annotation and I got the below record for the three samples
6 31896527 . G A 10481.73 PASS AC=6;AF=1.00;AN=6;DP=299;ExcessHet=3.0103;FS=0.000;MLEAC=6;MLEAF=1.00;MQ=59.99;QD=32.96;SOR=0.878 GT:AD:AF:DP:GQ:PL 1/1:0,103:1.00:103:99:3783,310,0 1/1:0,88:1.00:88:99:3298,265,0 1/1:0,89:1.00:89:99:3414,268,0
In the above case we have homozygosity, right ? I am trying to understand how I can use the Allele Fraction to understand If we have Hom or het ?
Moreover, each sample has its own AF right?But what is the AF=1 (with the bold) ?
Thank you in advance,
Maria

Hi astrinaki_maria,
Yes, you're correct that the AF of 1.00 would indicate homozygosity for this sample. The AF is calculated for each sample and represents the proportion of the reads that support the variant allele, which is a measure of the homozygosity. Please let me know if this is unclear.
Kind regards,
Pamela

Hi Pamela,
thank you for your quick response.
So, in the below example
1 1922670 . A G 2843.95 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=1.302e+00;DP=55;ExcessHet=1.5490;FS=10.558;MQ=60.00;MQRankSum=0.00;QD=26.58;ReadPosRankSum=0.204;SOR=0.608 GT:AD:AF:DP:GQ:PL 0/1:31,24:0.436:55:99:763,0,1105
the allele fraction value is 0.436, this value arises from the division of the reads of the alternate allele / the total reads of the position (24/55) right? Is the Allele fraction used to determine whether a variant is Heterozygous or homozygous ? and if so, could you help me with the thresholds for filtering for example Allele fraction =0.5 is heterozygous. If an allele fraction is 0.33 or 0.22 or 0.60 for example, will be called as heterozygous / homosygous in the vcf ?
Are there any thresholds in the allele fraction value when we curate germline mutations ?
Thank you a lot for your help!
Maria

Hi astrinaki_maria,
You are correct that the AF value arises from the division of the number of reads supporting the variant allele by the total number of reads. I'm not sure you will be able to achieve exactly what you are looking for with calling variants as strictly homozygous or heterozygous as I don't think there is a set threshold AF value. I believe that any sample with an AF that is not 1 or close to 1 could be considered heterozygous. You can also see that at the beginning of the VCF line, you have AF=0.500, which would denote that this variant is heterozygous. Do you have samples in your VCF where the AF is not clearly close to 0.5 or 1, such as 0.22?
Kind regards,
Pamela

Hi Pamela,
Thank you for your answer!
yes it seems that we have Allele Fraction less than 0.5 , as you can see below
12 122945877 . CAAA C 47.60 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=0.319;DP=17;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=57.58;MQRankSum=1.150e+00;QD=3.97;ReadPosRankSum=4.220e01;SOR=0.148 GT:AD:AF:DP:GQ:PL 0/1:9,3:0.250:12:55:55,0,314
In this example we have 12 total reads from which 3 reads are of the alternate allele and 9 reads are of the reference allele. The algorithm returns this variant as heterozygous. As we would expect in an heterozygous variant half reads would be of the alternate allele and the other half would of the reference allele. Could we trust the above allele fraction as less than half of the reads are the alternate allele?
Thank you for your help,
Maria

Hi astrinaki_maria,
Thank you for providing this. Yes, I would say that you can trust that the variant is heterozygous as the algorithms generally result in accurate detection. You can read more about the algorithms for germline variant calling here. Given that you only have 12 reads, the AF is definitely more prone to chance events which is why you are not seeing exactly 0.500. Given that there are multiple reads that support the variant allele, this is most likely a heterozygous variant and the low AF is just due to chance.
Kind regards,
Pamela
Please sign in to leave a comment.
11 comments