JEXL expression for filtering on AD (SelectVariants / FastaAlternateReferenceMaker)
Hi,
I am working on haploid bacterial data and I ran into a limitation of the program that I either can't solve or it would be nice to add a funtion for it in the future. I'll explain the issue:
Let's say I have (low coverage) data that I want to turn into an alternate fasta reference where:
REF: A
ALT: AAGT,T,CA
If I want to keep variants where the AD > [threshold] I can't do
-select 'vc.getGenotype("sample").getAD.1'
because for my sample it could be that the called ALT is getAD.2 and so far I haven't been able to use anything other than a number as an index to getAD.
This would be solved if we could do:
getAD.getGT OR getAD.IndexOfAlleleWithHighestCount
but to my knowledge none of these will work because JEXL will give an error. Maybe extending JEXL java operation to the AD array could fix it? Because even getAD[0] gives an error.
Do you have a solution to this?
PS. I am sorry if this should have been under General Questions
-
Additionally, I was wondering if you can tell me if FastaAlternateReferenceMaker can correctly choose the right alternate allele.
In the above example, if the 3rd one is called in one sample and the 1st alternate is called on another sample, can the tool map the correct one to each sample or would we have to change all called alleles to biallelic ones and map each samples called genotype to it and remove the others alternates? (If yes, do you have a way of doing that?)EDIT: this is solved with SelectVariants --remove-unused-alternates
-
Hi Yanis Chrys,
Are you blocked by your initial question in your post or do you have a workaround as of now?
We took a look and we couldn't figure out a JEXL one liner expression but we thought you could write a small code in a programming language to get the genotype and alt allele index to construct the JEXL expression.
Let me know,
Genevieve
-
Thank you for getting back to me. As it turns out my workaround was to use --remove-unused-alternates and then filter for the AD.1 of the SNP (because then only SNPs would have more than 1 elements in the AD array)
Although this technically worked, you inspired me to check my "solution" again (remove alternates) and I believe it caused me to have an Illegal base [ ] error that's been bothering me for days. Which means I will now have to figure out something else for this.Speaking of which, how would the JEXL expression inside -select accept arguments from the outside? I don't know how that would work - I was under the impression that getAD didn't accept anything other than a single number. Or do you mean to find the genomic positions that fail this filter and then add them to a file and remove them like intervals to exclude?
-
Hi Yanis Chrys,
It looks like you have found an issue that could be improved in how GATK uses JEXL expressions. I created a feature request ticket in our github at this link here: https://github.com/broadinstitute/gatk/issues/7448. Feel free to include additional information in the comments if you would like to describe in more detail what you are looking for.
Thanks for writing into our forum about this issue so we can make GATK even better!
Best,
Genevieve
Please sign in to leave a comment.
4 comments