VariantsToTable not extracting INFO 'sub-fields'
Hi all,
I am using GATK 4.1.6.0 to perform somatic variant calling with Mutect2. I obtained the final (single-sample) VCF file(s) and annotated it with Funcotator, and would now like to convert it into a table with the fields of interest for the analysis I am planning to do.
Everything works fine, but I am encountering some issues with the 'FUNCOTATION' INFO field. This field comprises dozens of "sub-fields", delimited by pipes ( | ) (e.g. Gencode_28_hugoSymbol | Gencode_28_ncbiBuild | Gencode_28_chromosome | etc etc).
Using "-F FUNCOTATION" correctly adds the FUNCOTATION column to the resulting table. What I would like to do, though, is extracting specific "sub-fields" from the FUNCOTATION field and add the corresponding columns to the table. From what I had gathered from the user guide, I was expecting "-ASF FUNCOTATION" to separate FUNCOTATION into distinct fields itself, and to enable the usage of "-F" to extract the desired ones (e.g. "-ASF FUNCOTATION -F Gencode_28_hugoSymbol"). However, this doesn't seem to be working, and the only effect of "-ASF FUNCOTATION" seems to be removing the square brackets from the FUNCOTATION field column in the final table, leaving it as a non-divisible entity.
Am I doing something wrong, or is the "FUNCOTATION" field not divisible in its 'sub-fields' in any way?
-
Hi Francesco Mazzarotto, what you are attempting to do with VariantsToTable is not possible. We don't have a GATK tool that will do exactly what you want, but you could look into getting MAF output format from Funcotator and the results will be easier to manipulate into a table.
-
But MAF output is somewhat different from VCF; and I think the VCF output format is better for germline variant annotation.
With Funcotator we get an integrated (and minuter) "variant calling - annotation" workflow. But the problem is "vertical bar" separated INFOs are not easy for downstream text processing.
I have two suggestions for the GATK Team:
You may want to develop a new tool (like VariantsToTable) to separate each "sub-info" in the FUNCOTATION INFO, and put them into separate columns with corresponding headers when creating the tab-delimited table.
Or add a feature to Funcotator to create multiple INFOs with FUNCOTATION prefix in their IDs; e.g.
#INFO=<ID=FUNCOTATION_Gencode_34_hugoSymbol,...>
#INFO=<ID=FUNCOTATION_Gencode_34_ncbiBuild,...>instead of
#INFO=<ID=FUNCOTATION,...,Description="Funcotation fields are: Gencode_34_hugoSymbol|Gencode_34_ncbiBuild|...">
Thanks
-
Hi Shahryar Alavi,
The GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.
Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.
We cannot guarantee a reply, however, we ask other community members to help out if you know the answer.
For context, check out our support policy.
Please sign in to leave a comment.
3 comments