VariantsToTable not extracting INFO 'sub-fields'
AnsweredHi all,
I am using GATK 4.1.6.0 to perform somatic variant calling with Mutect2. I obtained the final (single-sample) VCF file(s) and annotated it with Funcotator, and would now like to convert it into a table with the fields of interest for the analysis I am planning to do.
Everything works fine, but I am encountering some issues with the 'FUNCOTATION' INFO field. This field comprises dozens of "sub-fields", delimited by pipes ( | ) (e.g. Gencode_28_hugoSymbol | Gencode_28_ncbiBuild | Gencode_28_chromosome | etc etc).
Using "-F FUNCOTATION" correctly adds the FUNCOTATION column to the resulting table. What I would like to do, though, is extracting specific "sub-fields" from the FUNCOTATION field and add the corresponding columns to the table. From what I had gathered from the user guide, I was expecting "-ASF FUNCOTATION" to separate FUNCOTATION into distinct fields itself, and to enable the usage of "-F" to extract the desired ones (e.g. "-ASF FUNCOTATION -F Gencode_28_hugoSymbol"). However, this doesn't seem to be working, and the only effect of "-ASF FUNCOTATION" seems to be removing the square brackets from the FUNCOTATION field column in the final table, leaving it as a non-divisible entity.
Am I doing something wrong, or is the "FUNCOTATION" field not divisible in its 'sub-fields' in any way?
-
Hi Francesco Mazzarotto, what you are attempting to do with VariantsToTable is not possible. We don't have a GATK tool that will do exactly what you want, but you could look into getting MAF output format from Funcotator and the results will be easier to manipulate into a table.
-
But MAF output is somewhat different from VCF; and I think the VCF output format is better for germline variant annotation.
With Funcotator we get an integrated (and minuter) "variant calling - annotation" workflow. But the problem is "vertical bar" separated INFOs are not easy for downstream text processing.
I have two suggestions for the GATK Team:
You may want to develop a new tool (like VariantsToTable) to separate each "sub-info" in the FUNCOTATION INFO, and put them into separate columns with corresponding headers when creating the tab-delimited table.
Or add a feature to Funcotator to create multiple INFOs with FUNCOTATION prefix in their IDs; e.g.
#INFO=<ID=FUNCOTATION_Gencode_34_hugoSymbol,...>
#INFO=<ID=FUNCOTATION_Gencode_34_ncbiBuild,...>instead of
#INFO=<ID=FUNCOTATION,...,Description="Funcotation fields are: Gencode_34_hugoSymbol|Gencode_34_ncbiBuild|...">
Thanks
-
Hi Shahryar Alavi,
I brought up your feature request to the GATK team and they agreed that it was a great idea and would be highly useful for Funcotator results. I created a feature request issue ticket on the GATK github so that our team can add the tool whenever resources allow: https://github.com/broadinstitute/gatk/issues/7556. Feel free to chime in with your thoughts in the comment section of that ticket.
Best,
Genevieve
Please sign in to leave a comment.
3 comments