Details about the filters used in a previous version of FilterMutectCalls
Hey,
I am using mutational data (SNVs) downloaded from TCGA in 2021.
When examining the FILTER column in the MAF/VCF files, I see the following filters-
1. triallelic_site
2. homologous_mapping_event
3. multi_event_alt_allele_in_normal
4. str_contraction
5. clustered_events
6. panel_of_normals
7. t_lod_fstar
8. germline_risk
9. alt_allele_in_normal
10. oxog
11. bPcr
12. bSeq
I have several questions -
1. I found technical documentation on Mutect2 and FilterMutect calls on github, but it seems to describe a newer version. Is there any technical documentation about these filters? and generally about the Mutect2 and FilterMutectCalls versions that were used in 2021 for TCGA variants?
2. I also downloaded a VCF file from the TCGA website recently and it had the same filters as my data from 2021. Does this mean that the variant calling pipeline for TCGA variants is still not using the most updated version of FilterMutectCalls?
3. Most importantly - it seems like the vast majority (70-99% of variants, ranging according to cancer type) are tagged as errors/germline variants. How should I consider this in my analysis? why are these variants still kept on TCGA if this is the case?
Thank you very much,
Tal.
-
Hi Tal Gutman
According to the response from our team these filters are not related to Mutect2 and FilterMutectCalls tools. You may be able to get more information from the TCGA admins.
We hope this helps.
-
So, I looked more into it and the filters 1-9 are filters of GATK3 MuTect2.
(10-12 are related to the GDC, not MuTect2).
Where can I find technical documentation that describes GATK3 MuTect2?
I want to understand the pipeline of GATK3 MuTect2 and also understand the exacty what each filter does.
Thank you,
Tal.
-
Hi Tal Gutman
In that case here are the documents from the dusty depths of our repositories related to GATK3 MuTect2 and it's differences from GATK4 Mutect2.
We hope it helps.
-
Thank you,
I have seen these and they help partically but I was hoping for a technical documentation of the model, that can explain for example the calculations of t_lod and n_lod and exactly how each filter is calculated. Something like "Notes on Mutect2" that I found in the github repository for GATK4, but for GATK3. I'm guessing there should be some technical documentation explaining MuTect2 right?
Thank you very much,
Tal.
-
Or a paper describing this version?
-
Hi Tal Gutman
GATK3 MuTect2 was developed as an extension to the original Mutect1 with calling enhancements from local reassembly and PairHMM from HaplotypeCaller and filter and likelihood calculations are carried from the original Mutect1. Therefore it may be the best for you to refer to the original Mutect1 paper which describes how LOD calculations were made.
https://pubmed.ncbi.nlm.nih.gov/23396013/
We hope this helps.
-
Thank you.
Please sign in to leave a comment.
7 comments