Is there a GATK tool for quantifying a particular base?
Apologies if this is a naive question, but I'm quite new to bioinformatics.
I'm working with targeted Illumina sequencing data generated with DNA from diseased and healthy tissue (this is an age-related disease and is not cancer/neoplastic). My hypothesis is that the diseased tissue might contain specific somatic mutations in one particular gene (let's call it GENE), while the healthy tissue will not (the same mutations cause autosomal-dominant versions of my disease of interest when present in the germline, whereas my diseased tissue is from sporadic disease). My coverage varies, but is in the range of 1000x - 50'000x. My sequencing protocol employed UMIs.
I have run Mutect2 in default settings and with lower thresholds (--tumor-lod-to-emit) and haven't found any calls that correspond to these pathogenic mutations, so my results are negative. Nevertheless, I have reason to believe that my pathogenic mutations might be present at very low VAF (around 0.05-0.1 %), based on other experiments.
My question: How do I quantify the number of reads that correspond to a particular variant of interest? Let's say that base 2000 in GENE is usually an A, whereas the mutation A>T would be pathogenic. How do I count the number of Ts in position 2000, perhaps with a GATK tool?
Why I want to do this: If I find that there are no such variants or that the numbers don't differ between diseased and healthy tissue, I won't try any further analyses and will simply conclude that the data are negative.
-
Hi M Carta
Welcome to bioinformatics.
In order to collect such data you can force calls of those alleles of interest by using Mutect2 and the parameter
--alleles
Output VCF files will contain allelic fractions of those sites in the FORMAT fields as well as number of reads supporting the reference and alternate alleles.
If you wish to obtain pileups for those positions of your interest you can use
gatk Pileup
or
samtools mpileup
and you may combine that output with an auxiliary tool such as
java -jar varscan.jar readcounts
to obtain a tabular list of reads supporting each nucleotide per position. As a warning the latter 2 are not directly within our reach to provide support and your mileage may vary.
I hope these would help your endeavors.
Please sign in to leave a comment.
1 comment