Huge number of variants in tumor only mode as compared to T/N paired mode using Mutect2
REQUIRED for all errors and issues:
a) GATK version used:4.2.2.0
b) Exact command used:
c) Entire program log:
Hi GATK team,
I want to call somatic variants for 60 samples in tumor-normal paired mode vs tumor only mode. I did variant calling using mutect2 T/N mode and got ~10,000, 900, ~170000, ~5000 variants for first four cases and so on (Total 60 VCF files I got). Then, I also called the variants in tumor sample and normal sample separately using mutect2 itself (tumor only mode) where I got 60x2=120 VCF files. I am wondering to see huge difference in number of variants obtained using individual mode VS paired mode.
Sample1 125000(T) 140000(N) 10000(T/N)
Sample2 98000(T) 95000(N) 900(T/N)
Sample3 115000(T) 90000(N) 17500(T/N)
Sample4 120000(T) 117000(N) 5000(T/N)
There is 100 times difference in number of variants in sample 2 and 10-40 times difference in number of variants in other samples. The command which I used for both the pipelines are give below::
##paired T/N variant calling
python3 gatk Mutect2 -R hg19.fasta -I Normal1.bam --max-mnp-distance 0 -O Normal1.vcf.gz
python3 gatk Mutect2 -R hg19.fasta -I Normal2.bam --max-mnp-distance 0 -O Normal2.vcf.gz
python3 gatk GenomicsDBImport -R hg19.fasta -L hg19bed.bed --genomicsdb-workspace-path pon_db -V Normal1.vcf.gz -V Normal2.vcf.gz ..... -V Normal60.vcf.gz
python3 gatk CreateSomaticPanelOfNormals -R hg19.fasta --germline-resource somatic-b37_af-only-gnomad.raw.sites.vcf -V gendb://pon_db -O pon.vcf.gz
python3 gatk Mutect2 -R hg19.fasta -I Tumor1.bam -I Normal1.bam -normal Normal1 -pon pon.vcf.gz -germline-resource somatic-b37_af-only-gnomad.raw.sites-3.vcf -O Tumor1-unfiltered.vcf
python3 gatk FilterMutectCalls -R hg19.fasta -V Tumor1-unfiltered.vcf -O Tumor1-filtered.vcf
python3 gatk Mutect2 -R hg19.fasta -I Tumor2.bam -I Normal2.bam -normal Normal2 -pon pon.vcf.gz -germline-resource somatic-b37_af-only-gnomad.raw.sites-3.vcf -O Tumor2-unfiltered.vcf
python3 gatk FilterMutectCalls -R hg19.fasta -V Tumor2-unfiltered.vcf -O Tumor2-filtered.vcf
####Individual variant calling for tumor and normal samples
python3 gatk Mutect2 -R hg19.fasta -I Tumor1.bam -O Tumor1-SS.vcf
python3 gatk Mutect2 -R hg19.fasta -I Normal1.bam -O Normal1-SS.vcf
python3 gatk FilterMutectCalls -R hg19.fasta -V Tumor1-SS.vcf -O Tumor1-SS-filtered.vcf
python3 gatk FilterMutectCalls -R hg19.fasta -V Normal1-SS.vcf -O Normal1-SS-filtered.vcf
I am wondering how the difference can be so huge. Please let me know if you need any further information. Hoping to hear from you soon.
Thanks,
Shivangi
Thanks,
Shivangi
-
Shivangi Agarwal We need to change a few things.
- You should always run Mutect2 with a germline resource. For human sample, we provide one in our best practices google bucket: gs://gatk-best-practices/somatic-b37/af-only-gnomad.raw.sites.vcf, which is just gnomAD with all the annotations except AF (population allele frequency) stripped.
- Except in very rare circumstances it is best to use one of our public panels of normals, such as gs://gatk-best-practices/somatic-b37/Mutect2-WGS-panel-b37.vcf and gs://gatk-best-practices/somatic-b37/Mutect2-exome-panel.vcf.
It is always far preferable to run in tumor-normal mode when a matched normal is available. Your command should therefore be something like:
gatk Mutect2 -R hg19.fasta -I tumor.bam -I normal.bam -normal normal_sample -germline-resource af-only-gnomad.raw.sites.vcf -pon Mutect2-WGS-panel-b37.vcf -O unfiltered.vcf
gatk FilterMutectCalls -R hg19.fasta -V unfiltered.vcf -O filtered.vcf
Finally, even if you run everything perfectly, tumor-only mode will still yield many more variants. Almost all of this excess is attributable to rare germline variants not present in gnomAD, several tens of thousands of which occur in the average genome.
-
Hi David,
Thanks so much for your response. I did call variants both ways (1st: no germline and no public PON and IInd: Germline: somatic-b37_af-only-gnomad.raw.sites.vcf & PON somatic-b37_Mutect2-exome-panel.vcf). But, I did not find a difference in number of variants called.
Please have a look over the commands here:
Ist Case:
gatk Mutect2 -R hg19.fasta -I sample1.bam -O sample1-SS-unfiltered.vcf
IInd Case:
gatk Mutect2 -R hg19.fasta -L hg19bed.bed -I sample1.bam --germline-resource somatic-b37_af-only-gnomad.raw.sites.vcf -pon somatic-b37_Mutect2-exome-panel.vcf --f1r2-tar-gz f1r2.tar.gz -O sample1-unfiltered.vcf
The number of variants called in the first case are :414,377 and the number of variants called for the second case are: 410,968.
After I filter them using filtermutect2, the number of variants which passed the filters are: 81097 for Ist case and 98300 for IInd case.
gatk FilterMutectCalls -R hg19.fasta -V sample1-SS-unfiltered.vcf -O sample1-SS-filtered-testing.vcf
gatk FilterMutectCalls -R hg19.fasta -V sample1-unfiltered.vcf -O sample1-filtered-testing.vcf
I wonder to see huge variants and no difference in variant call numbers using germline resource and public PON. Hope to hear back from you soon.
Regards
-
Hi Shivangi,
That's about the difference I would expect. Using a matched normal will have more of an impact.
Regards,
David
Please sign in to leave a comment.
3 comments