Contamination calculation on single file don't detect contaminated samples
AnsweredGATK v4.2.2.0 (from docker image broadinstitute/gatk:4.2.2.0)
I used GetPileupSummaries and CalculateContamination with default parameters. CalculateContamination were runned without matched sample. vcf file were obtained from 1000G project, biallelic SNPs were filtered by SelectVariants.
I calculated contamination on normal sample bam and merged bam of 2 different normal samples (in silico contaminated). CalculateContamination showed no contamination in both cases. At the same time distribution of SNP's VAF (calculated from pileup.table) showed obvious and significant contamination in merged bam:
1) big peak at 0.25 VAF and 3 time lower at 0.5
2) total amount of HOM ALT 10 time differ between contaminated and not cases
I tested GATK Contamination Calculation pipeline on 5 contaminated samples and faced the same problem each time. Such behavior of CalculateContamination looks like a bug,
What can I do to obtain proper contamination value without matched file?
-
Are you running these commands with tumor samples? These tools are meant to be run with the Somatic Best Practices along with Mutect2 and FilterMutectCalls.
Best,
Genevieve
-
Hi Genevieve Brandt!
Yes, I'm running CalculateContamination on tumor samples. Calculation on single file works bad as in the case I describe above. Calculation with matched normal shows good performance.
In general my question is how to obtain realistic contamination value on single file? Are there a way to do it? Is calculation with matched sample the only way to find contamination by GATK?
Vlad
-
I see, yes, CalculateContamination should be able to accurately give contamination estimates without a matched normal. Could you share what the results look like for your single tumor sample from CalculateContamination?
-
Hi,
I have run contaminated tumor sample one more time and here is a result:sample contamination error
TUMOR 0.0 4.76837158203125E-7 -
Vladislav Maximov could you also share your CalculateContamination command?
-
I run with default parameters:
gatk CalculateContamination -I tumor-cont.table -O contamination.txt
tumor-cont.table is obtained by GetPileupSummaries command on merged two tumor samples -
Vladislav Maximov GetPileupSummaries should be run on only one tumor sample, the allele frequencies are very important and with two individuals this would make the calculation wrong.
Please sign in to leave a comment.
7 comments