GATK v220.127.116.11 (from docker image broadinstitute/gatk:18.104.22.168)
I used GetPileupSummaries and CalculateContamination with default parameters. CalculateContamination were runned without matched sample. vcf file were obtained from 1000G project, biallelic SNPs were filtered by SelectVariants.
I calculated contamination on normal sample bam and merged bam of 2 different normal samples (in silico contaminated). CalculateContamination showed no contamination in both cases. At the same time distribution of SNP's VAF (calculated from pileup.table) showed obvious and significant contamination in merged bam:
1) big peak at 0.25 VAF and 3 time lower at 0.5
2) total amount of HOM ALT 10 time differ between contaminated and not cases
I tested GATK Contamination Calculation pipeline on 5 contaminated samples and faced the same problem each time. Such behavior of CalculateContamination looks like a bug,
What can I do to obtain proper contamination value without matched file?
Please sign in to leave a comment.