Contamination calculated with and without matched-normal differ from each other
a) GATK version used
- GATK version 4.0.6.0
b) Exact GATK commands used
// Implementing CalculateContamination with matched-normal
gatk --java-options "-Xmx${mem_java}m" \
GetPileupSummaries \
--input ${bamTumor} \
--variant ${pileupSummaries} \
--intervals ${bedFile} \
--output tumor-pileups.table
gatk --java-options "-Xmx${mem_java}m" \
GetPileupSummaries \
--input ${bamNormal} \
--variant ${pileupSummaries} \
--intervals ${bedFile} \
--output normal-pileups.table
gatk --java-options "-Xmx${mem_java}m" \
CalculateContamination \
--input tumor-pileups.table \
-matched normal-pileups.table \
--output ${name}.${statusTumor}.contamination.table
// Implementing CalculateContamination without matched-normal (using tumor-pileups.table only)
gatk --java-options "-Xmx${mem_java}m" \
GetPileupSummaries \
-I ${bamTumor} \
-V ${pileupSummaries} \
-L ${bedFile} \
-O tumor_getpileupsummaries.table
gatk --java-options "-Xmx${mem_java}m" \
CalculateContamination \
-I tumor_getpileupsummaries.table \
-O tumor_calculatecontamination.table
c) Question
When implementing CalculateContamination tool with and without matched-normal, resulting contaminations scores were drastically different from each other. For example, CalculateContamination reported 0.58 contamination when implemented with the matched-normal sample, but contamination was 0 when implemented without matched-normal.
According to the CalculateContamination documentation, this tool was designed to work in the presence of copy number variations and with an arbitrary number of contaminating samples. In addition, this tool is designed to work well with no matched normal data. However, one can run GetPileupSummaries on a matched normal bam file and input the result to this tool. Further Mutect2 tutorial suggests implementing CalculateContamination only using tumor-pileups.table.
Therefore I'd be extremely thankful if you can clarify the proper implementation of CalculateContamination. Whether to implement it using tumor sample (as in Mutect2 tutorial) or with both tumor and matched-normals.
-
Hi ,
The GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.
Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.
We cannot guarantee a reply, however, we ask other community members to help out if you know the answer.
For context, check out our support policy.
-
Pubudu Saneth Samarakoon I think your commands for GetPileupSummaries may be incorrect. The -variant argument as in the tutorial should be the "small_exac_common" vcf provided in the GATK best practices google bucket (gs://gatk-best-practices/) (I can't tell what ${pileup_summaries} means in your commands). The -intervals argument should usually not be specified ie you usually want to calculate contamination over the entire bam file.
To answer your other question, you should use the matched normal when you have it. It can never hurt.
Please sign in to leave a comment.
2 comments