Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Contamination calculation on single file don't detect contaminated samples

Answered
0

7 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Vladislav Maximov,

    Are you running these commands with tumor samples? These tools are meant to be run with the Somatic Best Practices along with Mutect2 and FilterMutectCalls.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Vladislav Maximov

    Hi Genevieve Brandt!

    Yes, I'm running CalculateContamination on tumor samples. Calculation on single file works bad as in the case I describe above. Calculation with matched normal shows good performance. 

    In general my question is how to obtain realistic contamination value on single file? Are there a way to do it? Is calculation with matched sample the only way to find contamination by GATK?

    Vlad

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    I see, yes, CalculateContamination should be able to accurately give contamination estimates without a matched normal. Could you share what the results look like for your single tumor sample from CalculateContamination?

    0
    Comment actions Permalink
  • Avatar
    Vladislav Maximov

    Hi, 
    I have run contaminated tumor sample one more time and here is a result:

    sample     contamination   error
    TUMOR   0.0                     4.76837158203125E-7

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Vladislav Maximov could you also share your CalculateContamination command?

    0
    Comment actions Permalink
  • Avatar
    Vladislav Maximov

    I run with default parameters:
    gatk CalculateContamination -I tumor-cont.table -O contamination.txt

    tumor-cont.table is obtained by GetPileupSummaries command on merged two tumor samples

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Vladislav Maximov GetPileupSummaries should be run on only one tumor sample, the allele frequencies are very important and with two individuals this would make the calculation wrong.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk