Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

CalculateContamination inputs for paired tumor-normal data



  • Avatar
    Anthony DiCi

    Thank you for your post, Robin Mjelle ! I want to let you know we have received your question and will be moving it to the Community Discussions -> General Discussion topic, as the Somatic topic is for reporting bugs and issues with GATK.

    We'll get back to you if we have any updates or follow up questions. Please see our Support Policy for more details about how we prioritize responding to questions. 

    Comment actions Permalink
  • Avatar
    Philipp Hähnel

    Dear Robin,

    based on the current best practices and version of the tools, the CalculateContamination part of the variant calling workflow is independent of calling the Mutect2 part. The contamination model output is only needed for the FilterMutectCalls step, which identifies false positives in the variant calls.

    You run GetPileupSummaries on the input bams of the tumor sample and the normal sample. This gives you tumor_pileups and normal_pileups. Then you call roughly

    gatk CalculateContamination \
    --input ~{tumor_pileups} \
    ~{"--matched-normal " + normal_pileups} \
    --output ~{output_contamination} \
    --tumor-segmentation ~{output_segments}

    If you can read wdl, then you can also check this link for a recent implementation of the multi-sample variant calling workflow. Feel free to adapt it as you see fit.



    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk