Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 multi-sample pipeline

0

13 comments

  • 0
    Comment actions Permalink
  • Avatar
    David Benjamin

    @manolis The discussion Genevieve pointed to is a good overview of multi-sample mode.  For contamination you need to

    1.  Run GetPileupSummaries three times — on the normal and both tumors.

    2.  Run CalculateContamination twice — tumor1 vs normal and tumor2 vs normal

    3.  Run FilterMutectCalls once using the two files from step 2.  That is, specify the argument twice: "-contamination-table tumor1-contamination.table -contamination-table tumor2-contamination.table"

    You don't need to worry about tagging which input corresponds to which sample because CalculateContamination puts the sample name in the header of its output for FilterMutectCalls to read.

    If you wish to use the optional tumor segmentation output of CalculateContamination, it's the same idea as step 3.  You specify the argument once for each tumor sample.

    0
    Comment actions Permalink
  • Avatar
    manolis

    Many thanks Genevieve Brandt (she/her) and David Benjamin

    0
    Comment actions Permalink
  • Avatar
    GE

    Hi,

    How would the above process change with > 1 normal sample?

    Also, how would the above multi-sample pipeline work with FilterAlignmentArtifacts? Because FilterAlignmentArtifacts only accepts one input Mutect2 bamout file. Does FilterAlignmentArtifacts take into account the tags in the bamout file to filter per sample?.

    Thanks.

    0
    Comment actions Permalink
  • Avatar
    GE

    Note also: the latest version of the Mutect2 pipeline on Terra (https://dockstore.org/api/ga4gh/trs/v2/tools/%23workflow%2Fgithub.com%2Fbroadinstitute%2Fgatk%2Fmutect2/versions/4.1.8.1/plain-WDL/descriptor//scripts/mutect2_wdl/mutect2.wdl)

     is configured wrong at the FilterAlignmentArtifacts step. It is inputting the original BAM instead of the Mutect2 bamout file.

     

    Per your FilterAlignmentArtifacts documentation (https://gatk.broadinstitute.org/hc/en-us/articles/360037226112-FilterAlignmentArtifacts-EXPERIMENTAL-) the input file should be the Mutect2 bamout.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hello GE,

    I don't have an answer to your first question, our support team is focused on questions regarding GATK issues and abnormal requests. However other GATK users may have experience with your question and may have insight! Please see our support policy for more information.

    I have forwarded your second request to our Terra support team and they will give it a look to determine the best solution.

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Beri

    GE

    To answer the second question, the workflow that you linked is version 4.1.8.1 but the tool doc that was linked is for version 4.1.4.1. The tool doc which is associated with version 4.1.8.1 indicates the input for FilterAlignmentArtifacts "should be the same tumor bam that Mutect2 was run on". 

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    GE We changed FilterAlignmentArtifacts last year to use the original bam, not the bamout, which explains the change in documentation from 4.1.4 to 4.1.8.

    The identity of the normal is not very important for the contamination workflow.  I would just choose the normal with the greatest depth for the above steps if I were you.

    The best way to run FilterAlignmentArtifacts with multiple tumor samples is to specify the multi-sample VCF for the -V argument and repeat the -I argument, once for each tumor BAM.  This will consider whether each variant is a mapping artifact by considering evidence from all samples together.  Thus it assumes that a variant is either a mapping artifact in all samples or a mapping artifact in none.

    0
    Comment actions Permalink
  • Avatar
    GE

    Thanks. The documentation is still confusing in 4.1.8 because the Usage example says: -I somatic_bamout.bam.  A filename of "bamout" implies Mutect2 output rather than the original input BAM. I suggest to make this more clear.

    That's good to know that FilterAlignmentArtifacts can handle multi-sample situations. It would be nice if you all are able to publish a multi-sample wdl for Mutect2. I'll try to make one myself in the meantime anyway.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi GE, I'll make a note for our team to see if we can make a change to the FilterAlignmentArtifacts documentation so it is more clear. Thank you for the suggestion!

    0
    Comment actions Permalink
  • Avatar
    GE

    For CalculateContamination, can GatherPileupSummaries be used to combine the PileupSummaries from all normal samples?

    Note also that online GATK documentation does not document GatherPileupSummaries, even though it is in gatk and part of the best practices Mutect2 pipeline on Terra.

    0
    Comment actions Permalink
  • Avatar
    GE

    One more question for David Benjamin:

    You wrote: "3. Run FilterMutectCalls once using the two files from step 2. That is, specify the argument twice: "-contamination-table tumor1-contamination.table -contamination-table tumor2-contamination.table""

    Should FilterMutectCalls also receive the maf_segments (--tumor-segmentation) from all the tumor samples?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    GE The purpose of GatherPileupSummaries is to combine the output from disjoint scatter jobs. It's not meant for the purpose you are describing so we would recommend that you stick to our best practices and David's advice above of choosing one normal. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk