Mutect2 multi-sample pipeline
Can you please provide
a) GATK version used
GATK 4170
b) Exact GATK commands used
c) The entire error log if applicable.
Hi GATK team,
I have 2 tumor samples and 1 normal sample, all for the same patient. How can I do with the estimation of contamination (GetPileupSummaries,CalculateContamination) ? Is yet possible to calculate it with two tumor samples and use it at the end in the "FilterMutectCalls" step?
If yes, I have to run 3 GetPileupSummaries commands (tumor1, tumor2, normal) and then use two tumor input files (pileup summaries tables; tumor 1 and tumor 2 ) in the CalculateContamination step?
Many thanks
-
Please see this pipeline: https://gatk.broadinstitute.org/hc/en-us/articles/360035894731-Somatic-short-variant-discovery-SNVs-Indels-
As well as the discussion here: https://gatk.broadinstitute.org/hc/en-us/community/posts/360062528691-mutect2-multi-sample-
-
@manolis The discussion Genevieve pointed to is a good overview of multi-sample mode. For contamination you need to
1. Run GetPileupSummaries three times — on the normal and both tumors.
2. Run CalculateContamination twice — tumor1 vs normal and tumor2 vs normal
3. Run FilterMutectCalls once using the two files from step 2. That is, specify the argument twice: "-contamination-table tumor1-contamination.table -contamination-table tumor2-contamination.table"
You don't need to worry about tagging which input corresponds to which sample because CalculateContamination puts the sample name in the header of its output for FilterMutectCalls to read.
If you wish to use the optional tumor segmentation output of CalculateContamination, it's the same idea as step 3. You specify the argument once for each tumor sample.
-
Many thanks Genevieve Brandt (she/her) and David Benjamin
-
Hi,
How would the above process change with > 1 normal sample?
Also, how would the above multi-sample pipeline work with FilterAlignmentArtifacts? Because FilterAlignmentArtifacts only accepts one input Mutect2 bamout file. Does FilterAlignmentArtifacts take into account the tags in the bamout file to filter per sample?.
Thanks.
-
Note also: the latest version of the Mutect2 pipeline on Terra (https://dockstore.org/api/ga4gh/trs/v2/tools/%23workflow%2Fgithub.com%2Fbroadinstitute%2Fgatk%2Fmutect2/versions/4.1.8.1/plain-WDL/descriptor//scripts/mutect2_wdl/mutect2.wdl)
is configured wrong at the FilterAlignmentArtifacts step. It is inputting the original BAM instead of the Mutect2 bamout file.
Per your FilterAlignmentArtifacts documentation (https://gatk.broadinstitute.org/hc/en-us/articles/360037226112-FilterAlignmentArtifacts-EXPERIMENTAL-) the input file should be the Mutect2 bamout.
-
Hello GE,
I don't have an answer to your first question, our support team is focused on questions regarding GATK issues and abnormal requests. However other GATK users may have experience with your question and may have insight! Please see our support policy for more information.
I have forwarded your second request to our Terra support team and they will give it a look to determine the best solution.
Genevieve
-
GE,
To answer the second question, the workflow that you linked is version 4.1.8.1 but the tool doc that was linked is for version 4.1.4.1. The tool doc which is associated with version 4.1.8.1 indicates the input for FilterAlignmentArtifacts "should be the same tumor bam that Mutect2 was run on".
-
GE We changed FilterAlignmentArtifacts last year to use the original bam, not the bamout, which explains the change in documentation from 4.1.4 to 4.1.8.
The identity of the normal is not very important for the contamination workflow. I would just choose the normal with the greatest depth for the above steps if I were you.
The best way to run FilterAlignmentArtifacts with multiple tumor samples is to specify the multi-sample VCF for the -V argument and repeat the -I argument, once for each tumor BAM. This will consider whether each variant is a mapping artifact by considering evidence from all samples together. Thus it assumes that a variant is either a mapping artifact in all samples or a mapping artifact in none.
-
Thanks. The documentation is still confusing in 4.1.8 because the Usage example says: -I somatic_bamout.bam. A filename of "bamout" implies Mutect2 output rather than the original input BAM. I suggest to make this more clear.
That's good to know that FilterAlignmentArtifacts can handle multi-sample situations. It would be nice if you all are able to publish a multi-sample wdl for Mutect2. I'll try to make one myself in the meantime anyway.
-
Hi GE, I'll make a note for our team to see if we can make a change to the FilterAlignmentArtifacts documentation so it is more clear. Thank you for the suggestion!
-
For CalculateContamination, can GatherPileupSummaries be used to combine the PileupSummaries from all normal samples?
Note also that online GATK documentation does not document GatherPileupSummaries, even though it is in gatk and part of the best practices Mutect2 pipeline on Terra.
-
One more question for David Benjamin:
You wrote: "3. Run FilterMutectCalls once using the two files from step 2. That is, specify the argument twice: "-contamination-table tumor1-contamination.table -contamination-table tumor2-contamination.table""
Should FilterMutectCalls also receive the maf_segments (--tumor-segmentation) from all the tumor samples?
-
GE The purpose of GatherPileupSummaries is to combine the output from disjoint scatter jobs. It's not meant for the purpose you are describing so we would recommend that you stick to our best practices and David's advice above of choosing one normal.
-
For the multisample Mutect2 calls:
According to the 4.4.0. documentation , once can now specify --orientation-bias-artifact-priors to run FilterMutectCalls in order to filter based on sequence context artifacts (the input is the tar.gz file from LearnReadOrientationModel). My question is: do I run LearnReadOrientationModel for each sample (derived from the same patient) separately as it is also performed for GetPileupSummaries and CalculateContamination? In addition, do we follow the same approach in the tumor-only mode (regarding the LearnReadOrientationModel, GetPileupSummaries, and CalculateContamination steps)?
Finally, for multi-sample tumor-only mode: does the resulted VCF contain variants that are present in some samples and absent in others (GT 0/0), or does it only report only those found in all the samples?
Thank you!
-
Maria Kyriakidou You should run LearnReadOrientationModel separately for each tumor sample and specify --orientation-bias-artifact-priors multiple times in the FilterMutectCalls command. The tar.gz files contain the information as to which sample they come from. It's the same pipeline for tumor-only, the difference being that the commands for Mutect2, GetPileupSummaries, and CalculateContamination do not specify a normal.
In multi-sample mode (with or without a matched normal), a PASS variant means that FilterMutectCalls thinks the variant exists in at least one sample. It does not attempt to decided which samples exhibit the variant and which do not. The GT field is meaningless.
Please sign in to leave a comment.
15 comments