Mutect2 Shared Variant PASS in one tumor sample, but not other samples
REQUIRED for all errors and issues:
a) GATK version used:
4.4.0.0
b) Exact command used:
I am running Mutect2 using the workflow nf-core/sarek. The main script and config file are found on github (linked below). As far as I can tell, Mutect2 via nf-core/sarek is using default run parameters with tumor-only samples for my usage case.
The inputs to the nf-core/sarek workflow are 1) paired-end read files for each sample and 2) an input metadata file containing sample names, tumor status (0/1), path to files. The reads undergo QC via FastQC (read checking) and FastP (read trimming). The reads are mapped to GRCh38 via BWA. Mutect2 is used for variant calling. Each variant is annotated by VEP.
Links:
c) Entire program log:
There is no Mutect2-specific log file provided. Mutect2 runs and is completed and detected several thousands of variants per sample. Overall workflow log files can be provided as needed.
d) Issue:
I am using mutect2 (tumor-only, using Mutect2 provided PoN) to run 3 tumor samples from the same patient. Mutect2 runs to completion and lists variants for each sample. However, we've noticed in some genes that a variant will PASS (under the FILTER column) in one sample, but not another sample. Additionally, this observation is seen when running multi-sample mode as well (i.e., no change).
Example: The variant at position 7675085 (see green) passes for all three samples, but 7667396 (see yellow) only passes for Sample 2, but not Sample 1 and Sample 3. Further confounding this is that 7667396 is labeled as germline for Sample 1 and Sample 3.
e) Questions:
- Is it expected for a shared variant to PASS/FAIL across different samples?
- What may be causing this variant discrepancy? Shouldn't a germline detected variant not pass for all samples?
- Because the variant passes in one sample, but not the others, does that affect our confidence in the passing variant call (i.e., how confident are we in reporting this variant)?
- What can be done to interpret this result or troubleshoot this observation?
-
Hi David Moline
Can you share those 3 whole variant lines from their respective VCFs so that we can take a look at a bit more carefully?
-
Gökalp Çelik Thank you for responding! Here are the mutect2 VCF outputs for each of the three variants.
I do want to note that this observation (variant calling discrepancy) is not exclusive to just this variant, but is occurring for many more variants as well (many of them germline in addition to other reasons for failing)
In general, I'm confused on how to analyze what happens when a variant's filter PASS in one sample, but not others. Any guidance you have would be appreciated!
Let me know if you need anything else!#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE_NAME Sample 1 chr17 7667396 . G C . germline AS_FilterStatus=SITE;AS_SB_TABLE=28,24|70,61;DP=183;ECNT=1;GERMQ=1;MBQ=20,20;MFRL=177,178;MMQ=60,60;MPOS=37;POPAF=3.91;ROQ=93;TLOD=328.89 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:52,131:0.724:183:18,43:11,34:29,78:28,24,70,61 Sample 2 chr17 7667396 . G C . PASS AS_FilterStatus=SITE;AS_SB_TABLE=2,2|78,68;DP=150;ECNT=1;GERMQ=23;MBQ=20,20;MFRL=141,176;MMQ=60,60;MPOS=41;POPAF=3.91;ROQ=93;TLOD=367.64 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:4,146:0.964:150:1,36:1,43:2,79:2,2,78,68 Sample 3 chr17 7667396 . G C . germline AS_FilterStatus=SITE;AS_SB_TABLE=2,1|51,56;DP=113;ECNT=1;GERMQ=2;MBQ=41,20;MFRL=332,215;MMQ=60,60;MPOS=37;POPAF=3.91;ROQ=93;TLOD=304.42 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:3,107:0.949:110:2,34:1,31:3,73:2,1,51,56 -
Here is a more readable screenshot:
-
Hi again.
I do not see anything to differentiate between calls just from these so I have a few more questions.
Can you run these instances without any nfcore scripts? Just from the command line if you can. And you may limit the region around this variant (+ - 500bps on each end) to make your calls just to make sure that we are not hindered by any other stuff along the way.
-
Hello Gökalp Çelik,
Here is a new screenshot using mutect2 via command line. The command I used is shown below (this basic script was repeated for each sample):
gatk Mutect2 \
--input /home/.../Moline_David_NEPC_Output_2-13-2024/preprocessing/mapped/TN21_S61/TN21_S61.sorted.bam \
--output /home/.../mutect2/mutect2_NEPC_2-13-24/TN21_mutect2_2-13-2024.vcf.gz \
--reference /home/.../igenomes_GRCh38/references/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta \
--germline-resource /home/.../igenomes_GRCh38/references/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/af-only-gnomad.hg38.vcf.gz \
--panel-of-normals /home/.../igenomes_GRCh38/references/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/1000g_pon.hg38.vcf.gz \Here is the screenshot of variant 7667396:
Two things I noticed:
- This variant calling run took several hours to complete and resulted in hundreds of thousands of variants. I think I'm missing an essential filtering step.
- The FILTER column lists no filters parameter for any variant.
Let me know what your thoughts are and how I can improve my variant calling more! Thank you so much!
-
Hi David Moline
You need to use FilterMutectCalls tool to get your filters applied. Can you run that tool and check the variant states?
Regards.
-
Hello Gökalp Çelik
Here are the results using FilterMutectCalls. I also ran the data using LearnReadOrientationModel, GetPileupSummaries, and Calculate Contamination. This script should be very close to what the nf-core/sarek Mutect2 process is (I tried my best to use the same inputs/programs according to the nf-core/sarek log files).
Code (repeated for each of the three samples):gatk Mutect2 \
--input /home/.../Desktop/NEPC/Moline_David_NEPC_Output_2-13-2024/preprocessing/mapped/TN21_S61/TN21_S61.sorted.bam \
--reference /home/.../Desktop/igenomes_GRCh38/references/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta \
--intervals /home/.../raw_data/WES_720G.30padded.V2.3.bed \
--germline-resource /home/.../Desktop/igenomes_GRCh38/references/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/af-only-gnomad.hg38.vcf.gz \
--panel-of-normals /home/.../Desktop/igenomes_GRCh38/references/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/1000g_pon.hg38.vcf.gz \
--f1r2-tar-gz f1r2.tar.gz \
--output unfiltered.vcf
gatk LearnReadOrientationModel \
--input f1r2.tar.gz \
--output read-orientation-model-artifactprior.tar.gz
gatk GetPileupSummaries \
--input /home/.../Desktop/NEPC/Moline_David_NEPC_Output_2-13-2024/preprocessing/mapped/TN21_S61/TN21_S61.sorted.bam \
--reference /home/.../Desktop/igenomes_GRCh38/references/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta \
--variant /home/.../Desktop/igenomes_GRCh38/references/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/af-only-gnomad.hg38.vcf.gz \
--intervals /home/.../raw_data/WES_720G.30padded.V2.3.bed \
--output getpileupsummaries.table
gatk CalculateContamination \
--input getpileupsummaries.table \
--tumor-segmentation segments.table \
--output contamination.table
gatk FilterMutectCalls \
--variant unfiltered.vcf \
--reference /home/.../Desktop/igenomes_GRCh38/references/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta \
--tumor-segmentation segments.table \
--contamination-table contamination.table \
--orientation-bias-artifact-priors read-orientation-model-artifactprior.tar.gz \
--output filtered.vcf
Results:# Sample 1
chr17 7667396 . G C . germline AS_FilterStatus=SITE;AS_SB_TABLE=32,27|84,68;DP=211;ECNT=1;GERMQ=1;MBQ=20,20;MFRL=195,187;MMQ=60,60;MPOS=41;POPAF=3.91;ROQ=93;TLOD=386.66 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:59,152:0.732:211:22,52:11,39:33,92:32,27,84,68
# Sample 2
chr17 7667396 . G C . PASS AS_FilterStatus=SITE;AS_SB_TABLE=2,2|89,78;DP=171;ECNT=1;GERMQ=24;MBQ=20,20;MFRL=141,176;MMQ=60,60;MPOS=39;POPAF=3.91;ROQ=93;TLOD=419.84 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:4,167:0.968:171:1,39:1,50:2,90:2,2,89,78
# Sample 3
chr17 7667396 . G C . germline AS_FilterStatus=SITE;AS_SB_TABLE=7,1|81,72;DP=164;ECNT=1;GERMQ=1;MBQ=37,20;MFRL=332,228;MMQ=60,60;MPOS=40;POPAF=3.91;ROQ=93;TLOD=452.48 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:8,153:0.925:161:6,50:2,50:8,110:7,1,81,72Let me know what you think! Hope all is well.
-
Hi David Moline
It is good that the results you have without any external tools is consistent with that you had before. Something is seems to be different between these 3 samples in terms of that site or other similar sites. Is there another variant around this site in sample 2 vcf or sample1 and 3 vcfs?
Can you share IGV views of Bam files for this site and around if it's not a problem?
-
Hi again.
After a brief discussion with our team here is a possible situation. It looks like what Mutect2 and FilterMutectCalls doing is right here. Passing variant has a high allele frequency indicating that there could be a loss of heterozygosity due to copy number variation around that region, which is calculated by GetPileupSummaries and CalculateContamination tools, therefore raising the probability that the passing variant has higher GERMQ score which indicates the likelihood of not being a germline event.
I hope this helps.
-
Hi Gökalp Çelik,
So the results from GetPileupSummaries and CalculateContamination imply loss of heterozygosity potentially from copy number variation. Is there a way to visualize the copy number variation within a region around this variant using the Mutect2 data?
Thank you for all your help!
David
-
Hi David Moline
We have a Somatic CNV workflow that you can use to call and visualize somatic copy number variations.
We have workflows in wdls provided in the official GATK Repository.
https://github.com/broadinstitute/gatk/tree/4.5.0.0/scripts/cnv_wdl/somatic
I hope this helps.
Please sign in to leave a comment.
11 comments