Read-orientation filter removes tens of thousands of somatic variants after FilterMutectCalls
AnsweredHello,
I'm having a head-scratcher with a paired tumor-normal whole-exome sequencing sample where, if I disable the read-orientation prior filter (--ob-priors) in FilterMutectCalls, I get tens of thousands of PASS calls (39718), but if I include the filter I only get 197 PASS calls.
This is the command I'm using:
$gatk_path/gatk --java-options "-Djava.io.tmpdir=./tmp" FilterMutectCalls \
-R $reference \
-V mutect2/${tumor}__${normal}.mutect2.unfiltered.${mode}.merged.vcf \
--contamination-table contamination/${tumor}__${normal}.calculatecontamination.table \
--tumor-segmentation contamination/${tumor}__${normal}.tumorsegmentation.table \
--ob-priors mutect2/f1r2/${tumor}__${normal}.read-orientation-model.tar.gz \
-O mutect2/${tumor}__${normal}.mutect2.filtered.${mode}.vcf
I'm using GATK version:
The Genome Analysis Toolkit (GATK) v4.2.3.0
HTSJDK Version: 2.24.1
Picard Version: 2.25.4
I should add that given that these samples are from individuals with replication-repair deficiency we are indeed expecting a high somatic mutation load. Therefore, we are worried that many false negatives are being filtered out.
Any help in getting this sorted out is appreciated!
-
Hi Santiago Sanchez,
Could you include some of the variant examples that you think are getting incorrectly filtered out? This would help us to troubleshoot the filter.
Best,
Genevieve
-
Hi Genevieve Brandt (she/her),
Thanks for getting back to me. Here are lists for the first 10 calls:
Without the --ob-priors filter (PASS):
chr1 847768 . G T . PASS AS_FilterStatus=SITE;AS_SB_TABLE=29,51|1,2;DP=87;ECNT=1;GERMQ=93;MBQ=24,20;MFRL=168,133;MMQ=60,60;MPOS=35;NALOD=1.58;NLOD=10.79;POPAF=6;TLOD=3.84 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:46,0:0.026:46:31,0:1,0:14,32,0,0 0/1:34,3:0.103:37:22,2:0,0:15,19,1,2
chr1 939045 . C A . PASS AS_FilterStatus=SITE;AS_SB_TABLE=92,87|3,2;DP=188;ECNT=2;GERMQ=93;MBQ=20,25;MFRL=165,184;MMQ=60,60;MPOS=17;NALOD=0.055;NLOD=15.9;POPAF=6;TLOD=4.73 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:99,1:0.03:100:2,0:55,1:51,48,1,0 0/1:80,4:0.066:84:2,0:48,3:41,39,2,2
chr1 939075 . C A . PASS AS_FilterStatus=SITE;AS_SB_TABLE=100,101|3,2;DP=212;ECNT=2;GERMQ=93;MBQ=20,24;MFRL=158,160;MMQ=60,60;MPOS=29;NALOD=1.89;NLOD=22.78;POPAF=6;TLOD=6.4 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:112,0:0.013:112:2,0:57,0:55,57,0,0 0/1:89,5:0.073:94:1,0:53,3:45,44,3,2
chr1 943068 . C A . PASS AS_FilterStatus=SITE;AS_SB_TABLE=36,38|1,2;DP=77;ECNT=1;GERMQ=93;MBQ=34,20;MFRL=195,122;MMQ=60,60;MPOS=51;NALOD=1.54;NLOD=10.19;POPAF=6;TLOD=3.89 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:41,0:0.028:41:1,0:31,0:21,20,0,0 0/1:33,3:0.107:36:1,0:23,2:15,18,1,2
chr1 956083 . C A . PASS AS_FilterStatus=SITE;AS_SB_TABLE=227,252|2,6;DP=505;ECNT=1;GERMQ=93;MBQ=20,23;MFRL=174,168;MMQ=60,60;MPOS=37;NALOD=-0.6425;NLOD=58.41;POPAF=6;TLOD=6.58 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:301,2:0.013:303:9,0:191,2:141,160,0,2 0/1:178,6:0.039:184:6,0:106,4:86,92,2,4
chr1 957086 . C T . PASS AS_FilterStatus=SITE;AS_SB_TABLE=60,91|2,3;DP=163;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=173,194;MMQ=60,60;MPOS=29;NALOD=1.83;NLOD=19.77;POPAF=6;TLOD=7.31 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:91,0:0.015:91:2,0:54,0:36,55,0,0 0/1:60,5:0.083:65:1,0:37,3:24,36,2,3
chr1 961427 . C A . PASS AS_FilterStatus=SITE;AS_SB_TABLE=65,23|2,2;DP=97;ECNT=1;GERMQ=93;MBQ=34,20;MFRL=186,141;MMQ=60,60;MPOS=49;NALOD=1.56;NLOD=10.49;POPAF=6;TLOD=5.34 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:42,0:0.027:42:2,0:31,0:30,12,0,0 0/1:46,4:0.071:50:2,0:32,2:35,11,2,2
chr1 961521 . C A . PASS AS_FilterStatus=SITE;AS_SB_TABLE=128,98|2,3;DP=239;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=169,150;MMQ=60,60;MPOS=51;NALOD=1.91;NLOD=23.73;POPAF=6;TLOD=6.09 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:114,0:0.012:114:5,0:71,0:67,47,0,0 0/1:112,5:0.051:117:6,0:60,3:61,51,2,3
chr1 961624 . C A . PASS AS_FilterStatus=SITE;AS_SB_TABLE=67,99|3,3;DP=178;ECNT=1;GERMQ=93;MBQ=27,20;MFRL=184,135;MMQ=60,60;MPOS=25;NALOD=1.89;NLOD=22.87;POPAF=6;TLOD=12.62 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:104,0:0.013:104:7,0:59,0:43,61,0,0 0/1:62,6:0.071:68:7,0:40,3:24,38,3,3
chr1 979368 . C A . PASS AS_FilterStatus=SITE;AS_SB_TABLE=20,29|2,1;DP=57;ECNT=1;GERMQ=93;MBQ=26,20;MFRL=179,175;MMQ=60,60;MPOS=50;NALOD=1.35;NLOD=6.02;POPAF=6;TLOD=3.64 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:25,0:0.043:25:1,0:16,0:10,15,0,0 0/1:24,3:0.141:27:2,0:14,2:10,14,2,1With the --ob-priors filter:
chr1 847768 . G T . orientation;weak_evidence AS_FilterStatus=weak_evidence;AS_SB_TABLE=29,51|1,2;DP=87;ECNT=1;GERMQ=93;MBQ=24,20;MFRL=168,133;MMQ=60,60;MPOS=35;NALOD=1.58;NLOD=10.79;POPAF=6;ROQ=1;TLOD=3.84 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:46,0:0.026:46:31,0:1,0:14,32,0,0 0/1:34,3:0.103:37:22,2:0,0:15,19,1,2
chr1 939045 . C A . normal_artifact;orientation;weak_evidence AS_FilterStatus=weak_evidence;AS_SB_TABLE=92,87|3,2;DP=188;ECNT=2;GERMQ=93;MBQ=20,25;MFRL=165,184;MMQ=60,60;MPOS=17;NALOD=0.055;NLOD=15.9;POPAF=6;ROQ=1;TLOD=4.73 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:99,1:0.03:100:2,0:55,1:51,48,1,0 0/1:80,4:0.066:84:2,0:48,3:41,39,2,2
chr1 939075 . C A . orientation;weak_evidence AS_FilterStatus=weak_evidence;AS_SB_TABLE=100,101|3,2;DP=212;ECNT=2;GERMQ=93;MBQ=20,24;MFRL=158,160;MMQ=60,60;MPOS=29;NALOD=1.89;NLOD=22.78;POPAF=6;ROQ=1;TLOD=6.4 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:112,0:0.013:112:2,0:57,0:55,57,0,0 0/1:89,5:0.073:94:1,0:53,3:45,44,3,2
chr1 943068 . C A . orientation;weak_evidence AS_FilterStatus=weak_evidence;AS_SB_TABLE=36,38|1,2;DP=77;ECNT=1;GERMQ=93;MBQ=34,20;MFRL=195,122;MMQ=60,60;MPOS=51;NALOD=1.54;NLOD=10.19;POPAF=6;ROQ=1;TLOD=3.89 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:41,0:0.028:41:1,0:31,0:21,20,0,0 0/1:33,3:0.107:36:1,0:23,2:15,18,1,2
chr1 956083 . C A . normal_artifact;orientation AS_FilterStatus=SITE;AS_SB_TABLE=227,252|2,6;DP=505;ECNT=1;GERMQ=93;MBQ=20,23;MFRL=174,168;MMQ=60,60;MPOS=37;NALOD=-0.6425;NLOD=58.41;POPAF=6;ROQ=1;TLOD=6.58 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:301,2:0.013:303:9,0:191,2:141,160,0,2 0/1:178,6:0.039:184:6,0:106,4:86,92,2,4
chr1 957086 . C T . orientation AS_FilterStatus=SITE;AS_SB_TABLE=60,91|2,3;DP=163;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=173,194;MMQ=60,60;MPOS=29;NALOD=1.83;NLOD=19.77;POPAF=6;ROQ=1;TLOD=7.31 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:91,0:0.015:91:2,0:54,0:36,55,0,0 0/1:60,5:0.083:65:1,0:37,3:24,36,2,3
chr1 961427 . C A . orientation;weak_evidence AS_FilterStatus=weak_evidence;AS_SB_TABLE=65,23|2,2;DP=97;ECNT=1;GERMQ=93;MBQ=34,20;MFRL=186,141;MMQ=60,60;MPOS=49;NALOD=1.56;NLOD=10.49;POPAF=6;ROQ=1;TLOD=5.34 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:42,0:0.027:42:2,0:31,0:30,12,0,0 0/1:46,4:0.071:50:2,0:32,2:35,11,2,2
chr1 961521 . C A . orientation;weak_evidence AS_FilterStatus=weak_evidence;AS_SB_TABLE=128,98|2,3;DP=239;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=169,150;MMQ=60,60;MPOS=51;NALOD=1.91;NLOD=23.73;POPAF=6;ROQ=1;TLOD=6.09 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:114,0:0.012:114:5,0:71,0:67,47,0,0 0/1:112,5:0.051:117:6,0:60,3:61,51,2,3
chr1 961624 . C A . orientation AS_FilterStatus=SITE;AS_SB_TABLE=67,99|3,3;DP=178;ECNT=1;GERMQ=93;MBQ=27,20;MFRL=184,135;MMQ=60,60;MPOS=25;NALOD=1.89;NLOD=22.87;POPAF=6;ROQ=1;TLOD=12.62 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:104,0:0.013:104:7,0:59,0:43,61,0,0 0/1:62,6:0.071:68:7,0:40,3:24,38,3,3
chr1 979368 . C A . orientation;weak_evidence AS_FilterStatus=weak_evidence;AS_SB_TABLE=20,29|2,1;DP=57;ECNT=1;GERMQ=85;MBQ=26,20;MFRL=179,175;MMQ=60,60;MPOS=50;NALOD=1.35;NLOD=6.02;POPAF=6;ROQ=1;TLOD=3.64 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:25,0:0.043:25:1,0:16,0:10,15,0,0 0/1:24,3:0.141:27:2,0:14,2:10,14,2,1Thank you!
-
Hi Santiago Sanchez,
These variants look like examples of orientation bias rather than true variants. They are all low allele fraction variants only supported by one strand. Most of them are also C to A variants. These are likely signs of orientation bias. Do you have any other evidence that real variants are being filtered out by the orientation bias filter?
You can read more about the orientation bias filtering here:
Let me know if you have further questions.
Best,
Genevieve
-
Hi Genevieve Brandt (she/her),
We do not have more data on this sample, but we do know the individual is replication repair deficient, which leads to high mutation loads. Since POLE and POLD1 mutations can lead to strand-specific mutations. With data produced without the filter, we are able to match COSMIC mutational signatures that match POLD1 deficiency (https://cancer.sanger.ac.uk/signatures/sbs/sbs10d/) which is characterized by a high frequency of C -> A changes.
I guess our worry is that, for some reason, the read-orientation-bias model is picking up the strand-specific signature of mutations generated by deficient POLE/POLD1. If there is a chance this is the case, is there a way to sort this out when the LearnReadOrientation model runs?
Also, this data comes from a relapse sample. I'll check if mutations from the original sample match the ones that are being filtered out here.
Thanks,
Santiago
-
I see, thanks for the extra information. I'll see if one of our developers has time to look at your question to determine the best solution for your issue.
-
Hi Santiago,
The read orientation filter was designed with the assumption that context-specific mutations (e.g. GCA -> GTA) that occur throughout the genome are not real and therefore should be filtered out. It is useful for FFPE samples for example, where context- and strand-specific artifact mutations are often observed across the genome.
In your case, it sounds like the exact mutation signature you wish to detect is the same as the signature that the filter assumes is artifact, so I'm inclined to suggest that you simply not use the filter.
But I also noticed that your data is always skewed towards F1R2 or F2R1 reads; i.e. at a given locus most reads are either entirely F1R2 or F2R1, whether they support the reference or the alt allele. This would be normal in strand-specific RNA-seq, but not in WES. It suggests that one strand of the DNA is preferentially selected for sequencing over the other. Is that expected?
Please sign in to leave a comment.
6 comments