Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Read-orientation filter removes tens of thousands of somatic variants after FilterMutectCalls

Answered
0

6 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Santiago Sanchez,

    Could you include some of the variant examples that you think are getting incorrectly filtered out? This would help us to troubleshoot the filter.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Santiago Sanchez

    Hi Genevieve Brandt (she/her)

    Thanks for getting back to me. Here are lists for the first 10 calls:

    Without the --ob-priors filter (PASS):

    chr1 847768 . G T . PASS AS_FilterStatus=SITE;AS_SB_TABLE=29,51|1,2;DP=87;ECNT=1;GERMQ=93;MBQ=24,20;MFRL=168,133;MMQ=60,60;MPOS=35;NALOD=1.58;NLOD=10.79;POPAF=6;TLOD=3.84 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:46,0:0.026:46:31,0:1,0:14,32,0,0 0/1:34,3:0.103:37:22,2:0,0:15,19,1,2
    chr1 939045 . C A . PASS AS_FilterStatus=SITE;AS_SB_TABLE=92,87|3,2;DP=188;ECNT=2;GERMQ=93;MBQ=20,25;MFRL=165,184;MMQ=60,60;MPOS=17;NALOD=0.055;NLOD=15.9;POPAF=6;TLOD=4.73 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:99,1:0.03:100:2,0:55,1:51,48,1,0 0/1:80,4:0.066:84:2,0:48,3:41,39,2,2
    chr1 939075 . C A . PASS AS_FilterStatus=SITE;AS_SB_TABLE=100,101|3,2;DP=212;ECNT=2;GERMQ=93;MBQ=20,24;MFRL=158,160;MMQ=60,60;MPOS=29;NALOD=1.89;NLOD=22.78;POPAF=6;TLOD=6.4 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:112,0:0.013:112:2,0:57,0:55,57,0,0 0/1:89,5:0.073:94:1,0:53,3:45,44,3,2
    chr1 943068 . C A . PASS AS_FilterStatus=SITE;AS_SB_TABLE=36,38|1,2;DP=77;ECNT=1;GERMQ=93;MBQ=34,20;MFRL=195,122;MMQ=60,60;MPOS=51;NALOD=1.54;NLOD=10.19;POPAF=6;TLOD=3.89 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:41,0:0.028:41:1,0:31,0:21,20,0,0 0/1:33,3:0.107:36:1,0:23,2:15,18,1,2
    chr1 956083 . C A . PASS AS_FilterStatus=SITE;AS_SB_TABLE=227,252|2,6;DP=505;ECNT=1;GERMQ=93;MBQ=20,23;MFRL=174,168;MMQ=60,60;MPOS=37;NALOD=-0.6425;NLOD=58.41;POPAF=6;TLOD=6.58 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:301,2:0.013:303:9,0:191,2:141,160,0,2 0/1:178,6:0.039:184:6,0:106,4:86,92,2,4
    chr1 957086 . C T . PASS AS_FilterStatus=SITE;AS_SB_TABLE=60,91|2,3;DP=163;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=173,194;MMQ=60,60;MPOS=29;NALOD=1.83;NLOD=19.77;POPAF=6;TLOD=7.31 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:91,0:0.015:91:2,0:54,0:36,55,0,0 0/1:60,5:0.083:65:1,0:37,3:24,36,2,3
    chr1 961427 . C A . PASS AS_FilterStatus=SITE;AS_SB_TABLE=65,23|2,2;DP=97;ECNT=1;GERMQ=93;MBQ=34,20;MFRL=186,141;MMQ=60,60;MPOS=49;NALOD=1.56;NLOD=10.49;POPAF=6;TLOD=5.34 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:42,0:0.027:42:2,0:31,0:30,12,0,0 0/1:46,4:0.071:50:2,0:32,2:35,11,2,2
    chr1 961521 . C A . PASS AS_FilterStatus=SITE;AS_SB_TABLE=128,98|2,3;DP=239;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=169,150;MMQ=60,60;MPOS=51;NALOD=1.91;NLOD=23.73;POPAF=6;TLOD=6.09 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:114,0:0.012:114:5,0:71,0:67,47,0,0 0/1:112,5:0.051:117:6,0:60,3:61,51,2,3
    chr1 961624 . C A . PASS AS_FilterStatus=SITE;AS_SB_TABLE=67,99|3,3;DP=178;ECNT=1;GERMQ=93;MBQ=27,20;MFRL=184,135;MMQ=60,60;MPOS=25;NALOD=1.89;NLOD=22.87;POPAF=6;TLOD=12.62 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:104,0:0.013:104:7,0:59,0:43,61,0,0 0/1:62,6:0.071:68:7,0:40,3:24,38,3,3
    chr1 979368 . C A . PASS AS_FilterStatus=SITE;AS_SB_TABLE=20,29|2,1;DP=57;ECNT=1;GERMQ=93;MBQ=26,20;MFRL=179,175;MMQ=60,60;MPOS=50;NALOD=1.35;NLOD=6.02;POPAF=6;TLOD=3.64 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:25,0:0.043:25:1,0:16,0:10,15,0,0 0/1:24,3:0.141:27:2,0:14,2:10,14,2,1

    With the --ob-priors filter:

    chr1 847768 . G T . orientation;weak_evidence AS_FilterStatus=weak_evidence;AS_SB_TABLE=29,51|1,2;DP=87;ECNT=1;GERMQ=93;MBQ=24,20;MFRL=168,133;MMQ=60,60;MPOS=35;NALOD=1.58;NLOD=10.79;POPAF=6;ROQ=1;TLOD=3.84 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:46,0:0.026:46:31,0:1,0:14,32,0,0 0/1:34,3:0.103:37:22,2:0,0:15,19,1,2
    chr1 939045 . C A . normal_artifact;orientation;weak_evidence AS_FilterStatus=weak_evidence;AS_SB_TABLE=92,87|3,2;DP=188;ECNT=2;GERMQ=93;MBQ=20,25;MFRL=165,184;MMQ=60,60;MPOS=17;NALOD=0.055;NLOD=15.9;POPAF=6;ROQ=1;TLOD=4.73 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:99,1:0.03:100:2,0:55,1:51,48,1,0 0/1:80,4:0.066:84:2,0:48,3:41,39,2,2
    chr1 939075 . C A . orientation;weak_evidence AS_FilterStatus=weak_evidence;AS_SB_TABLE=100,101|3,2;DP=212;ECNT=2;GERMQ=93;MBQ=20,24;MFRL=158,160;MMQ=60,60;MPOS=29;NALOD=1.89;NLOD=22.78;POPAF=6;ROQ=1;TLOD=6.4 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:112,0:0.013:112:2,0:57,0:55,57,0,0 0/1:89,5:0.073:94:1,0:53,3:45,44,3,2
    chr1 943068 . C A . orientation;weak_evidence AS_FilterStatus=weak_evidence;AS_SB_TABLE=36,38|1,2;DP=77;ECNT=1;GERMQ=93;MBQ=34,20;MFRL=195,122;MMQ=60,60;MPOS=51;NALOD=1.54;NLOD=10.19;POPAF=6;ROQ=1;TLOD=3.89 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:41,0:0.028:41:1,0:31,0:21,20,0,0 0/1:33,3:0.107:36:1,0:23,2:15,18,1,2
    chr1 956083 . C A . normal_artifact;orientation AS_FilterStatus=SITE;AS_SB_TABLE=227,252|2,6;DP=505;ECNT=1;GERMQ=93;MBQ=20,23;MFRL=174,168;MMQ=60,60;MPOS=37;NALOD=-0.6425;NLOD=58.41;POPAF=6;ROQ=1;TLOD=6.58 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:301,2:0.013:303:9,0:191,2:141,160,0,2 0/1:178,6:0.039:184:6,0:106,4:86,92,2,4
    chr1 957086 . C T . orientation AS_FilterStatus=SITE;AS_SB_TABLE=60,91|2,3;DP=163;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=173,194;MMQ=60,60;MPOS=29;NALOD=1.83;NLOD=19.77;POPAF=6;ROQ=1;TLOD=7.31 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:91,0:0.015:91:2,0:54,0:36,55,0,0 0/1:60,5:0.083:65:1,0:37,3:24,36,2,3
    chr1 961427 . C A . orientation;weak_evidence AS_FilterStatus=weak_evidence;AS_SB_TABLE=65,23|2,2;DP=97;ECNT=1;GERMQ=93;MBQ=34,20;MFRL=186,141;MMQ=60,60;MPOS=49;NALOD=1.56;NLOD=10.49;POPAF=6;ROQ=1;TLOD=5.34 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:42,0:0.027:42:2,0:31,0:30,12,0,0 0/1:46,4:0.071:50:2,0:32,2:35,11,2,2
    chr1 961521 . C A . orientation;weak_evidence AS_FilterStatus=weak_evidence;AS_SB_TABLE=128,98|2,3;DP=239;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=169,150;MMQ=60,60;MPOS=51;NALOD=1.91;NLOD=23.73;POPAF=6;ROQ=1;TLOD=6.09 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:114,0:0.012:114:5,0:71,0:67,47,0,0 0/1:112,5:0.051:117:6,0:60,3:61,51,2,3
    chr1 961624 . C A . orientation AS_FilterStatus=SITE;AS_SB_TABLE=67,99|3,3;DP=178;ECNT=1;GERMQ=93;MBQ=27,20;MFRL=184,135;MMQ=60,60;MPOS=25;NALOD=1.89;NLOD=22.87;POPAF=6;ROQ=1;TLOD=12.62 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:104,0:0.013:104:7,0:59,0:43,61,0,0 0/1:62,6:0.071:68:7,0:40,3:24,38,3,3
    chr1 979368 . C A . orientation;weak_evidence AS_FilterStatus=weak_evidence;AS_SB_TABLE=20,29|2,1;DP=57;ECNT=1;GERMQ=85;MBQ=26,20;MFRL=179,175;MMQ=60,60;MPOS=50;NALOD=1.35;NLOD=6.02;POPAF=6;ROQ=1;TLOD=3.64 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:25,0:0.043:25:1,0:16,0:10,15,0,0 0/1:24,3:0.141:27:2,0:14,2:10,14,2,1

    Thank you!

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Santiago Sanchez,

    These variants look like examples of orientation bias rather than true variants. They are all low allele fraction variants only supported by one strand. Most of them are also C to A variants. These are likely signs of orientation bias. Do you have any other evidence that real variants are being filtered out by the orientation bias filter?

    You can read more about the orientation bias filtering here:

    Let me know if you have further questions.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Santiago Sanchez

    Hi Genevieve Brandt (she/her),

    We do not have more data on this sample, but we do know the individual is replication repair deficient, which leads to high mutation loads. Since POLE and POLD1 mutations can lead to strand-specific mutations. With data produced without the filter, we are able to match COSMIC mutational signatures that match POLD1 deficiency (https://cancer.sanger.ac.uk/signatures/sbs/sbs10d/) which is characterized by a high frequency of C -> A changes.

    I guess our worry is that, for some reason, the read-orientation-bias model is picking up the strand-specific signature of mutations generated by deficient POLE/POLD1. If there is a chance this is the case, is there a way to sort this out when the LearnReadOrientation model runs?

    Also, this data comes from a relapse sample. I'll check if mutations from the original sample match the ones that are being filtered out here.

    Thanks,

    Santiago

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    I see, thanks for the extra information. I'll see if one of our developers has time to look at your question to determine the best solution for your issue.

    0
    Comment actions Permalink
  • Avatar
    Takuto Sato

    Hi Santiago,

    The read orientation filter was designed with the assumption that context-specific mutations (e.g. GCA -> GTA) that occur throughout the genome are not real and therefore should be filtered out. It is useful for FFPE samples for example, where context- and strand-specific artifact mutations are often observed across the genome.

    In your case, it sounds like the exact mutation signature you wish to detect is the same as the signature that the filter assumes is artifact, so I'm inclined to suggest that you simply not use the filter.

    But I also noticed that your data is always skewed towards F1R2 or F2R1 reads; i.e. at a given locus most reads are either entirely F1R2 or F2R1, whether they support the reference or the alt allele. This would be normal in strand-specific RNA-seq, but not in WES. It suggests that one strand of the DNA is preferentially selected for sequencing over the other. Is that expected?

     

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk