FilterMutectCalls 'haplotype' filter value assigned to variants with different PGT tag
AnsweredHello,
I am using GATK v4.2.5.0 to process tumor-only samples sequenced with WES.
In a sample, one variant that has been detected with Sanger sequencing (chr14-45137087-C-T) gets filtered out as non-PASS (also) because of the 'haplotype' filter value. As far as the 'haplotype' filter value is concerned, the 'guilty' variant seems to be another SNP 3bp upstream (chr14-45137084-C-T). There are no other variants called within 100bp of the Sanger-validated one (see below).
chr14 45136964 . C T . haplotype;weak_evidence AS_FilterStatus=weak_evidence;AS_SB_TABLE=3,0|1,0;DP=4;ECNT=2;GERMQ=25;MBQ=41,37;MFRL=360,390;MMQ=60,60;MPOS=69;POPAF=7.30;ROQ=17;TLOD=3.20 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:3,1:0.333:4:1,0:1,1:0|1:45136962_C_T:45136962:3,0,1,0
chr14 45137084 . C T . germline;haplotype;panel_of_normals AS_FilterStatus=SITE;AS_SB_TABLE=9,1|12,5;DP=27;ECNT=2;GERMQ=1;MBQ=41,41;MFRL=297,326;MMQ=60,60;MPOS=45;PON;POPAF=0.830;ROQ=90;TLOD=59.93 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:10,17:0.615:27:4,13:4,4:0|1:45137084_C_T:45137084:9,1,12,5
chr14 45137087 . C T . germline;haplotype AS_FilterStatus=SITE;AS_SB_TABLE=12,5|9,1;DP=27;ECNT=2;GERMQ=1;MBQ=41,41;MFRL=326,297;MMQ=60,60;MPOS=44;POPAF=2.33;ROQ=93;TLOD=31.76 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 1|0:17,10:0.385:27:13,4:4,6:1|0:45137084_C_T:45137084:12,5,9,1
chr14 45149295 . AC A . haplotype;weak_evidence AS_FilterStatus=weak_evidence;AS_SB_TABLE=0,0|0,0;DP=1;ECNT=2;GERMQ=8;MBQ=0,27;MFRL=0,407;MMQ=60,60;MPOS=15;POPAF=7.30;ROQ=93;TLOD=4.20 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:0,1:0.667:1:0,1:0,0:0|1:45149295_AC_A:45149295:0,0,0,1
However, the two variants have two different PGT tags (0|1 and 1|0). From the --distance-on-haplotype documentation, I gathered that the 'haplotype' filter value could be assigned only to variants with the same PGT and PID tags, within 100bp (default) and already filtered out for other reasons.
The command that I used is the following:
gatk FilterMutectCalls \
-R $RefGenome \
-V $TempSampleDir/$SampleName.unfiltered.vcf \
--tumor-segmentation $TempSampleDir/$SampleName.segments_table \
--contamination-table $TempSampleDir/$SampleName.contamination_table \
--ob-priors $TempSampleDir/$SampleName.tumor_artifact_prior.tar.gz \
-O $TempSampleDir/$SampleName.filtered.vcf
Am I missing something, or is the 'haplotype' filter value mistakenly assigned to these two variants?
-
Yes, this does look like bug with the phasing. Could you send in a bug report with a small snippet of your files that recreate this issue? The instructions of how to do that are here: https://gatk.broadinstitute.org/hc/en-us/articles/360035889671.
Let me know once you have uploaded your files and I will take a look.
Best,
Genevieve
-
Hi Genevieve,
many thanks for this. I have tried to upload the file multiple times (last attempt with a file called bug_report_fmazzarotto.tar.gz) but I am unable to say if the upload was successful as both Filezilla and the ftp upload via terminal behaved strangely (Filezilla said that the upload was successful but then placed it among the "failed transfers", and the terminal seems to get stuck on the message "150 Opening Binary Mode data Connection"). However, trying to re-upload the file without renaming it, I get a "overwrite permission denied" message, as if the previous upload actually worked.
Would you please be able to check if the file was actually uploaded?Best wishes
Francesco
-
Thanks Francesco, it was successful. We'll take a look.
-
I am also seeing this as well for version 4.2.1.0. I will see if I can create bug report but it indicates to only do so if asked so let me know.
I also get something like below for PGT:PID
0|1:2720441_C_A
1|0:2720441_C_AchrX 2720441 . C A . haplotype;orientation;weak_evidence AS_FilterStatus=weak_evidence;AS_SB_TABLE=53,127|1,3;DP=189;ECNT=2;GERMQ=93;MBQ=20,20;MFRL=159,119;MMQ=60,60;MPOS=26;NALOD=1.78;NLOD=17.76;POPAF=6;ROQ=1;TLOD=3.2;AC=1;AN=2 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:83,4:0.048:87:48,0:20,3:0|1:2720441_C_A:2720441:10,73,1,3
chrX 2720445 . G T . haplotype;orientation AS_FilterStatus=SITE;AS_SB_TABLE=43,123|2,4;DP=181;ECNT=2;GERMQ=93;MBQ=30,20;MFRL=160,109;MMQ=60,60;MPOS=30;NALOD=1.75;NLOD=16.55;POPAF=6;ROQ=1;TLOD=6.37;AC=1;AN=2 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 1|0:79,6:0.062:85:41,3:24,0:1|0:2720441_C_A:2720441:7,72,2,4 -
Am I able to access to see updates on the bug report as a viewer? No big deal if I cannot view it though :)
-
Brian Wiley thanks for letting us know! We are taking a look at Francesco's bug report and will let you know if we need any information from you. We aren't able to share the data.
-
Brian Wiley Francesco Mazzarotto Thank you for the bug report Francesco! We were able to identify that the haplotype filter only identifies PID matches and does not look at the PGT. I opened an issue ticket so that we can get the haplotype filter working properly: https://github.com/broadinstitute/gatk/issues/7809
Another aspect of this update is we are planning on making the --linked-de-bruijn-graph option on as default with Mutect2 because we think this will make the PGT more reliable and accurate.
-
Hi Genevieve - thanks very much for this. Is there an approximate date for the update release?
-
Francesco Mazzarotto I don't have an estimate for when it will get done at this point. Our developers are really busy and there are other projects our developer team is actively working on before they take a look at this bug report.
When I discussed this with the developers, they noted that the variants that have this filter seem like true negatives. I think you can still use this tool even with this issue persisting.
Let me know if you have any other concerns.
-
Hi Genevieve, I have met the same problem when I used GATK 4.3.0.0 Mutect2. Here is an example. I haven't use FilterMutectCalls, but I don't know how to deal with such variants. Some of them seems to be true considering the AF, DP or AC.
As you can see from the example, MT15652 and MT15807 seem to be a true variant considering their AFs.
MT15652 and MT15711 have the same PID (both are 15652_C_A).
MT15744 and MT15807 have the same PID (both are 15744_C_T) . I have no idea which one should I keep? If I keep 15744, the AF indicates it is not a true variant.
Could you please give me some advice? I am a new user of GATK.
Please sign in to leave a comment.
10 comments