Missing documentation on strict_strand
AnsweredHi,
The strict_strand filter in FilterMutectCalls is listed in the output VCF header regardless of whether it is actually applied, but it is only applied when the --min-reads-per-strand setting is set to >0. This is confusing, because FILTERs should not be listed in the VCF header if they are not actually applied. Otherwise users might falsely think that the strict_strand filter was applied, but was not found for any variants.
Note also my other recent post about strand_bias not being annotated on any variants. I suspect there is another undocumented setting in FilterMutectCalls that is necessary to turn this on, but I'm not sure what setting that is.
Note: I think there is also a related bug. When I run FilterMutectCalls with or without '--min-read-per-strand 1', a small number of variants are labeled as weak_evidence when running with that extra parameter even though they do not have the strict_strand filter. And without that extra parameter, they are PASS.
Without --min-read-per-strand 1:
chr2 96552875 . T G . PASS AS_FilterStatus=SITE;AS_SB_TABLE=20,13|6,1;DP=42;ECNT=1;GERMQ=31;MBQ=30,20;MFRL=476,394;MMQ=60,60;MPOS=62;POPAF=7.30;ROQ=21;TLOD=8.76 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:33,7:0.190:40:11,4:14,0:20,13,6,1
With --min-read-per-strand 1:
chr2 96552875 . T G . weak_evidence AS_FilterStatus=weak_evidence;AS_SB_TABLE=20,13|6,1;DP=42;ECNT=1;GERMQ=7;MBQ=30,20;MFRL=476,394;MMQ=60,60;MPOS=62;POPAF=7.30;ROQ=21;TLOD=8.76 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:33,7:0.190:40:11,4:14,0:20,13,6,1
Based on the VCF header description of weak_evidence, this doesn't make sense: weak_evidence,Description="Mutation does not meet likelihood threshold"
Also, there is no reason why a variant would change from PASS to weak_evidence when the only parameter added was --min-read-per-strand 1. I suspect this is a bug.
Thanks.
-
Hi GE,
Regarding the VCF header, we do not change the header based on whether or not a certain filter was applied with a tool. I can see how that may be confusing, but that is how GATK works currently.
For the strict_strand filter, it is turned on when the argument --min-reads-per-strand is greater than 0. The integer sets the minimum number of reads required on both strands to support the alternate allele.
This strict_strand filter is a hard filter and is removing variants when they are only supported by reads in one direction. You can lose a lot of sensitivity with this filter, especially if you do not have high coverage. The reason why your variant is filtered by weak_evidence with the strict_strand filter turned on is because you are losing sensitivity. Mutect's internal model for the tumor being active loses evidence, which changes the weak_evidence filter.
The strict_strand filter is an advanced filter and we do not recommend it for general users.
Hope this helps you understand the output,
Genevieve
-
Thanks. That makes sense. The only part I think should be evaluated in future GATK versions is: "Regarding the VCF header, we do not change the header based on whether or not a certain filter was applied with a tool. "
This is because I think that situations of 'empty data' (i.e. the filter was not applied and that is why the variant doesn't have the filter) should be distinguished from 'data=0' situations (filter was applied but not found for the variant).
-
Yes, that is a confusing situation, but it is not what headers are meant to be used for. I discussed the header situation with the GATK team today and they noted that they are not planning to change this behavior. If other users chime in and also would like us to change this, however, we can continue the conversation.
-
Sure. Note that if others think this is useful, a compromise might be to add to the tag "NOT-USED", so that the description of the tag is still present, but the user also can see that it was not applied.
Please sign in to leave a comment.
4 comments