### Overview

Fisher’s Exact Test is a statistical test that is used to analyze contingency tables, where contingency tables are matrices that contain the frequencies of the variables in play. According to statistics lore, noted statistician R.A.Fisher invented the test to determine if Dr. Muriel Bristol could actually tell the difference between milk being added to her tea or tea being added to her milk (she couldn’t). Fisher’s Exact Test is so named because it allows us to calculate the exact p-value for the experiment, rather than having to rely on an approximation. The p-value gives us the probability of observing the set of results we obtained if the null hypothesis were true, *i.e.* getting those results purely by chance.

### Mathematical theory

The Wolfram Math World article on Fisher’s Exact Test includes some very helpful information on the theoretical underpinnings of the test, as well as an example of how it can be applied.

### Use in GATK

In GATK, we use Fisher’s Exact Test to calculate the FisherStrand annotation, which is an indicator of strand bias, a common source of artifactual calls. The test determines whether there is a difference in the number of reads that support the reference allele and alternate allele on each strand (*i.e.* number of reads in forward and reverse orientation). The value is reported in the FisherStrand annotation, FS in the VCF.

### Example: Fisher Strand in practice

*Note: This example follows the steps given in the Wolfram article linked above.*

In this example, we want to determine if there is a difference in the number of reads that support the reference allele and alternate allele on each strand. Our null hypothesis is that there is no difference in the number of reads that support the reference allele and alternate allele on each strand (there is no strand bias). We will calculate a p-value that tells us the probability of observing our data if our null hypothesis is true (or, that there is no strand bias). The lower the p-value, the less likely we are to believe that there is no strand bias.

Let’s say we have 3 reads supporting the reference allele on the forward strand and 0 reads supporting the reference allele on the reverse strand. We also have 0 reads supporting the alternate allele on the forward strand and 3 reads supporting the alternate allele on the reverse strand.

The contingency table, or matrix, looks like this:

Forward Strand | Reverse Strand | Total | |
---|---|---|---|

Reference Allele | 3 | 0 | 3 |

Alternate Allele | 0 | 3 | 3 |

Total | 3 | 3 | 6 |

At first glance, it seems obvious there is some bias going on here, because each allele is only seen either on the forward strand or the reverse strand. To determine with confidence whether there really is strand bias, we will perform Fisher’s Exact Test on this set of observations.

We first use the hypergeometric probability function to calculate the probability of getting the exact matrix we have above. The probability calculation for a 2 x 2 matrix is:

Let’s define the variables in that equation:

- R1 = sum of row 1
- R2 = sum of row 2
- C1 = sum of column 1
- C2 = sum of column 2
- N = R1 + R2 = C1 + C2
- aij = values in matrix where i and j are row and column numbers

Now, let’s calculate the probability P for our own matrix above:

That gives us the probability of observing our own data. However, for our test, we need the probability of observing our own data *and* more extreme data. So now we need to calculate the probability of observing more extreme data, which we'll define as any matrix that has the same row and column totals as our own, and also has a probability equal to or less than our matrix probability.

#### Matrix probability calculations

Let's find all possible matrices of non-negative integers that would be consistent with the given row and column totals (i.e. total number of observations) and calculate their probability using the formula for above.

- Original matrix (our experimental observations)

Forward Strand | Reverse Strand | Total | |
---|---|---|---|

Reference Allele | 3 | 0 | 3 |

Alternate Allele | 0 | 3 | 3 |

Total | 3 | 3 | 6 |

- Hypothetical matrix 1

Forward Strand | Reverse Strand | Total | |
---|---|---|---|

Reference Allele | 2 | 1 | 3 |

Alternate Allele | 1 | 2 | 3 |

Total | 3 | 3 | 6 |

- Hypothetical matrix 2

Forward Strand | Reverse Strand | Total | |
---|---|---|---|

Reference Allele | 1 | 2 | 3 |

Alternate Allele | 2 | 1 | 3 |

Total | 3 | 3 | 6 |

- Hypothetical matrix 3

Forward Strand | Reverse Strand | Total | |
---|---|---|---|

Reference Allele | 0 | 3 | 3 |

Alternate Allele | 3 | 0 | 3 |

Total | 3 | 3 | 6 |

#### Results

We see that the only matrix with a probability less than or equal to our matrix is hypothetical matrix 3. We will now add the probabilities of our own matrix and matrix 3 to get the final p-value.

Sum all p-values less than or equal to 0.05 to calculate overall P-value:

The p-value of 0.1 tells us there is a 10% chance that there is no statistically convincing evidence of bias, despite our strong intuition that the numbers look biased. This is because there are only 6 reads, and we can’t confidently say that there is really strand bias at work based on so few reads (observations). If we had seen more, we may have had more evidence to confidently say there is bias -- or we might have realized there is no bias at this site, and the numbers we saw were an accidental effect. If you’d like to see how our confidence scales with read numbers, try working out several cases with larger numbers of reads. You’ll need to draw up a lot of possible matrices!

Anyway, in the GATK context we still want to transform our FS annotation value to Phred scale for convenience before writing it out to the output VCF. To get the Phred-scaled p-value, we simply plug in the p-value of 0.1 into the Phred equation like this:

So the value of FS at this site would be 10. Note if we had a p-value of 1, meaning there is a 100% chance of there being no bias, the Phred score would be 0. So, a Phred-score closer to 0 means there is a lower chance of there being bias. Higher FS values therefore indicate more bias. See the documentation article on understanding hard-filtering recommendations for more commentary on how we interpret the value of FS in practice.

## 1 comment

Hello, the pictures/images are unreadable in this post. Could someone please fix those? Thank you very much!

Please sign in to leave a comment.