In this practical, we will inspect the statistical tests used to compare a set of genes of interest to a set of reference genes. This practical is essentially a tutorial, based on the result returned by David in the previous practical Handling genomic coordinates.

In this tutorial, we hade submitted a set of predicted E2F target genesc(see file M00920_targets.txt) to the Web tool DAVID, to compare it to various catalogues of functional annotations (Gene Ontology, KEGG, …). DAVID returned a table reporting the functional classes for which our gene set showed significant enrichment.

The goal of this tutorial is to reproduce the calculation of the significance. We will show two distinct ways to model the problem:

  1. Hypergeometric test: drawing at random a certain number of balls from an urn containing marked and non-marked balls (hypergeometric test).
  2. Fisher’s exact test: testing the independence between regulation (genes belonging or not to the E2F predicted gene set) and class membership (genes annotated or not as involved in cell cycle, in the GO annotations).

Click to open the image in a separate window