Web site for this course



Abbrev Name
DP Denis Puthier
JvH Jacques van Helden




  • 2ème année du Master en Bioinformatique, biochimie structurale et génomique (BBSG).
  • Ecole doctorale


Tool About
R A free software environment for statistical computing and graphics
R markdown Documentation for R markdown language (used for the practicals)
Bioconductor A set of R libraries dedicated to statistical analysis of genomics data.
MeV: MultiExperiment Viewer A Java application designed to allow the analysis of expression data
Cluster 3.0 Implements the most commonly used clustering methods for gene expression
java Treeview Java-based tool to visualize trees prodced by hierarchical clustering togeter with a heatmap with expression proviles.


Students are expected to have followed the introduction to statistics in the first year of the master.

We assume that the following concepts are acquired.

  • Discrete distributions (geometric, binomial)
  • Sampling and estimation
  • Mean comparison tests (Student, Welch)

A basic knowledge of the R language is expected.

  • handling of variables and data frames (“tables”)
  • distributions of probability
  • drawing (histograms, dot plots)
  • test of hypothesis

If you did not receive any training.

During the first practical we will briefly revise these concepts and practical skills.

Schedule and contents

Date From To Subject Teacher Concepts Material
3/11 14:00 18:00 Introduction au cours JvH Diapos html Rmd
14:00 18:00 Detecting differentially expressed genes (DEG) with microrarrays DP - Hypothesis testing
- Student \(t\) statistics
- Unbiased estimation of variance
- MA plot
- Volcano plots
-P-value distribution
- E-value
- Slides: html Rmd
- Basics about Student and Welch’s t test html Rmd
- Practical: detecting differentially expressed genes in microarray data html Rmd
- Practical: generating random control sets following a Normal distribution (incomplete) html Rmd
7/11 14:00 18:00 About distances and clustering DP - Distance metrics
- Hierarchical clustering
- Theory : html
- Distance metrics and clustering (practical) html rmd
20/11 9:00 12:30 The multiple ways to correct for multiple testing JvH - False positive risk (FPR)
- Expected number of false positives (E-value)
- Family-Wise Error Rate (FWER)
- False Discovery Rate (FDR)
- Multiple testing corrections (slides)
- Multiple testing corrections (practical)
20/11 9:00 12:30 Supervised classification JvH - Discriminant analysis
- Cross-validation (k-fold, LOO)
- Data dimensionality and overfitting
- Variable selection
- Introduction to multivariate analysis
- Discriminant analysis (slides)
- Dimension reduction and PCA
- Practical: supervised classification
21/11 14:00 18:00 Functional enrichment of DEG DP - Functional enrichment statistics
- The hypergeometric distribution
- Theory : on white board
- Hypergeometric distribution and enrichment statistics. An example application: DAVID (practical)
22/11 14:00 18:00 Overview of discrete distributions, with applications to NGS DP - Geometric
- Binomial
- Poisson
- Negative binomial
Tutorial [html][pdf][Rmd]
22/11 14:00 18:00 Detecting differentially expressed genes (DEG) with RNA-seq DP/JvH - RNA-Seq principles
- Normalizing RNA-seq counts
-Detecting differentially expressed genes (DEG)
- RNA-Seq DEG with DESeq2 [html] [pdf] [Rmd]
22/11 14:00 18:00 Descriptive statistics with ggplot2 DP - ggplot2 principles
- Layout, creating diagrams
- Introduction to ggplot2 [html]
RNA-seq analysis (pursued) JvH []

Screen pictures