Web site for this course

https://dputhier.github.io/ASG/

Teachers

Abbrev Name
DP Denis Puthier
JvH Jacques van Helden

Year

2017-2018

Audience

  • 2ème année du Master en Bioinformatique, biochimie structurale et génomique (BBSG).
  • Ecole doctorale

Resources

Tool About
R A free software environment for statistical computing and graphics
R markdown Documentation for R markdown language (used for the practicals)
Bioconductor A set of R libraries dedicated to statistical analysis of genomics data.
MeV: MultiExperiment Viewer A Java application designed to allow the analysis of expression data
Cluster 3.0 Implements the most commonly used clustering methods for gene expression
java Treeview Java-based tool to visualize trees prodced by hierarchical clustering togeter with a heatmap with expression proviles.

Prerequisites

Students are expected to have followed the introduction to statistics in the first year of the master.

We assume that the following concepts are acquired.

  • Discrete distributions (geometric, binomial)
  • Sampling and estimation
  • Mean comparison tests (Student, Welch)

A basic knowledge of the R language is expected.

  • handling of variables and data frames (“tables”)
  • distributions of probability
  • drawing (histograms, dot plots)
  • test of hypothesis

If you did not receive any training.

During the first practical we will briefly revise these concepts and practical skills.

Schedule and contents

Date From To Subject Teacher Concepts Material
3/11 14:00 18:00 Introduction au cours JvH Diapos html Rmd
3/11
6/11
14:00 18:00 Detecting differentially expressed genes (DEG) with microrarrays DP - Hypothesis testing
- Student \(t\) statistics
- Unbiased estimation of variance
- MA plot
- Volcano plots
-P-value distribution
- E-value
- Slides: html Rmd
- Basics about Student and Welch’s t test html Rmd
- Practical: detecting differentially expressed genes in microarray data html Rmd
- Practical: generating random control sets following a Normal distribution (incomplete) html Rmd
7/11 14:00 18:00 About distances and clustering DP - Distance metrics
- Hierarchical clustering
- Theory : html
- Distance metrics and clustering (practical) html rmd
20/11 9:00 12:30 The multiple ways to correct for multiple testing JvH - False positive risk (FPR)
- Expected number of false positives (E-value)
- Family-Wise Error Rate (FWER)
- False Discovery Rate (FDR)
- Multiple testing corrections (slides)
- Multiple testing corrections (practical)
20/11 9:00 12:30 Supervised classification JvH - Discriminant analysis
- Cross-validation (k-fold, LOO)
- Data dimensionality and overfitting
- Variable selection
- Introduction to multivariate analysis
- Discriminant analysis (slides)
- Dimension reduction and PCA
- Practical: supervised classification
21/11 14:00 18:00 Functional enrichment of DEG DP - Functional enrichment statistics
- The hypergeometric distribution
- Theory : on white board
- Hypergeometric distribution and enrichment statistics. An example application: DAVID (practical)
22/11 14:00 18:00 Overview of discrete distributions, with applications to NGS DP - Geometric
- Binomial
- Poisson
-Hypergeometric
- Negative binomial
Tutorial [html][pdf][Rmd]
22/11 14:00 18:00 Detecting differentially expressed genes (DEG) with RNA-seq DP/JvH - RNA-Seq principles
- Normalizing RNA-seq counts
-Detecting differentially expressed genes (DEG)
- RNA-Seq DEG with DESeq2 [html] [pdf] [Rmd]
22/11 14:00 18:00 Descriptive statistics with ggplot2 DP - ggplot2 principles
- Layout, creating diagrams
- Introduction to ggplot2 [html]
RNA-seq analysis (pursued) JvH []
[]

Screen pictures

Multivariate data

Approches de classification (supervisée ou non).

3. Réduction de dimensionnalité. 4. Evaluation des résultats de classification supervisée

5. Quelques méthodes de classification supervisée.

Additional support

Concept Description
Introduction to R - First steps with R and Siméon Denis Poisson (practical)
- R language: A quick tutorial (practical)
- R language: A quick tutorial (practical)
Occurrence statistics - The Poisson distribution in the context of Peak-calling (practical)
- The Poisson distribution in the context of k-mers occurence statistics (practical)
- Read mapping statistics and the binomial distribution (practical)
- Hypergeometric distribution and enrichment statistics. An example application: DAVID (practical)
- Application example: K-mer occurrences in ChIP-seq peaks (practical)
Microarray analysis - Introduction to multivariate analysis (slides)
- Transcriptome microarrays: study cases (slides)
- Normalization of Affymetrix DNA chip (slides) -
- Handling and normalizing affymetrix data with bioconductor (practical)
- Differential_expression (slides)
- Basics about Student and Welch’s t test
- Microarray data: selecting differentially expressed genes with R or TmeV (practical)
- Sampling distributions (practical) - Detecting differentially expressed genes in microarray data. Part I: exploring Student t statistics
The multiple ways to correct multiple testing - Multiple testing corrections (slides)
- Multiple testing corrections (practical)
Clustering (unsupervised classification) - Correlation analysis (slides)
- Clustering (slides)
- Clustering (slides DP)
- About distances
- Handling clustering methods: artificial datasets (practical)
- Clustering of microarray data
RNA-seq data analysis - RNA-Seq method (slides)
- RNA-seq read mapping (practical)
- The negative binomial and DESeq bioC package
- The negative binomial and DESeq bioC package
Supervised classification - Introduction to multivariate analysis
- Discriminant analysis (slides)
Visualization - Dimension reduction and PCA
Overview - Discrete distributions for NGS data analysis

Personal work

The instructions for the 2017 personal work can be found here