Handling and normalizing affymetrix data with bioconductor

Affymetrix microarray data normalization and quality assessment

Denis Puthier and Jacques van Helden

This tutorial is just a brief tour of the language capabilities and is intented to give some clues to begin with the R programming language. For a more detailled overview see R for beginners (E. Paradis)

Bioconductor
Installing bioconductor
S4 objects
The dataset from Den Boer (2009)
Reading Affymetrix data
Loading phenotypic data
Affy library: graphics
Quality control of raw data
Present/absent calls
Data normalization
The ExpressionSet object
Checking the normalization results
Probe annotations
Writing data onto disk
References

Bioconductor

From Wikipedia:

Bioconductor is a free, open source and open development software project for the analysis and comprehension of genomic data generated by wet lab experiments in molecular biology.

Most Bioconductor components are distributed as R packages, which are add-on modules for R. Initially most of the Bioconductor software packages focused on the analysis of single channel Affymetrix and two or more channel cDNA/Oligo microarrays. As the project has matured, the functional scope of the software packages broadened to include the analysis of all types of genomic data, such as SAGE, X-seq data (RNA-Seq, ChIP-Seq, ...), or SNP data.

The broad goals of the projects are to:

Provide widespread access to a broad range of powerful statistical and graphical methods for the analysis of genomic data.
Facilitate the inclusion of biological metadata in the analysis of genomic data, e.g. annotation data from UCSC or GO database.
Provide a common software platform that enables the rapid development and deployment of plug-able, scalable, and interoperable software.
Further scientific understanding by producing high-quality documentation and reproducible research.

What is reproducible research ? How can R contributes to reproductibility ?

Some area covered by the Bioconductor project with some representative packages:

Affymetrix GeneChip analysis: Affy, simpleaffy
Affymetrix exon arrays: xmapcore, xps
Probe Metadata: Annotate, hgu133aprobe, hgu95av2probe, ABPkgBuilder
Microarray data filtering: Genefilter
Statistical analysis of microarrays: SAMR, siggenes, multtest, DEDS, pickgene
Tiling arrays: AffyTiling, tilingArray
CGH array analysis: CGHbase, snapCGH
NGS quality control/filtering: ShortRead
RNA-Seq: easyRNASeq, DESeq
ChIP-Seq: chipseq
High level plotting functions: geneplotter
Functionnal enrichment analysis: GO, Gostats, goCluster, geneplotter
Genome coordinates: GenomicFeatures, genomeIntervals, GenomeGraphs, GenomicRanges
Graphs: graph, Rgraphviz, biocGraph
Flow cytometry: flowCore, flowViZ
Variant calling: VariantTools
Proteomics: MassSpecWavelet
Image analysis: EBImage

Handling and normalizing affymetrix data with bioconductor

Affymetrix microarray data normalization and quality assessment

Contents

Bioconductor

Installing Bioconductor

S4 objects

The dataset from Den Boer (2009)

Reading Affymetrix data

Retrieving data

Solution

Loading data into R

Solution

Solution

Loading phenotypic data

Solution

Indexing an affyBatch object

Affy library: graphics

The image function

Solution

The barplot.ProbeSet() function

Solution

Interpretation

Quality control of raw data

Descriptive statistics

Solution

Interpretation

AffyRNAdeg

Present/absent calls

Data normalization

Solution

The ExpressionSet object

Checking the normalization results

Relative Log Expression (RLE)

Solution

MA plot diagram

Solution

Probe annotations

Solution

Writing data onto disk

Additional exercices

References