Denis Puthier and Jacques van Helden

This tutorial is just a brief tour of the language capabilities and is intented to give some clues to begin with the R programming language. For a more detailled overview see R for beginners (E. Paradis)

- Basic aspects of the language.
- Syntax for calling Functions.
- Functions for creating vectors.
- Vector manipulation.
- Objects of class: factor, Matrix, data.frame and list.
- The apply family functions
- Graphics with R
- Bioconductor and S4 objects

R is an object-oriented programming language. You can easily create basic objects of class `vector`, `matrix`,
`data.frame`, `list`, `factor`,...

Below, we create a vector *x* that contains one value. You can see the content of *x* by simply calling it.

x<-15 x

Alternatively, you can assign a value to x by using the "=" operator. However "<-" is most generally prefered.

x=22 x

In R, anything on a line after a hash mark (#) is a comment and is ignored by the interpreter.

#x<-57 x

Instructions can be separated by semi-colons (;) or new-line.

x<-12; y<-13 x; y

Once values are assigned to an object, R will store this object into the memory. Previously created objects can be **l**i**s**ted using the `ls` function.

ls()

Object can be deleted using the `rm` (**r**e**m**ove) function.

rm(x) rm(y) ls()

In the above section we have created vectors containing numeric data. We have also used functions (`ls` and `rm`). We can use numerous functions to perform specific tasks. When calling a function, we will use this generic syntax:

-NameOfTheFunction(arg1=a, arg2=b, ...)

- arg1 et arg2 (...) : arguments of the function.
- a and b : The objects that will be passed to the function.

To access the documentation of a given function, use the `help` function (or the question mark). The documentation gives you an overview of the function:

- usage
- argument name and class
- returned values
- examples

For instance to get information about the `substr` function (used to extract part of a character string) use one of the following instructions:

help(substr)#or?substr

When calling a function, the name of the arguments can be omitted if they are placed as expected. For instance if one wants to extract character 2 to 4 in the string "microarray":

substr("microarray",2,4)

If the arguments are not in the expected order their names are mandatory (note that, for convenience, they can be abbreviated but the abbreviation used should be unambiguous):

substr(start=2,stop=4,x="microarray") #works

substr(st=2,st=4,x="microarray") #ambiguous. R throw an error message.

The function `c` is used to **c**ombine values into a vector. A vector can contain several values of the same mode. Most frequently, the mode will be one of: "numeric", "character" or "logical".

mic<-c("Agilent","Affy") #a character vector mic class(mic) # or is(mic) num<-c(1,2,3) # a numeric vector num class(num) bool<-c(T,F,T) # a logical vector class(bool)

3:10 10:3

The `rep` function **rep**eats a value as many times as requested.

The `seq` (**seq**uence) function is used to generate a regular sequences of numerics

rep(3,5) seq(0,10,by=2) seq(0,10,length.out=7)

the `rnorm` (**r**andom **norm**al)function is used to generate normally distributed values with mean equal to 'mean' (default 0) and standard deviation equal to 'sd' (default 1).

additional distributions are available, for instance, `runif` (**r**andom **uniform**), `rpois` (**r**andom **pois**son)

x<-rnorm(1000,mean=2,sd=2) hist(x)

set.seed(1) x<-round(rnorm(10),2) x x[2] x[1:3] x[c(2,6)] which(x > 0) # returns the positions containing positive values x[which(x > 0)] # returns the requested positive values(using a vector of integers) x> 0 # returns TRUE/FALSE for each position. x[x > 0] # same results as x[which(x0)] nm<-paste("pos",1:10,sep="_") nm names(x)<-nm x x["pos_10"] # indexing with the names of the elements

Simply use the `<-` operators. Note that in R, missing values are defined as `NA` (Not Attributed).

x[1:2]<-c(10,11) x x[4:6]<-NA x is.na(x) # returns TRUE if the position is NA x<-na.omit(x) # To delete NA values (or x[!is.na(x)]) x

R is intented to handle large data sets and to retrieve information using a concise syntax. Thanks to the internal feature of R, called *vectorization*, numerous operation can be written without a loop:

x<-0:10 y<-20:30 x+y x^2

This object looks like a vector. It is used to store categorical variables. A vector can be converted to a factor using the `as.factor` function. The `levels` function can be used to extract the names of the categories and to rename them.

x<-rep(c("good","bad"),5) x x<-as.factor(x) x # note that levels are displayed now levels(x) levels(x)<-0:1 x table(x)

Matrix objects are intended to store 2-dimensional datasets. Each value will be of the same mode. As with vectors, one can use names, numeric vectors or a logical vector for indexing this object. One can index rows or columns or both.

x<-matrix(1:10,ncol=2) colnames(x)<-c("ctrl","trmt") row.names(x)<-paste("gene",1:5,sep="_") x x[,1] # first column x[1,] # first row x[1,2] # row 1 and column 2 x[c(T,F,T,T,T),]

Note that the syntax below that use a logical matrix is also frequently used to extract or replace part of a matrix.

x > 2 & x < 8 x[x > 2 & x < 8]<-NA

This object is very similar to the matrix except that each column can contain a given mode (a column with characters, a column with logicals, a column with numerics,...).

Columns from a data.frame can also be extracted using the `$` operator

x <- as.data.frame(x) x x$ctrl

Object of class list can store any type of object. They should be indexed with the "[[" or $ operators.

l1<-list(A=x,B=rnorm(10)) l1 l1[[1]] l1[[2]] l1$A

They are used to loop through row and columns of a matrix (or dataframe) or through elements of a list.

x<-matrix(rnorm(20),ncol=4) apply(x,MARGIN=1,min) # extract min value for each row (MARGIN=1) apply(x,MARGIN=2,min) # extract min value for each column (MARGIN=2)

The ` lapply` is used for

lapply(l1,is)

This function tipically takes a vector and a factor as arguments. Let say we have value (x) )related to three caterogies ("good", "bad", "medium"). We can compute different statistics related to the category:

cat<-rep(c("good","bad","medium"),5) cat<-as.factor(cat) x<-rnorm(length(cat)) x[cat=="good"]<-x[cat=="good"]+2 x[cat=="medium"]<-x[cat=="medium"]+1 boxplot(x~cat) tapply(x,cat,sd) tapply(x,cat,mean) tapply(x,cat,length)

R offers a large variety of high-level graphics functions (`plot`, `boxplot`, `barplot`, `hist`, `pairs`, `image`, ...). The generated graphics can be modified using low-level functions (`points`, `text`, `line`, `abline`, `rect`, `legend`, ...).

path path<-system.file("swirldata",package="marray") getwd() # the current working directory setwd(path) # set working directory to "path" getwd() # The working directory has changed dir() # list files and directories in the current working directory #file.show("swirl.1.spot") # this file contains a Header d<-read.table("swirl.1.spot",header=T,sep="\t",row.names=1) is(d) colnames(d) G<-d$Gmedian R<-d$Rmedian plot(R,G,pch=16,cex=0.5,col="red") R<-log2(R) G<-log2(G) M<-R-G A<-R+G plot(A,M,pch=16,cex=0.5) low<-lowess(M~A) lines(low,col="blue",lwd=2)#lwd:linewidth abline(h=0,col="red")#h:horizontal abline(h=-1,col="green") abline(h=1,col="green") # We will only add gene names (here a numeric) for a subset of strongly induced/repressed genes subset<-abs(M) > 1 points(A[subset],M[subset],col="red") gn<-1:nrow(d) text(A[subset],M[subset],lab=gn[subset],cex=0.4,pos=2)