About plotnine

When doing descriptive statistics we frequently need to partition the graphics based on categorical (i.e. qualitative) or ordinal variables. Doing such graphics may be particularly difficult when using classical Python graphical libraries (e.g matplotlib). The R software benefits from a very nice library for such a task, the ggplot2 package developed by Hadley Wickham (Wickham 2016) This package has been quickly became really popular in the bioinformatic field (here categories may be gene, groups of genes, species, signaling pathways, epigenetic marks…) and ordinal variables a discretized level of expression, for instance. The ggplot2 R package is an implementation of the graphical model proposed by Leland Wilkison in his book: The Grammar of Graphics (Wilkinson 2016). In this model, the graph is viewed as an entity composed of data, layers, scales, coordinate system and facets. One can create a graphic then add the various component using the + operator. Although the syntax may appear a little bit tricky for beginners, one can quickly understand the benefit of such an approach when composing complexe diagrams.

Several projects have proposed a port of ggplot2 under Python. The plotnine library is one of these projects that proposes a rather stable and exhaustive port of ggplot2 under Python. In the subsequent tutorial, we will use the chickwts dataset that is available in the R datasets library. We will propose this dataset through an URL available for download. The information we have about the chickwts dataset are the following:

Chicken Weights By Feed Type: An experiment was conducted to measure and compare the effectiveness of various feed supplements on the growth rate of chickens.

Downloading the dataset

We will download the dataset (a tabulated flat file) and load it into a pandas DataFrame. The DataFrame¨ object is also a port from a popular object in the R world (data.frame). This DataFrame can be viewed as a matrix whose columns may be of various types (Objects, int64, floats…). The DataFrame object contains various functions to perform operations on the dataset.

##     weight       feed
## 0      179  horsebean
## 1      160  horsebean
## 2      136  horsebean
## 3      227  horsebean
## 4      217  horsebean
## 5      168  horsebean
## 6      108  horsebean
## 7      124  horsebean
## 8      143  horsebean
## 9      140  horsebean
## 10     309    linseed
## 11     229    linseed
## 12     181    linseed
## 13     141    linseed
## 14     260    linseed
## 15     203    linseed
## 16     148    linseed
## 17     169    linseed
## 18     213    linseed
## 19     257    linseed
## 20     244    linseed
## 21     271    linseed
## 22     243    soybean
## 23     230    soybean
## 24     248    soybean
## 25     327    soybean
## 26     329    soybean
## 27     250    soybean
## 28     193    soybean
## 29     271    soybean
## ..     ...        ...
## 41     226  sunflower
## 42     320  sunflower
## 43     295  sunflower
## 44     334  sunflower
## 45     322  sunflower
## 46     297  sunflower
## 47     318  sunflower
## 48     325   meatmeal
## 49     257   meatmeal
## 50     303   meatmeal
## 51     315   meatmeal
## 52     380   meatmeal
## 53     153   meatmeal
## 54     263   meatmeal
## 55     242   meatmeal
## 56     206   meatmeal
## 57     344   meatmeal
## 58     258   meatmeal
## 59     368     casein
## 60     390     casein
## 61     379     casein
## 62     260     casein
## 63     404     casein
## 64     318     casein
## 65     352     casein
## 66     359     casein
## 67     216     casein
## 68     222     casein
## 69     283     casein
## 70     332     casein
## 
## [71 rows x 2 columns]

What about the type of object returned by pandas.read_csv() ? What about the types of the columns ?

## <class 'pandas.core.frame.DataFrame'>
## weight     int64
## feed      object
## dtype: object

Creating a basic diagram

Changing the global diagram theme

The diagram theme can be changed using call to functions from plotnine starting with theme_.

  • Using completion, discover the various functions proposing builtin themes for the diagrams.
  • Test some of them to change the global graphic rendering.

Theming your diagram

The diagrams can be tweaked more deeply by passing some arguments to the theme() function. Several aspects of the diagram are thus themeable. The list of themeable elements is provided here. The themeables are objects of several classes:

For instance, the following code changes various elements of the theme:

  • The axis text.
  • The axis title.
  • The axis ticks.
  • The panel grid.