This function performs gene clustering using the MCL algorithm. The method starts by creating a graph with genes as nodes and edges connecting each gene to its nearest neighbors. Then the method use the MCL algorithm to detect clusters of co-expressed genes (method argument).

gene_clustering(
  object = NULL,
  s = 5,
  inflation = 2,
  method = c("closest_neighborhood", "reciprocal_neighborhood"),
  algorithm = c("MCL", "infomap", "louvain", "walktrap"),
  threads = 1,
  output_path = NULL,
  name = NULL,
  keep_nn = FALSE,
  louv_resolution = 5,
  walktrap_step = 4,
  infomap_nb = 10,
  infomap_modularity = TRUE,
  mcl_dir = NULL,
  delete_output_file = FALSE
)

Arguments

object

A ClusterSet object.

s

If method="closest_neighborhood", s is an integer value indicating the size of the neighbourhood used for graph construction. Default is 5.

inflation

A numeric value indicating the MCL inflation parameter. Default is 2.

method

Which method to use to build the graph. If "closest_neighborhood", creates an edge between two selected genes a and b if b is part of the kg closest nearest neighbors of a (with kg < k). If "reciprocal_neighborhood"), inspect the neighborhood of size k of all selected genes and put an edge between two genes a and b if they are reciprocally in the neighborhood of the other

algorithm

The algorihm to use for clustering: MCL", "infomap", "louvain" or "walktrap".

threads

An integer value indicating the number of threads to use for MCL.

output_path

a character indicating the path where the output files will be stored.

name

a character string giving the name for the output files. If NULL, a random name is generated.

keep_nn

Deprecated. Use 'method' instead.

louv_resolution

Resolution of Louvain algorithm if chosen.

walktrap_step

The length of the random walks to perform for walktrap algorithm.

infomap_nb

nb.trials parameter for igraph::cluster_infomap().

infomap_modularity

modularity parameter for igraph::cluster_infomap().

mcl_dir

A path to MCL (if issue with automated installation).

delete_output_file

Whether to delete output files (.graph_input.txt, .graph_out.txt).

Value

A ClusterSet object

References

- Van Dongen S. (2000) A cluster algorithm for graphs. National Research Institute for Mathematics and Computer Science in the 1386-3681.

Examples

# Restrict vebosity to info messages only.
library(Seurat)
set_verbosity(1)

# Load a dataset
load_example_dataset("7871581/files/pbmc3k_medium")
#> |-- INFO :  Dataset 7871581/files/pbmc3k_medium was already loaded. 

# Select informative genes
res <- select_genes(pbmc3k_medium,
                    distance = "pearson",
                    row_sum=5)
#> |-- INFO :  Number of selected rows/genes (row_sum): 1164 
#> |-- INFO :  Computing distances using selected method: pearson 
#> |-- INFO :  Computing distances to KNN. 
#> |-- INFO :  Computing simulated distances to KNN. 
#> |-- INFO :  Computing distances to KNN threshold (DKNN threshold). 
#> |-- INFO :  Selecting informative genes. 
#> |-- INFO :  Instantiating a ClusterSet object. 
                    
# Cluster informative features

## Method 1 - Construct a graph with a 
## novel neighborhood size
res <- gene_clustering(res, method="closest_neighborhood",
                       inflation = 1.5, threads = 4)
#> |-- INFO :  Computing distances between selected genes. 
#> |-- INFO :  Compute distances to KNN between selected genes. 
#> |-- INFO :  Computing distances to KNN. 
#> |-- INFO :  Creating the input file for graph partitioning. 
#> |-- INFO :  Deleting reciprocal edges. 
#> |-- INFO :  Converting distances into weights. 
#> |-- INFO :  Writing the input file. 
#> |-- INFO :  Input file saved in :/var/folders/zy/wl3dj2_n76zfc8sdvny1q06c0000gn/T//RtmpL90b6O/R14z6ff8TV.graph_input.txt 
#> |-- INFO :  MCL algorithm has been selected. 
#> |-- INFO :  MCL path :  
#> |-- INFO :  mcl_dir: 
#> |-- INFO :  Number of clusters found: 8 
                       
# Display the heatmap of gene clusters
res <- top_genes(res)
#> |-- INFO :  One or several clusters contain less than 20 genes. Retrieving all genes 
#> |-- INFO :  Extracting genes with the highest mean correlation to the others... 
#> |-- INFO :  Extracting genes with the highest mean correlation to the others... 
#> |-- INFO :  Extracting genes with the highest mean correlation to the others... 
#> |-- INFO :  Extracting genes with the highest mean correlation to the others... 
#> |-- INFO :  Extracting genes with the highest mean correlation to the others... 
#> |-- INFO :  Extracting genes with the highest mean correlation to the others... 
#> |-- INFO :  Extracting genes with the highest mean correlation to the others... 
#> |-- INFO :  Extracting genes with the highest mean correlation to the others... 
#> |-- INFO :  Results are stored in 'top_genes' slot of the object. 
plot_heatmap(res)
#> |-- INFO :  cell_clusters is not NULL. Setting show_dendro to FALSE 
#> |-- INFO :  Centering matrix. 
#> |-- INFO :  Ceiling matrix. 
#> |-- INFO :  Flooring matrix. 
#> |-- INFO :  Ordering cells/columns using hierarchical clustering. 
#> |-- INFO :  Plotting heatmap. 
#> |-- INFO :  Plot is interactive... 
plot_heatmap(res, cell_clusters = Seurat::Idents(pbmc3k_medium))
#> |-- INFO :  Extracting cell identity. 
#> |-- INFO :  Centering matrix. 
#> |-- INFO :  Ceiling matrix. 
#> |-- INFO :  Flooring matrix. 
#> |-- INFO :  Plotting heatmap. 
#> |-- INFO :  Plot is interactive... 


## Method 2 - Conserve the same neighborhood 
## size
res <- gene_clustering(res, 
                       inflation = 2.2,
                       method="reciprocal_neighborhood")
#> |-- INFO :  Creating the input file for MCL algorithm. 
#> |-- INFO :  Converting distances into weights. 
#> |-- INFO :  Selecting only reciprocal neighbors. 
#> |-- INFO :  Writing the input file. 
#> |-- INFO :  Input file saved in :/var/folders/zy/wl3dj2_n76zfc8sdvny1q06c0000gn/T//RtmpL90b6O/65vDYGyn24.graph_input.txt 
#> |-- INFO :  MCL algorithm has been selected. 
#> |-- INFO :  MCL path :  
#> |-- INFO :  mcl_dir: 
#> |-- INFO :  Number of clusters found: 16 
                       
# Display the heatmap of gene clusters
res <- top_genes(res)
#> |-- INFO :  One or several clusters contain less than 20 genes. Retrieving all genes 
#> |-- INFO :  Extracting genes with the highest mean correlation to the others... 
#> |-- INFO :  Extracting genes with the highest mean correlation to the others... 
#> |-- INFO :  Extracting genes with the highest mean correlation to the others... 
#> |-- INFO :  Extracting genes with the highest mean correlation to the others... 
#> |-- INFO :  Extracting genes with the highest mean correlation to the others... 
#> |-- INFO :  Extracting genes with the highest mean correlation to the others... 
#> |-- INFO :  Extracting genes with the highest mean correlation to the others... 
#> |-- INFO :  Extracting genes with the highest mean correlation to the others... 
#> |-- INFO :  Extracting genes with the highest mean correlation to the others... 
#> |-- INFO :  Extracting genes with the highest mean correlation to the others... 
#> |-- INFO :  Extracting genes with the highest mean correlation to the others... 
#> |-- INFO :  Extracting genes with the highest mean correlation to the others... 
#> |-- INFO :  Extracting genes with the highest mean correlation to the others... 
#> |-- INFO :  Extracting genes with the highest mean correlation to the others... 
#> |-- INFO :  Extracting genes with the highest mean correlation to the others... 
#> |-- INFO :  Extracting genes with the highest mean correlation to the others... 
#> |-- INFO :  Results are stored in 'top_genes' slot of the object. 
plot_heatmap(res)
#> |-- INFO :  cell_clusters is not NULL. Setting show_dendro to FALSE 
#> |-- INFO :  Centering matrix. 
#> |-- INFO :  Ceiling matrix. 
#> |-- INFO :  Flooring matrix. 
#> |-- INFO :  Ordering cells/columns using hierarchical clustering. 
#> |-- INFO :  Plotting heatmap. 
#> |-- INFO :  Plot is interactive... 
plot_heatmap(res, cell_clusters = Seurat::Idents(pbmc3k_medium))
#> |-- INFO :  Extracting cell identity. 
#> |-- INFO :  Centering matrix. 
#> |-- INFO :  Ceiling matrix. 
#> |-- INFO :  Flooring matrix. 
#> |-- INFO :  Plotting heatmap. 
#> |-- INFO :  Plot is interactive...