This function performs gene clustering using the MCL algorithm. The method starts by creating a graph with genes as nodes and edges connecting each gene to its nearest neighbors. Then the method use the MCL algorithm to detect clusters of co-expressed genes (method argument).

gene_clustering(
  object = NULL,
  s = 5,
  inflation = 2,
  method = c("closest_neighborhood", "reciprocal_neighborhood"),
  threads = 1,
  output_path = NULL,
  name = NULL,
  keep_nn = FALSE
)

Arguments

object

A ClusterSet object.

s

If method="closest_neighborhood", s is an integer value indicating the size of the neighbourhood used for graph construction. Default is 5.

inflation

A numeric value indicating the MCL inflation parameter. Default is 2.

method

Which method to use to build the graph. If "closest_neighborhood", creates an edge between two selected genes a and b if b is part of the kg closest nearest neighbors of a (with kg < k). If "reciprocal_neighborhood"), inspect the neighborhood of size k of all selected genes and put an edge between two genes a and b if they are reciprocally in the neighborhood of the other

threads

An integer value indicating the number of threads to use for MCL.

output_path

a character indicating the path where the output files will be stored.

name

a character string giving the name for the output files. If NULL, a random name is generated.

keep_nn

Deprecated. Use 'method' instead.

Value

A ClusterSet object

References

- Van Dongen S. (2000) A cluster algorithm for graphs. National Research Institute for Mathematics and Computer Science in the 1386-3681.

Examples

# Restrict vebosity to info messages only.
library(Seurat)
set_verbosity(1)

# Load a dataset
load_example_dataset("7871581/files/pbmc3k_medium")
#> |-- INFO :  Dataset 7871581/files/pbmc3k_medium was already loaded. 

# Select informative genes
res <- select_genes(pbmc3k_medium,
                    distance = "pearson",
                    row_sum=5)
#> |-- INFO :  Computing distances using selected method: pearson 
#> |-- INFO :  Computing distances to KNN. 
#> |-- INFO :  Computing simulated distances to KNN. 
#> |-- INFO :  Computing threshold of distances to KNN (DKNN threshold). 
#> |-- INFO :  Selecting informative genes. 
#> |-- INFO :  Creating the ClusterSet object. 
                    
# Cluster informative features

## Method 1 - Construct a graph with a 
## novel neighborhood size
res <- gene_clustering(res, method="closest_neighborhood",
                       inflation = 1.5, threads = 4)
#> |-- INFO :  Compute distances between selected genes. 
#> |-- INFO :  Compute distances to KNN between selected genes. 
#> |-- INFO :  Computing distances to KNN. 
#> |-- INFO :  Creating the input file for MCL algorithm. 
#> |-- INFO :  Writing the input file. 
#> |-- INFO :  The input file saved in '/var/folders/zy/wl3dj2_n76zfc8sdvny1q06c0000gn/T/Rtmp7s9US9/R14z6ff8TV.input_mcl.txt'. 
#> |-- INFO :  Adding results to a ClusterSet object. 
                       
# Display the heatmap of gene clusters
res <- top_genes(res)
#> |-- INFO :  Number of top genes is greater than the number of genes in cluster 7. All genes will be used. 
#> |-- INFO :  Number of top genes is greater than the number of genes in cluster 8. All genes will be used. 
#> |-- INFO :  Results are stored in top_genes slot of ClusterSet object. 
plot_heatmap(res)
#> |-- INFO :  cell_clusters is not NULL. Setting show_dendro to FALSE 
#> |-- INFO :  Centering matrix. 
#> |-- INFO :  Ceiling matrix. 
#> |-- INFO :  Flooring matrix. 
#> |-- INFO :  Ordering cells/columns using hierarchical clustering. 
#> |-- INFO :  Plotting heatmap. 
#> |-- INFO :  Plot is interactive... 
plot_heatmap(res, cell_clusters = Seurat::Idents(pbmc3k_medium))
#> |-- INFO :  Extracting cell identity. 
#> |-- INFO :  Centering matrix. 
#> |-- INFO :  Ceiling matrix. 
#> |-- INFO :  Flooring matrix. 
#> |-- INFO :  Plotting heatmap. 
#> |-- INFO :  Plot is interactive... 


## Method 2 - Conserve the same neighborhood 
## size
res <- gene_clustering(res, 
                       inflation = 2.2,
                       method="reciprocal_neighborhood")
#> |-- INFO :  Creating the input file for MCL algorithm. 
#> |-- INFO :  Writing the input file. 
#> |-- INFO :  The input file saved in '/var/folders/zy/wl3dj2_n76zfc8sdvny1q06c0000gn/T/Rtmp7s9US9/65vDYGyn24.input_mcl.txt'. 
#> |-- INFO :  Adding results to a ClusterSet object. 
                       
# Display the heatmap of gene clusters
res <- top_genes(res)
#> |-- INFO :  Number of top genes is greater than the number of genes in cluster 6. All genes will be used. 
#> |-- INFO :  Number of top genes is greater than the number of genes in cluster 7. All genes will be used. 
#> |-- INFO :  Number of top genes is greater than the number of genes in cluster 8. All genes will be used. 
#> |-- INFO :  Number of top genes is greater than the number of genes in cluster 9. All genes will be used. 
#> |-- INFO :  Number of top genes is greater than the number of genes in cluster 10. All genes will be used. 
#> |-- INFO :  Number of top genes is greater than the number of genes in cluster 11. All genes will be used. 
#> |-- INFO :  Number of top genes is greater than the number of genes in cluster 12. All genes will be used. 
#> |-- INFO :  Number of top genes is greater than the number of genes in cluster 13. All genes will be used. 
#> |-- INFO :  Number of top genes is greater than the number of genes in cluster 14. All genes will be used. 
#> |-- INFO :  Number of top genes is greater than the number of genes in cluster 15. All genes will be used. 
#> |-- INFO :  Number of top genes is greater than the number of genes in cluster 16. All genes will be used. 
#> |-- INFO :  Results are stored in top_genes slot of ClusterSet object. 
plot_heatmap(res)
#> |-- INFO :  cell_clusters is not NULL. Setting show_dendro to FALSE 
#> |-- INFO :  Centering matrix. 
#> |-- INFO :  Ceiling matrix. 
#> |-- INFO :  Flooring matrix. 
#> |-- INFO :  Ordering cells/columns using hierarchical clustering. 
#> |-- INFO :  Plotting heatmap. 
#> |-- INFO :  Plot is interactive... 
plot_heatmap(res, cell_clusters = Seurat::Idents(pbmc3k_medium))
#> |-- INFO :  Extracting cell identity. 
#> |-- INFO :  Centering matrix. 
#> |-- INFO :  Ceiling matrix. 
#> |-- INFO :  Flooring matrix. 
#> |-- INFO :  Plotting heatmap. 
#> |-- INFO :  Plot is interactive...