R/gene_clustering.R
gene_clustering.Rd
This function performs gene clustering using the MCL algorithm. The method starts by creating a graph with genes as nodes and edges connecting each gene to its nearest neighbors. Then the method use the MCL algorithm to detect clusters of co-expressed genes (method argument).
gene_clustering(
object = NULL,
s = 5,
inflation = 2,
method = c("closest_neighborhood", "reciprocal_neighborhood"),
threads = 1,
output_path = NULL,
name = NULL,
keep_nn = FALSE
)
A ClusterSet object.
If method="closest_neighborhood", s is an integer value indicating the size of the neighbourhood used for graph construction. Default is 5.
A numeric value indicating the MCL inflation parameter. Default is 2.
Which method to use to build the graph. If "closest_neighborhood", creates an edge between two selected genes a and b if b is part of the kg closest nearest neighbors of a (with kg < k). If "reciprocal_neighborhood"), inspect the neighborhood of size k of all selected genes and put an edge between two genes a and b if they are reciprocally in the neighborhood of the other
An integer value indicating the number of threads to use for MCL.
a character indicating the path where the output files will be stored.
a character string giving the name for the output files. If NULL, a random name is generated.
Deprecated. Use 'method' instead.
A ClusterSet object
- Van Dongen S. (2000) A cluster algorithm for graphs. National Research Institute for Mathematics and Computer Science in the 1386-3681.
# Restrict vebosity to info messages only.
library(Seurat)
set_verbosity(1)
# Load a dataset
load_example_dataset("7871581/files/pbmc3k_medium")
#> |-- INFO : Dataset 7871581/files/pbmc3k_medium was already loaded.
# Select informative genes
res <- select_genes(pbmc3k_medium,
distance = "pearson",
row_sum=5)
#> |-- INFO : Computing distances using selected method: pearson
#> |-- INFO : Computing distances to KNN.
#> |-- INFO : Computing simulated distances to KNN.
#> |-- INFO : Computing threshold of distances to KNN (DKNN threshold).
#> |-- INFO : Selecting informative genes.
#> |-- INFO : Creating the ClusterSet object.
# Cluster informative features
## Method 1 - Construct a graph with a
## novel neighborhood size
res <- gene_clustering(res, method="closest_neighborhood",
inflation = 1.5, threads = 4)
#> |-- INFO : Compute distances between selected genes.
#> |-- INFO : Compute distances to KNN between selected genes.
#> |-- INFO : Computing distances to KNN.
#> |-- INFO : Creating the input file for MCL algorithm.
#> |-- INFO : Writing the input file.
#> |-- INFO : The input file saved in '/var/folders/zy/wl3dj2_n76zfc8sdvny1q06c0000gn/T/Rtmp7s9US9/R14z6ff8TV.input_mcl.txt'.
#> |-- INFO : Adding results to a ClusterSet object.
# Display the heatmap of gene clusters
res <- top_genes(res)
#> |-- INFO : Number of top genes is greater than the number of genes in cluster 7. All genes will be used.
#> |-- INFO : Number of top genes is greater than the number of genes in cluster 8. All genes will be used.
#> |-- INFO : Results are stored in top_genes slot of ClusterSet object.
plot_heatmap(res)
#> |-- INFO : cell_clusters is not NULL. Setting show_dendro to FALSE
#> |-- INFO : Centering matrix.
#> |-- INFO : Ceiling matrix.
#> |-- INFO : Flooring matrix.
#> |-- INFO : Ordering cells/columns using hierarchical clustering.
#> |-- INFO : Plotting heatmap.
#> |-- INFO : Plot is interactive...
plot_heatmap(res, cell_clusters = Seurat::Idents(pbmc3k_medium))
#> |-- INFO : Extracting cell identity.
#> |-- INFO : Centering matrix.
#> |-- INFO : Ceiling matrix.
#> |-- INFO : Flooring matrix.
#> |-- INFO : Plotting heatmap.
#> |-- INFO : Plot is interactive...
## Method 2 - Conserve the same neighborhood
## size
res <- gene_clustering(res,
inflation = 2.2,
method="reciprocal_neighborhood")
#> |-- INFO : Creating the input file for MCL algorithm.
#> |-- INFO : Writing the input file.
#> |-- INFO : The input file saved in '/var/folders/zy/wl3dj2_n76zfc8sdvny1q06c0000gn/T/Rtmp7s9US9/65vDYGyn24.input_mcl.txt'.
#> |-- INFO : Adding results to a ClusterSet object.
# Display the heatmap of gene clusters
res <- top_genes(res)
#> |-- INFO : Number of top genes is greater than the number of genes in cluster 6. All genes will be used.
#> |-- INFO : Number of top genes is greater than the number of genes in cluster 7. All genes will be used.
#> |-- INFO : Number of top genes is greater than the number of genes in cluster 8. All genes will be used.
#> |-- INFO : Number of top genes is greater than the number of genes in cluster 9. All genes will be used.
#> |-- INFO : Number of top genes is greater than the number of genes in cluster 10. All genes will be used.
#> |-- INFO : Number of top genes is greater than the number of genes in cluster 11. All genes will be used.
#> |-- INFO : Number of top genes is greater than the number of genes in cluster 12. All genes will be used.
#> |-- INFO : Number of top genes is greater than the number of genes in cluster 13. All genes will be used.
#> |-- INFO : Number of top genes is greater than the number of genes in cluster 14. All genes will be used.
#> |-- INFO : Number of top genes is greater than the number of genes in cluster 15. All genes will be used.
#> |-- INFO : Number of top genes is greater than the number of genes in cluster 16. All genes will be used.
#> |-- INFO : Results are stored in top_genes slot of ClusterSet object.
plot_heatmap(res)
#> |-- INFO : cell_clusters is not NULL. Setting show_dendro to FALSE
#> |-- INFO : Centering matrix.
#> |-- INFO : Ceiling matrix.
#> |-- INFO : Flooring matrix.
#> |-- INFO : Ordering cells/columns using hierarchical clustering.
#> |-- INFO : Plotting heatmap.
#> |-- INFO : Plot is interactive...
plot_heatmap(res, cell_clusters = Seurat::Idents(pbmc3k_medium))
#> |-- INFO : Extracting cell identity.
#> |-- INFO : Centering matrix.
#> |-- INFO : Ceiling matrix.
#> |-- INFO : Flooring matrix.
#> |-- INFO : Plotting heatmap.
#> |-- INFO : Plot is interactive...