R/select_genes.R
select_genes.Rd
This function selects genes based on k-nearest neighbour analysis. The function takes a seurat object or gene expression matrix as input and compute distance to k-nearest neighbour for each gene/feature. A threshold is set based on permutation analysis and FDR computation.
A matrix, data.frame or Seurat object.
a character string indicating the method for computing distances (one of "pearson", "cosine", "euclidean", spearman or "kendall").
This parameter controls the fraction of genes with high dknn (ie. noise) whose neighborhood (i.e associated distances) will be used to compute simulated DKNN values. A value of 0 means to use all the genes. A value close to 1 means to use only gene with high dknn (i.e close to noise).
An integer specifying the size of the neighborhood.
A feature/gene whose row sum is below this threshold will be discarded. Use -Inf to keep all genes.
A numeric value indicating the false discovery rate threshold (range: 0 to 100).
a character string indicating which slot to use from the input scRNA-seq object (one of "data", "sct" or "counts").
a logical indicating whether to skip the k-nearest-neighbors (KNN) filter. If FALSE, all genes are kept for the next steps.
If TRUE, correlation below 0 are set to zero ("pearson", "cosine", "spearman" "kendall"). This may increase the relative weight of positive correlation (as true anti-correlation may be rare).
An integer specifying the random seed to use.
a ClusterSet class object
- Lopez F.,Textoris J., Bergon A., Didier G., Remy E., Granjeaud S., Imbert J. , Nguyen C. and Puthier D. TranscriptomeBrowser: a powerful and flexible toolbox to explore productively the transcriptional landscape of the Gene Expression Omnibus database. PLoSONE, 2008;3(12):e4001.
# Restrict vebosity to info messages only.
set_verbosity(1)
# Load a dataset
load_example_dataset("7871581/files/pbmc3k_medium")
#> |-- INFO : Dataset 7871581/files/pbmc3k_medium was already loaded.
# Select informative genes
res <- select_genes(pbmc3k_medium,
distance = "pearson",
row_sum=5)
#> |-- INFO : Computing distances using selected method: pearson
#> |-- INFO : Computing distances to KNN.
#> |-- INFO : Computing simulated distances to KNN.
#> |-- INFO : Computing threshold of distances to KNN (DKNN threshold).
#> |-- INFO : Selecting informative genes.
#> |-- INFO : Creating the ClusterSet object.
# Result is a ClusterSet object
is(res)
#> [1] "ClusterSet"
slotNames(res)
#> [1] "data" "gene_clusters"
#> [3] "top_genes" "gene_clusters_metadata"
#> [5] "gene_cluster_annotations" "cells_metadata"
#> [7] "dbf_output" "parameters"
# The selected genes
nrow(res)
#> [1] 293
head(row_names(res))
#> [1] "PTCRA" "ACRBP" "TUBB1" "SDPR" "HIST1H2AC" "C2orf88"