Skip to contents

hc_auto_cluster performs a full clustering pipeline on an AnnDataR object, including PCA, distance calculation, SNN graph construction, consensus clustering, UMAP embedding, and cluster hull generation. It prints a UMAP plot of the resulting clusters.

Usage

hc_auto_cluster(
  AnnDatR,
  layer = NULL,
  pca_components = 50,
  pca_transform = "log1p",
  distance_components = NULL,
  distance_method = "spearman",
  snn_neighbors = 20,
  snn_similarity = "euclidean",
  cluster_resolution = 6,
  cluster_method = "louvain",
  verbose = TRUE
)

Arguments

AnnDatR

An AnnDataR object.

layer

Optional layer name within the AnnDatR object to use instead of the default data.

pca_components

Number of principal components to compute (default is 50).

pca_transform

Data transformation method to apply before PCA. Options are "log1p", "sqrt", or "none" (default is "log1p").

distance_components

Number of principal components to be used. If NULL, uses Kaiser’s rule to determine the number of components.

distance_method

Distance metric to use (default is 'spearman'). This must be one of "euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski", "pearson", "spearman" or "kendall".

snn_neighbors

Number of nearest neighbors to consider (default is 20).

snn_similarity

Distance metric to use for nearest neighbor search (default is "euclidean"). Other options include "cosine", "manhattan", and "hamming".

cluster_resolution

Resolution parameter for clustering (default is 6).

cluster_method

Clustering method to use: "louvain" (default) or "leiden".

verbose

Logical indicating whether to print progress messages (default is TRUE).

Value

An AnnDataR object with clustering and intermediate results added. The UMAP plot of the clusters is also printed.

Examples

adata_res <- hc_auto_cluster(example_adata, cluster_resolution = 8)
#> [1] "Kaiser's rule is above 80% variation. Success"
#> Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'
#> Also defined by ‘spam’
#> Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'
#> Also defined by ‘spam’
#> Building SNN based on a provided distance matrix
#> Computing SNN
#> Iteration: 0 *** value: 947.234
#> Iteration: 1 *** value: 212.952
#> Iteration: 2 *** value: 42.2765
#> Iteration: 3 *** value: 22.1534
#> Iteration: 4 *** value: 22.1534
#> Minimum: 22.1534
#> Joining with `by = join_by(cons_cluster)`
#> The 'umap-learn' Python package is not installed. Installing it now...
#> Using virtual environment '/home/runner/.virtualenvs/r-reticulate' ...
#> + /home/runner/.virtualenvs/r-reticulate/bin/python -m pip install --upgrade --no-user umap-learn

head(adata_res$uns$consensus_clustering)
#> # A tibble: 6 × 2
#>   gene            cluster
#>   <chr>           <chr>  
#> 1 ENSG00000002745 30     
#> 2 ENSG00000004660 6      
#> 3 ENSG00000006047 29     
#> 4 ENSG00000006059 27     
#> 5 ENSG00000006453 22     
#> 6 ENSG00000006740 6