Type: | Package |
Title: | Single-Cell Decomposition using Hierarchical Autoencoder |
Version: | 1.2.3 |
Maintainer: | Ha Nguyen <hvn0006@auburn.edu> |
Description: | Provides a fast and accurate pipeline for single-cell analyses. The 'scDHA' software package can perform clustering, dimension reduction and visualization, classification, and time-trajectory inference on single-cell data (Tran et.al. (2021) <doi:10.1038/s41467-021-21312-2>). |
License: | GPL-3 |
Encoding: | UTF-8 |
LazyData: | true |
Depends: | R (≥ 3.4) |
Imports: | matrixStats, foreach, doParallel, igraph, Matrix, uwot, cluster, Rcpp, RcppParallel, RcppAnnoy, methods, torch (≥ 0.3.0), RhpcBLASctl, coro |
LinkingTo: | Rcpp, RcppArmadillo, RcppParallel, RcppAnnoy |
RoxygenNote: | 7.3.3 |
Suggests: | testthat, knitr, mclust |
NeedsCompilation: | yes |
VignetteBuilder: | knitr |
URL: | https://github.com/duct317/scDHA |
BugReports: | https://github.com/duct317/scDHA/issues |
Packaged: | 2025-09-23 15:05:46 UTC; hanguyen |
Author: | Ha Nguyen [cre], Duc Tran [aut], Tin Nguyen [fnd], Hao Chen [ctb] |
Repository: | CRAN |
Date/Publication: | 2025-09-23 22:40:03 UTC |
Goolam
Description
Goolam dataset in list format, include scRNA-seq data and cell type information.
Usage
Goolam
Format
An object of class list
of length 2.
Goolam_result
Description
Result of processing Goolam dataset using 'scDHA' function.
Usage
Goolam_result
Format
An object of class list
of length 4.
scDHA
Description
The main function to perform dimension deduction and clustering.
Usage
scDHA(
data = data,
k = NULL,
method = "scDHA",
sparse = FALSE,
n = 5000,
ncores = 10L,
gen_fil = TRUE,
do.clus = TRUE,
sample.prob = NULL,
seed = NULL
)
Arguments
data |
Gene expression matrix, with rows represent samples and columns represent genes. |
k |
Number of clusters, leave as default for auto detection. Has no effect when |
method |
Method used for clustering. It can be "scDHA" or "louvain". The default setting is "scDHA". |
sparse |
Boolen variable indicating whether data is a sparse matrix. The input must be a non negative sparse matrix. |
n |
Number of genes to keep after feature selection step. |
ncores |
Number of processor cores to use. |
gen_fil |
Boolean variable indicating whether to perform scDHA gene filtering before performing dimension deduction and clustering. |
do.clus |
Boolean variable indicating whether to perform scDHA clustering. If |
sample.prob |
Probability used for classification application only. Leave this parameter as default, no user input is required. |
seed |
Seed for reproducibility. |
Value
List with the following keys:
cluster - A numeric vector containing cluster assignment for each sample. If
do.clus = False
, this values is alwaysNULL
.latent - A matrix representing compressed data from the input data, with rows represent samples and columns represent latent variables.
Examples
library(scDHA)
#Load example data (Goolam dataset)
data('Goolam'); data <- t(Goolam$data); label <- as.character(Goolam$label)
#Log transform the data
data <- log2(data + 1)
if(torch::torch_is_installed()) #scDHA need libtorch installed
{
#Generate clustering result, the input matrix has rows as samples and columns as genes
result <- scDHA(data, ncores = 2, seed = 1)
#The clustering result can be found here
cluster <- result$cluster
}
scDHA classification
Description
Perform classification of new data based on available data.
Usage
scDHA.class(
train = train,
train.label = train.label,
test = test,
ncores = 10L,
seed = NULL
)
Arguments
train |
Expression matrix of available data, with rows represent samples and columns represent genes. |
train.label |
A vector containing label for each sample in training data. |
test |
Expression matrix new data for classification, with rows represent samples and columns represent genes. |
ncores |
Number of processor cores to use. |
seed |
Seed for reproducibility. |
Value
A vector contain classified labels for new data.
Examples
library(scDHA)
#Load example data (Goolam dataset)
data('Goolam'); data <- t(Goolam$data); label <- as.character(Goolam$label)
#Log transform the data
data <- log2(data + 1)
#Split data into training and testing sets
set.seed(1)
idx <- sample.int(nrow(data), size = round(nrow(data)*0.75))
train.x <- data[idx, ]; train.y <- label[idx]
test.x <- data[-idx, ]; test.y <- label[-idx]
if(torch::torch_is_installed()) #scDHA need libtorch installed
{
#Predict the labels of cells in testing set
prediction <- scDHA.class(train = train.x, train.label = train.y, test = test.x,
ncores = 2, seed = 1)
#Calculate accuracy of the predictions
acc <- round(sum(test.y == prediction)/length(test.y), 2)
print(paste0("Accuracy = ", acc))
}
scDHA pseudo time inference
Description
Inferring pseudo-time data.
Usage
scDHA.pt(sc = sc, start.point = 1, ncores = 10L, seed = NULL)
Arguments
sc |
Embedding object, produced by |
start.point |
Starting point of the trajectory. |
ncores |
Number of processor cores to use. |
seed |
Seed for reproducibility. |
Value
List with the following keys:
pt - Pseudo-time values for each sample.
Examples
library(scDHA)
#Load example data (Goolam dataset)
data('Goolam'); data <- t(Goolam$data); label <- as.character(Goolam$label)
#Log transform the data
data <- log2(data + 1)
if(torch::torch_is_installed()) #scDHA need libtorch installed
{
#Generate clustering result, the input matrix has rows as samples and columns as genes
result <- scDHA(data, ncores = 2, seed = 1)
#Cell stage order in Goolam dataset
cell.stages <- c("2cell", "4cell", "8cell", "16cell", "blast")
#Generate pseudo-time for each cell, the input is the output from scDHA function
result <- scDHA.pt(result, start.point = 1, ncores = 2, seed = 1)
#Calculate R-squared value
r2 <- round(cor(result$pt, as.numeric(factor(label, levels = cell.stages)))^2, digits = 2)
}
scDHA visulization
Description
Generating 2D embeded data for visulation.
Usage
scDHA.vis(sc = sc, method = "UMAP", ncores = 10L, seed = NULL)
Arguments
sc |
Embedding object produced by the |
method |
Visualization method to use. It can be "UMAP" or "scDHA". The default setting is "UMAP". |
ncores |
Number of processor cores to use. |
seed |
Seed for reproducibility. |
Value
a list with the following keys:
pred - A matrix representing the 2D projection of single-cell data, where rows represent samples and columns represent latent components.
Examples
library(scDHA)
#Load example data (Goolam dataset)
data('Goolam'); data <- t(Goolam$data); label <- as.character(Goolam$label)
#Log transform the data
data <- log2(data + 1)
if(torch::torch_is_installed()) #scDHA need libtorch installed
{
#Generate clustering result, the input matrix has rows as samples and columns as genes
result <- scDHA(data, ncores = 2, seed = 1)
#Generate 2D representation, the input is the output from scDHA function
result <- scDHA.vis(result, ncores = 2, seed = 1)
#Plot the representation of the dataset, different colors represent different cell types
plot(result$pred, col=factor(label), xlab = "scDHA1", ylab = "scDHA2")
}