cellink.tl.external.run_scdrs#
- cellink.tl.external.run_scdrs(adata, gs_file=None, gene_sets=None, src_species='human', trait_name=None, n_pcs=50, n_ctrl=1000, weight_opt='vs', ctrl_match_key='mean_var', n_mean_bin=20, n_var_bin=20, flag_return_ctrl_raw_score=False, flag_return_ctrl_norm_score=True, encode_sex=True, encode_age=True, additional_covariates=None, group_analysis=None, corr_analysis=None, gene_analysis=False, knn_n_neighbors=15, knn_n_pcs=20, min_genes=250, min_cells=50, prefix=None, save_results=True, return_adata=False)#
Run scDRS (single-cell disease-relevance score) analysis on AnnData.
scDRS associates individual cells in single-cell RNA-seq data with disease GWAS, computing cell-level disease scores and performing downstream analyses.
- Parameters:
adata (AnnData) – AnnData object containing single-cell expression data.
gs_file (str or Path, optional) – Path to scDRS gene set file (.gs format).
gene_sets (dict, optional) – Dictionary with trait names as keys and (gene_list, gene_weights) tuples as values. Either gs_file or gene_sets must be provided.
src_species (str, optional, default='human') – Species of the input gene sets.
trait_name (str, optional) – Name of specific trait to analyze from gs_file. If None, analyzes all traits.
n_pcs (int, default=50) – Number of principal components to compute if not already present.
n_ctrl (int, default=1000) – Number of control gene sets for null distribution.
weight_opt ({'vs', 'uniform'}, default='vs') – Weighting option: ‘vs’ for variance-stabilization, ‘uniform’ for equal weights.
ctrl_match_key (str, default='mean_var') – Key for matching control genes (stored in adata.var after preprocessing).
n_mean_bin (int, default=20) – Number of bins for gene expression mean when matching control genes.
n_var_bin (int, default=20) – Number of bins for gene expression variance when matching control genes.
flag_return_ctrl_raw_score (bool, default=False) – Whether to return raw control scores.
flag_return_ctrl_norm_score (bool, default=True) – Whether to return normalized control scores.
encode_sex (bool, default=True) – Whether to include sex as a covariate.
encode_age (bool, default=True) – Whether to include age as a covariate.
additional_covariates (list of str, optional) – Additional covariates from dd.C.obs to include.
group_analysis (list of str, optional) – List of cell group annotations in dd.C.obs for group-level analysis.
corr_analysis (list of str, optional) – List of cell-level continuous variables in dd.C.obs for correlation analysis.
gene_analysis (bool, default=False) – Whether to perform gene-level correlation analysis.
knn_n_neighbors (int, default=15) – Number of neighbors for KNN graph (used in heterogeneity analysis).
knn_n_pcs (int, default=20) – Number of PCs for computing KNN graph.
min_genes (int, default=250) – Minimum number of genes for cell filtering.
min_cells (int, default=50) – Minimum number of cells for gene filtering.
prefix (str, optional) – Prefix for output files. Default is “scdrs”.
save_results (bool, default=True) – Whether to save results to files.
return_adata (bool, default=False) – Whether to return the AnnData object with scDRS scores added.
- Return type:
- Returns:
pd.DataFrame or tuple or AnnData Depending on the analysis performed: - If only score computation: DataFrame with scDRS scores - If downstream analyses: tuple of DataFrames (scores, group_stats, cell_corr, gene_corr) - If return_adata=True: AnnData object with scores in .obs
- Raises:
ImportError – If scdrs package is not installed.
ValueError – If neither gs_file nor gene_sets is provided.
Examples
>>> # Basic scDRS analysis >>> results = run_scdrs( ... dd, ... gs_file="traits.gs", ... group_analysis=["cell_type"], ... )
>>> # With custom gene sets >>> gene_sets = { ... "MyDisease": (["GENE1", "GENE2", "GENE3"], [1.5, 2.0, 1.8]) ... } >>> results = run_scdrs(dd, gene_sets=gene_sets)