Tutorial, code and examples of the D-SPIN framework for the preprint "D-SPIN constructs gene regulatory network models from multiplexed scRNA-seq data revealing organizing principles of cellular perturbation response" (bioRxiv)
D-SPIN is implemented in Python and MATLAB. The Python package is sufficient for most datasets with <100k cells and a few hundred conditions. The MATLAB implementation is more efficient on large datasets and can be deployed on clusters for parallelization using built-in parfor.
Install the Python package:
pip install dspin
The MATLAB code is available in the folder DSPIN_matlab and is executable after specifying paths to the saved data from the preprocessing with the Python package (see run_with_matlab=True below).
A tutorial intended for an invited book chapter is available at bioRxiv:
A guide to D-SPIN: constructing regulatory network models from single-cell RNA-seq perturbation data.
Two demos of D-SPIN are available on Google Colab:
-
Demo1: reconstructs a regulatory network from simulated hematopoietic stem cell (HSC) TF expression data under single-gene perturbations simulated with the BEELINE framework (Pratapa et al., Nat. Methods, 2020).
Demo1 -
Demo2: reconstructs a regulatory network and response vectors using a subset of the immune dictionary dataset (Cui et al., Nature, 2024), where mice were treated with cytokines and lymph nodes were collected and profiled by scRNA-seq.
Demo2
DSPIN was tested with:
- python (3.9.18)
- anndata (0.10.3)
- matplotlib (3.8.2)
- scanpy (1.9.6)
- tqdm (4.65.0)
- leidenalg (0.10.1)
- igraph (0.10.8)
Note: other versions may work as well.
D-SPIN can work with many perturbation types under the assumption that the perturbation conditions share the same core regulatory network:
- Genetic screens – Perturb-seq, CRISPR knockdown/activation, RNAi
- Chemical or signaling cues – drug treatments, growth-factor changes
- Physiological differences – healthy vs disease, different patients, time courses
- Spatial niches – local micro-environments from spatial transcriptomics
Because D-SPIN models the distribution of transcriptional states, it is designed for single-cell RNA-seq data. The model can also work with bulk RNA-seq data but the performance is limited.
Minimum fields expected by the examples below:
adata.X— log-normalized (log1p) matrix after QC (filter low-quality cells, high mitochondrial content, and typically use HVGs).adata.obs['sample_id']— condition label; cells with the same value form one perturbation group.adata.obs['batch']— batch label; perturbation effects are compared within each batch when controls exist.adata.obs['if_control']—Truefor controls,Falseotherwise.
Practical guidance:
- Aim for ≥ 25 cells per condition when possible.
- If no explicit controls exist, D-SPIN can still compute responses relative to the global average, but matched controls per batch are preferred.
from dspin.dspin import DSPIN
model = DSPIN(adata, save_path, num_spin=adata.shape[1]) # gene-level
model.network_inference(
sample_id_key='sample_id',
method='pseudo_likelihood',
directed=True,
# optional priors / constraints:
# sample_list=None,
# perturb_matrix=None, # shape: (n_samples, n_genes)
# prior_network=None, # binary matrix of likely edges (e.g., motif/ATAC prior)
run_with_matlab=False,
params={'stepsz': 0.05, 'lam_l1_j': 0.01}
)
model.response_relative_to_control(
sample_id_key='sample_id',
if_control_key='if_control',
batch_key='batch'
)Results saved to model:
model.network– regulatory weights Jmodel.responses– responses h for each conditionmodel.relative_responses– responses relative to control (preferably within each batch)
Tips
num_spin=adata.shape[1]- gene-level networkdirected=True- only supported withpseudo_likelihoodprior_network(optional) - prior knowledge on network edges (e.g., TF–motif binding, ATAC-seq data)perturb_matrix(optional) - prior on direct perturbation targets (e.g., Perturb-seq)run_with_matlab=True- write variables tosave_pathfor MATLAB inference on large datasets
Program-level models first discover gene programs (via consensus oNMF), then infer a network over programs.
from dspin.dspin import DSPIN
model = DSPIN(adata, save_path, num_spin=20) # e.g., 20 programs
model.gene_program_discovery(
num_repeat=10,
seed=0,
cluster_key='cell_type' # optional: balance cell types
)
model.network_inference(
sample_id_key='sample_id',
method='mcmc_maximum_likelihood'
)
model.response_relative_to_control(
sample_id_key='sample_id',
if_control_key='if_control',
batch_key='batch'
)Extra outputs
- Program compositions under
save_path/onmf - Consensus program gene lists under
save_path
Tips
- A practical heuristic is
num_spin≈ 5 × (number of major cell types/clusters), but keepnum_spin ≤ 40for interpretability. cluster_keylets D-SPIN down-sample over-represented cell types before oNMF to reduce overfitting.
Given a program-level model (model_program) and a gene-level model (model_gene), D-SPIN can regress program activities onto the gene network to nominate regulators.
# model_program : program-level DSPIN object
# model_gene : gene-level DSPIN object
model_gene.program_regulator_discovery(
model_program.program_representation,
sample_id_key='sample_id',
params={'stepsz': 0.02, 'lam_l1_interaction': 0.01}
)Outputs in model_gene:
model_gene.program_interactions– regression coefficients (gene ↔ program)model_gene.program_activities– global activity for each program
Tips
- The two models should be built from
adataobjects with matchingadata.obs(so each cell aligns between gene- and program-level representations).
The package includes helpers for module discovery (Leiden clustering) and plotting.
import dspin.plot as dp
# Example: program-level network
G, j_filt = dp.create_undirected_network(
model.network,
node_names=spin_name, # list of program (or gene) names
thres_strength=0.05
)
module_list = dp.compute_modules(G, resolution=1, seed=0)
dp.plot_network_heatmap(j_filt, module_list, spin_name_list=spin_name)
dp.plot_network_diagram(
j_filt, module_list,
pos=None,
directed=False,
weight_thres=0.15,
spin_name_list_short=spin_name_short
)
dp.plot_response_heatmap(
model.relative_responses,
module_list,
spin_name_list=spin_name,
sample_list=model.sample_list
)Tips
resolutioncontrols Leiden module granularity (higher - smaller modules).- For gene-level models,
spin_nameis typically gene names; for program-level models, use program labels or representative gene names.
-
Jiang, Jialong, et al. "D-SPIN constructs gene regulatory network models from multiplexed scRNA-seq data revealing organizing principles of cellular perturbation response." bioRxiv (2023).
-
Jiang, Jialong, and Thomson, Matt. "A guide to D-SPIN: constructing regulatory network models from single-cell RNA-seq perturbation data." bioRxiv (2025).
-
Pratapa, Aditya, et al. "Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data." Nature methods 17.2 (2020): 147-154.
-
Cui, Ang, et al. "Dictionary of immune responses to cytokines at single-cell resolution." Nature 625.7994 (2024): 377-384.


