SpatialAgent: An autonomous AI agent for spatial biology

Hanchen Wang; Yichun He; Paula P. Coelho; Matthew Bucci; Abbas Nazir; Bob Chen; Linh Trinh; Serena Zhang; Kexin Huang; Vineethkrishna Chandrasekar; Douglas C. Chung; Minsheng Hao; Ana Carolina Leote; Yongju Lee; Bo Li; Tianyu Liu; Jin Liu; Romain Lopez; Tawaun Lucas; Mingyu Ma; Nikita Makarov; Lisa McGinnis; Linna Peng; Stephen Ra; Gabriele Scalia; Avtar Singh; Liming Tao; Masatoshi Uehara; Chenyu Wang; Runmin Wei; Ryan Copping; Orit Rozenblatt-Rosen; Jure Leskovec; Aviv Regev

doi:10.1101/2025.04.03.646459

Abstract

Advances in AI are transforming scientific discovery, yet spatial biology, a field that deciphers the molecular organization within tissues, remains constrained by labor-intensive workflows. Here, we present SpatialAgent, a fully autonomous AI agent dedicated for spatial-biology research. SpatialAgent integrates large language models with dynamic tool execution and adaptive reasoning. SpatialAgent spans the entire research pipeline, from experimental design to multimodal data analysis and hypothesis generation. Tested on multiple datasets comprising two million cells from human brain, heart, and a mouse colon colitis model, SpatialAgent’s performance surpassed the best computational methods, matched or outperformed human scientists across key tasks, and scaled across tissues and species. By combining autonomy with human collaboration, SpatialAgent establishes a new paradigm for AI-driven discovery in spatial biology.

1. Introduction

Modern AI is becoming a key tool for scientific discovery (1). Recently, Large language models (LLMs) (2) have shown promises in accelerating this progress by enabling reasoning, planning, and tool integration, through autonomous LLM-equipped agents (3). Autonomous agents are AI systems that iteratively perceive, plan, and act, allowing them to dynamically adapt to tasks with minimal human intervention. This makes them particularly valuable in more open-ended scientific research, where they integrate pre-existing knowledge (via LLMs and retrieval-augmented generation) with analytical actions (in connected tools), and iterative reasoning. In this they are distinct from both prior automated pipelines, which had to be hard-coded and hence could not adapt to the dynamic needs of new data analysis, and human-driven analysis, where a seasoned data scientist must iteratively engage with analysis tools, results, and knowledge, including through prompting of LLMs. Indeed, very recent studies have demonstrated the effectiveness of autonomous agentic systems in scientific workflows. Multi-agent reasoning has been used to iteratively refine hypotheses in biomedical research (4), while domain-specific toolsets have enabled agents to automate complex tasks, such as molecular dynamics simulations (5) and therapeutic reasoning (6). These advances underscore the transformative potential of AI-driven systems in accelerating scientific discovery across disciplines.

Spatial genomics, a rapidly growing field, offers a particularly compelling case in point. Spatial genomics aims to deciphers how biomolecules and cells are spatially organized within tissues to in homeostasis and disease (7). As an emerging area, driven by novel lab technologies that generate complex high dimensional data, spatial genomics relies on computational methods that are quite fragmented, require extensive human intervention and do not yet generalize well across diverse datasets and biological contexts. Moreover, users have to reason over multiple analysis approaches and complex concepts spanning molecules, cells and multicellular interactions, that require deep biological knowledge across many cell types, gene programs and interaction mechanisms to interpret. Thus, in practice, most studies demand substantial commitment from expert analysts and time-consuming work. We reasoned that LLM-driven autonomous agents could help drive innovative work in spatial genomics, both accelerating progress and helping make new discoveries.

Here, we introduce SpatialAgent, an LLM agent that autonomously navigates the entire spatial-biology gamut, from experimental design to multimodal analysis and data-driven hypothesis generation. Unlike established computational approaches in spatial genomics, which rely on predefined workflows and rigid models, SpatialAgent employs adaptive reasoning and dynamic tool integration, allowing it to adjust to new datasets, tissue types, and biological questions. It processes multimodal inputs, incorporates external databases, and supports human-in-the-loop interactions, enabling both fully automated and collaborative discovery. We evaluate SpatialAgent on multiple datasets spanning human brain, heart, and mouse colon, with tasks such as gene panel design, cell and tissue annotation, and pattern inference in cell-cell communication and pathway analysis. SpatialAgent outperforms existing computational baselines, matches or surpasses human experts in key biological tasks, and accelerates scientific workflows. Beyond efficiency, SpatialAgent enhances the interpretability of its AI-driven discoveries, bridging the gap between automation and human intuition and legacy knowledge. SpatialAgent sets a new standard for autonomous and collaborative research in spatial biology, with broader implications for biomedical research and precision medicine.

2. Results

Overview of SpatialAgent

We designed SpatialAgent, an LLM-based agent specialized for fully autonomous spatial-biology workflows, with the flexibility to optionally incorporate human interactions (Fig. 1a, Methods). It operates through a self-governing loop that integrates LLMs with external tools and databases. SpatialAgent is equipped with both predefined planning templates (akin to those analysts train to use as informal “playbooks”) and a curated set of domain-specific tools, which are critical for enhancing LLM performance in scientific problem-solving (8). It also adapts dynamically to human feedback, newly introduced tools, and previously unseen tasks (Supplementary Fig. 8).

Figure 1: Overview and modular design of SpatialAgent.

(a) Overview. SpatialAgent is an LLM-equipped autonomous agent that tackles a broad set of tasks in spatial biology, including (1) gene panel design for the experimental stage, (2) cell-type and tissue-niche annotation for the observational analysis stage, and (3) findings summarization and (4) generating hypotheses related to cell–cell interactions. SpatialAgent supports multimodal inputs and cross-species analyses. (b) Key modules. The action module (left) executes tasks such as retrieving reference datasets, converting gene names, verifying ligand–receptor interactions using existing databases, processing data with established software packages (e.g., numpy) or generating and executing custom code, while reasoning over and aggregating information from multiple sources. The memory module (top right) maintains both semantic memory (high-level objectives) and episodic memory (short-term steps and context). The planning module (bottom) manages task planning via chain-of-thought reasoning and self-reflection, iteratively refining plans to achieve specific goals.

SpatialAgent consists of three key modules (Fig. 1b): memory, planning, and action. The memory module maintains both semantic memory of high-level objectives and available tools (e.g., “I want to design a panel of 500 genes for mouse models with prostate cancers, treated with T-cell infiltration.”) and episodic memory of short-term steps and evolving context, ensuring continuity in task execution.

When a user queries, the planning module employs chain-of-thought reasoning and self-reflection prompts (Supplementary Information 1.5, Supplementary Fig.7), optionally leveraging predefined templates (Supplementary Information 1.4, Supplementary Fig.1, 2, 3) or generating plans de novo (Supplementary Information 1.7), to break down complex tasks into manageable steps. As execution progresses, the memory module iteratively refines stepwise plans, dynamically adapting to new information, such as user modifications or tool execution failures.

The action module executes tasks by selecting appropriate tools, such as retrieving relevant scRNA-seq datasets, converting gene names between species, or verifying ligand–receptor interactions via external databases. It also processes data using established libraries (e.g., scikit-learn (9), PyTorch (10), Scanpy (11)) or generates and executes custom code when needed, integrating information from multiple sources. The agent’s modular toolbox is easily extensible, enabling customization to suit user needs (Supplementary Information 1.3, 6).

SpatialAgent operates in either autonomous or co-pilot mode. The autonomous mode executes complex tasks end-to-end, leveraging pre-defined plan templates and generalizing to unseen requests (Supplementary Information 1.7), requiring no human intervention beyond the initial query. The co-pilot mode enables interactive engagement, allowing users to refine task definitions dynamically (e.g., pose follow-up questions) as the process unfolds (Supplementary Information 1.8), facilitating more adaptable and user-guided task execution.

To assess SpatialAgent’s performance we benchmarked it on three core use cases: gene panel design, cell and niche annotation, and cell-cell interaction analysis, spanning three plan templates (one per core use case), 19 specialized tools and five datasets. Specifically, we tested SpatialAgent on gene panel design for spatial profiling of human brain regions, comparing it to four established computational methods and 10 human experts; on multimodal cell and niche annotation of developing human heart tissue compared to seven human scientists; and on discovery of cell-cell interactions in the tissue remodeling in mice, for open ended discovery of key tissue patterns and associated hypotheses. In addition, we assessed SpatialAgent’s ability to generalize to unseen user queries (Supplementary Information 1.7), and to engage in two ongoing ‘real world’ lab experiments.

SpatialAgent out-performed most tools and human experts in gene panel design

To evaluate SpatialAgent’s ability to autonomously design gene panels, we benchmarked its performance against computational baselines, human scientists, and a hybrid setting where SpatialAgent refined initial designs from human scientists (Methods). We used a Visium spatial transcriptomic dataset of 12 sections from the human dorsolateral prefrontal cortex (DLPFC) from three adult human donors (12), capturing spatial gene expression across the six-layered cortical architecture. When a user queries SpatialAgent to design gene panels for the DLPFC, the agent updates its memory with the objective and available tools (all tools were accessible throughout). It then formulates a plan (i.e. a sequence of tool calls and decisions), to fulfill the task. This planning is scaffolded by predefined templates, analogous to how human analysts learn from tutorials or recommended workflows: they offer a structured yet adaptable blueprint that guides decision-making while allowing for context-specific variation. These templates help the agent correctly invoke domain-specific tools and reduce failure modes common in self-generated code, such as hallucinations or improper tool usage (mostly self-generated code). In this case, since the gene panel design aligns with a known template, SpatialAgent follows it (Fig. 2a), while dynamically adjusting steps based on intermediate results.

Figure 2: Gene panel design by SpatialAgent.

(a) Illustrative step wise agent autonomous workflow. Schematic of the first few steps in the SpatialAgent workflow for designing a gene panel in the dorsolateral prefrontal cortex (DLPFC). (b-f) SpatialAgent outperforms established computational baselines for cell type and spatial coordinate prediction. (b,c) Cell-type prediction accuracy (b, y axis) and relative improvements over computational baselines (c, x axis) for designing 50-500 gene panels by SpatialAgent or each of several established methods. Box plots show medians (center lines), interquartile ranges (boxes), and 1.5× interquartile ranges (whiskers). Circles denote outliers. Results are averaged over 10 runs across all 12 DLPFC samples. (d,e) Spatial coordinate prediction performance (d, y axis) and relative improvements (e, x axis) by SpatialAgent or each of several established methods. Results are averaged over 10 runs across all 12 DLPFC samples. Box plots as in (b,c). (f,g) Cell-type prediction accuracy (f, y axis) and relative improvements (g, x axis) for SpatialAgent, human scientists, and hybrid approaches in which SpatialAgent incorporates human-designed templates. Solid bars represent individual scientists, and the striped bars to their right show performance gains when combined with SpatialAgent. (we report the minimum, median, mean, and maximum improvements) averaged across all three panel sizes.) (h,i) Spatial coordinate prediction performance (h, y axis) and relative improvements (i, x axis) for SpatialAgent, human scientists, and hybrid approaches. As in (f, g), each bar represents a human scientist, with the paired bar indicating combined performance.

Across panel sizes from 50 to 500 genes, SpatialAgent’s panels consistently outperformed those from established computational pipelines in terms of cell-type prediction accuracy (Fig. 2b,c, 6.0-19.1% improvement) and spatial coordinate prediction (Fig. 2d,e), with R² gains of up to 47.1% for some methods, and performing on par with others (HVG, −1.4%). Additional results are (Fig. 2b-e, Supplementary Fig.17-20). Thus, SpatialAgent autonomously identifies gene sets that capture biological variance and spatial organization better than standard approaches.

We also compared SpatialAgent’s performance to that of 10 human experts for the design of gene panels of three sizes (50, 100, 150 genes). SpatialAgent outperformed 90% of human designs in cell-type prediction and 95% in spatial Y-coordinate prediction. Importantly, despite variations in individual human performance and workflow (Supplementary Information 2.2), integrating SpatialAgent with human expertise in the hybrid setting consistently improved outcomes for each human (maximum improvement: 935.1% in predicting Y-coordinates). To support such integration, SpatialAgent was designed to accept human-drafted gene panels as input and initialize its internal plan accordingly. It then evaluated, revised, or extended the design using its reasoning loop and toolset, effectively integrating into diverse human workflows without requiring standardization. Specifically, enhancing human designs with SpatialAgent improved cell-type prediction accuracy in 80% of cases (Fig. 2f,g) and spatial coordinate prediction in 90% of cases (Fig. 2h,i). Interestingly, incorporating human-designed panels also enhanced the final design compared to the agent alone, with 55% of hybrid designs outperforming SpatialAgent’s autonomous runs, highlighting the benefit of human-machine collaboration (Fig. 2f–i).

Reasoning and efficiency of autonomous panel design

We next analyzed SpatialAgent’s reasoning process, spatial structure preservation, and computational / cost efficiency relative to baseline methods and human scientists.

As part of its output, SpatialAgent’s provides detailed reasoning for each gene included in the panel, revealing that the Agent assigned gene importance scores by combining reference datasets, external marker databases, internal knowledge, and human scientist–designed templates (Fig. 3a). The agent had appropriately increased its confidence in gene selection by prioritizing genes that are supported by multiple corroborating sources. This integration across knowledge bases increases the likelihood the agent will overcome gaps in individual databases, reduce bias, and ensure robust and generalizable performance. Moreover SpatialAgent produces a transparent aggregation of information, providing interpretable, gene-level rationales in natural language, explicitly connecting each gene in the panel to relevant cell types and functional roles (Fig. 3a). This enhances biological interpretability and can expedite human expert validation.

Figure 3:

Reasoning, performance, and efficiency of SpatialAgent in gene panel design. (a) Agent reasoning in annotation. Illustrative examples of multiple sources of information - reference datasets, external marker databases, internal knowledge, and human scientist–designed templates - used by SpatialAgent to assign importance scores in the “Estimate Importance of Each Gene” step for gene panel design in the DLPFC dataset. (b-e) Performance on cell type separation by cortical layer. (b) UMAP embedding of spatial transcriptomic profiles (dots) consisting of 500 selected genes from Spapros, SpatialAgent, or the full gene set (21,000 genes) and colored by cortical layer annotation. (c) Clustering evaluation metrics (y axes, labeled on top) with 500-gene panels selected by each method (x axis) and with all 21,000 genes. Arrows: desired direction of alignment with spatial organization. (d,e) Spatial expression patterns of the top three genes selected by SpatialAgent (d) and Spapros (e). (f,g) Runtime and cost performance. Runtime (f, y axis, log scale) and cost estimates (g, y axis, log scale) for each method (x axis) across three runs. Human time is averaged over self-reported usage. Cost estimates are based on standard vendor rates (e.g., Amazon Web Service, OpenAI) for computational baselines and SpatialAgent. Human labor costs are projected assuming an annual salary of $100,000 USD (assuming 8 work hours per weekday) (Methods).

Because the dorsolateral pre-frontal cortex is organized in distinct cortical layers that define functionally distinct cell types and circuits (12), we also evaluated the clustering structure and spatial coherence associated with the gene panels from SpatialAgent, standard pipelines and human experts. Even when reducing the transcriptome to 500 genes, SpatialAgent’s panels preserve strong spatial organization and produce well-differentiated clusters (Fig. 3b). These are closely aligned with the underlying spatial structure by multiple clustering metrics (Silhouette, Davies-Bouldin, Calinski-Harabasz, and Local Label Agreement), outperforming all baseline approaches (Fig. 3c).

Moreover, the genes prioritized by SpatialAgent reflect known functional activities of the DLPFC (e.g., working memory, attention, and executive control) and provide spatially informative signals. For example, the three genes with highest importance scores in the SpatialAgent’s panels are key neuronal markers (GAD2 (GABAergic inhibitory neurons), SLC17A7 (glutamatergic excitatory signaling), and GRIN2B (an NMDA receptor subunit)) and display distinct spatial distributions (Fig. 3d,e), spanning cortical layers, but absent from adjacent white matter. In contrast, the top three selected genes from Spapros (13) (PCDH15, ATP1A2, NKAIN3) yielded weaker spatial enrichment.

SpatialAgent also provides significant improvements in both runtime and cost, completing its analysis in around 30 minutes, far faster than human-driven panel design and notably quicker than advanced computational methods, such as Spapros (Fig. 3f). SpatialAgent offered substantial cost and time savings relative to some (not all) computational pipelines, and compared to human expert work. Thus, SpatialAgent allows users to quickly and efficiently benefit from iteration, seamlessly integrating diverse knowledge bases, with the potential to be combined with expert knowledge in order to design robust panels for spatial-omics experiments.

Enhanced cell type and tissue niche annotation

Accurate and scalable annotation of spatially-localized cells and tissue niches remains a bottleneck in spatial biology, particularly when it requires considering multimodal information, such as expression profiles, anatomical annotations or histology. To evaluate SpatialAgent’s capabilities, we use a high-resolution molecular and spatial cell atlas of the developing human heart (14), which integrates scRNA-seq and MERFISH spatial measurements across 6 heart samples collected at 9 to 16 post-conception weeks (142,946 single-cell profiles and >1.5 million MERFISH-detected cells). We assessed SpatialAgent’s ability to reconstruct cell-type organization with both cellular and spatial fidelity. For cell-type annotation, we compared SpatialAgent to GPTCellType (15), CellTypist (16, 17), our human experts, and author-provided ‘ground truth’ using shared UMAP coordinates and ontology terms (Fig. 4b,d, Methods, Supplementary Information 3.2, 3.3). For tissue niche annotation, we compared SpatialAgent to human experts and Author-provided annotations (Fig. 4c), as we are not aware of an existing standard pipelines for this task (Supplementary Information 3.2)).

Figure 4: Cell type and tissue niche annotation by SpatialAgent.

(a) Illustrative workflow. SpatialAgent integrates multimodal information (i.e., anatomical images, MERFISH data) for tissue niche annotation, followed by sample-wise aggregation and refinement via collective intelligence. (b–d) Cell type annotations. (b) UMAP of cells colored by annotations from GPTCellType, CellTypist, a representative human scientist (La, second-best in accuracy), SpatialAgent, and the original study. Colors indicate eight major cell types, where ‘VSMCs’ denotes ‘vascular smooth muscle cells’. Visualizations and scores are aggregated across all samples using the author annotations as ground truth (same rule applies below). (c) Annotation performance: accuracy, macro precision, and micro precision (y axis) across methods (x axis). (d) Confusion matrices comparing annotations from CellTypist, GPTCellType, human scientist La, and SpatialAgent to ground truth, sharing the same coloring scale of 0-1. (e-f) Tissue-niche annotations. (e) Annotated tissue niches by SpatialAgent, human scientist La, and the original author. Colors denote all seven major niches (legend); where ‘unmatched’ indicates regions with no correspondence to author annotations (Methods). (f) Accuracy, macro precision, and micro precision (y axis) across methods (x axis). (g, h) Cost and time. Estimated cost (g, y axis, USD, log scale) and time (h, y axis, hr) for SpatialAgent and human scientists (x axis), estimated as in Fig. 3.

Overall, SpatialAgent produced consistent, high-quality annotations that overall aligned closely with the author-provided ground truth, outperformed GPTCellType and CellTypist, and was largely on par with the best human experts, in terms of accuracy and precision at both coarse and fine scale (Fig. 4b,d). CellTypist struggles reflect its computational design (assigning per-cell annotations using a pre-trained neural network, leading to noisier predictions), training bias toward immune cells, and design for scRNA-seq, rather than MERFISH data with a more limited feature space. GPTCell-Type and human experts failed to identify immune and neuronal cells, and human experts also miss vascular smooth muscle cells. While SpatialAgent performed well, it did not fully recover epicardial cells, where the author’s annotations distinguish between “epicardium-derived progenitor cells” and “epicardial” (as did GPTCellType, see Supplementary Fig.23-26), even though epicardial cells are relatively abundant in the dataset, and were present in both the queries and instructions. Indeed, GPTCellType and SpatialAgent failed to fully capture this tissue-specific feature, even when reference datasets were provided. LLM-based methods, including GPTCellType and SpatialAgent, tended to default to the most common annotations in their internal knowledge rather than incorporating dataset-specific biological context. In contrast, some human experts leveraged this context to provide more accurate epicardial cell annotations (Supplementary Information 3.5).

For tissue niche annotation, we compared SpatialAgent to human experts and Author-provided annotations (Fig. 4c), because we are not aware of existing standard pipelines for this task. As possibly the first method for automated tissue niche annotation, SpatialAgent effectively delineated anatomical regions by leveraging multimodal information from anatomical reference images to enhance spatial context. This allowed SpatialAgent to capture fine-grained spatial structures consistently across samples. Its performance benefits from key design choices, such as the removal of small clusters in UTAG (18) (Supplementary Information 3.4) and iterative refinement through sample-wise aggregation (below). SpatialAgent performed on par with or better than human experts in terms of accuracy and precision at coarse and fine scale (Fig. 4f), generalizing well across tissue regions. Notably, some human experts introduced creative annotations, such as labeling the outer layer as “Epicardium”, which was absent in the original Author-provided annotations. Since this discrepancy lacks a direct reference, we categorize this region as “unmatched” (Supplementary Information 3.2), as some of these could be instances of enhanced performance of a new human annotator compared to the original ones. Overall, SpatialAgent matched the performance of the best human annotators and outperformed the rest (Supplementary Information 3.5). Moreover, SpatialAgent dramatically reduced both annotation costs and time (Fig. 4e,g). Thus, SpatialAgent maintains high annotation quality while offering a scalable, cost-effective alternative to manual expert annotation for large-scale spatial transcriptomic studies.

Behavior analysis in multimodal annotation

To understand the basis of SpatialAgent’s improved annotation quality we analyzed its behavior and choices. For example, GPTCellType annotated Leiden Cluster 18 as cardiac fibroblasts, whereas SpatialAgent annotated them as neurons, consistent with the Authors’ annotation (Fig. 5a). While GPTCellType took a gene centric approach that relied on expression of extracellular matrix-related genes (e.g., POSTN, COL2A1, and COL9A2), which are commonly associated with fibroblasts, SpatialAgent combined transferred cell-type composition and recognizes a high neuronal cell proportion in the region (0.54), along with neuronal or neural development markers (NRXN1). SpatialAgent also recognized an overlap with cardiac-related genes with neuronal markers highlighting a potential caveat, which gene-centric methods may overlook.

Figure 5: Multimodal cell and niche annotation with SpatialAgent.

(a) Comparison of GPT-CellType and SpatialAgent annotations. Reasoning underlying the annotation of Leiden Cluster 18 (red) from MERFISH data by GPTCelltype (middle) and SpatialAgent (bottom). GPTCellType assigns cell types based on the top 20 differentially expressed genes in the cluster (top), annotating the cluster’s cells as cardiac fibroblasts, while SpatialAgent incorporates additional transferred cell-type composition information, suggesting a neuronal identity (with caveats) and matching the authors’ annotation (Supplementary Fig.22). (b) Collective-intelligence design for tissue-niche annotation in SpatialAgent. Sample-wise annotations (left) are refined by aggregating reasoning across samples (right), improving final niche assignments. Incorrect annotations in the initial sample-wise outputs are marked in red (left). This process optionally integrates molecular and cell-type information to enhance the performance.

Similarly, SpatialAgent enhanced tissue niche annotation through a collective intelligence approach, refining spatial assignments across samples by integrating multimodal reasoning (Fig. 4b). The agent’s initial sample-wise annotations included several inconsistencies when transferring labels from the example anatomical image provided to the spatial transcriptomic data. For example, it mislabeled the ‘Right Atrium’ of one sample as ‘Tricuspid Valve’, and the ‘Left Ventricle’ of another sample as ‘Pulmonary Vein’. (Fig. 4b, left). These initial discrepancies highlight the challenge of localized annotation variability, partially due to the limitations of visual reasoning in current LLMs (19). However, by aggregating spatial reasoning across all samples, SpatialAgent systematically resolved these ambiguities, leading to a coherent and correct anatomical interpretation of these regions in all samples (Fig. 4b, right). Overall, SpatialAgent integrated anatomical reference images, molecular profiles, and cell-type information, to maintain annotation consistency across samples, surpassing traditional per-sample annotation approaches. The use of a standardized naming scheme further ensures comparability across datasets.

Thus, SpatialAgent can effectively automate and standardize cell and tissue niche annotation, providing a biologically meaningful and spatially coherent interpretation of spatial transcriptomics data.

Mining data patterns, summarizing findings, and proposing hypotheses with SpatialAgent

We next evaluated SpatialAgent’s capability to extract complex biological insights from spatial transcriptomics data. To avoid any knowledge contamination, we used a recent mouse colitis study (20), which was published after GPT4o’s knowledge cutoff date (Methods). This dataset consists of MERFISH analysis of 940 genes in 52 tissue sections from 15 mice across four conditions: prior to DSS-induced inflammation, early in colitis, during peak inflammation and after a recovery period. The study includes author-annotated cells and spatial neighborhoods.

Upon receiving a user query (e.g., “I want to infer cell-cell interactions in the tissue and how they change over time. <some additional descriptions on the data file>”), SpatialAgent autonomously executed a multi-step analysis pipeline: it first summarized condition-specific, cell-type compositions and spatiotemporal changes, and then computed cell–cell communication scores using LIANA+(21), a framework integrating cellChat (22), cellPhoneDB (23), and additional gene set enrichment and factor analysis steps. It culminated by generating a detailed 7,000-word report that uniquely operates in the interaction mode, enabling iterative refinements through user-specified follow-up queries and new tasks via “memory hacking” (Fig. 6a, Supplementary Fig. 15).

Figure 6: Mining complex data patterns with SpatialAgent.

(a) Overview of the workflow with the interaction mode of SpatialAgent (Methods and Supplementary Information). After SpatialAgent autonomously executes a sequence of actions, the user can ask follow-up questions or initiate a new task. (b) Comparison of interpretations between the original authors and SpatialAgent. Shared (top) and distinct (bottom) findings between the authors (left) SpatialAgent (right) analyses.

Comparing SpatialAgent’s interpretations with the original study, we observed broad agreement on key findings, particularly regarding inflammation-associated fibroblasts (IAFs) and significant immune infiltration (Fig. 6b), which are hallmarks of IBD (24) and were particularly highlighted in the original paper (20). Notably, SpatialAgent uncovered additional insights, such as enhanced TGF-β signaling, submucosal remodeling, and fibroblast–pericyte interactions—elements that were not explicitly emphasized in the original study. These findings suggest a pivotal role for TGF-β-induced fibroblast activation in tissue repair and fibrosis (25). In particular, SpatialAgent’s analysis proposes fibroblast polarization, mediated through TGF-β and IL-11 signaling, as a critical process orchestrating tissue regeneration (26). SpatialAgent proposes that the observed interplay between IAFs and pericytes appears to drive extracellular matrix remodeling, a mechanism that aligns with prior studies implicating IL-11 in fibrosis and tissue repair (27).

These insights could have significant implications. Identifying key signaling pathways, such as TGF-β and IL-11, offers potential therapeutic targets, and the distinct spatial transcriptomic signatures uncovered by SpatialAgent might be promising biomarker candidates for assessing disease severity and predicting treatment responses.

Adding customized genes to an existing panel for a specialized system

Finally, we applied SpatialAgent to optimize experimental design in an ongoing wet lab study. Specifically, we tasked SpatialAgent with selecting 100 additional genes to enhance the Xenium 5k pan-tissue panel for a study on prostate cancer mouse models under different treatment conditions (Fig. 7a). This process involved a hybrid collaboration between SpatialAgent, human scientists, and suppliers to ensure the selection was data-driven, biologically informed, and manufacturability-compliant.

Figure 7: Designing 100 customized genes alongside the Xenium 5k pan-tissue panel for mouse models of prostate cancer under various treatments.

(a) Overview of experimental workflow. (b,c) Improved cell type distinctions with Xenium+SpatialAgent panel. (b) UMAP embedding of cell profiles using different gene subsets, colored by cell-type annotations. Xenium: standard 5k pan-tissue panel; Xenium + Random: 5k panel combined with 100 randomly selected additional genes; Xenium + SpatialAgent: 5k panel with 100 genes selected by SpatialAgent; Full Set: complete profiles from the reference scRNA-seq dataset. (c) Clustering metrics and cell-type accuracy (y axis) for each method (x axis). Xenium + Random results are averaged over three independent random-gene selections. Arrows: direction of improved performance. (d) SpatialAgent -selected genes are enriched for key processes (Methods). Enrichment (node size, -log(P value)) and overlap (edge width, Jaccard index) of SpatialAgent selected genes with genes from different pathways (text boxes), and the specific overlapping gene names. (e) Cell–cell interaction scores are enhanced with Xenium+SpatialAgent panel. Strengths (from CellPhoneDB) of predicted cell-cell interactions from Basal and Fibroblast populations to immune cells, using Xenium (left) or Xenium + SpatialAgent genes (right). (Methods).

To systematically evaluate the impact of these customized gene panels, we first benchmarked their selection against a reference scRNA-seq dataset from a similar prostate cancer system (28). This ensured that the panel captured biologically relevant functions and pathways. The additional 100 SpatialAgent -selected genes improved the resolution of stromal, immune, and epithelial compartments, particularly in regions critical to prostate cancer progression and immune interactions (Fig. 7b,c, Supplementary Fig. 75-74).

Specifically, Xenium+SpatialAgent enhanced the definition of basal epithelial cells, fibroblasts, and prostate microvascular endothelial cells, leading to clearer population boundaries (Fig. 7b). While the standard Xenium panel performed slightly better in classifying classical monocytes and some T cell populations, Xenium+SpatialAgent provided superior resolution for stromal and epithelial compartments: key components in prostate cancer progression (Supplementary Information 5.2). Moreover, Xenium+SpatialAgent substantially improved clustering quality across multiple metrics (Fig. 7c). The selected genes spanned key overlapping processes (Fig. 7d) enriched in a reference dataset with notable T-cell infiltration. Analysis using the enhanced Xenium+SpatialAgent set also refined inferred cell–cell interactions within the tumor microenvironment, particularly communication between basal epithelial, fibroblast, and immune compartments (Fig. 7e). Interaction strengths, computed via CellPhoneDB, revealed distinct signaling differences, refining our understanding of immune–tumor crosstalk.

We further investigated how SpatialAgent’s customized gene selections enhance the detection of cell-cell interactions within the tumor microenvironment, specifically examining communication between basal epithelial cells, fibroblasts, and immune cells (Fig. 7e). By calculating interaction strengths using CellPhoneDB, we revealed distinct signaling patterns that provide putative insights into immune-tumor crosstalk. The Xenium-only panel (Supplementary Fig. 76) primarily identified fibroblast-immune interactions involving modulatory ligands such as Jag1-Notch2 and Vcan-Tlr2. In contrast, incorporating SpatialAgent’s customized design (Supplementary Fig. 77) highlighted the importance of laminin-integrin signaling networks previously undetected. This enhanced approach revealed cell type-specific communication patterns: fibroblasts engaged in numerous robust interactions across immune subtypes, while basal epithelial cells displayed fewer but functionally important integrin-based signals. Of particular interest, our integrated analysis uncovered distinct regulatory circuits in specific immune populations (e.g., Igf1-Insr in Tregs) and emphasized the role of basal epithelial cells in shaping the tumor microenvironment through targeted signaling pathways. These refined interaction maps demonstrate the critical importance of optimized gene panel selection for accurately dissecting complex cellular communications and may inform future therapeutic strategies targeting these pathways.

3. Discussion

Computational analysis in spatial genomics has been a bottleneck and SpatialAgent introduces a substantial shift that will accelerate progress and empower scientists. By uniting LLM-based reasoning, dynamic tool execution, and multi-modal data analysis within a self-governing system, it merges the interpretability and adaptability typically associated with expert-driven strategies, while offering the scalability and throughput gains of computational pipelines. In addition to substantially streamlining existing workflows, which were difficult to automate, it enables AI-driven hypothesis generation, guided experimentation, and faster data-to-discovery loops.

One of the most significant advantages of autonomous AI agents in this context is the potential to democratize sophisticated analyses. Current spatial biology pipelines often require specialized computational skills and substantial resources, creating barriers for labs with limited infrastructure or for bench scientists who may not be as well versed in coding or computational analysis. By automatically selecting tools, orchestrating analyses, and making interpretable decisions, SpatialAgent —and AI agents like it—can empower researchers to execute optimal workflows without extensive computational expertise, and combine them with their unique human abilities to formulate interesting questions, and engage with the agent. At the same time it liberates computational scientists from the need to perform relatively rote analyses, and allows them to focus on developing novel analytic strategies, new methods, and deeper analyses. Thus, this broader access has tangible benefits for the research community, from speeding up routine annotations to accelerating the identification of emergent biological patterns that might otherwise be overlooked.

An agent based approach also promises greater synergy between computational predictions and wet-lab validation. Traditionally, refining experimental designs is a lengthy, iterative process: initial hypotheses are tested in the lab, new data are generated, and computational analyses are performed offline. In contrast, AI agents can offer near-real-time feedback and suggest potential avenues for hypothesis refinement, enabling a more “closed loop” where computational insights more directly guide wet-lab experiments. By quickly highlighting the most promising markers or pathways, SpatialAgent can help researchers iterate more efficiently, saving time, resources, and, importantly, allowing investigators to focus on creative exploration rather than manual data wrangling.

Nevertheless, challenges remain. First, building deeper domain-specific knowledge into the system will be crucial, particularly for niche or emerging biological processes underrepresented in current training data. Second, unlike standard computational analyses, agent-driven approaches can have the same limitations as humans, including biases, hallucinations, and other errors. Automated uncertainty quantification would strengthen trust by distinguishing between high- and low-confidence predictions, allowing scientists to concentrate validation efforts where it is most needed. Furthermore, while multi-agent architectures hold promise for distributing specialized tasks among dedicated “expert” agents, their success depends on robust coordination protocols and reasoned consensus-building to avoid compounding errors or generating contradictory outputs (29).

Looking ahead, we envision AI autonomous agents evolving beyond passive analysts into active collaborators capable of context-aware reasoning and counterfactual experimentation. With refined planning algorithms, such agents could propose targeted experimental designs to test competing mechanistic hypotheses, serving as intelligent research partners who generate new questions rather than merely answering old ones. Coupled with advancements in large-scale single-cell multi-omics, clinical diagnostics, and real-time imaging, the next generation of AI-driven systems may redefine the boundaries of biomedical discovery. Ultimately, by reducing drudgery, enhancing interpretability, and unleashing creative potential, SpatialAgent illustrates how autonomous AI agents can save time, augment human ingenuity, and accelerate breakthroughs in spatial biology and beyond.

Methods Summary

SpatialAgent

SpatialAgent’s core framework consists of three key components: a memory module that maintains both semantic and episodic information, a planning module that employs chain-of-thought reasoning for task decomposition, and an action module that executes specialized functions through a curated toolbox. The agent operates through a self-governing loop where it first encodes objectives and available tools in memory, formulates a concrete plan with sequential steps, executes these actions while recording outcomes, and iteratively refines its approach based on results and potential execution failures. This architecture enables adaptation to new datasets, tissue types, and biological questions without requiring predefined rigid workflows. Full implementation details and prompt instructions are provided in Supplementary Information.

Co-pilot mode

To support a co-pilot mode for SpatialAgent, memory hacking was employed, where the agent’s memory is updated based on the user’s new input each time. This allows SpatialAgent to adjust its plans and actions or provide useful content based on the user’s queries. Further details are provided in Supplementary Information.

Computational Baselines

SpatialAgent was compared against established computational methods across each of three primary use cases. For gene panel design, it was benchmarked against (1) selection of highly variable genes (HVG) with Seurat (30), (2) GeneBasis (31), (3) Persist (32), and (4) Spapros (13). These range from simple variance-based methods to dedicated methods for spatial transcriptomics. For cell type annotation, SpatialAgent was compared with CellTypist (17) (a traditional approach), and GPTCellType (15) (an LLM based approach). For cell-cell interaction analysis, SpatialAgent was assessed for its ability to integrate tools from the LIANA framework and generate biologically meaningful insights. Detailed implementation parameters for all baseline methods are provided in Supplementary Information.

Human scientist performance

To benchmark SpatialAgent against human expertise, expert scientists were recruited with background in computational method development, experimental spatial transcriptomics, and biological data analysis. For the gene panel design task, 10 scientists were provided with the DLPFC dataset and asked to design gene panels of three different sizes (i.e., 50, 100, and 150 genes). For the cell type and tissue niche annotation task, 7 scientists annotated the developing heart dataset. Participants were explicitly instructed not to use LLMs during their analysis but were otherwise free to employ any tools or resources typically used in their workflow. The approaches, time spent, and final results were documented through a standardized form. Full task descriptions and instructions provided to participants are detailed in Supplementary Information.

Evaluation metrics

For gene panel design, metrics were employed as previously described for Persist and Spapros: cell-type prediction accuracy, spatial coordinate prediction performance, and clustering qualities. These metrics assess both the biological relevance of selected genes and their ability to capture spatial organization within tissues. For cell and niche annotation, performance was evaluated by accuracy, macro precision, and micro precision against author-provided annotations as ground truth. Metrics that are difficult to define or distinguish between baselines (e.g., expression constraints from Spapros) were avoided to ensure fair and interpretable comparisons.

Code availability

The implementation of SpatialAgent and tutorial notebooks for reproducing the results in this manuscript are available at https://github.com/Genentech/SpatialAgent.

Supplementary Information

1. SpatialAgent

1.1. Configurations of base LLMs

SpatialAgent development primarily utilized OpenAI’s GPT-4o model (version released on August 6th, 2024, knowledge cutoff October 2023). GPT-4o supports a context window of 128,000 tokens and can generate responses up to 16,384 tokens in length, as in OpenAI’s documentation: https://platform.openai.com/docs/models.

SpatialAgent was also adjusted for Anthropic’s Claude-3.5-Sonnet (released on October 22, 2024, training data cutoff April 2024). Claude-3.5-Sonnet supports a context window of 200,000 tokens and can generate up to 8,192 tokens per response, according to its documentation: https://docs.anthropic.com/en/docs/about-claude/models. Without further adaptations, we found that the current agent framework can directly work with Claude-3.7-Sonnet (released on Fed 19, 2025).

1.2. LangChain framework

To enable structured reasoning, dynamic tool invocation, and memory persistence within SpatialAgent, LangChain was used, as a modular framework for building LLM-powered applications. LangChain allows us to compose multi-step workflows that incorporate retrieval-augmented generation (RAG), structured decision-making, and domain-specific tools.

In SpatialAgent, LangChain is employed with two key aspects:

Tool curation and orchestration: LangChain’s agent framework is used to orchestrate calls between LLMs and specialized tools for spatial biology. Agents dynamically decide which tools to invoke based on user queries or analysis objectives, ensuring efficiency in various tasks.
Memory and context handling: To maintain consistency across multi-turn interactions and complex workflows, LangChain’s memory modules are used, allowing intermediate outputs (e.g., retrieved datasets, computed gene scores) to persist across reasoning steps.

1.3. Tools developments

Three main types of curated tools were used in SpatialAgent. The tool collection below can be readily extended, allowing for customization and integration of additional functionalities to meet specific research needs. This modular design enables researchers to incorporate their own specialized tools, databases, or analytical methods while maintaining compatibility with the existing framework.

1.3.1. Databases

Multiple databases were integrated to support retrieving reference datasets and conduct basic analysis, leveraging their structured metadata and large-scale annotations.

Chan-Zuckerberg Initiative (CZI) CELLxGENE is a scalable single-cell data platform offering access to over 50 million unique cell profiles with standardized metadata, facilitating meta-analyses, cross-dataset integration, and computational modeling (33). CELLxGENE was accessed via its Python API at https://chanzuckerberg.github.io/cellxgene-census/ on November 26, 2024.

The retrieved database, built on November 25, 2024, follows census schema version 2.1.0 and dataset schema version 5.2.0. It comprises 135,560,133 total cell profiles from 1003 datasets, including 71,730,590 unique cell profiles, annotated across 780 distinct cell types, 63 tissues, from 22,009 human donors and 4,752 mouse models.

PanglaoDB is a web-based resource for exploring mouse and human single-cell RNA-seq data, integrating over 1,000 experiments with more than 4 million cell profiles (34). It offers a user-friendly interface, standardized pre-processing pipelines, cell-type inference, and a curated marker gene compendium to enhance data accessibility and biological insights. The database was downloaded in TSV format from https://panglaodb.se/markers.html on May 20, 2024 (last update March 27, 2020 (10:44:00 CET)). The database contains 8,286 associations, spanning 178 cell types, 4,679 gene symbols, and 29 tissues.

CellMarker2 provides a manually curated collection of experimentally supported markers for 2,578 cell types across 656 tissues in humans and mice, totaling 83,361 tissue-cell type-marker associations (35), across 26 marker types (e.g., protein-coding genes, lncRNAs) derived from 24,591 studies. The database was downloaded in csv format from http://www.bio-bigdata.center/CellMarker_download.html on Oct 20, 2024.

PROGENy provides footprints of pathway activity by focusing on transcriptionally responsive genes rather than pathway components themselves (36). The database contains consensus gene signatures for 14 key signaling pathways (EGFR, Hypoxia, JAK-STAT, MAPK, NFkB, p53, PI3K, TGFb, TNFa, Trail, VEGF, WNT, Androgen, and Estrogen), each represented by 100 responsive genes determined from thousands of perturbation experiments. PROGENy was accessed via its R package (version 1.18.1) from Bioconductor on June 2, 2024, which provides model matrices for both human and mouse organisms and supports weight matrices of varying sizes (100, 300, and 500 genes per path-way) to balance specificity and coverage depending on analytical needs.

LIANA (LIgand-receptor ANalysis frAmework) is a comprehensive resource for cell-cell communication inference that integrates multiple databases and analytical methods into a unified framework (21). It aggregates ligand-receptor pairs from CellPhoneDB, CellChat, NATMI, ConnectomeDB, and other resources, resulting in a comprehensive interaction collection covering more than 3,000 unique interactions. Its curated, non-redundant ligand-receptor interactions are annotated for complex structures (including heteromeric proteins), enabling accurate inference of intercellular signaling networks. LIANA version 0.1.8 was accessed on May 25, 2024, via its Python interface.

1.3.2. Data Processing

CZIInfo is a retrieval tool designed to extract relevant cell-type and tissue information from the CZI reference dataset based on specified criteria. It allows users to query by tissue, organism (Homo sapiens or Mus musculus), disease state, and dataset identifier. The tool loads census metadata from czi_census_datasets_v4.csv (codes on how to curate this are provided in https://github.com/Genentech/SpatialAgentData) and leverages an LLM-based tissue name matching pipeline to refine search results. It then filters dataset entries based on tissue and disease annotations, extracting and saving reference single-cell data. Cell types and tissues are further summarized using an LLM, and results are stored as structured text files for downstream annotation. Queries are executed via API-based dataset retrieval and processed using structured filtering and embedding-based similarity search.

GeneVoting is an integration tool that consolidates multiple rounds of gene importance scoring to generate a refined gene panel. It aggregates importance scores from three iterations, normalizes gene names, and ranks genes based on cumulative importance. To ensure comprehensive coverage, it dynamically adjusts rankings to maintain representation across all cell types. The tool filters and selects the top genes while summarizing their associated reasons. The final gene panel is saved in a structured CSV format for downstream analysis, with automatic cleanup of intermediate files.

PreProcess automates the preprocessing of spatial transcriptomics data using Scanpy (11). It filters low-quality cells and genes, normalizes expression values, and applies log transformation to enhance downstream analyses. The tool performs dimensionality reduction via PCA, computes neighborhood graphs, and generates UMAP embeddings to capture spatial relationships. The processed dataset is saved in.h5ad format with standardized gene naming conventions, ensuring compatibility with subsequent computational workflows.

HarmonyTransfer enables the transfer of cell type annotations from CZI reference datasets to spatial transcriptomics data. It preprocesses and integrates single-cell RNA-seq and spatial datasets by identifying shared genes, normalizing expression values, and performing batch correction using Harmony integration. A multi-layer perceptron (MLP) classifier is trained on the reference dataset’s principal components and applied to predict cell types in the spatial dataset. The final transferred labels are saved in structured CSV format for downstream analysis, ensuring a robust and scalable approach to cell type annotation in spatial omics studies.

MainUTAG applies the UTAG computational method (18) to identify spatially coherent tissue niches based on transcriptomic similarity. It clusters spatial transcriptomics data using the Leiden algorithm with adaptive resolution selection, optimizing for niche structure while maintaining biological interpretability. Small clusters are reassigned based on nearest-neighbor similarity to ensure robustness. The tool generates and saves structured outputs, including niche annotations in CSV format and visualizations of spatial niche distributions. These results support downstream analyses by providing high-resolution spatial context for cell-type organization.

CellTypeAnnotation similar as GPTCellType (15), is an automated clustering and annotation tool for spatial transcriptomics data, leveraging gene markers and reference-transferred labels to infer main-level cell types. It clusters cells using the Leiden algorithm and identifies marker genes via Wilcoxon rank-sum testing. The tool calculates cell type compositions within each cluster and employs a language model to assign cell types based on marker profiles and ontology constraints. Annotations are further refined through LLM-driven organization. Results, including annotated cell types and justification reasons, are saved in structured formats. The tool also generates UMAP visualizations for both clustering and final cell-type assignments to facilitate interpretation.

TissueNicheAnnotation is a multimodal annotation tool designed to infer tissue niches by integrating spatial transcriptomics data, cell type labels, and anatomical priors. It accepts preprocessed spatial data in.h5ad format alongside main-level cell type annotations and niche clustering results. The tool first extracts spatial features and visualizes niche clusters, incorporating anatomical reference images to enhance niche understanding. Using a language model, it annotates tissue niches by considering spatial location, cell composition, and enriched gene sets. The tool also harmonizes annotations across multiple samples via an LLM-driven consensus mechanism. Final annotations, including tissue niche labels and supporting justifications, are saved in structured formats for down-stream analysis. Spatial distributions of annotated niches are visualized to facilitate interpretation.

DynamicsInference performs cross-condition dynamics analysis in spatial transcriptomics data, integrating factor, cell-type, and pathway analyses. It extracts factor loadings, maps sample conditions, and conducts pairwise statistical tests to identify significant changes across experimental groups. The tool ranks cell-type contributions to key factors and detects enriched pathways using structured scoring. Results are saved in a structured.pickle format, providing a comprehensive dataset for downstream interpretation and integrative analysis.

CCIContext infers context-specific cell-cell interactions from spatial transcriptomics data using the Tensor-Cell2cell framework. It integrates ligand-receptor analysis, accounts for batch effects, and identifies condition-specific interaction patterns. The tool constructs a four-dimensional interaction tensor across samples, cell types, and signaling pathways, applying tensor decomposition to extract interpretable factors. It further performs pathway enrichment analysis using PROGENy to link ligand-receptor activity with downstream signaling. Results include interaction heatmaps, factor analysis, and pathway scores, saved in structured formats for downstream interpretation.

1.3.3. Aggregation

GeneImportance is an automated pipeline for estimating the importance of marker genes within cell types based on reference datasets. It processes marker gene information from multiple sources, e.g., CZI, PanglaoDB, and CellMarker2, and utilizes an LLM to infer gene importance scores iteratively. The tool first extracts relevant cell-type and gene associations, then queries an LLM to rank genes based on their relevance. The inferred importance scores are reformatted and validated before aggregation into a structured CSV file. The final output provides a ranked list of genes with assigned importance scores, supporting downstream cell-type characterization and feature selection.

ReportGeneration automates the synthesis of spatial transcriptomics analysis results into a structured scientific report. It integrates findings from multiple analyses, including spatial patterns, cell-cell interactions, and condition-specific effects, filtering for statistically significant factors. The tool processes large-scale data using chunking strategies, applies language models for structured summarization, and generates coherent discussions and conclusions. The final report is formatted with background context, dataset descriptions, and analytical insights, ensuring comprehensive documentation for downstream interpretation.

SummarizeConditionTool, SummarizeCellTypeTool, and SummarizeTissueRegionTool facilitate the structured analysis of spatial transcriptomics data by summarizing condition-specific patterns, cell-type distributions, and tissue-region characteristics. The tools leverage an LLM to generate detailed descriptions both per sample and across conditions. They integrate contextual metadata, including tissue type and dataset information, to enhance interpretability. Condition summaries capture key differences across experimental groups, cell-type summaries contextualize distributions across tissue regions, and tissue-region summaries provide insights into spatial organization. The outputs are structured as JSON files for downstream analyses and reporting.

1.3.4. General purpose

Coding powers SpatialAgent to generate and execute custom functions for any analyses. It supports Python and shell scripts with emphasis on data analysis. The tool handles various data input / output formats, maintains execution context, and provides error handling. For spatial transcriptomics, Coding facilitates custom statistical tests, specialized visualizations, and integration with external packages not explicitly covered by other tools. Though it offers great flexibility (ideally all the tools can be generated by the LLM on-the-fly), this tool indeed requires a base LLM with exceptional coding performance (e.g., Claude-3.7-Sonnet far outperforms GPT-4o).

1.4. Plan templates

While SpatialAgent can generate a valid plan for many user queries, which can also be refined by additional user’s feedback, using plan templates for high-frequent tasks can largely improve the stability (i.e. completion rate) and performance. (Note that in other scientific agentic systems, such as Google’s Co-Scientist, most computation / agentic burden is spent on making a concrete research plan). Three plan templates we included for the respective three representative tasks (Supplementary Fig. 1, 2, 3). These are provided to the SpatialAgent during memory initialization stages and can be extended per user’s own need and creativity.

Supplementary Figure 1:

Templated plan for gene panel design.

1.5. Core prompts in the agentic framework

A sophisticated set of core prompts were incorporated in SpatialAgent to orchestrate its autonomous behavior in spatial biology analysis. These prompts form the cognitive architecture of the agent, beginning with a system prompt (Supplementary Fig.4) that establishes the agent’s identity as a computational biology analyst and defines its operational parameters with the zero-shot ReAct setting (37). The semantic memory initialization prompt (Supplementary Fig.5) structures the agent’s knowledge representation, incorporating task templates and available tools. The planning framework (Supplementary Fig.7) implements a structured thought process with five key components: task status assessment, plan review, action determination, action specification, and success criteria evaluation. This deliberative cycle enables the agent to monitor progress, adapt strategies, and determine appropriate next steps. Finally, the action execution prompt (Supplementary Fig.6) provides a streamlined template for tool calling, ensuring that each operation is precisely formatted with clear input parameters and expected outputs. Together, these prompts create a cohesive cognitive system that guides the agent through complex spatial biology workflows while maintaining contextual awareness and methodical execution.

Supplementary Figure 2:

Templated plan for cell type and tissue niche annotation.

Supplementary Figure 3:

Templated plan for inferring cell-cell communications.

Supplementary Figure 4:

System prompt on launching the agent.

Supplementary Figure 5:

Prompt on initializing semantic memory.

Supplementary Figure 6:

Prompt on making actions (tool calling).

Supplementary Figure 7:

Prompt on proposing and updating plan.

1.6. Extendability

SpatialAgent was designed to be extendable, so it can evolve through three primary mechanisms (Supplementary Fig.8). First, SpatialAgent’s memory continuously updates through human messages, enabling iterative refinement based on user feedback and new information. Second, SpatialAgent’s planning capabilities can be expanded by adding new plan templates that encode domain-specific knowledge and problem-solving approaches without modifying the underlying architecture. Third, SpatialAgent’s action abilities can be enhanced by incorporating new tools that connect to external services, computational resources, or specialized functions through a consistent interface.

Supplementary Figure 8:

Extendability of SpatialAgent.

This modular design creates a flexible cycle where users can extend the agent’s functionality through natural interaction rather than complex reprogramming. For example, researchers can introduce specialized templates for more specific or sophisticated analysis, add tools for spatial computation or visualization, and guide the traces through conversational feedback. The result is a system that adapts to increasingly complex spatial reasoning challenges while maintaining operational coherence across diverse applications.

1.7. Generalization

While all results in main-text use autonomous mode with predefined tools and templates, we believe a flexible decision-making architecture is beneficial for robust generalization. It first distinguishes between simple queries and complex tasks. For non-task queries, it updates memory and replies directly without planning. Task-oriented requests activate full capabilities: after memory updates, the agent checks for suitable plan templates. If available, it follows predefined strategies; otherwise, it dynamically composes tools (mostly coding) to form custom plans.

A continuous memory–action–planning loop enables in-execution refinement based on intermediate results. This adaptive loop supports generalization across tasks and incorporates human feedback, often outperforming purely autonomous runs. To test generalization, we next present two unseen tasks requiring novel plans and coding. SpatialAgent succeeds at cell type annotation harmonization (Supplementary Fig.10, 11, 12) but fails at spatial gene regulatory analysis—a task that remains difficult even for Claude-Sonnet-3.7 and GPT-O1 (Supplementary Fig.10, 14).

Supplementary Figure 9:

Generalizability.

1.7.1. Annotation harmonization

Supplementary Figure 10:

Task overview for cell type annotation harmonization. The agent is given two human lung scRNA-seq datasets with distinct label sets and asked to harmonize the annotations.

Supplementary Figure 11:

Initial user query and agent response. The agent correctly identifies the task and starts by inspecting the dataset metadata. ₃₄

Supplementary Figure 12:

Final result: a successful annotation mapping. After a few iterations using the coding tool, the agent produces a reasonable mapping across datasets.

1.7.2. Gene Regulatory Network Inference with the Spatial Context

Supplementary Figure 13:

Task overview for spatial GRN inference. The agent is asked to infer gene regulatory relationships within spatial transcriptomics data.

Supplementary Figure 14:

Initial query and failed agent response. The agent attempts to reason

1.7.3. Summary

These two tasks demonstrate both the strengths and current limitations of SpatialAgent’s generalization. In the annotation harmonization task, the agent adapts effectively through iterative planning and code generation, leveraging its flexible architecture. In contrast, the spatial GRN task reveals difficulties in multi-step reasoning and specific tool orchestration - challenges that remain unsolved even for the most advanced LLMs.

Addressing these limitations will require stronger coding capabilities, improved reasoning for dynamic plan construction, integration of domain-specific tools (e.g., PySCENIC+), and clearer user prompts that reflect first-hand know-hows. These improvements are critical for boosting the agent’s completion rates and performance on complex, open-ended scientific tasks.

1.8. Interaction design of SpatialAgent

SpatialAgent enables dynamic interaction between users and the agentic system (Supplementary Fig. 15, 16). A user query triggers the agent to update its memory and context. It then revises its execution plan, refining templates or generating new ones, and invokes external tools or databases as needed. Tool outputs are integrated with the agent’s knowledge to revise its context. The agent then responds with insights, visualizations, or recommendations based on the analysis.

This cyclical design enables continuous refinement through interaction, as each human query triggers a complete processing cycle that builds upon previous exchanges. The structured workflow maintains consistency while allowing for adaptive responses to diverse inquiries, creating an intuitive collaborative experience between researchers and the automated assistant.

Supplementary Figure 15:

Design on the interaction mode of SpatialAgent.

Supplementary Figure 16:

User interface of SpatialAgent, developed using Gradio package.

2. Gene panel design

2.1. Computational baselines

Several computational baseline methods were used. Each relies on scRNA-seq data as a reference from which to select a set of genes predicted to be most informative in spatial contexts.

HVG(Seurat) identifies genes with high expression variance across cells, typically after normalizing for the relationship between mean expression and variance. This approach prioritizes genes with biological heterogeneity over technical noise.

GeneBasis greedily selects genes that maximize mutual information with principal components of the full expression matrix. It iteratively builds a gene panel that preserves the overall structure of the dataset while minimizing redundancy among selected genes.

Persist uses a persistence-based approach to identify genes that contribute significantly to topological features in the data. It quantifies how each gene affects persistent homology metrics, selecting genes that maintain important structural information across different scales.

Spapros applies a sparse regression framework to identify a minimal set of genes that can reconstruct the full expression matrix with high fidelity. It leverages L1-regularization to enforce sparsity while optimizing for genes that capture maximum information about cell states and transitions.

2.2. Human scientists evaluation

2.2.1. Study design

Human expert performance in panel design was systematically evaluated using a two-stage approach. The human experts included both computational and experimental scientists, who were either pursuing a PhD or held an MD/PhD in a related field (e.g., cell biology, biostatistics, computer science) and had published work.

Stage 1: Domain experts selected 50, 100, or 150 genes for spatial transcriptomics experiments focusing on the DLPFC. They were provided with the same scRNA-seq dataset used by baseline methods and SpatialAgent, along with exploratory data analysis (EDA) notebooks. Experts documented their selections with gene symbols, rankings, and detailed reasoning. They also provided their research backgrounds and were allowed to consult literature (excluding original dataset papers) and online resources, except for AI tools like chatGPT. We then evaluated these panels using the same set of metrics as we benchmarked the computational baselines and SpatialAgent.

Stage 2: We implemented a comparative evaluation framework where experts assessed outputs from various sources (SpatialAgent, GPT-4o, human experts, and SpatialAgent + human combinations) using a structured 5-point rubric. The rubric evaluated accuracy, reasoning quality, completeness, and conciseness of gene selections. This approach enabled a quantitative comparison of human and AI performance while exploring the potential of human-AI collaboration in gene panel design for spatial transcriptomics.

View this table:

Supplementary Table 1: Comparison of panel design workflows for spatial transcriptomics

All materials, including instructions, EDA notebooks, and anonymized human scientists’ solutions, are provided in the supplementary files.

2.2.2. Overview of human solutions

Researchers employed several distinct strategies when selecting genes for spatial transcriptomics panels (Supplementary Table. 1). Most researchers relied on either the provided reference dataset or literature sources, with only a few integrating multiple data sources. Their selection methods typically fell into three categories: (1) algorithmic approaches (Persist algorithm, iterative greedy methods, or highly variable gene selection), (2) differential expression analysis to identify cell type-specific markers, or (3) knowledge-driven selection of known markers from literature. The more comprehensive approaches, such as those from P and V, combined multiple strategies by cross-referencing markers from different sources and including genes related to specific biological processes or pathways of interest. Some researchers focused exclusively on cell type identification, while others extended their panels to capture pathway activities or disease-relevant genes. We provide anonymized human expert designs in the supplement to support future refinement and improvement by other practitioners.

2.3. Pairwise comparisons on cell type prediction

We also conducted a pairwise statistical comparison of cell-type prediction performance across different methods on the DLPFC dataset, using gene panels of 50 and 100 genes. Each entry in the tables below reports the t-statistic and corresponding p-value (in parentheses) from independent two-sample t-tests. A positive t-statistic indicates that the method in the row outperforms the method in the column. Statistically significant differences (p < 0.05) are highlighted in bold. As panel size increases, performance gaps between SpatialAgent and other baselines become less pronounced.

Supplementary Figure 17:

Accuracy in predicting cell type.

Supplementary Figure 18:

F1 score in predicting cell type.

View this table:

Supplementary Table 2: Pairwise statistical comparison of cell-type prediction using 50-gene panels. Values represent test statistics from independent t-tests, with p-values in parentheses. Statistically significant differences are bolded (* p < 0.05), same notations below.

2.3.1. More metrics on location restoration

Essentially most of other metrics (clustering-based) as well as the spatially expressed variations are highly correlated with the cell type and location prediction performance, the reader can easily calculate on their ends as we release the data.

View this table:

Supplementary Table 3: Pairwise statistical comparison of cell-type prediction using 100-gene panels.

Supplementary Figure 19:

Correlation in predicting x coordinates.

Supplementary Figure 20:

Correlation in predicting y coordinates.

2.3.2. Performance via human expert rating

Human scientists were asked to rate anonymized gene panels that included a mixture of those designed by other scientists, the SpatialAgent and hybrid designs of SpatialAgent incorporating human-panels. Participants were asked to rate each panel across three categories: accuracy (can the given panel accurately map the DLPFC?), completeness (is the designed panel proposed complete?) and reasoning (is the reasoning given for the gene panel selection logical and insightful?). Scientist scored panels from 1 (Poor) to 5 (Excellent). Mean score for each panel are summarized in Supplementary Fig.21 below.

Supplementary Figure 21:

Human evaluation of panel design

3. Cell type and tissue niche annotation

3.1. Overview

Cell type and tissue niche annotation is a crucial step in spatial transcriptomics, enabling the biological interpretation of complex spatial gene expression patterns. Unlike scRNA-seq, spatial transcriptomics preserves tissue context, allowing researchers to identify both cell types and their spatial organization within micro-environments. This integration is essential for understanding cellular interactions, developmental processes, and disease mechanisms in their native spatial contexts.

The annotation process typically follows a hierarchical structure, from broad cell type categories (Tier 1) to finer subtypes (Tiers 2–3), alongside spatial domain identification. Annotations are derived from marker gene expression, spatial coordinates, and morphological features. The integration of these multimodal data presents computational and biological challenges, especially in developing or pathological tissues where cell states exist along a continuum rather than discrete categories.

3.2. Baselines

We compare SpatialAgent with two automated cell type annotation methods:

CellTypist is a supervised machine learning approach using multinomial logistic regression trained on curated reference datasets. It supports hierarchical classification with probabilistic outputs but is limited by the quality and scope of its training data.

GPTCellType is LLM-based method that interprets gene expression profiles and generates cell type annotations with explanatory rationales. It integrates diverse information sources and adapts to novel cell types but may hallucinate annotations when faced with ambiguous expression patterns. To enable side-by-side comparison, we run the GPTCellType directly on the leiden cluster calculated from SpatialAgent.

To ensure fair comparisons, we harmonize all annotations, including those from computational methods, human scientists, and SpatialAgent, to a common tier-1 reference set. We use the original author-provided annotations as ground truth, acknowledging the inherent bias in this setting.

3.3. Per cluster annotation between GPTCellType and SpatialAgent

We next compare SpatialAgent with GPTCellType on the same set of leiden clustering, noting that the authors did not use the same set cluster for annotation, but we notice that SpatialAgent can provide a descent annotation in Tier-1 annotations.

3.4. Removing small clusters in UTAG

Building on the original UTAG algorithm, we employ a nearest-neighbor approach to reassign cells from small clusters to larger, more established ones. Specifically, we identify clusters with fewer than a specified threshold (default: 100 cells) for removal. For each batch, we reassign cells in small clusters by finding their five nearest spatial neighbors among reference cells (i.e., those in larger clusters). The most common cluster label among these neighbors is then assigned to the cell, ensuring spatial coherence by favoring proximity-based reassignments over arbitrary ones.

Supplementary Figure 22:

Composition of author annotations across Leiden clusters used by SpatialAgent and GPTCellType.

3.5. Analysis of human expert annotations

Researchers employed consistent approaches for cell and niche annotation in spatial transcriptomics data (Supplementary Table. 4). For cell annotation, most relied on computational clustering (primarily Leiden) followed by either manual annotation with marker genes or automated label transfer methods like CellTypist or TACCO. Many provided hierarchical (multi-tier) annotations to capture both major cell types and subtypes. Only one researcher (TL) used a unique co-expression pattern approach instead of clustering.

For niche annotation, UTAG spatial clustering was the predominant method, showing remarkable consistency across researchers. The differences emerged primarily in how researchers assigned biological meaning to these spatial clusters - most used a combination of spatial positioning and reference to anatomical images, with some incorporating prior anatomical knowledge (particularly of heart structures). Overall, researchers balanced computational methods with biological knowledge, using spatial context to refine their annotations.

We provide the original human annotations in Supplementary Fig. 27-32. For cell annotations, we visualize them within the same UMAP clustering structure, where each scientist’s annotations represent their individual interpretations. For tissue niche annotations, we map them onto the spatial coordinates of each MERFISH spot.

3.6. Confusion matrix

We plot the confusion matrices for cell type annotations (Supplementary Fig.33–41) and tissue niche annotations (Supplementary Fig.42–50).

3.7. Summary

Accurate cell type and tissue niche annotation is key to interpreting spatial transcriptomics. With detailed comparisons, SpatialAgent outperforms existing methods in Tier-1 cell type annotation and aligns well with authors. We also improved UTAG clustering by removing small clusters to boost spatial coherence in niche annotation.

View this table:

Supplementary Table 4: Comparison of cell and niche annotation methodologies

Supplementary Figure 23:

Annotation of GPTCellType and SpatialAgent on Leiden clusters (1/4)

Supplementary Figure 24:

Annotation of GPTCellType and SpatialAgent on Leiden clusters (2/4)

Supplementary Figure 25:

Annotation of GPTCellType and SpatialAgent on Leiden clusters (3/4)

Supplementary Figure 26:

Annotation of GPTCellType and SpatialAgent on Leiden clusters (4/4)

Supplementary Figure 27:

Original Tier 1 cell type annotations.

Supplementary Figure 28:

Original Tier 2 cell type annotations.

Supplementary Figure 29:

Original Tier 1 tissue niche annotations (1/2).

Supplementary Figure 30:

Original Tier 1 tissue niche annotations (2/2).

Supplementary Figure 31:

Original Tier 2 tissue niche annotations (1/2).

Supplementary Figure 32:

Original Tier 2 tissue niche annotations (2/2).

Supplementary Figure 33:

Confusion matrix of SpatialAgent cell type annotation.

Supplementary Figure 34:

Confusion matrix of GPTCellType cell type annotation.

Supplementary Figure 35:

Confusion matrix of scientist HC (also CellTypist) cell type annotation.

Supplementary Figure 36:

Confusion matrix of human scientist L cell type annotation.

Supplementary Figure 37:

Confusion matrix of human scientist Lh cell type annotation.

Supplementary Figure 38:

Confusion matrix of human scientist Lu cell type annotation.

Supplementary Figure 39:

Confusion matrix of human scientist MH cell type annotation.

Supplementary Figure 40:

Confusion matrix of human scientist P cell type annotation.

Supplementary Figure 41:

Confusion matrix of human scientist TL tissue niche annotation.

Supplementary Figure 42:

Confusion matrix of SpatialAgent tissue niche annotation.

Supplementary Figure 43:

Confusion matrix of human scientist HC tissue niche annotation.

Supplementary Figure 44:

Confusion matrix of human scientist L tissue niche annotation.

Supplementary Figure 45:

Confusion matrix of human scientist Lh tissue niche annotation.

Supplementary Figure 46:

Confusion matrix of human scientist La tissue niche annotation.

Supplementary Figure 47:

Confusion matrix of human scientist Lu tissue niche annotation.

Supplementary Figure 48:

Confusion matrix of human scientist L tissue niche annotation.

Supplementary Figure 49:

Confusion matrix of human scientist P tissue niche annotation.

Supplementary Figure 50:

Confusion matrix of human scientist TL tissue niche annotation.

4. Mining cell-cell interaction and report generation

4.1. Overview

Advances in spatial transcriptomics (e.g., MERFISH) enable high-resolution mapping of cellular identity, location, and molecular output within tissues, providing critical insights into disease progression and tissue remodeling (Supplementary Fig.51). By capturing spatial and molecular changes across disease onset, peak inflammation, and recovery, these techniques facilitate a deeper understanding of inflammatory conditions such as colitis.

Supplementary Figure 51:

Overview of spatial transcriptomics data used for cell–cell interaction analysis. The dataset captures spatial and molecular profiles across disease stages—onset, peak inflammation, and recovery—enabling investigation of dynamic tissue remodeling and intercellular communication.

Comparative analysis plays a central role in extracting biological insights. Systematic comparisons between healthy and diseased states, or across disease stages, reveal key shifts in gene expression, cell populations, and spatial organization. These analyses highlight tissue remodeling events, such as immune infiltration or the emergence of inflammation-associated fibroblasts (IAFs), pinpointing mechanisms of damage and potential therapeutic windows.

Equally important is understanding cell-cell interaction (CCI), which governs tissue responses. In inflamed environments, fibroblasts and immune cells exchange molecular signals (e.g., cytokines, chemokines) that orchestrate inflammation and repair. Mapping these interactions uncovers regulatory networks that maintain or disrupt tissue balance, explaining immune escalation, tissue remodeling, and healing processes.

SpatialAgent leverages multimodal spatial transcriptomic datasets to systematically extract CCI insights. It integrates ligand-receptor analysis, neighborhood-based correlations, and factor-driven networks through a multi-step pipeline: preprocessing, interaction inference using LIANA+ (aggregating multiple ligand-receptor tools), and factor analysis to determine key drivers of intercellular communication. The findings are compiled into structured reports highlighting biological patterns, key signaling pathways, and potential therapeutic targets.

4.2. Summarization

This section presents a comparative summary of CCI across different conditions, focusing on interaction network shifts, cell type distributions, and tissue structures.

4.2.1. Condition specific analysis

SpatialAgent examines CCI changes across disease stages, such as increased pro-inflammatory signaling (e.g., TNF, IL-6) during peak inflammation and fibroblast-driven repair signals in recovery (Supplementary Fig. 52).

Supplementary Figure 52:

Summary of sample information.

4.2.2. Cell type specific analysis

Comparative analysis identifies dynamically interacting cell populations. For example, inflammatory fibroblasts in colitis show increased signaling with T cells and monocytes via TGF-β and IL-11, while epithelial-stromal interactions remain stable, underscoring immune-driven inflammation (Supplementary Fig. 53–59).

4.2.3. Tissue region specific analysis

SpatialAgent assesses spatial variation in CCI across tissue regions. In colitis, immune infiltration and fibroblast expansion localize to specific regions, refining insights into tissue heterogeneity and microenvironment-specific signaling. Summaries for representative tissue regions are shown in Supplementary Fig. 60–63.

4.3. Factor analysis in CCI

To interpret complex intercellular signaling patterns, SpatialAgent performs factor analysis to uncover underlying regulatory programs and communication modules. This approach identifies key drivers of cell–cell interactions by combining tensor decomposition, pathway enrichment, and regulatory network inference.

LIANA and Cell2Cell integration

SpatialAgent integrates LIANA for robust ligand-receptor interaction quantification and employs Cell2Cell to identify global communication patterns through tensor-based decomposition.

Supplementary Figure 53:

Summary of IAF2 cells within DSS3 samples.

Pathway and network analysis

SpatialAgent contextualizes communication modules within biological pathways using PROGENy, linking interactions to transcriptional responses. This enables:

Pathway enrichment analysis (e.g., TGF-β in fibroblast activation, IFN-γ in immune response).
Cross-species gene mapping for translational insights.
Regulatory network inference, predicting transcription factors (e.g., NF-κB, SMAD3) mediating key interactions.

4.4. Generated report

SpatialAgent produces structured reports from spatial transcriptomic analyses by integrating statistical significance filtering (p<0.01) with contextual scientific narratives. Reports summarize spatial pattern information, cellular interactions, and condition-specific effects, transforming complex computational results into accessible scientific insights (Supplementary Fig. 64–71).

Supplementary Figure 54:

Summary of Macrophage cells within DSS3 samples.

Supplementary Figure 55:

Summary of Treg cells within DSS21 samples.

Supplementary Figure 56:

Summary of IAF2 cells across all conditions.

Supplementary Figure 57:

Summary of Macrophage cells across all conditions.

Supplementary Figure 58:

Summary of IAE1 cells across all conditions.

Supplementary Figure 59:

Summary of Colonocyte cells across all conditions.

Supplementary Figure 60:

Summary of MU10 tissue region across all conditions.

Supplementary Figure 61:

Summary of LUM tissue region across all conditions.

Supplementary Figure 62:

Summary of MU1 tissue region across all conditions (1/2).

Supplementary Figure 63:

Summary of MU1 tissue region across all conditions (2/2).

Supplementary Figure 64:

SpatialAgent generated report (1/8)

Supplementary Figure 65:

SpatialAgent generated report (2/8)

Supplementary Figure 66:

SpatialAgent generated report (3/8)

Supplementary Figure 67:

SpatialAgent generated report (4/8)

Supplementary Figure 68:

SpatialAgent generated report (5/8)

Supplementary Figure 69:

SpatialAgent generated report (6/8)

Supplementary Figure 70:

SpatialAgent generated report (7/8)

Supplementary Figure 71:

SpatialAgent generated report (8/8)

5. Designing additional customized genes upon Xenium 5k panel

5.1. Overview

The standard Xenium 5k panel provides broad coverage but may lack key genes necessary for specialized biological contexts. Here, we designed a customized gene set to enhance spatial transcriptomic resolution in prostate cancer mouse models under different treatments. By selecting 100 additional genes using SpatialAgent, we improved the characterization of stromal, immune, and epithelial compartments: critical for understanding tumor progression and immune interactions. This customization refined cell-type resolution, enhanced clustering quality, and uncovered key ligand-receptor interactions within the tumor microenvironment.

5.2. Customized gene panel improves cell-type resolution and clustering

To assess the impact of the additional genes, we compared clustering performance across different gene selection strategies on the reference scRNA-seq data. Although improvements are subtle in visualizations (Supplementary Fig. 72-75), quantitative metrics indicate enhanced clustering resolution. Specifically, incorporating 100 genes selected via SpatialAgent led to improved clustering scores compared to both the Xenium 5k panel alone and a random selection of 100 genes. The random selection did not contribute meaningfully to the resolution, slightly degrading performance compared to the Xenium 5k panel alone.

5.3. Customized gene panel enhances the capture of cell-cell interactions

Beyond improving clustering, adding 100 genes via SpatialAgent enhanced detection of ligand–receptor interactions in the tumor microenvironment, based on a scRNA-seq reference dataset. Expanded gene coverage revealed richer cell–cell communication, especially in stromal and immune compartments, offering deeper insight into tumor–immune interactions and treatment responses. We compared interaction detection across gene selection strategies and found that SpatialAgent -selected genes yielded more significant ligand–receptor pairs than random panels, highlighting its value.

In the Xenium-only analysis (Supplementary Fig. 76), the strongest signals centered on immune-modulatory ligands such as Jag1–Notch2 and Vcan–Tlr2, which dominate fibroblast–immune crosstalk. In contrast, incorporating SpatialAgent (Supplementary Fig. 77) shifted the focus toward laminin - integrin pairs (e.g., Lamb1–Itga3_Itgb1), highlighting enhanced extracellular matrix remodeling and cell-adhesion pathways.

Distinct interaction patterns also emerged across immune subpopulations. Larger or more intensely colored circles for certain ligand-receptor pairs (e.g., Igf1–Insr in Tregs vs. myeloid DCs) suggest diverse regulatory circuits at play. Notably, fibroblasts engaged in multiple robust interactions, reinforcing their role in shaping immune infiltration and the local microenvironment.

Though basal epithelial cells had fewer strong signals, they mediated key integrin-based interactions potentially relevant for tumor maintenance and immune evasion. Overall, integrating Xenium with spatially informed gene panels sharpens our view of tumor–immune dynamics, emphasizing both immune-modulatory and ECM-related signaling.

5.4. Summary

Our results demonstrate that a customized gene panel can significantly improve spatial transcriptomic resolution, refining clustering and cell-cell interaction analyses. We will update the results as

Supplementary Figure 72:

UMAPs using Xenium 5k Pan Tissue panel wet-lab data finishes the collection progresses.

Supplementary Figure 73:

UMAPs using Xenium 5k panel and 100 random selected genes

Supplementary Figure 74:

UMAPs using Xenium 5k panel and 100 genes from SpatialAgent

Supplementary Figure 75:

UMAPs using full 23k genes

Supplementary Figure 76:

Ligand receptor analysis using Xenium

Supplementary Figure 77:

Ligand receptor analysis using Xenium + SpatialAgent

6. Tools

We attached the description (doc strings) of each tool (Supplementary Fig.78-92) and prompts (Supplementary Fig.93-108.)

Supplementary Figure 78:

Tool description (1/15)

Supplementary Figure 79:

Tool description (2/15)

Supplementary Figure 80:

Tool description (3/15)

Supplementary Figure 81:

Tool description (4/15)

Supplementary Figure 82:

Tool description (5/15)

Supplementary Figure 83:

Tool description (6/15)

Supplementary Figure 84:

Tool description (7/15)

Supplementary Figure 85:

Tool description (8/15)

Supplementary Figure 86:

Tool description (9/15)

Supplementary Figure 87:

Tool description (10/15)

Supplementary Figure 88:

Tool description (11/15)

Supplementary Figure 89:

Tool description (12/15)

Supplementary Figure 90:

Tool description (13/15)

Supplementary Figure 91:

Tool description (14/15)

Supplementary Figure 92:

Tool description (15/15)

Supplementary Figure 93:

Tool prompt (1/16)

Supplementary Figure 94:

Tool prompt (2/16)

Supplementary Figure 95:

Tool prompt (3/16)

Supplementary Figure 96:

Tool prompt (4/16)

Supplementary Figure 97:

Tool prompt (5/16)

Supplementary Figure 98:

Tool prompt (6/16)

Supplementary Figure 99:

Tool prompt (7/16)

Supplementary Figure 100:

Tool prompt (8/16)

Supplementary Figure 101:

Tool prompt (9/16)

Supplementary Figure 102:

Tool prompt (10/16)

Supplementary Figure 103:

Tool prompt (11/16)

Supplementary Figure 104:

Tool prompt (12/16)

Supplementary Figure 105:

Tool prompt (13/16)

Supplementary Figure 106:

Tool prompt (14/16)

Supplementary Figure 107:

Tool prompt (15/16)

Supplementary Figure 108:

Tool prompt (16/16)

Acknowledgments

We thank Leslie Gaffney, Yanay Rosen, Xiao Wang, Chuan Xu, Peng He, Russel Littman, Guohao Li, members of Hongyu Zhao’s and Mengdi Wang’s groups, as well as many colleagues at Genentech and Stanford University, for their invaluable insights. We are also grateful to members of the Leskovec and Regev labs, along with colleagues and friends from various places around the world, for their constructive and insightful discussions. J.L. was supported by NSF under Nos. OAC-1835598 (CINES), CCF-1918940 (Expeditions), DMS-2327709 (IHBEM); Stanford Data Applications Initiative, Wu Tsai Neurosciences Institute, Stanford Institute for Human-Centered AI, Chan Zuckerberg Initiative, Amazon, Genentech, GSK, Hitachi, SAP, and UCB.

Footnotes

↵+ Work done during internship at Genentech

References

[1].↵
Hanchen Wang, Tianfan Fu, Yuanqi Du, et al. Scientific discovery in the age of artificial intelligence. Nature, 620(7972):47–60, 2023.
OpenUrl CrossRef PubMed
[2].↵
Tom Brown et al. Language models are few-shot learners. In Advances in Neural Information Processing Systems, 2020.
[3].↵
Shanghua Gao, Ada Fang, Yepeng Huang, et al. Empowering biomedical discovery with ai agents. Cell, 187(22):6125–6151, 2024.
OpenUrl CrossRef PubMed
[4].↵
Juraj Gottweis, Wei-Hung Weng, Alexander Daryin, et al. Towards an ai co-scientist. arXiv:2502.18864, 2025.
[5].↵
Quintina Campbell, Sam Cox, Jorge Medina, et al. Mdcrow: Automating molecular dynamics workflows with large language models. arXiv:2502.09565, 2025.
[6].↵
Gao Shanghao, et al. Txagent: An ai agent for therapeutic reasoning across a universe of tools. arXiv:2503.10970, 2025.
[7].↵
Giovanni Palla, David Fischer, Aviv Regev, et al. Spatial components of molecular tissue biology. Nature Biotechnology, 40(3):308–318, 2022.
OpenUrl CrossRef PubMed
[8].↵
Botao Yu, Frazier N Baker, Ziru Chen, et al. Chemtoolagent: The impact of tools on language agents for chemistry problem solving. In NAACL Findings, 2025.
[9].↵
F. Pedregosa, G. Varoquaux, A. Gramfort, et al. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
OpenUrl
[10].↵
Adam Paszke, Sam Gross, Francisco Massa, et al. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, 2019.
[11].↵
Alexander Wolf, Philipp Angerer, and Fabian Theis. Scanpy: large-scale single-cell gene expression data analysis. Genome biology, 19:1–5, 2018.
OpenUrl CrossRef PubMed
[12].↵
Kristen R Maynard, Leonardo Collado-Torres, Lukas M Weber, et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nature neuroscience, 24(3):425– 436, 2021.
OpenUrl CrossRef PubMed
[13].↵
Louis Kuemmerle, Malte Luecken, Alexandra Firsova, et al. Probe set selection for targeted spatial transcriptomics. Nature methods, pages 1–11, 2024.
[14].↵
Elie N Farah, Robert K Hu, Colin Kern, et al. Spatially organized cellular communities form the developing human heart. Nature, 627(8005):854–864, 2024.
OpenUrl CrossRef PubMed
[15].↵
Wenpin Hou and Zhicheng Ji. Assessing gpt-4 for cell type annotation in single-cell rna-seq analysis. Nature Methods, pages 1–4, 2024.
[16].↵
Chuan Xu, Martin Prete, Simone Webb, et al. Automatic cell-type harmonization and integration across human cell atlas datasets. Cell, 186(26):5876–5891, 2023.
OpenUrl CrossRef PubMed
[17].↵
C Domínguez Conde, Chao Xu, Louie B Jarvis, et al. Cross-tissue immune cell analysis reveals tissue-specific features in humans. Science, 376(6594):eabl5197, 2022.
OpenUrl CrossRef PubMed
[18].↵
Junbum Kim, Samir Rustam, Juan Miguel Mosquera, et al. Unsupervised discovery of tissue architecture in multiplexed imaging. Nature methods, 19(12):1653–1661, 2022.
OpenUrl CrossRef PubMed
[19].↵
Zane Durante, Qiuyuan Huang, Naoki Wake, et al. Agent ai: Surveying the horizons of multi-modal interaction. arXiv:2401.03568, 2024.
[20].↵
Paolo Cadinu, Kisha N Sivanathan, Aditya Misra, et al. Charting the cellular biogeography in colitis reveals fibroblast trajectories and coordinated spatial remodeling. Cell, 187(8):2010– 2028, 2024.
OpenUrl CrossRef PubMed
[21].↵
Daniel Dimitrov, Philipp Sven Lars Schäfer, Elias Farr, et al. Liana+ provides an all-in-one framework for cell–cell communication inference. Nature Cell Biology, 26(9):1613–1622, 2024.
OpenUrl PubMed
[22].↵
Suoqin Jin, Christian F Guerrero-Juarez, Lihua Zhang, et al. Inference and analysis of cell-cell communication using cellchat. Nature communications, 12(1):1088, 2021.
OpenUrl
[23].↵
Mirjana Efremova, Miquel Vento-Tormo, Sarah A Teichmann, et al. Cellphonedb: inferring cell– cell communication from combined expression of multi-subunit ligand–receptor complexes. Nature protocols, 15(4):1484–1506, 2020.
OpenUrl CrossRef PubMed
[24].↵
Christopher S Smillie, Moshe Biton, Jose Ordovas-Montanes, et al. Intra-and inter-cellular rewiring of the human colon during ulcerative colitis. Cell, 178(3):714–730, 2019.
OpenUrl CrossRef PubMed
[25].↵
Benjamin Ng, Stuart A Cook, and Sebastian Schafer. Interleukin-11 signaling underlies fibrosis, parenchymal dysfunction, and chronic inflammation of the airway. Experimental & Molecular Medicine, 52(12):1871–1878, 2020.
OpenUrl CrossRef PubMed
[26].↵
Srinivas Allanki, Boris Strilic, Lilly Scheinberger, et al. Interleukin-11 signaling promotes cellular reprogramming and limits fibrotic scarring during tissue regeneration. Science advances, 7(37):eabg6497, 2021.
OpenUrl CrossRef PubMed
[27].↵
Takashi Nishina, Yutaka Deguchi, Daisuke Ohshima, et al. Interleukin-11-expressing fibroblasts have a unique gene signature correlated with poor prognosis of colorectal cancer. Nature communications, 12(1):2281, 2021.
OpenUrl CrossRef PubMed
[28].↵
Diya B Joseph, Gervaise H Henry, Alicia Malewska, et al. Single-cell analysis of mouse and human prostate reveals novel fibroblasts with specialized distribution and microenvironment interactions. The Journal of pathology, 255(2):141–154, 2021.
OpenUrl CrossRef PubMed
[29].↵
Mert Cemri, Melissa Z Pan, Shuyi Yang, Lakshya A Agrawal, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Dan Klein, Kannan Ramchandran, et al. Why do multiagent llm systems fail? arXiv:2503.13657, 2025.
[30].↵
Tim Stuart, Andrew Butler, Paul Hoffman, et al. Comprehensive integration of single-cell data. cell, 177(7):1888–1902, 2019.
OpenUrl CrossRef PubMed
[31].↵
Alsu Missarova, Jaison Jain, Andrew Butler, et al. genebasis: an iterative approach for unsupervised selection of targeted gene panels from scrna-seq. Genome biology, 22:1–22, 2021.
OpenUrl CrossRef PubMed
[32].↵
Ian Covert, Rohan Gala, Tim Wang, et al. Predictive and robust gene selection for spatial transcriptomics. Nature Communications, 14(1):2091, 2023.
OpenUrl CrossRef PubMed
[33].↵
CZI Single-Cell Biology Program, Shibla Abdulla, Brian Aevermann, et al. Cz cell× gene discover: A single-cell data platform for scalable exploration, analysis and modeling of aggregated data. BioRxiv, pages 2023–10, 2023.
[34].↵
Oscar Franzén, Li-Ming Gan, and Johan Björkegren. Panglaodb: a web server for exploration of mouse and human single-cell rna sequencing data. Database, 2019:baz046, 2019.
OpenUrl CrossRef PubMed
[35].↵
Congxue Hu, Tengyue Li, Yingqi Xu, et al. Cellmarker 2.0: an updated database of manually curated cell markers in human/mouse and web tools based on scrna-seq data. Nucleic acids research, 51(D1):D870–D876, 2023.
OpenUrl CrossRef PubMed
[36].↵
Michael Schubert, Bertram Klinger, Martina Klünemann, et al. Perturbation-response genes reveal signaling footprints in cancer gene expression. Nature communications, 9(1):20, 2018.
OpenUrl CrossRef PubMed
[37].↵
Shunyu Yao, Jeffrey Zhao, Dian Yu, et al. React: Synergizing reasoning and acting in language models. In International Conference on Learning Representations, 2023.

View the discussion thread.

Posted April 06, 2025.

Download PDF

Citation Tools

Subject Area

Bioinformatics

Subject Areas

All Articles

Animal Behavior and Cognition (7195)
Biochemistry (16561)
Bioengineering (12859)
Bioinformatics (39268)
Biophysics (20219)
Cancer Biology (17416)
Cell Biology (24013)
Clinical Trials (138)
Developmental Biology (12732)
Ecology (18841)
Epidemiology (2067)
Evolutionary Biology (23225)
Genetics (15018)
Genomics (21422)
Immunology (16641)
Microbiology (38091)
Molecular Biology (16152)
Neuroscience (83568)
Paleontology (631)
Pathology (2671)
Pharmacology and Toxicology (4525)
Physiology (7186)
Plant Biology (14291)
Scientific Communication and Education (1978)
Synthetic Biology (4046)
Systems Biology (9383)
Zoology (2172)

[1] [1].↵
Hanchen Wang, Tianfan Fu, Yuanqi Du, et al. Scientific discovery in the age of artificial intelligence. Nature, 620(7972):47–60, 2023.
OpenUrl CrossRef PubMed

[2] [2].↵
Tom Brown et al. Language models are few-shot learners. In Advances in Neural Information Processing Systems, 2020.

[3] [3].↵
Shanghua Gao, Ada Fang, Yepeng Huang, et al. Empowering biomedical discovery with ai agents. Cell, 187(22):6125–6151, 2024.
OpenUrl CrossRef PubMed

[4] [4].↵
Juraj Gottweis, Wei-Hung Weng, Alexander Daryin, et al. Towards an ai co-scientist. arXiv:2502.18864, 2025.

[5] [5].↵
Quintina Campbell, Sam Cox, Jorge Medina, et al. Mdcrow: Automating molecular dynamics workflows with large language models. arXiv:2502.09565, 2025.

[6] [6].↵
Gao Shanghao, et al. Txagent: An ai agent for therapeutic reasoning across a universe of tools. arXiv:2503.10970, 2025.

[7] [7].↵
Giovanni Palla, David Fischer, Aviv Regev, et al. Spatial components of molecular tissue biology. Nature Biotechnology, 40(3):308–318, 2022.
OpenUrl CrossRef PubMed

[8] [8].↵
Botao Yu, Frazier N Baker, Ziru Chen, et al. Chemtoolagent: The impact of tools on language agents for chemistry problem solving. In NAACL Findings, 2025.

[9] [9].↵
F. Pedregosa, G. Varoquaux, A. Gramfort, et al. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
OpenUrl

[10] [10].↵
Adam Paszke, Sam Gross, Francisco Massa, et al. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, 2019.

[11] [11].↵
Alexander Wolf, Philipp Angerer, and Fabian Theis. Scanpy: large-scale single-cell gene expression data analysis. Genome biology, 19:1–5, 2018.
OpenUrl CrossRef PubMed

[12] [12].↵
Kristen R Maynard, Leonardo Collado-Torres, Lukas M Weber, et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nature neuroscience, 24(3):425– 436, 2021.
OpenUrl CrossRef PubMed

[13] [13].↵
Louis Kuemmerle, Malte Luecken, Alexandra Firsova, et al. Probe set selection for targeted spatial transcriptomics. Nature methods, pages 1–11, 2024.

[14] [14].↵
Elie N Farah, Robert K Hu, Colin Kern, et al. Spatially organized cellular communities form the developing human heart. Nature, 627(8005):854–864, 2024.
OpenUrl CrossRef PubMed

[15] [15].↵
Wenpin Hou and Zhicheng Ji. Assessing gpt-4 for cell type annotation in single-cell rna-seq analysis. Nature Methods, pages 1–4, 2024.

[16] [16].↵
Chuan Xu, Martin Prete, Simone Webb, et al. Automatic cell-type harmonization and integration across human cell atlas datasets. Cell, 186(26):5876–5891, 2023.
OpenUrl CrossRef PubMed

[17] [17].↵
C Domínguez Conde, Chao Xu, Louie B Jarvis, et al. Cross-tissue immune cell analysis reveals tissue-specific features in humans. Science, 376(6594):eabl5197, 2022.
OpenUrl CrossRef PubMed

[18] [18].↵
Junbum Kim, Samir Rustam, Juan Miguel Mosquera, et al. Unsupervised discovery of tissue architecture in multiplexed imaging. Nature methods, 19(12):1653–1661, 2022.
OpenUrl CrossRef PubMed

[19] [19].↵
Zane Durante, Qiuyuan Huang, Naoki Wake, et al. Agent ai: Surveying the horizons of multi-modal interaction. arXiv:2401.03568, 2024.

[20] [20].↵
Paolo Cadinu, Kisha N Sivanathan, Aditya Misra, et al. Charting the cellular biogeography in colitis reveals fibroblast trajectories and coordinated spatial remodeling. Cell, 187(8):2010– 2028, 2024.
OpenUrl CrossRef PubMed

[21] [21].↵
Daniel Dimitrov, Philipp Sven Lars Schäfer, Elias Farr, et al. Liana+ provides an all-in-one framework for cell–cell communication inference. Nature Cell Biology, 26(9):1613–1622, 2024.
OpenUrl PubMed

[22] [22].↵
Suoqin Jin, Christian F Guerrero-Juarez, Lihua Zhang, et al. Inference and analysis of cell-cell communication using cellchat. Nature communications, 12(1):1088, 2021.
OpenUrl

[23] [23].↵
Mirjana Efremova, Miquel Vento-Tormo, Sarah A Teichmann, et al. Cellphonedb: inferring cell– cell communication from combined expression of multi-subunit ligand–receptor complexes. Nature protocols, 15(4):1484–1506, 2020.
OpenUrl CrossRef PubMed

[24] [24].↵
Christopher S Smillie, Moshe Biton, Jose Ordovas-Montanes, et al. Intra-and inter-cellular rewiring of the human colon during ulcerative colitis. Cell, 178(3):714–730, 2019.
OpenUrl CrossRef PubMed

[25] [25].↵
Benjamin Ng, Stuart A Cook, and Sebastian Schafer. Interleukin-11 signaling underlies fibrosis, parenchymal dysfunction, and chronic inflammation of the airway. Experimental & Molecular Medicine, 52(12):1871–1878, 2020.
OpenUrl CrossRef PubMed

[26] [26].↵
Srinivas Allanki, Boris Strilic, Lilly Scheinberger, et al. Interleukin-11 signaling promotes cellular reprogramming and limits fibrotic scarring during tissue regeneration. Science advances, 7(37):eabg6497, 2021.
OpenUrl CrossRef PubMed

[27] [27].↵
Takashi Nishina, Yutaka Deguchi, Daisuke Ohshima, et al. Interleukin-11-expressing fibroblasts have a unique gene signature correlated with poor prognosis of colorectal cancer. Nature communications, 12(1):2281, 2021.
OpenUrl CrossRef PubMed

[28] [28].↵
Diya B Joseph, Gervaise H Henry, Alicia Malewska, et al. Single-cell analysis of mouse and human prostate reveals novel fibroblasts with specialized distribution and microenvironment interactions. The Journal of pathology, 255(2):141–154, 2021.
OpenUrl CrossRef PubMed

[29] [29].↵
Mert Cemri, Melissa Z Pan, Shuyi Yang, Lakshya A Agrawal, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Dan Klein, Kannan Ramchandran, et al. Why do multiagent llm systems fail? arXiv:2503.13657, 2025.

[30] [30].↵
Tim Stuart, Andrew Butler, Paul Hoffman, et al. Comprehensive integration of single-cell data. cell, 177(7):1888–1902, 2019.
OpenUrl CrossRef PubMed

[31] [31].↵
Alsu Missarova, Jaison Jain, Andrew Butler, et al. genebasis: an iterative approach for unsupervised selection of targeted gene panels from scrna-seq. Genome biology, 22:1–22, 2021.
OpenUrl CrossRef PubMed

[32] [32].↵
Ian Covert, Rohan Gala, Tim Wang, et al. Predictive and robust gene selection for spatial transcriptomics. Nature Communications, 14(1):2091, 2023.
OpenUrl CrossRef PubMed

[33] [33].↵
CZI Single-Cell Biology Program, Shibla Abdulla, Brian Aevermann, et al. Cz cell× gene discover: A single-cell data platform for scalable exploration, analysis and modeling of aggregated data. BioRxiv, pages 2023–10, 2023.

[34] [34].↵
Oscar Franzén, Li-Ming Gan, and Johan Björkegren. Panglaodb: a web server for exploration of mouse and human single-cell rna sequencing data. Database, 2019:baz046, 2019.
OpenUrl CrossRef PubMed

[35] [35].↵
Congxue Hu, Tengyue Li, Yingqi Xu, et al. Cellmarker 2.0: an updated database of manually curated cell markers in human/mouse and web tools based on scrna-seq data. Nucleic acids research, 51(D1):D870–D876, 2023.
OpenUrl CrossRef PubMed

[36] [36].↵
Michael Schubert, Bertram Klinger, Martina Klünemann, et al. Perturbation-response genes reveal signaling footprints in cancer gene expression. Nature communications, 9(1):20, 2018.
OpenUrl CrossRef PubMed

[37] [37].↵
Shunyu Yao, Jeffrey Zhao, Dian Yu, et al. React: Synergizing reasoning and acting in language models. In International Conference on Learning Representations, 2023.

SpatialAgent: An autonomous AI agent for spatial biology

Abstract

1. Introduction

2. Results

Overview of SpatialAgent

SpatialAgent out-performed most tools and human experts in gene panel design

Reasoning and efficiency of autonomous panel design

Enhanced cell type and tissue niche annotation

Behavior analysis in multimodal annotation

Mining data patterns, summarizing findings, and proposing hypotheses with SpatialAgent

Adding customized genes to an existing panel for a specialized system

3. Discussion

Methods Summary

SpatialAgent

Co-pilot mode

Computational Baselines

Human scientist performance

Evaluation metrics

Code availability

Supplementary Information

1. SpatialAgent

1.1. Configurations of base LLMs

1.2. LangChain framework

1.3. Tools developments

1.3.1. Databases

1.3.2. Data Processing

1.3.3. Aggregation

1.3.4. General purpose

1.4. Plan templates

1.5. Core prompts in the agentic framework

1.6. Extendability

1.7. Generalization

1.7.1. Annotation harmonization

1.7.2. Gene Regulatory Network Inference with the Spatial Context

1.7.3. Summary

1.8. Interaction design of SpatialAgent

2. Gene panel design

2.1. Computational baselines

2.2. Human scientists evaluation

2.2.1. Study design

2.2.2. Overview of human solutions

2.3. Pairwise comparisons on cell type prediction

2.3.1. More metrics on location restoration

2.3.2. Performance via human expert rating

3. Cell type and tissue niche annotation

3.1. Overview

3.2. Baselines

3.3. Per cluster annotation between GPTCellType and SpatialAgent

3.4. Removing small clusters in UTAG

3.5. Analysis of human expert annotations

3.6. Confusion matrix

3.7. Summary

4. Mining cell-cell interaction and report generation

4.1. Overview

4.2. Summarization

4.2.1. Condition specific analysis

4.2.2. Cell type specific analysis

4.2.3. Tissue region specific analysis

4.3. Factor analysis in CCI

LIANA and Cell2Cell integration

Pathway and network analysis

4.4. Generated report

5. Designing additional customized genes upon Xenium 5k panel

5.1. Overview

5.2. Customized gene panel improves cell-type resolution and clustering

5.3. Customized gene panel enhances the capture of cell-cell interactions

5.4. Summary

6. Tools

Acknowledgments

Footnotes

References

Citation Manager Formats

Subject Area