Applications (stat.AP)

Graph Convolutional Support Vector Regression for Robust Spatiotemporal Forecasting of Urban Air Pollution
Nourin Jahan, Madhurima Panja, Muhammed Navas T, Tanujit Chakraborty
May 06 2026 cs.LG stat.AP stat.ML arXiv:2605.03795v1

@misc{2605.03795, author = {Nourin Jahan and Madhurima Panja and Muhammed Navas T and Tanujit Chakraborty}, title = {{G}raph {C}onvolutional {S}upport {V}ector {R}egression for {R}obust {S}patiotemporal {F}orecasting of {U}rban {A}ir {P}ollution}, year = {2026}, eprint = {2605.03795}, note = {arXiv:2605.03795v1} }
PDF
Urban air quality forecasting is challenging because pollutant concentrations are nonlinear, nonstationary, spatiotemporally dependent, and often affected by anomalous observations caused by traffic congestion, industrial emissions, and seasonal meteorological variability. This study proposes a Graph Convolutional Support Vector Regression (GCSVR) framework for robust spatiotemporal forecasting of urban air pollution. The model combines graph convolutional learning to capture inter-station spatial dependence with support vector regression to model nonlinear temporal dynamics while reducing sensitivity to outlier observations. The proposed framework is evaluated using air quality records from 37 monitoring stations in Delhi and 18 stations in Mumbai, representing inland and coastal metropolitan environments in India. Forecasting performance is assessed across multiple horizons and compared with established temporal and spatiotemporal benchmarks. The results show that GCSVR consistently improves predictive accuracy and maintains stable performance across seasons and outlier-prone pollution episodes. Statistical test further confirms the reliability of the proposed approach across the two cities. Finally, conformal prediction is integrated with GCSVR to generate calibrated prediction intervals, enhancing its practical value for uncertainty-aware air quality monitoring and public health decision-making.
On Model-Based Clustering With Entropic Optimal Transport
Gonzalo Mena
May 06 2026 stat.ME stat.AP stat.ML arXiv:2605.03240v1

@misc{2605.03240, author = {Gonzalo Mena}, title = {{O}n {M}odel-{B}ased {C}lustering {W}ith {E}ntropic {O}ptimal {T}ransport}, year = {2026}, eprint = {2605.03240}, note = {arXiv:2605.03240v1} }
PDF
We develop a new methodology for model-based clustering. Optimizing the log-likelihood provides a principled statistical framework for clustering, with solutions found via the EM algorithm. However, because the log-likelihood is nonconvex, only convergence to stationary points can be guaranteed, and practitioners often use multiple starting points in the hope that one will converge to the global solution. We consider a new loss function based on entropic optimal transport that shares the same global optimum as the log-likelihood but has a much better-behaved landscape, thereby avoiding spurious local-optima configurations that are pervasive with the log-likelihood. Similar to the EM algorithm for the log-likelihood, this new loss can be optimized by the Sinkhorn-EM algorithm, which we show converges at a rate comparable to that of EM. By analyzing extensive numerical experiments and two real-world applications in image segmentation in C. elegans microscopy and clustering in spatial transcriptomics, we show that this new loss outperforms log-likelihood optimization, indicating that it represents a valuable clustering methodology for practitioners.
ADAPTS: Agentic Decomposition for Automated Protocol-agnostic Tracking of Symptoms
Alexandria K. Vail, Marcelo Cicconet, Katie Aafjes-van Doorn, Ryan Maroney, Marc Aafjes
May 06 2026 cs.AI cs.CL cs.HC stat.AP stat.CO arXiv:2605.03212v1

@misc{2605.03212, author = {Alexandria K.~Vail and Marcelo Cicconet and Katie Aafjes-van Doorn and Ryan Maroney and Marc Aafjes}, title = {{ADAPTS}: {A}gentic {D}ecomposition for {A}utomated {P}rotocol-agnostic {T}racking of {S}ymptoms}, year = {2026}, eprint = {2605.03212}, note = {arXiv:2605.03212v1} }
PDF
Modeling latent clinical constructs from unconstrained clinical interactions is a unique challenge in affective computing. We present ADAPTS (Agentic Decomposition for Automated Protocol-agnostic Tracking of Symptoms), a framework for automated rating of depression and anxiety severity using a mixture-of-agents LLM architecture. This approach decomposes long-form clinical interviews into symptom-specific reasoning tasks, producing auditable justifications while preserving temporal and speaker alignment. Generalization was evaluated across two independent datasets ($N=204$) with distinct interview structures. On high-discrepancy interviews, automated ratings approximated expert benchmarks ($\text{absolute error}=22$) more closely than original human ratings ($\text{absolute error}=26$). Implementing an ``extended'' protocol that incorporates qualitative clinical conventions significantly stabilized ratings, with absolute agreement reaching $\text{ICC(2,1)} = 0.877$. These findings suggest that the ADAPTS framework enables promising evaluations of psychiatric severity. While the current implementation is purely text-based, the underlying architecture is readily extensible to multimodal inputs, including acoustic and visual features. By approximating expert-level precision in a protocol-agnostic manner, this framework provides a foundation for objective and scalable psychiatric assessment, especially in resource-limited settings.
Evaluating the probative value of forensic gait analysis evidence using empirical data
Ruoyun Hui, Amy L Wilson, Colin Aitken, Ivan Birch, Nadia Asgeirsdottir, Graham Jackson
May 06 2026 stat.AP arXiv:2605.03193v1

@misc{2605.03193, author = {Ruoyun Hui and Amy L Wilson and Colin Aitken and Ivan Birch and Nadia Asgeirsdottir and Graham Jackson}, title = {{E}valuating the probative value of forensic gait analysis evidence using empirical data}, year = {2026}, eprint = {2605.03193}, note = {arXiv:2605.03193v1} }
PDF
Forensic gait analysis can aid the investigation of crimes through comparing features of gait captured in video footage. Modelling the probative value of gait evidence requires an understanding of the variation of features of gait between individuals in the population and within the same individuals. We address this question using a previously described population dataset and newly collected datasets with repeated observations of the same individuals on separate occasions. In addition to exploring the level of variability, correlation between features of gait, and the effect of demographic factors, we developed a likelihood ratio model through recoding features of gait as dichotomous variables and dimension reduction using PCA. High correlations between some features were observed, confirming that they should not contribute independently to the weight of evidence. The likelihood ratio model produced misleading likelihood ratios in less than 10% of the comparisons using the first four principal components. However, the risk increases when within-individual variability is mis-specified. Therefore, while the current model provides assistance to the judgement of gait experts, human expertise is indispensable to decide whether or not the difference in walking and/or recording conditions between the reference and questioned footage could have caused any observed differences in the features of gait. We discuss future directions in understanding the sources of the variability, improving statistical modelling and note the need to consider carefully how to select the relevant population for model fitting.
Pose Tracking with a Foundation Pose Model and an Ensemble Directional Kalman Filter
Tianlu Lu, Asif Sijan, Thomas Noh, Huaijin Chen, Andrey A. Popov
May 06 2026 cs.LG math.DG stat.AP arXiv:2605.03105v1

@misc{2605.03105, author = {Tianlu Lu and Asif Sijan and Thomas Noh and Huaijin Chen and Andrey A.~Popov}, title = {{P}ose {T}racking with a {F}oundation {P}ose {M}odel and an {E}nsemble {D}irectional {K}alman {F}ilter}, year = {2026}, eprint = {2605.03105}, note = {arXiv:2605.03105v1} }
PDF
This paper introduces the ensemble directional Kalman filter (EnDKF), an ensemble-based Kalman filtering approach for pose tracking that jointly estimates an object's position and attitude using ideas from directional statistics. The EnDKF integrates a unit-quaternion attitude representation to move beyond canonical Kalman filter mean and covariance assumptions that poorly capture directional uncertainty. Experiments on a synthetic constant-velocity constant-angular-velocity system and a digital-twin head-tracking scenario using the FoundationPose algorithm demonstrate a significant reduction in error as opposed to merely using measurements.
Tweedie-based nonparametric estimation for semicontinuous mixed densities
Guanjie Lyu, Frédéric Ouimet, Cindy Feng
May 06 2026 stat.ME math.ST stat.AP stat.TH arXiv:2605.03044v1

@misc{2605.03044, author = {Guanjie Lyu and Frédéric Ouimet and Cindy Feng}, title = {{T}weedie-based nonparametric estimation for semicontinuous mixed densities}, year = {2026}, eprint = {2605.03044}, note = {arXiv:2605.03044v1} }
PDF
Semicontinuous outcomes occur frequently in health services, insurance, and cost studies. Standard nonparametric density estimators are not well suited to such data because they do not naturally accommodate the mixed structure, the nonnegative support, or the pronounced boundary effects near zero. To address these limitations, we introduce an asymmetric kernel estimator for mixed densities on $[0,\infty)$ based on the Tweedie distribution. For a power parameter $p\in(1,2)$, the Tweedie kernel itself has a point mass at zero and an absolutely continuous component on $(0,\infty)$, yielding a unified smoothing construction that preserves the atom at zero and smooths the positive component using the full semicontinuous sample. We establish pointwise bias and variance expansions, derive asymptotic formulae for the mean squared error and mean integrated squared error, obtain optimal bandwidth rates, and prove asymptotic normality. We propose a profile least-squares cross-validation procedure to jointly select the bandwidth and the power parameter. Simulation results show competitive performance, particularly in challenging boundary-spike and heavy-tailed settings, and an application to emergency department length-of-stay data illustrates the practical value of the method.
Synergy Area with FDR-controlled Evaluation (SAFE) to robustly assess safety profile in clinical trials
Tianyu Zhan, Yabing Mai, Yihua Gu, Thao Doan, Xun Chen
May 06 2026 stat.AP arXiv:2605.03041v1

@misc{2605.03041, author = {Tianyu Zhan and Yabing Mai and Yihua Gu and Thao Doan and Xun Chen}, title = {{S}ynergy {A}rea with {FDR}-controlled {E}valuation ({SAFE}) to robustly assess safety profile in clinical trials}, year = {2026}, eprint = {2605.03041}, note = {arXiv:2605.03041v1} }
PDF
Safety assessment plays a fundamental role in developing a new drug via clinical trials for ethical considerations. Due to complexity, manual review is typically conducted on the totality of data to draw safety conclusions. There are some existing quantitative methods to facilitate or tailor further medical review, with a controlled error rate and integration of clinical knowledge. In addition to those two key aspects, we emphasize the importance of relying on substantial evidence to draw robust conclusions on safety. Motivated by these three important properties, we propose a two-layer Synergy Area with FDR-controlled Evaluation (SAFE) structural framework to robustly assess the safety profile in clinical trials. In the first layer of SAFE, we investigate each clinically meaningful Synergy Area (SA) based on compelling evidence. In the next layer, the false discovery rate (FDR) is controlled for potential findings across all SAs. Simulation studies show that SAFE properly controls error rates within and across SAs at the nominal level. We further apply the proposed approach to two case studies based on real data from the Historical Trial Data (HTD) Sharing Initiative of the DataCelerate platform. As compared to some direct methods, SAFE demonstrates an appealing feature of screening out extreme data and reaching solid safety conclusions. It can act as either a building block in another framework, or a platform to incorporate additional components.
The Catastrophic Consequences of Agnosticism for Life Searches and a Possible Workaround
David Kipping
May 06 2026 astro-ph.IM stat.AP arXiv:2605.02969v1

@misc{2605.02969, author = {David Kipping}, title = {{T}he {C}atastrophic {C}onsequences of {A}gnosticism for {L}ife {S}earches and a {P}ossible {W}orkaround}, year = {2026}, eprint = {2605.02969}, note = {arXiv:2605.02969v1} }
PDF
Planned and ongoing searches for life, both biological and technological, confront an epistemic barrier concerning false positives - namely, that we don't know what we don't know. The most defensible and agnostic approach is to adopt diffuse (uninformative) priors, not only for the prevalence of life, but also for the prevalence of confounders. We evaluate the resulting Bayes factors between the null and life hypotheses for an idealized experiment with $N_{pos}$ positive labels (biosignature detections) among $N_{tot}$ targets with various priors. Using diffuse priors, the consequences are catastrophic for life detection, requiring at least ${\sim}10^4$ (for some priors ${\sim}10^{13}$) surveyed targets to ever obtain "strong evidence" for life. Accordingly, an HWO-scale survey with $N_{tot}{\sim}25$ would have no prospect of achieving this goal. A previously suggested workaround is to forgo the agnostic confounder prior, by asserting some upper limit on it for example, but we find that the results can be highly sensitive to this choice - as well as difficult to justify. Instead, we suggest a novel solution that retains agnosticism: by dividing the sample into two groups for which the prevalence of life differs, but the confounder rate is global. We show that a $N_{tot}=24$ survey could expect 24% of possible outcomes to produce strong life detections with this strategy, rising to $\geq50$% for $N_{tot}\geq76$. However, AB-testing introduces its own unique challenges to survey design, requiring two groups with differing life prevalence rates (ideally greatly so) but a global confounder rate.
Statistical analysis of virion-cell interactions mediated by peptide nanofibrils and peptide amphiphiles using STEM tomography
Philipp Rieder, Julia La Roche, Orkun Furat, Annalena Kuhn, Lena Rauch-Wirth, Kübra Kaygisiz, Fabian Zech, Jan Münch, Clarissa Read, Rüdiger Groß, Volker Schmidt
May 06 2026 physics.bio-ph cond-mat.mtrl-sci q-bio.SC stat.AP arXiv:2605.02934v1

@misc{2605.02934, author = {Philipp Rieder and Julia La Roche and Orkun Furat and Annalena Kuhn and Lena Rauch-Wirth and Kübra Kaygisiz and Fabian Zech and Jan Münch and Clarissa Read and Rüdiger Groß and Volker Schmidt}, title = {{S}tatistical analysis of virion-cell interactions mediated by peptide nanofibrils and peptide amphiphiles using {STEM} tomography}, year = {2026}, eprint = {2605.02934}, note = {arXiv:2605.02934v1} }
PDF
Peptide nanofibrils (PNFs) and peptide amphiphiles (PAs) are promising tools for enhancing viral transduction and gene transfer. However, quantitative insight into how their supramolecular architecture governs virion-cell interactions is limited. Here, we introduce a framework for the acquisition, processing, and statistical analysis of scanning transmission electron microscopy (STEM) tomograms to objectively quantify peptide-virion-cell interactions. Using four transduction-enhancing peptides (D4, Vectofusin-1, palmitic acid-PA (pal-PA), and eicosapentaenoic-PA (eic-PA)), peptide aggregate morphology, interfacial contact areas, and the spatial organization of virions with respect to peptides and cells were analyzed using advanced geometric descriptors. All peptides efficiently captured virions, resulting in few free virions, but they differ in how strictly virions were spatially confined near the cell surface. These differences reflect alternative spatial organization strategies, which are likely crucial factors influencing transduction-enhancing efficacy. Our approach provides a novel, generalizable method to evaluate infection-enhancing nanomaterials and guides the rational design of next-generation peptide assemblies for therapeutic viral delivery.

Recent comments