Physics-Informed BERT-style Transformer for Multiscale PDE Modeling
Side-by-side animations of ground truth vs. PIBERT predictions over time — Cylinder Wake benchmark
A novel framework for solving multiscale partial differential equations (PDEs) that integrates hybrid spectral embeddings, physics-biased attention mechanisms, and self-supervised pretraining.
Three major innovations that enable PIBERT to outperform existing approaches
Combines Fourier and Wavelet transforms to capture both global patterns and localized features, enabling accurate multiscale representation.
Incorporates PDE residuals directly into attention calculation for physically consistent predictions and improved convergence.
Includes Masked Physics Prediction (MPP) and Equation Consistency Prediction (ECP) tasks for robust generalization.
Designed specifically for PDEs with rich multiscale behavior, capturing dynamic structures in a stable latent space.
Works across different hardware configurations, from consumer GPUs to high-end accelerators.
Provides interpretable latent space representations and attention patterns for physical insight.
PIBERT demonstrates state-of-the-art performance across multiple benchmarks
| Model | Relative L1 | Relative L2 | MAE |
|---|---|---|---|
| PINN | 0.0651 | 0.0803 | 0.0581 |
| FNO | 0.0123 | 0.0150 | 0.0100 |
| Transformer | 0.0225 | 0.0243 | 0.0200 |
| PINNsFormer | 0.0065 | 0.0078 | 0.0060 |
| PIBERT | 0.0061 | 0.0074 | 0.0056 |
| Model | MSE(u) | MSE(v) | MSE(p) |
|---|---|---|---|
| PINNs | 0.0500 | 0.0300 | 0.01500 |
| Spectral PINN | 0.0200 | 0.0045 | 0.00085 |
| FNO | 0.0113 | 0.0012 | 0.00021 |
| PINNsFormer | 0.0065 | 0.0007 | 0.00003 |
| PIBERT (Lite) | 0.0103 | 0.0011 | 0.000046 |
| Model Variant | MSE (Test) | NMSE (Test) |
|---|---|---|
| PIBERT (Full) | 0.4975 | 1.3409 |
| Fourier-only | 1.6520 | 12.4010 |
| Wavelet-only | 0.4123 | 1.1021 |
| Standard-attention | 1.3201 | 9.8760 |
| FNO | 1.8099 | 13.5830 |
| UNet | 3.7006 | 29.2627 |
| Task | Minimum (GTX 3060) | Recommended (A100) | Notes |
|---|---|---|---|
| Model Inference (2D) | ✓ | ✓ | 64×64 grids work on both |
| Model Training (2D) | ✓ (small batches) | ✓ | GTX 3060 requires gradient checkpointing |
| 3D Problem Inference | ✗ | ✓ | Requires 40+ GB VRAM |
| Pretraining | ✗ | ✓ | Not feasible on consumer GPUs |
PIBERT evaluated on real-world fluid simulation benchmarks: Cylinder Wake and Fluid–Structure Interaction (FSI)
| Model | MSE (All) | NMSE (All) | LMAE (All) | LPCC (All) | R² (All) |
|---|---|---|---|---|---|
| PIBERT | 0.06280 | 0.05875 | 0.11448 | 0.97019 | 0.94123 |
| FourierFlow | 0.06341 | 0.05931 | 0.11797 | 0.96989 | 0.94066 |
| FNO2d | 0.06417 | 0.06003 | 0.13216 | 0.96959 | 0.93995 |
| PITT | 0.12538 | 0.11729 | 0.19095 | 0.94118 | 0.88267 |
| DeepONet2d | 0.21216 | 0.19846 | 0.26190 | 0.89658 | 0.80146 |
| PINN | 0.42694 | 0.39938 | 0.29292 | 0.77492 | 0.60046 |
PIBERT achieves best LMAE, NMSE, LPCC, and R² on Cylinder Wake (real-world). Lower is better for MSE/NMSE/LMAE; higher is better for LPCC/R².
| Model | MSE (All) | NMSE (All) | LMAE (All) | LPCC (All) | R² (All) |
|---|---|---|---|---|---|
| PIBERT | 0.000206 | 0.000270 | 0.008640 | 0.999864 | 0.999729 |
| FourierFlow | 0.000307 | 0.000402 | 0.010626 | 0.999797 | 0.999595 |
| PINN | 0.000442 | 0.000580 | 0.011629 | 0.999708 | 0.999416 |
| FNO2d | 0.001716 | 0.002248 | 0.027409 | 0.998873 | 0.997736 |
| DeepONet2d | 0.080288 | 0.105176 | 0.172149 | 0.945567 | 0.894091 |
| PITT | 0.035803 | 0.046901 | 0.101337 | 0.976101 | 0.952772 |
PIBERT achieves best performance on all FSI metrics. Significant margin over baselines on NMSE (0.000270 vs. next-best 0.000402).
| Model | MSE (u) | MSE (v) | NMSE (u) | NMSE (v) | R² (u) | R² (v) |
|---|---|---|---|---|---|---|
| PIBERT | 0.03321 | 0.09240 | 0.02993 | 0.08984 | 0.97002 | 0.91016 |
| FourierFlow | 0.03023 | 0.09658 | 0.02725 | 0.09391 | 0.97271 | 0.90609 |
| FNO2d | 0.03439 | 0.09394 | 0.03100 | 0.09134 | 0.96895 | 0.90866 |
| PITT | 0.06295 | 0.18781 | 0.05673 | 0.18261 | 0.94317 | 0.81739 |
| DeepONet2d | 0.10205 | 0.32226 | 0.09198 | 0.31334 | 0.90787 | 0.68666 |
| PINN | 0.09934 | 0.75454 | 0.08954 | 0.73364 | 0.91032 | 0.26636 |
| Model | MSE (u) | MSE (v) | NMSE (u) | NMSE (v) | R² (u) | R² (v) |
|---|---|---|---|---|---|---|
| PIBERT | 0.000103 | 0.000309 | 0.000143 | 0.000381 | 0.999851 | 0.999619 |
| FourierFlow | 0.000141 | 0.000473 | 0.000197 | 0.000584 | 0.999796 | 0.999416 |
| PINN | 0.000156 | 0.000729 | 0.000218 | 0.000899 | 0.999774 | 0.999100 |
| FNO2d | 0.001486 | 0.001947 | 0.002074 | 0.002402 | 0.997846 | 0.997597 |
| DeepONet2d | 0.036055 | 0.124522 | 0.050325 | 0.153672 | 0.947739 | 0.846269 |
| PITT | 0.017610 | 0.053995 | 0.024581 | 0.066635 | 0.974474 | 0.933340 |
PIBERT's performance on various benchmark problems
Side-by-side comparisons of ground truth vs. PIBERT predictions over time
Get started with PIBERT in minutes
# Basic installation
pip install pibert
# For development with testing and documentation tools
pip install pibert[dev]
# For full functionality including wavelet transforms
pip install pibert[full]
from pibert import PIBERT
from pibert.utils import load_dataset
# Load a small sample dataset
dataset = load_dataset("reaction_diffusion")
# Initialize a small model
model = PIBERT(
input_dim=1,
hidden_dim=64,
num_layers=2,
num_heads=4
)
# Perform prediction
pred = model.predict(dataset["test"]["x"][:1], dataset["test"]["coords"][:1])
print(f"Prediction shape: {pred.shape}")
All results in the paper can be reproduced using the provided code:
jupyter notebook examples/ablation_study_gpu.ipynb
If you find PIBERT useful in your research, please cite our paper
@article{chakraborty2024pibert,
title={PIBERT: A Physics-Informed Transformer with Hybrid Spectral Embeddings for Multiscale PDE Modeling},
author={Chakraborty, Somyajit, Pan, Ming and Xizhong, Chen},
year={2025}
}
For support and questions, please open an issue on GitHub or contact the authors:
Somyajit Chakraborty: chksomyajit@sjtu.edu.cn
Pan Ming: panming@sjtu.edu.cn
Chen Xizhong: chenxizh@sjtu.edu.cn