A pure Rust implementation of TensorFlow, providing a full-featured machine learning framework with Rust's safety and performance.
v0.1.0 (2026-03-20)
TenfloweRS v0.1.0 is the first release, with 12,949 tests passing across 6 crates, zero clippy warnings, zero security vulnerabilities, and comprehensive documentation.
TenfloweRS is a native Rust machine learning framework inspired by TensorFlow, designed to bring the power of deep learning to the Rust ecosystem. It leverages Rust's memory safety, zero-cost abstractions, and excellent performance while maintaining compatibility with the broader ML ecosystem through ONNX support.
TenfloweRS adapts TensorFlow's proven architecture to Rust's strengths:
- Memory Safety First: All operations are memory-safe by design, eliminating segfaults and data races
- Zero-Cost Abstractions: High-level APIs compile down to efficient machine code
- Explicit over Implicit: Clear ownership and error handling following Rust conventions
- Modular Architecture: Organized as a workspace of focused, reusable crates
- Cross-Platform: Native support for Windows, macOS, and Linux with unified GPU abstraction
- Pure Rust: No C/Fortran dependencies in the default build -- the entire stack is 100% Rust
| TensorFlow Concept | TenfloweRS Implementation |
|---|---|
tf.Tensor |
Tensor<T> with static typing |
tf.Operation |
Op trait with registered kernels |
tf.Graph |
Graph struct with ownership semantics |
tf.Session |
Session trait for graph execution |
tf.GradientTape |
GradientTape for automatic differentiation |
tf.keras.Layer |
Layer trait with builder pattern |
tf.data.Dataset |
Iterator-based Dataset trait |
tf.device |
Device enum with placement control |
- Dual Execution Modes: Both eager execution (PyTorch-style) and static computation graphs (TensorFlow-style)
- Pure Rust Implementation: No C/C++ dependencies in the core, ensuring memory safety
- GPU Support: Cross-platform GPU acceleration via WGPU (Metal, Vulkan, DirectX)
- Rust Scientific Stack: Built on NumRS2 and SciRS2 for numerical computing
- Python Bindings: PyO3-based FFI crate with 48 passing tests
- Tensorboard Integration: Pure Rust implementation with no protobuf dependency
- ONNX Support: Import and export models for cross-framework compatibility
- Performance: SIMD vectorization, optional BLAS integration, and parallel execution
- 150+ Research Domains: From transformers and diffusion models to quantum ML and protein structure prediction
- Production Ready: 12,949 tests passing, 0 security vulnerabilities, comprehensive docs
Current Version: 0.1.0 (Released 2026-03-20)
First release with full-featured ML capabilities across all 6 crates.
- Tests: 12,949 passing (100% pass rate)
- Code: 1,453 Rust files, ~641K lines of Rust code
- Security: 0 vulnerabilities
- Clippy: 0 warnings, 0 errors
- Rustdoc: Builds clean with
-D warnings - TODO markers: 0 remaining
| Crate | Tests | Status | Description |
|---|---|---|---|
| tenflowers-core | 675 | Stable | Core tensor operations and GPU support |
| tenflowers-autograd | 334 | Stable | Automatic differentiation engine |
| tenflowers-neural | 11,407 | Stable | Neural network layers, models, and 150+ research domains |
| tenflowers-dataset | 472 | Stable | Data loading and preprocessing |
| tenflowers-ffi | 48 | Stable | Python bindings via PyO3 |
| tenflowers | 13 (doc) | Stable | Unified API and prelude |
- Core tensor operations fully tested and validated
- Automatic differentiation engine with comprehensive gradient support
- Neural network layers (Dense, Conv2D, BatchNorm, Dropout, Attention, RNN, GNN, Transformers, and many more)
- Training utilities (optimizers including SGD, Adam, AdamW, LAMB, Lion, Muon; loss functions; training loops; LR schedulers)
- Data loading pipeline with multi-format support
- GPU acceleration via WGPU (cross-platform)
- SciRS2/NumRS2 ecosystem integration
- Python bindings with PyO3 (48 tests passing)
- Tensorboard logging (pure Rust, no protobuf dependency)
- Security hardening (zero vulnerabilities)
- Comprehensive documentation
The neural crate alone has 11,407 tests covering:
Core architectures: attention mechanisms (multi-head, flash, ALiBi, RoPE), RNN (LSTM, GRU, bidirectional), transformers (encoder, decoder, efficient variants including RetNet, Mamba-2, GQA), CNN, graph neural networks (GCN, GAT, GraphSAGE, GIN, and advanced variants)
Generative models: normalizing flows, diffusion models, GANs, VAEs, energy-based models, neural rendering (3D Gaussian splatting, NeRF)
Reinforcement learning: policy gradient, actor-critic, PPO, SAC, multi-agent RL, safe RL, inverse RL, reward shaping, world models
Scientific ML: physics-informed neural networks (PINNs), neural ODEs/SDEs, operator learning (FNO, DeepONet, WNO, GNO), differentiable physics, simulation-based inference
Domain-specific: molecular GNN, protein structure prediction, drug discovery, medical imaging, audio models, speech recognition, video understanding, geospatial ML, climate ML, satellite ML, digital pathology, bio ML
Advanced methods: Bayesian deep learning, federated learning, meta-learning, NAS, knowledge distillation, quantum ML, geometric deep learning, causal inference, optimal transport, topological ML, continual learning, active learning, conformal prediction, and many more
Add TenfloweRS to your Cargo.toml:
[dependencies]
tenflowers-core = "0.1.0"
tenflowers-neural = "0.1.0"For GPU support:
[dependencies]
tenflowers-core = { version = "0.1.0", features = ["gpu"] }For the unified API:
[dependencies]
tenflowers = "0.1.0"use tenflowers_core::{Tensor, Device, Context};
// Create a context for eager execution
let ctx = Context::new()?;
// Create tensors
let a = Tensor::<f32>::ones(&[2, 3]);
let b = Tensor::<f32>::from_vec(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0], &[2, 3])?;
// Operations execute immediately in eager mode
let c = a.add(&b)?;
let d = c.matmul(&b.transpose()?)?;
// Move to GPU
let gpu_tensor = a.to(Device::Gpu(0))?;
// Automatic differentiation
let tape = GradientTape::new();
let x = Tensor::variable(vec![1.0, 2.0, 3.0], &[3]);
let y = tape.watch(x.clone());
let z = y.pow(2.0)?;
let grads = tape.gradient(&z, &[&x])?;use tenflowers_core::{Graph, Session, Placeholder};
// Build a computation graph
let graph = Graph::new();
let a = graph.placeholder::<f32>("input_a", &[None, 784])?;
let w = graph.variable("weights", &[784, 10])?;
let b = graph.variable("bias", &[10])?;
let y = a.matmul(&w)?.add(&b)?;
// Create a session and run
let session = Session::new(&graph)?;
session.run(
&[("input_a", input_tensor)],
&["output"],
&mut outputs
)?;use tenflowers_neural::{Sequential, Dense, Conv2D, Model};
use tenflowers_core::Tensor;
// Define a CNN for image classification
let mut model = Sequential::new(vec![
Box::new(Conv2D::new(32, (3, 3)).with_activation("relu")),
Box::new(Conv2D::new(64, (3, 3)).with_activation("relu")),
Box::new(layers::GlobalAveragePooling2D::new()),
Box::new(Dense::new(128, true).with_activation("relu")),
Box::new(layers::Dropout::new(0.5)),
Box::new(Dense::new(10, true).with_activation("softmax")),
]);
// Compile the model
model.compile(
optimizer::Adam::new(0.001),
loss::SparseCategoricalCrossentropy::new(),
vec![metrics::Accuracy::new()]
)?;
// Train the model
model.fit(
&train_dataset,
epochs: 10,
batch_size: 32,
validation_data: Some(&val_dataset),
)?;use tenflowers_dataset::{Dataset, DataLoader};
// Create a dataset from tensors
let dataset = Dataset::from_tensor_slices((images, labels))?
.shuffle(1000)
.batch(32)
.prefetch(2);
// Iterate through batches
for (batch_images, batch_labels) in dataset.iter() {
// Training step
}TenfloweRS follows a modular architecture inspired by TensorFlow:
tenflowers/
├── tenflowers-core/ # Core tensor operations and device management
│ ├── tensor/ # Tensor implementation with device support
│ ├── ops/ # Operation registry and implementations
│ ├── kernels/ # CPU and GPU kernel implementations
│ ├── graph/ # Computation graph representation
│ └── device/ # Device abstraction and management
├── tenflowers-autograd/ # Automatic differentiation engine
│ ├── tape/ # GradientTape for eager mode
│ ├── graph_grad/ # Graph-based backpropagation
│ └── ops/ # Gradient definitions for operations
├── tenflowers-neural/ # Neural network layers, models, and research domains
│ ├── layers/ # Layer implementations (attention, RNN, GNN, etc.)
│ ├── optimizers/ # Training optimizers (SGD, Adam, LAMB, Lion, Muon)
│ ├── rl/ # Reinforcement learning
│ ├── federated/ # Federated learning
│ ├── diffusion/ # Diffusion models
│ ├── graph_neural_ode/ # Neural ODE on graphs
│ └── ... # 150+ research domain modules
├── tenflowers-dataset/ # Data loading and preprocessing
│ ├── sources/ # Data source implementations
│ ├── transforms/ # Data transformation ops
│ └── iterators/ # Efficient iteration strategies
├── tenflowers-ffi/ # Python bindings via PyO3
│ └── src/ # Python-facing API
└── tenflowers/ # Unified API crate and prelude
- Reference-counted tensors with device placement
- Lazy allocation and memory pooling
- Zero-copy views and slicing
- Automatic broadcasting
- Extensible operation registry
- Multi-dispatch for device/dtype specialization
- Shape inference at graph construction time
- Automatic gradient registration
- Eager Mode: Operations execute immediately
- Graph Mode: Build once, run multiple times with optimization
- Unified API for CPU, GPU, and custom devices
- Automatic device placement with hints
- Cross-device memory transfers
- Multi-GPU support with collective operations
# Clone the repository
git clone https://github.com/cool-japan/tenflowers
cd tenflowers
# Build all crates
cargo build --workspace
# Run tests (requires cargo-nextest)
cargo nextest run --workspace
# Build with GPU support
cargo build --workspace --features gpu
# Build with BLAS acceleration (pure Rust)
cargo build --workspace --features blas-oxiblas
# Check for warnings (must pass -- no warnings policy)
cargo check --workspace
cargo clippy --workspace -- -D warnings
# Build documentation
cargo doc --workspace --no-depsCheck out the examples directory for usage examples:
mnist_eager.rs- MNIST classification with eager execution
TenfloweRS is designed for high performance:
- CPU: SIMD vectorization, optional BLAS integration (OxiBLAS), Rayon parallelization
- GPU: WGPU compute shaders, memory pooling, kernel fusion
- Memory: Zero-copy operations, buffer reuse, lazy allocation
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Key areas where we need help:
- GPU kernel development and optimization
- Performance benchmarking
- Documentation and examples
- Testing edge cases
- Python API expansion (tenflowers-ffi)
- Open an issue to discuss your contribution
- Follow the no-warnings policy (clippy must pass with
-D warnings) - Write tests including gradient checks where applicable
- Ensure zero
unwrap()usage in production code - Submit a PR with clear description
- Core tensor operations and autograd
- 150+ neural network research domains
- GPU support via WGPU
- Python bindings via PyO3
- 12,949 tests, 0 warnings, 0 vulnerabilities
- Graph optimization passes (constant folding, operator fusion, dead code elimination)
- Expanded GPU kernel coverage
- Performance benchmarking suite with CI gates
- ONNX import/export finalization
- Multi-GPU orchestration improvements
- API stability improvements toward 1.0
- Stable public API with semantic versioning guarantees
- Comprehensive ONNX compatibility
- Production deployment tooling
- WASM compilation target
| Feature | TensorFlow | TenfloweRS |
|---|---|---|
| Language | C++ with Python API | Pure Rust with Python bindings |
| Memory Safety | Manual management | Guaranteed by Rust |
| Execution | Eager + Graph | Eager + Graph |
| GPU Support | CUDA, ROCm | WGPU (cross-platform) |
| Autodiff | Tape + Graph | Tape + Graph |
| Deployment | TFLite, TF.js | Native, WASM (planned) |
| Ecosystem | Mature, extensive | Growing, Rust-focused |
TenFlowers is developed and maintained by COOLJAPAN OU (Team KitaSan).
If you find TenFlowers useful, please consider sponsoring the project to support continued development of the Pure Rust ecosystem.
https://github.com/sponsors/cool-japan
Your sponsorship helps us:
- Maintain and improve the COOLJAPAN ecosystem
- Keep the entire ecosystem (OxiBLAS, OxiFFT, SciRS2, etc.) 100% Pure Rust
- Provide long-term support and security updates
This project is licensed under the Apache License, Version 2.0 (LICENSE).
TenfloweRS builds upon the excellent Rust scientific computing ecosystem:
- NumRS2 for n-dimensional arrays
- SciRS2 for scientific algorithms
- OxiBLAS for pure Rust BLAS
- OxiFFT for pure Rust FFT
- WGPU for GPU compute
Special thanks to the TensorFlow team for the inspiration and architectural patterns.
- GitHub Issues: Bug reports and feature requests
- Discussions: Community forum
Note: TenfloweRS is not affiliated with Google's TensorFlow. It is an independent project bringing ML capabilities to Rust.