DASMatrix is a high-performance Python framework specifically designed for Distributed Acoustic Sensing (DAS) data processing and analysis. This framework provides a comprehensive toolkit for reading, processing, analyzing, and visualizing DAS data, suitable for research and applications in geophysics, structural health monitoring, and security surveillance.
- π High-Efficiency Data Reading: Support for 12+ data formats (DAT, HDF5, PRODML, Silixa, Febus, Terra15, APSensing, ZARR, NetCDF, SEG-Y, MiniSEED, TDMS) with Lazy Loading
- β‘ HPC Engine: Built on Xarray and Dask for TB-level Out-of-Core processing with Numba JIT-optimized kernels and operator fusion
- π Fluent Chainable API: Intuitive signal processing workflows through
DASFrame - π§ AI Inference Integration: Native support for PyTorch and ONNX models with high-performance inference pipelines
- π Professional Signal Processing: Comprehensive tools including spectral analysis, filtering, integration, and event detection
- π€ Intelligent Agent Tools: AI Agent toolkit supporting natural-language driven deep analysis and automated discovery
- π Scientific-Grade Visualization: Multiple plot types including time-domain waveforms, spectra, spectrograms, waterfalls
- π Unit System: First-class physical unit support via Pint integration
- π² Built-in Examples: Easy generation of synthetic data (sine waves, chirps, events) for testing
# Clone the repository
git clone https://github.com/QIanGua/DASMatrix.git
cd DASMatrix
# Install with uv (automatically creates virtual environment)
uv sync# Clone the repository
git clone https://github.com/QIanGua/DASMatrix.git
cd DASMatrix
# Install with pip
pip install -e .from DASMatrix import df
# Create DASFrame with lazy loading
frame = df.read("data.h5")
# Build processing pipeline
processed = (
frame
.detrend(axis="time") # Remove trend
.bandpass(1, 500) # Bandpass filter
.normalize() # Normalize
)
# Advanced Analysis (Modern STFT API)
stft_frame = frame.stft(nperseg=1024, noverlap=512)
# Execute computation (Parallelized via Dask)
result = processed.collect()
# Optional: use HybridEngine for supported ops
# Supported ops:
# - slice
# - detrend(time)
# - demean(time)
# - abs
# - scale
# - normalize
# - bandpass
# - lowpass
# - highpass
# - notch
# - fft
# - hilbert
# - fk_filter
# - median_filter
# - stft
result_hybrid = processed.collect(engine="hybrid")
# Scientific Visualization (with Auto-Decimation Protection)
processed.plot_heatmap(title="HPC Waterfall", max_samples=2000)The project now standardizes on snake_case APIs. Legacy CamelCase methods remain available for one compatibility cycle and emit DeprecationWarning.
| Legacy API | Preferred API |
|---|---|
reader.ReadRawData(path) |
reader.read_raw_data(path) |
processor.FKFilter(...) |
processor.fk_filter(...) |
processor.ComputeSpectrum(...) |
processor.compute_spectrum(...) |
processor.FindPeakFrequencies(...) |
processor.find_peak_frequencies(...) |
DASMatrix.api.stream_func(...) |
DASMatrix.stream(...) |
from DASMatrix.acquisition import DASReader, DataType
from DASMatrix.config import SamplingConfig
# Configure sampling parameters
config = SamplingConfig(
fs=10000, # Sampling frequency 10kHz
channels=512, # 512 channels
wn=5.0, # 5Hz high-pass filter
byte_order="big"
)
# Read data
reader = DASReader(config, DataType.DAT)
raw_data = reader.read_raw_data("path/to/data.dat")from DASMatrix.visualization import DASVisualizer
import matplotlib.pyplot as plt
# Create visualizer
visualizer = DASVisualizer(
output_path="./output",
sampling_frequency=config.fs
)
# Time-domain waveform
visualizer.WaveformPlot(
data[:, 100], # Channel 100 data
time_range=(0, 10), # Show 0-10 seconds
amplitude_range=(-0.5, 0.5),
title="Waveform Plot",
file_name="waveform_ch100"
)
# Spectrum plot
visualizer.SpectrumPlot(
data[:, 100],
title="Spectrum Plot",
db_range=(-80, 0),
file_name="spectrum_ch100"
)
# Spectrogram
visualizer.SpectrogramPlot(
data[:, 100],
freq_range=(0, 500),
time_range=(0, 10),
cmap="inferno",
file_name="spectrogram_ch100"
)
# Waterfall plot (time-channel)
visualizer.WaterfallPlot(
data,
title="Waterfall Plot",
colorbar_label="Amplitude",
value_range=(-0.5, 0.5),
file_name="waterfall"
)
plt.show()from DASMatrix.ml.model import ONNXModel
# Load optimized model
model = ONNXModel("path/to/model.onnx")
# Predict directly in processing chain
predictions = (
df.read("data.h5")
.bandpass(10, 100)
.normalize()
.predict(model) # Returns inference results
)
# Use Intelligent Agent Tools
from DASMatrix.agent import DASAgentTools
agent_tools = DASAgentTools()
# Inference orchestrated by an LLM-based Agent via natural language
result = agent_tools.run_inference(data_id="...", model_path="...")- Full Documentation: Complete API reference and tutorials
- Examples: Practical usage examples
- API Reference: Detailed API documentation
- δΈζζζ‘£: Chinese documentation
DASMatrix/
βββ acquisition/ # Data acquisition module
β βββ formats/ # Format plugins
β βββ das_reader.py # DAS data reader class
βββ api/ # Core API
β βββ dasframe.py # DASFrame (Xarray/Dask Backend)
β βββ df.py # Functional API entry points
βββ ml/ # [NEW] AI/Machine Learning Module
β βββ model.py # Model Wrappers (Torch/ONNX)
β βββ pipeline.py # Inference Pipelines
β βββ exporter.py # Model Export Utilities
βββ agent/ # [NEW] Agent Engineering Framework
β βββ tools.py # Intelligent Analysis Toolkit
β βββ session.py # Task Session Management
βββ config/ # Configuration module
β βββ sampling_config.py # Sampling configuration
β βββ visualization_config.py # Visualization configuration
βββ processing/ # Data processing module
β βββ das_processor.py # DAS data processor class
β βββ numba_filters.py # Numba-optimized filters
β βββ engine.py # Computation graph engine
βββ visualization/ # Visualization module
β βββ das_visualizer.py # DAS visualization class
βββ units.py # Unit system (Pint-based)
βββ examples.py # Example data generation
βββ utils/ # Utility functions
βββ time.py # Time conversion tools
DASMatrix is engineered for massive DAS datasets:
- Zero-Copy Loading: Utilizing
np.memmapfor binary formats to index TBs of data in milliseconds.
- Kernel Fusion: Multiple operations (e.g.,
demean -> filter -> abs) are fused into a single machine-code loop via Numba, minimizing memory traffic. - Lazy Computation Graph: Every operation returns a lazy
DASFrame. Real computation only happens when you explicitlycollect()orplot(). - Auto-Decimation: Interactive visualization of huge datasets is protected by automatic downsampling to keep your UI responsive.
# Install development dependencies
uv sync --dev
# Run tests
just test
# If tests hang at "collecting ..." due to Matplotlib font cache,
# use a writable cache directory:
MPLCONFIGDIR=/tmp/mplcache MPLBACKEND=Agg just test
# If uv crashes on macOS due to SystemConfiguration proxy access,
# ensure system proxy is configured (example uses local proxy):
sudo networksetup -setwebproxy "Wi-Fi" 127.0.0.1 7890
sudo networksetup -setsecurewebproxy "Wi-Fi" 127.0.0.1 7890
sudo networksetup -setsocksfirewallproxy "Wi-Fi" 127.0.0.1 7890
# Restore proxy settings:
sudo networksetup -setwebproxystate "Wi-Fi" off
sudo networksetup -setsecurewebproxystate "Wi-Fi" off
sudo networksetup -setsocksfirewallproxystate "Wi-Fi" off
# Run performance benchmarks
just benchmark
# Code quality checks
just check-all
# Quick fixes
just fix-all- Ruff: Linting and formatting
- MyPy: Type checking
- Pre-commit hooks: Automatic code quality checks
- GitHub Actions: CI/CD pipeline
We welcome contributions, issues, and suggestions! Please participate in project development through GitHub Issues and Pull Requests.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License.
