Elephant rumble denoising for recordings with mechanical noise (vehicles, aircraft, generators). Multi-stage DSP pipeline + U-Net spectrogram masking with harmonic template input and overlap-aware fusion.
Training phase (uses CSV annotations):
pipeline.pyruns Stage A–D (spectral gating, harmonic extraction, Wiener masking, optional U-Net) to produce the cleanest possible training targetsdataset.py --cachebuilds training data (noisy/clean pairs + harmonic templates + overlap labels)train_unet.pytrains the 2-channel U-Net to predict a soft mask (bounded below byUNET_MASK_OUTPUT_FLOOR), with log-magnitude L1 + rumble-band weighting + mask smoothness (seeconfig.py)
Inference phase (WAV-only, no CSV):
- Auto spectral gating (noise profiled from quietest segments)
- pyin F0 tracking generates harmonic template (annotation-free)
- U-Net predicts soft mask + overlap count from 2-channel input
- Fusion: mask + F0 tracks + overlap count protect elephant calls
- Output: cleaned WAV + mask overlay + F0 contour + stage spectrograms
ElephantsNoise / ElephantVoices (hrhuynguyen/ElephantsNoise): model-based harmonic-plus-noise separation with rumble annotations (Sound_file, Start_time, End_time, Call_type). It uses the same master CSV schema as this repo’s hackathon file and WAVs under config.AUDIO_DIR (for example drive-download-20260411T200349Z-3-001/).
# One file (defaults: master CSV + drive-download audio dir from config.py)
python3 run_harmonic.py clean "drive-download-20260411T200349Z-3-001/090224-09_generator_01.wav"
# All annotated files in the audio folder
python3 run_harmonic.py batch --no-plots
# Explicit paths
python3 run_harmonic.py clean "path/to/file.wav" \
-a "Audio Files Master (04_10_2026) - 20260324_rumbles_in_noise_for_hackathon.csv" \
-o output/harmonic_elephantsnoiseThe EleVox UI can run this path when you choose Harmonic EM (CSV) — the uploaded filename must match a Sound_file entry in the master CSV. The deep-learning U-Net path stays annotation-free.
Vendored code lives in elephantvoices/; see elephantvoices/SOURCE.md for attribution and license.
pip3 install -r requirements.txtPlace hackathon WAV files in a folder and set AUDIO_DIR in config.py (or use --audio-dir).
# WAV-only inference (deployed path — no CSV needed)
python3 unet_denoise.py --no-csv "any_recording.wav"
python3 unet_denoise.py --no-csv --two-pass # all WAVs in audio dir
# Strong Stage A + post NR + 2-pass U-Net (mask refine stays harmonic-friendly; add --aggressive if needed)
python3 unet_denoise.py --no-csv --heavy-mechanical "noisy_recording.wav"
# API-friendly: cleaned WAV + single before/after PNG only
python3 unet_denoise.py --no-csv --heavy-mechanical --comparison-only "file.wav"
# Full DSP pipeline with annotations (training target generation)
python3 pipeline.py --unet
# Prepare training cache
python3 dataset.py --cache training_cache.npz
# Train U-Net (GPU recommended)
python3 train_unet.py --cache training_cache.npz --epochs 100 --patches-per-epoch 5000
# Evaluate (runs WAV-only inference, scores with CSV)
python3 evaluate.py
# Track F0 contours from raw audio
python3 f0_tracker.py "recording.wav"
# EleVox web UI (HackSMU frontend) + EleSonic API — see hacksmu/README.md
pip install flask flask-cors
python3 hacksmu/app.py| File | Purpose |
|---|---|
config.py |
Centralized configuration (STFT params, paths, tuning knobs) |
pipeline.py |
4-stage DSP pipeline (training target factory) |
unet_model.py |
2-channel U-Net with mask + overlap count heads |
unet_denoise.py |
WAV-only inference with F0 fusion |
dataset.py |
Training data generation with harmonic templates |
train_unet.py |
Training loop: soft-mask loss + log-mag L1 + TV + overlap CE |
f0_tracker.py |
Multi-F0 tracking + harmonic validation |
evaluate.py |
Per-call metrics + category comparison |
utils.py |
Audio I/O, spectrogram helpers, visualization |
run_harmonic.py |
ElephantsNoise EM pipeline CLI (CSV + WAV) |
elephantvoices/ |
Vendored harmonic-plus-noise implementation (upstream) |
Re-generate DSP cleaned targets after changing pipeline constants, then re-cache and train.
| Run | Settings |
|---|---|
| A — Baseline | Previous checkpoint or train with UNET_MASK_OUTPUT_FLOOR=0, LOSS_MASK_TV_WEIGHT=0, CALL_CENTERED_CROP_RATIO=0, TRAINING_IDEAL_MASK_MIN=0, standard loss weights (revert temporarily in config.py if comparing fairly). |
| B — Soft mask + weighted loss | Defaults in config.py: UNET_MASK_OUTPUT_FLOOR≈0.14, log-mag L1 with 3× / 1.5× weights 10–150 Hz / 150–300 Hz, LOSS_MASK_TV_WEIGHT≈0.10, TRAINING_IDEAL_MASK_MIN≈0.07. |
| C — B + call-centered crops | Same as B with CALL_CENTERED_CROP_RATIO=0.70 (default). |
Protocol: (1) python3 pipeline.py --unet (or without U-Net) to refresh output/cleaned/. (2) python3 dataset.py --cache training_cache.npz. (3) Train ~30–50 epochs; compare spectrograms on generator-heavy files and evaluate.py harmonic preservation on call windows.
All outputs go under output/ (gitignored):
output/unet_cleaned/— cleaned WAV filesoutput/unet_cleaned/spectrograms/— before/after comparisons, mask overlays, F0 plots, stage viewsoutput/model/— trained model checkpointsoutput/evaluation/— evaluation metrics CSV
Hackathon / research use — verify ElephantVoices asset terms for redistribution of recordings.