Skip to content

dothanhtam91/EleVox

Repository files navigation

EleSonic

Elephant rumble denoising for recordings with mechanical noise (vehicles, aircraft, generators). Multi-stage DSP pipeline + U-Net spectrogram masking with harmonic template input and overlap-aware fusion.

Architecture

Training phase (uses CSV annotations):

  1. pipeline.py runs Stage A–D (spectral gating, harmonic extraction, Wiener masking, optional U-Net) to produce the cleanest possible training targets
  2. dataset.py --cache builds training data (noisy/clean pairs + harmonic templates + overlap labels)
  3. train_unet.py trains the 2-channel U-Net to predict a soft mask (bounded below by UNET_MASK_OUTPUT_FLOOR), with log-magnitude L1 + rumble-band weighting + mask smoothness (see config.py)

Inference phase (WAV-only, no CSV):

  1. Auto spectral gating (noise profiled from quietest segments)
  2. pyin F0 tracking generates harmonic template (annotation-free)
  3. U-Net predicts soft mask + overlap count from 2-channel input
  4. Fusion: mask + F0 tracks + overlap count protect elephant calls
  5. Output: cleaned WAV + mask overlay + F0 contour + stage spectrograms

Second approach — harmonic EM (CSV required)

ElephantsNoise / ElephantVoices (hrhuynguyen/ElephantsNoise): model-based harmonic-plus-noise separation with rumble annotations (Sound_file, Start_time, End_time, Call_type). It uses the same master CSV schema as this repo’s hackathon file and WAVs under config.AUDIO_DIR (for example drive-download-20260411T200349Z-3-001/).

# One file (defaults: master CSV + drive-download audio dir from config.py)
python3 run_harmonic.py clean "drive-download-20260411T200349Z-3-001/090224-09_generator_01.wav"

# All annotated files in the audio folder
python3 run_harmonic.py batch --no-plots

# Explicit paths
python3 run_harmonic.py clean "path/to/file.wav" \
  -a "Audio Files Master (04_10_2026) - 20260324_rumbles_in_noise_for_hackathon.csv" \
  -o output/harmonic_elephantsnoise

The EleVox UI can run this path when you choose Harmonic EM (CSV) — the uploaded filename must match a Sound_file entry in the master CSV. The deep-learning U-Net path stays annotation-free.

Vendored code lives in elephantvoices/; see elephantvoices/SOURCE.md for attribution and license.

Setup

pip3 install -r requirements.txt

Place hackathon WAV files in a folder and set AUDIO_DIR in config.py (or use --audio-dir).

Run

# WAV-only inference (deployed path — no CSV needed)
python3 unet_denoise.py --no-csv "any_recording.wav"
python3 unet_denoise.py --no-csv --two-pass       # all WAVs in audio dir
# Strong Stage A + post NR + 2-pass U-Net (mask refine stays harmonic-friendly; add --aggressive if needed)
python3 unet_denoise.py --no-csv --heavy-mechanical "noisy_recording.wav"
# API-friendly: cleaned WAV + single before/after PNG only
python3 unet_denoise.py --no-csv --heavy-mechanical --comparison-only "file.wav"

# Full DSP pipeline with annotations (training target generation)
python3 pipeline.py --unet

# Prepare training cache
python3 dataset.py --cache training_cache.npz

# Train U-Net (GPU recommended)
python3 train_unet.py --cache training_cache.npz --epochs 100 --patches-per-epoch 5000

# Evaluate (runs WAV-only inference, scores with CSV)
python3 evaluate.py

# Track F0 contours from raw audio
python3 f0_tracker.py "recording.wav"

# EleVox web UI (HackSMU frontend) + EleSonic API — see hacksmu/README.md
pip install flask flask-cors
python3 hacksmu/app.py

Key Files

File Purpose
config.py Centralized configuration (STFT params, paths, tuning knobs)
pipeline.py 4-stage DSP pipeline (training target factory)
unet_model.py 2-channel U-Net with mask + overlap count heads
unet_denoise.py WAV-only inference with F0 fusion
dataset.py Training data generation with harmonic templates
train_unet.py Training loop: soft-mask loss + log-mag L1 + TV + overlap CE
f0_tracker.py Multi-F0 tracking + harmonic validation
evaluate.py Per-call metrics + category comparison
utils.py Audio I/O, spectrogram helpers, visualization
run_harmonic.py ElephantsNoise EM pipeline CLI (CSV + WAV)
elephantvoices/ Vendored harmonic-plus-noise implementation (upstream)

Ablation experiments (rumble preservation)

Re-generate DSP cleaned targets after changing pipeline constants, then re-cache and train.

Run Settings
A — Baseline Previous checkpoint or train with UNET_MASK_OUTPUT_FLOOR=0, LOSS_MASK_TV_WEIGHT=0, CALL_CENTERED_CROP_RATIO=0, TRAINING_IDEAL_MASK_MIN=0, standard loss weights (revert temporarily in config.py if comparing fairly).
B — Soft mask + weighted loss Defaults in config.py: UNET_MASK_OUTPUT_FLOOR≈0.14, log-mag L1 with 3× / 1.5× weights 10–150 Hz / 150–300 Hz, LOSS_MASK_TV_WEIGHT≈0.10, TRAINING_IDEAL_MASK_MIN≈0.07.
C — B + call-centered crops Same as B with CALL_CENTERED_CROP_RATIO=0.70 (default).

Protocol: (1) python3 pipeline.py --unet (or without U-Net) to refresh output/cleaned/. (2) python3 dataset.py --cache training_cache.npz. (3) Train ~30–50 epochs; compare spectrograms on generator-heavy files and evaluate.py harmonic preservation on call windows.

Outputs

All outputs go under output/ (gitignored):

  • output/unet_cleaned/ — cleaned WAV files
  • output/unet_cleaned/spectrograms/ — before/after comparisons, mask overlays, F0 plots, stage views
  • output/model/ — trained model checkpoints
  • output/evaluation/ — evaluation metrics CSV

License

Hackathon / research use — verify ElephantVoices asset terms for redistribution of recordings.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors