HackSMU VII / iMasons · ElephantVoices challenge: remove overlapping mechanical noise (airplanes, cars, generators) from elephant field recordings without distorting the elephant calls.
EchoMap is a deterministic, ML-free DSP pipeline that exploits a structural difference rather than a frequency one — elephant rumbles are sustained harmonic bands; mechanical noise is broadband and stationary.
- Load at native sample rate (48 kHz) —
sr=Nonepreserves 10–20 Hz infrasound. - HPSS (Harmonic-Percussive Source Separation) — median filters isolate the harmonic component.
- Noise fingerprint — per-recording, built from the HPSS-harmonic signal's silent gaps so it subtracts cleanly (this was a bug in earlier versions; see
process.pydocstring). - Spectral subtraction on the harmonic signal (
step4a). - 8–1200 Hz Butterworth bandpass (
step4b) — preserves infrasound down to 8 Hz. - In-call energy ratio — a proxy metric (not a true SNR; see evaluation below for real ground truth).
Every stage writes real audio + spectrograms (harmonic.wav, gated.wav, cleaned.wav), so the UI stage-toggle shows genuine intermediates.
The in-call/out-of-call energy ratio in the UI is a cleanup proxy and can be misleading on its own. For real ground-truth evaluation we built two off-line harnesses:
5 labeled elephant rumbles × 2 noise types (generator, vehicle) × 4 input SNRs (−10, −5, 0, +5 dB) = 24 mixes. We measure scale-invariant SDR improvement against the clean reference.
| Input SNR | SI-SDR Δ (dB) | Call-band cosine |
|---|---|---|
| −10 dB (noise-dominated) | +1.9 ± 4.0 | 0.55 |
| −5 dB (heavy noise) | +2.1 ± 2.5 | 0.77 |
| 0 dB | −0.0 ± 1.6 | 0.91 |
| +5 dB (already clean-ish) | −3.8 ± 1.4 | 0.96 |
Finding: the pipeline is a low-SNR rescue tool. It improves the recordings that are currently unusable, but will damage already-clean audio — do not apply it blanket.
Reproduce: python scripts/eval/build_mixes.py && python scripts/eval/run_eval.py
Generic speech denoisers (RNNoise / Meta Denoiser / Adobe Podcast) high-pass below ~80 Hz because speech has nothing useful there — which would obliterate every elephant fundamental. We simulate that behavior with a 100 Hz Butterworth HPF + aggressive spectral gate (clearly labeled as a simulation).
| Band | Ours | Speech-style proxy |
|---|---|---|
| 8–35 Hz (elephant fundamental) | −4.9 dB (preserved) | −44.3 dB (destroyed) |
| 35–200 Hz (harmonics) | +2.2 dB | +2.4 dB |
| 200–1200 Hz (mid) | −3.4 dB | +3.8 dB |
Finding: ~39 dB advantage in the elephant fundamental band. This is why classical DSP beats a generic ML denoiser for this use case.
Reproduce: python scripts/compare/infrasound_comparison.py
cd backend && python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
cd ..
uvicorn backend.main:app --reload --port 8000Batch process the dataset:
python -m backend.pipeline.process batch data/recordings data/calls.csv outputGET /calls— all processed recordingsGET /calls/{id}— single recording detailGET /calls/{id}/audio/{stage}—hpss | gated | clean(real intermediate files)GET /calls/{id}/spectrogram/{stage}— same stagesGET /stats— aggregate metricsPOST /process— upload a.wav, run pipeline, return resultsPOST /query— natural-language query over processed-recordings metrics (Gemini-backed chat; no vision, no classification)
cd frontend
npm install
npm run devSet VITE_API_URL=http://localhost:8000 in frontend/.env.
- Not an ML model. No training data, no GPU, no weights.
- Not a call classifier.
call_typeis passthrough from the labeled CSV column. - Not a universal denoiser. It's tuned for elephant rumbles in mechanical-noise-dominated recordings; it will hurt already-clean audio (see SI-SDR table).
- Not a fundamental-frequency estimator. At n_fft=4096 / sr=48 kHz, the 8–35 Hz search range has only ~2 FFT bins — previous versions produced a meaningless 2-valued output; this field has been removed.