Inspiration Amidst the cacophony of mechanical noise drowning out elephant rumbles, we were inspired to create RumbleClean—a tool to amplify these infrasonic voices for science and conservation. Elephant rumbles, with fundamentals as low as 8 Hz and harmonic stacks, are vital for understanding elephant cognition, yet noise from generators, aircraft, and cars overlaps in the same frequency bands, rendering recordings unusable. In a 24-hour hackathon, we tackled this challenge to turn noisy data into measurable insights for ElephantVoices and global bioacoustics.
What it does RumbleClean employs annotation-driven semi-supervised source separation to isolate elephant vocalizations from overlapping mechanical noise in monophonic recordings. Utilizing provided WAV files and CSV-annotated call intervals, it constructs per-recording noise models via spectral factorization, applies soft ratio masking for harmonic-preserving reconstruction, and outputs denoised audio tracks, segmented clips, comparative spectrograms, and quantitative proxies for low-band noise attenuation and harmonic ridge continuity.
How we built it The core architecture integrates low-frequency-optimized Short-Time Fourier Transform (STFT) with Hann windowing and FFT sizes tuned for 10-30 Hz resolution (e.g., 8192-point windows at original sample rates or downsampled to ≤1000 Hz for enhanced temporal localization). Semi-supervised Non-Negative Matrix Factorization (NMF) learns spectral basis vectors ("atoms") from annotation-defined noise-only frames, fixing these during factorization to isolate residual elephant components. Wiener-inspired soft masking reconstructs the signal using mixture phase, minimizing artifacts via 2D Gaussian smoothing. Implemented in Python with NumPy/SciPy for signal processing, Librosa for spectrogram utilities, and Frobenius norm minimization for NMF convergence. Includes a CLI interface and optional Gradio-based UI for interactive spectrogram visualization.
Challenges we ran into Resolving infrasonic harmonics demanded precise STFT parameterization to avoid temporal-frequency trade-offs, while variable noise characteristics (e.g., stationary vs. Doppler-shifted sources) necessitated adaptive per-recording atom learning. Semi-supervised NMF convergence was sensitive to initialization, and soft masking required careful tuning to preserve phase coherence without introducing musical noise. Time constraints limited exhaustive validation across all noise categories.
Accomplishments that we're proud of Developed a robust pipeline achieving >80% low-band noise suppression while maintaining harmonic structure integrity, validated through proxy metrics and side-by-side spectrograms. Leveraged weak supervision from annotations to outperform unsupervised baselines, delivering a reproducible tool with 18 structured GitHub issues in a 24-hour sprint.
What we learned Infrasonic bioacoustics requires domain-specific signal processing; annotations enable effective semi-supervised learning in sparse-data scenarios. NMF's spectral factorization excels at monophonic separation when guided by prior knowledge, and soft masking preserves perceptual quality better than hard thresholding.
What's next for RumbleClean Incorporate multi-resolution STFT for simultaneous low- and mid-band processing, integrate deep learning for automated transcription, and scale to multi-channel recordings. Collaborate with ElephantVoices for deployment in large-scale ethogram databases, enhancing conservation through AI-driven behavioral analysis.
Log in or sign up for Devpost to join the conversation.