hacksmu

Elephant rumble denoising workspace with a browser-first local web app.

The UI is now the primary interface. The backend roadmap still follows the same scientific shape: annotation-aware ingestion, low-frequency spectrogram design, separation, reconstruction, and evidence generation.

Real project assets

The repo contains the actual hackathon data assets:

audioFiles/: real source WAV recordings
spectrograms/: per-selection spectrogram images
Audio Files Master (04_10_2026) - 20260324_rumbles_in_noise_for_hackathon.csv: canonical annotation spreadsheet export
audio_files_master_extracted.txt: text extraction of the same master file for reference and debugging
pdf_extracted.txt: extracted text from the project-method PDF

Use the CSV as the canonical interval source. The extracted text files are reference material only.

Method summary

For the short scientific rationale behind the denoising pipeline, see METHOD.md.

That note explains:

why the STFT is tuned specifically for sub-30 Hz elephant rumble structure
how annotation-guided per-recording NMF separates elephant energy from mechanical noise
why Wiener-style soft masking is better than simple filtering or generic deep separators for this task
where to find the current example outputs and proxy metrics

Current app shape

Backend

rumbleclean/ currently provides the ingestion layer:

discovers .wav recordings
parses .csv, .tsv, and .xlsx interval sheets
recognizes the real master CSV columns: Selection, Sound_file, Start_time, End_time, Call_type
returns joined recording bundles with audio, sample_rate, and intervals
computes low-frequency spectrograms for both direct and downsampled modes
derives complementary noise-only time spans and spectrogram frame selections from the annotation intervals

Web API

webapp/ now provides the real browser workflow API:

GET /api/health
GET /api/heroes
GET /api/heroes/<recording_id>
POST /api/runs/heroes/<recording_id>
POST /api/runs/uploads
GET /api/runs/<run_id>
POST /api/uploads/inspect
GET /artifacts/runs/<run_id>/<path:filename>

Hero presets and constrained uploads both end in the same manifest-backed run bundle under artifacts/runs/.

Frontend

frontend/ is a React + Vite app and is now the real primary interface. The current app supports:

a polished landing page
a hero recordings browser
a hero detail page that can run the backend pipeline
a constrained upload flow for .wav + .csv/.tsv/.xlsx
a manifest-backed results page with audio, plots, metrics, and clip links

Local development

1. Install Python dependencies

.\.venv\Scripts\python -m pip install -r requirements.txt

2. Install frontend dependencies

npm --prefix frontend install

3. Start the Flask API

.\.venv\Scripts\python -m webapp.app

This runs the local API on http://127.0.0.1:5000.

4. Start the React frontend

npm --prefix frontend run dev

This starts the browser UI on http://127.0.0.1:5173. The Vite dev server proxies /api and /spectrograms to Flask.

5. Run the app

Open http://127.0.0.1:5173
Choose a hero preset or upload your own files
The browser will run the pipeline and redirect to /results/<run_id>
Each run writes its exported bundle under artifacts/runs/<run_id>/

Backend usage example

from rumbleclean import (
    build_dataset,
    compute_downsampled_spectrogram,
    extract_noise_only_spectrogram,
    load_recording_bundle,
)

dataset = build_dataset(
    "audioFiles",
    "Audio Files Master (04_10_2026) - 20260324_rumbles_in_noise_for_hackathon.csv",
)
bundle = dataset["04-040920-02_vehicle_1"]

audio, sample_rate, intervals = load_recording_bundle(
    "audioFiles/04-040920-02_vehicle_1.wav",
    "Audio Files Master (04_10_2026) - 20260324_rumbles_in_noise_for_hackathon.csv",
)
spectrogram = compute_downsampled_spectrogram(audio, sample_rate)
noise_only = extract_noise_only_spectrogram(
    spectrogram,
    intervals,
    total_duration_seconds=len(audio) / sample_rate,
)

load_recording_bundle() returns (audio, sample_rate, intervals) where:

audio is a NumPy array of normalized PCM samples
sample_rate is the WAV sample rate as an integer
intervals is a tuple of typed annotation intervals sorted by start time
each interval may include a selection_id when the source sheet provides one

Noise-only selection uses the complement of annotated call intervals and maps those spans onto spectrogram frame-center timestamps. That logic works for both spectrogram modes, which means later NMF training can consume the same interface whether the project is using the original-rate large FFT path or the faster downsampled path.

Semi-supervised separation

rumbleclean/separation.py adds the first semi-supervised NMF slice for Phase 3:

fit_noise_basis(...) learns a compact non-negative noise_basis from noise-only spectrogram columns
semi_supervised_nmf(...) keeps that noise_basis fixed while learning:
- elephant_basis
- noise_activation
- elephant_activation
the result exposes separate noise_component and elephant_component magnitude estimates plus the combined reconstruction
the browser pipeline now passes the spectrogram frequency axis into a rumble-aware weighting path so the ~10-35 Hz band is less likely to be over-explained by the noise model

This means the pipeline does not need clean elephant-only training examples. Later Issue 3.4 can consume those separated magnitude components to build soft masks and reconstruct audio with the mixture phase.

Phase-aware reconstruction

Phase 4.1 formalizes the reconstruction step on top of the Phase 3 mask:

reconstruct_elephant_signal(...) reuses the original mixture phase from SpectrogramResult.complex_spectrogram
the helper applies the elephant ratio mask to the complex mixture STFT rather than estimating a new phase
the returned PhaseReconstructionResult includes the reconstructed waveform, mask, masked complex spectrogram, and mode/config metadata

This works for both direct and downsampled spectrogram modes because inversion uses the stored spectrogram config. The reconstructed sample rate currently follows the analysis mode, which means downsampled reconstructions come back at the downsampled analysis rate until later export work decides how to upsample or package them.

Run artifact bundles

Phase 4.3 adds a unified export contract for hero runs now and upload runs later:

export_run_artifacts(...) writes each run under artifacts/runs/<run_id>/
manifest.json is the canonical entrypoint for the UI and download flows
the bundle includes:
- audio/cleaned_full_track.wav
- metrics.txt
- clips/*.wav for per-call exports
- plots/before_spectrogram.svg
- plots/after_spectrogram.svg
- plots/comparison_spectrogram.svg
- plots/baseline_reference.txt

Phase 5.2 adds a compact proxy-metric report to each run bundle:

Noise suppression (dB) compares 0-60 Hz energy outside annotated calls before vs after separation
Harmonic alignment score compares the dominant 10-40 Hz ridge inside annotated calls before vs after separation

The manifest stores relative paths so the same bundle format works whether the source came from a hero preset or a future uploaded recording.

Verification

Run the Python test suites:

.\.venv\Scripts\python -m unittest tests.test_dataset tests.test_hero_files tests.test_spectrogram tests.test_noise_segments tests.test_noise_dictionary tests.test_separation tests.test_exports tests.test_metrics tests.test_webapp -v

Run the frontend tests:

npm --prefix frontend test -- --run

Run the frontend production build:

npm --prefix frontend run build

Roadmap

Use ROADMAP.md for phase tracking and issue-level progress.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hacksmu

Real project assets

Method summary

Current app shape

Backend

Web API

Frontend

Local development

1. Install Python dependencies

2. Install frontend dependencies

3. Start the Flask API

4. Start the React frontend

5. Run the app

Backend usage example

Semi-supervised separation

Phase-aware reconstruction

Run artifact bundles

Verification

Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
artifacts		artifacts
audioFiles		audioFiles
config		config
docs		docs
frontend		frontend
rumbleclean		rumbleclean
spectrograms		spectrograms
tests		tests
webapp		webapp
.gitignore		.gitignore
Audio Files Master (04_10_2026) - 20260324_rumbles_in_noise_for_hackathon.csv		Audio Files Master (04_10_2026) - 20260324_rumbles_in_noise_for_hackathon.csv
Audio Files Master (04_10_2026) - 20260324_rumbles_in_noise_for_hackathon.pdf		Audio Files Master (04_10_2026) - 20260324_rumbles_in_noise_for_hackathon.pdf
GITHUB_ISSUES.md		GITHUB_ISSUES.md
METHOD.md		METHOD.md
README.md		README.md
ROADMAP.md		ROADMAP.md
Winning Plan for Elephant Rumble Denoising Under Extreme Time Pressure (1).pdf		Winning Plan for Elephant Rumble Denoising Under Extreme Time Pressure (1).pdf
audio_files_master_extracted.txt		audio_files_master_extracted.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

hacksmu

Real project assets

Method summary

Current app shape

Backend

Web API

Frontend

Local development

1. Install Python dependencies

2. Install frontend dependencies

3. Start the Flask API

4. Start the React frontend

5. Run the app

Backend usage example

Semi-supervised separation

Phase-aware reconstruction

Run artifact bundles

Verification

Roadmap

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages