Skip to content

[codex] Skip active audio device restart checks#3963

Merged
louis030195 merged 1 commit into
mainfrom
codex/skip-active-audio-device-restarts
Jun 10, 2026
Merged

[codex] Skip active audio device restart checks#3963
louis030195 merged 1 commit into
mainfrom
codex/skip-active-audio-device-restarts

Conversation

@louis030195

Copy link
Copy Markdown
Collaborator

Summary

  • skip the audio device monitor's per-cycle start_device call when the parsed device already has a live, non-disconnected stream
  • keep the existing 2s monitor cadence and recovery paths for stale, missing, disconnected, or user-disabled devices

Why

The monitor sweep runs continuously while audio is active. Calling start_device on already-active devices every pass adds avoidable async/lock work in the idle path even though recovery is only needed when the stream is not active.

Validation

  • cargo fmt --package screenpipe-audio -- --check
  • cargo test -p screenpipe-audio device_monitor --lib
  • cargo check -p screenpipe-audio (passes; pre-existing unused mut warning in core/stream.rs)

@github-actions

Copy link
Copy Markdown
Contributor

Diarization eval results

Source: crates/screenpipe-audio-eval/evals/ · VoxConverse dev (CC-BY-4.0) + composed workday templates + screenpipe-shaped LibriSpeech fixtures

fixture DER VAD FA VAD FN boundary err (s) continuity predicted / true spk
interrupted_meeting 0.186 0.01 0.063 20.286 0.833 9 / 5
long_silence_day 0.437 0.011 0.145 11.46 0.7 14 / 10
screenpipe_meeting_rapid_handoffs 0.241 0.196 0.099 2.305 1 5 / 3
screenpipe_background_24_7_day 0.315 0.025 0.159 2.203 1 4 / 3
screenpipe_short_backchannels 0.561 0.915 0.064 0.488 n/a 3 / 3
screenpipe_mic_system_echo_leakage 0.275 0.198 0.084 3.045 0.667 5 / 3
screenpipe_overlap_crosstalk 0.254 0.84 0.042 0.667 n/a 3 / 3
abjxc 0.016 0.098 0.002 1.151 n/a 2 / 1
bxpwa 0.111 0.453 0.029 20.793 0.714 8 / 5
dhorc 0.143 0.461 0.034 3.681 1 5 / 4

DER, VAD FA, VAD FN, boundary err: lower is better. Continuity: higher is better, 1.0 = same hyp cluster across all silence gaps. Composed workday rows and screenpipe_* rows exercise screenpipe-shaped usage: meetings, background gaps, backchannels, echo leakage, and crosstalk. Raw VoxConverse rows score broadcast-quality stems for comparison. See crates/screenpipe-audio-eval/evals/README.md for methodology.

Pipeline replay matrix

Source: generated screenpipe_* fixtures materialized into temp screenpipe SQLite DBs, then read back through search_audio. This catches storage/search regressions that pure DER scoring misses.

scenarios passed failed skipped avg background DER avg background speaker err Deepgram
41 40 0 1 0.329 0.183 skip

The no-secret CI matrix runs local diarization under Parakeet/Whisper engine labels across live/background and mic/system device profiles. Real Deepgram/screenpipe-cloud smoke can be run locally with --deepgram required when credentials are present.

Transcription quality

Source: LibriSpeech test-clean (CC-BY-4.0) · per-model utterance cap · normalized lowercased word-level Levenshtein

model utterances WER CER throughput (samples/s)
tiny 50 0.085 0.033 53760
whisper-large-v3-turbo-quantized 20 0.042 0.009 1839
parakeet 50 0.04 0.026 219930

WER + CER on read-aloud speech. Per-model utterance caps keep wall time bounded — tiny/parakeet at 50, the heavier large-v3-turbo-quantized at 20. See README for normalization rules.

@louis030195 louis030195 marked this pull request as ready for review June 10, 2026 04:58
@louis030195 louis030195 merged commit 58fab8d into main Jun 10, 2026
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant