Skip to content

Fix overlay SSE EPIPE handling#2781

Merged
enoch85 merged 2 commits into
Maintainerr:developmentfrom
SmolSoftBoi:codex/issue-2780
Apr 27, 2026
Merged

Fix overlay SSE EPIPE handling#2781
enoch85 merged 2 commits into
Maintainerr:developmentfrom
SmolSoftBoi:codex/issue-2780

Conversation

@SmolSoftBoi

@SmolSoftBoi SmolSoftBoi commented Apr 27, 2026

Copy link
Copy Markdown
Collaborator

Description & Design

This change fixes a server crash during overlay Run Now runs when poster restore/revert work hits a broken stream or transient Plex upload failure.

The fix has two parts:

  • Adds a shared guarded SSE stream writer for logs/events so closed clients and EPIPE write failures clean up the client instead of reaching uncaughtException.
  • Hardens overlay revert handling so stale overlay-state restore failures are counted as errors, the original poster backup and state are kept for retry, and the processor continues with the remaining items.

This approach keeps the fix narrow and server-side. It avoids changing public APIs, contracts, database schema, or UI behaviour, and follows the existing retry-safe overlay design: only clear backup/state after a confirmed restore.

Trade-offs and edge cases:

  • Failed restores now report as overlay run errors rather than successful reverts.
  • If no backup exists, state is still cleared because there is nothing left to retry.
  • Reset/revert paths continue best-effort behaviour and log failures with useful error detail.
  • SSE clients that fail writes are removed immediately, which is preferable to keeping dead connections around.

Related issue

Fixes #2780

AI-Assisted Development

This PR was AI-assisted using Codex.

AI assistance was used to trace the issue, inspect the overlay revert and SSE streaming paths, draft the implementation, and add focused regression tests. I reviewed the relevant server code paths, kept the changes scoped to the existing architecture, validated that backup/state preservation still works, and ran the project quality gates.

This solution fits the project's standards because it uses existing NestJS/RxJS patterns, avoids new dependencies, keeps changes small and reviewable, preserves retry-safe overlay behaviour, and adds targeted tests for the reported failure mode.

Checklist

  • I have read the CONTRIBUTING.md document.
  • I understand the code I am submitting and can explain how it works
  • I have performed a self-review of my code
  • I have linted and formatted my code
  • My changes generate no new warnings
  • New and existing unit tests pass locally with my changes

How to test

  1. Run lint and type checks:
    yarn lint
    yarn check-types

  2. Run the focused regression tests:
    yarn --cwd apps/server jest --runTestsByPath src/modules/overlays/overlay-processor.service.spec.ts src/modules/events/events.controller.spec.ts src/modules/logging/logs.controller.spec.ts

  3. Build the workspace:
    yarn build

  4. Manual verification, if available:

    • Configure overlays with Plex.
    • Ensure at least one saved overlay state refers to an item no longer in any overlay-enabled collection.
    • Open the UI logs page, then trigger Overlays → Run Now.
    • Confirm the server does not crash if poster restore fails or the log/event stream disconnects.

Additional context

yarn build passes. The UI build may still print the existing Vite large chunk warning; this change does not introduce a new warning.

@SmolSoftBoi SmolSoftBoi requested a review from enoch85 as a code owner April 27, 2026 00:07
…antics

- Expose send() on SseStreamClient instead of the raw RxJS Subject
- Restore clearAllStates() in resetAllOverlays so reset is a hard wipe
- Drop no-op axios try/catch in plex-api uploadPoster
- Use plain warn() + debug(error) per logging convention
- RevertItemResult as discriminated string union

@enoch85 enoch85 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve this in current state.

Please test the latest pushes, and confirm it fixes your issue. Once approved, I'll merge this.

Thanks for the PR! 🚀

@enoch85 enoch85 merged commit b655789 into Maintainerr:development Apr 27, 2026
13 checks passed
@enoch85

enoch85 commented Apr 27, 2026

Copy link
Copy Markdown
Collaborator

Decided to merge this anyway to avoid conflicts on another PR I'm working on touching the same files.

It's also easier to confirm now that the changes are in development. 👍

maintainerr-automation Bot added a commit that referenced this pull request Apr 27, 2026
* fix(plex): bound request timeout and surface watch-history errors (#2773)

The runtime PlexApi client had no socket timeout, so a wedged Plex
request could hang the rule executor forever. Add PLEX_REQUEST_TIMEOUT_MS
(30s) to bound it.

Also align plexApi.getWatchHistory with the Jellyfin adapter's contract:
errors propagate so callers can distinguish a real outage from a
confirmed empty history (matches the design documented for the rules
layer's NOT_EXISTS / missing-value handling). Removes silent
.catch(() => null) swallowers in seenBy, sw_allEpisodesSeenBy and
collection_siblings_lastViewedAt that previously turned transient
lookup failures into "viewed by no one".

Refs #2771

* fix(overlays): preset edit, cron discoverability, tab gating (#2775)

* fix(plex): bound request timeout and surface watch-history errors

The runtime PlexApi client had no socket timeout, so a wedged Plex
request could hang the rule executor forever. Add PLEX_REQUEST_TIMEOUT_MS
(30s) to bound it.

Also align plexApi.getWatchHistory with the Jellyfin adapter's contract:
errors propagate so callers can distinguish a real outage from a
confirmed empty history (matches the design documented for the rules
layer's NOT_EXISTS / missing-value handling). Removes silent
.catch(() => null) swallowers in seenBy, sw_allEpisodesSeenBy and
collection_siblings_lastViewedAt that previously turned transient
lookup failures into "viewed by no one".

Refs #2771

* fix(overlays): resolve preset edit, cron discoverability, and tab gating issues

- Fix #2768: preset edit now opens the editor with a copy-on-save flow
  (info description, "Save as copy" button, name modal) and remounts on
  id change so switching to /new from a preset URL no longer leaks state.
- Add post-save modal prompting for a cron schedule when overlays are
  enabled with no schedule set, with a direct link to Job Settings.
- Gate template tabs and Run Now / Reset on persisted overlay-enabled
  state; redirect direct nav to /overlays/templates when disabled.
- Add Docs button on Overlay Settings linking to docs.maintainerr.info/overlays.
- Add useOverlaySettings query + useUpdateOverlaySettings mutation
  mirroring useSettings/usePatchSettings, so the wrapper re-evaluates
  gating after save via cache propagation.
- Extract BrandLink shared component for inline branded body-text links;
  swap manual styling in Metadata, Jobs, AddModal, Calendar, StorageMetrics.
- Replace crontab.org with crontab.guru and dedupe per-row cron links
  into a single link in the Job Settings page description.
- Suppress success alert on copy-on-save navigation (the URL change is
  the confirmation; the alert flashed for one frame and disappeared).
- Sync workspace MCP config (.mcp.json + .vscode/mcp.json) to use
  --output-dir .playwright-mcp; document screenshot path requirement.

* fix(ui): unblock tsc + fix the type errors it was hiding (#2776)

apps/ui/tsconfig.json had ignoreDeprecations: "6.0", which is invalid
for TypeScript 5.x and caused tsc to bail before type-checking. CI
exited 0 without scanning anything, so eight pre-existing type errors
went unreported.

- Remove the invalid ignoreDeprecations option (no deprecated options
  remain in the tsconfig that need silencing).
- Drop unused imports in CollectionInfo (DocumentTextIcon,
  CollectionLogDto, isMetaActionedByRule).
- Add the missing MediaItem fields (guid, addedAt, providerIds,
  mediaSources, library) to the CollectionMediaPage test fixture.
- Update the UseQueryResult / QueryClient mock casts in two specs to
  use the project's existing `as unknown as ...` double-cast pattern
  (matches 12 other specs).

* ci(fider): tighten triage script and surface pre-existing matches in re-eval

Re-evaluate workflow can now catch "feature already exists, user didn't know"
cases that the daily triage's PR-prefix gate misses (e.g. foundational
date-rule support that predates the feat:/fix: convention).

- Lift sleep + the entire model-call layer (throttle, per-run budget,
  retry-with-backoff, retry-after handling, header logging) from
  fider-triage.mjs into a createModelCaller factory in fider-shared.mjs.
- Replace the wrong "Copilot Free 15 RPM / 150 RPD" comment with the
  observed runner-token budget (1000 RPM / 1M TPM, no exposed daily cap)
  from a real re-eval run; pace 1 call/sec with an 800-call sanity ceiling.
- BudgetExhaustedError now propagates through triagePost catches so main()
  aborts cleanly with a deferred-posts log instead of burning iterations
  into the consecutive-failure abort.
- Drop the feat:/fix: PR-title prefix gate on the pre-existing path when
  forceReeval=true; daily run still enforces it for precision.
- Re-eval workflow now sets CHECK_PRE_EXISTING=true so the relaxed path
  actually runs.
- MAX_POSTS_PER_RUN 25 -> 100; consolidate the three judge*WithModel and
  build*Comment functions; collapse filterCandidates +
  filterPreExistingCandidates into one parametric filterPRs.

Net: triage script 844 -> 649 lines (-23%), shared +117 for the lifted
infra; behaviour preserved end-to-end (comment markers byte-identical,
tag slugs unchanged, quote-validation rejection paths preserved).

* ci(fider): @-mention OP and nudge to split bundled requests in possibly-pre-existing comment

Both pre-existing flags from the latest re-eval run were partial matches
caused by multi-ask FRs: the bot quoted a PR that delivered one bullet
of a wishlist and tagged the post pre-existing. The new footer addresses
the OP directly via @-mention (already returned on every post by
/api/v1/posts) and asks them to split bundled requests so each can be
triaged independently. The long maintainer-facing "lower confidence than
possibly-completed" preamble that the OP didn't need is gone.

Comment-marker idempotency means existing pre-existing comments will be
edited (not duplicated) to the new text on the next re-eval run.

* Fix overlay SSE EPIPE handling (#2781)

* Fix overlay SSE EPIPE handling

* refactor(overlays,sse): tighten SSE writer API and overlay revert semantics

- Expose send() on SseStreamClient instead of the raw RxJS Subject
- Restore clearAllStates() in resetAllOverlays so reset is a hard wipe
- Drop no-op axios try/catch in plex-api uploadPoster
- Use plain warn() + debug(error) per logging convention
- RevertItemResult as discriminated string union

---------

Co-authored-by: enoch85 <mailto@danielhansson.nu>

---------

Co-authored-by: enoch85 <mailto@danielhansson.nu>
Co-authored-by: Kristian Matthews-Kennington <kristian@matthews-kennington.com>
@enoch85 enoch85 added this to the 3.9.0 milestone Apr 28, 2026 — with GitHub Codespaces
@maintainerr-automation

Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 3.9.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

@enoch85

enoch85 commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator

Seems still to be an issue: #2807

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Overlay "Run Now" crashes server with uncaught EPIPE during poster restore/revert

2 participants