Make PageList/Search concurrency-safe for background search (Part 2 of #189) #8850

dave-fl · 2025-09-23T01:26:55Z

Prepares the terminal core for multi-threaded search by making PageList mutation safe to run alongside a background PageListSearch. Part 2 of #189.

Introduce std.Thread.RwLock-backed WriteGuard/ReadGuard on PageList; writers take the re-entrant guard, which drains pending frees and bumps the snapshot epoch exactly once per structural mutation.
Add per-node refcounts, serials, and a pending-free stack; retainNode/releaseNode keep nodes
alive across concurrent readers and defer frees to the IO thread.
Prevent stale reuse during pruning by skipping nodes whose refcount is still held and quarantining them until released.
Teach PageListSearch and SlidingWindow to retain/release nodes, hold the shared read guard only
while sampling, and rescan when the pagelist epoch changes so stale selections are dropped.
Extend coverage with PageListSearch drops stale nodes after epoch bump and PageListSearch
survives concurrent mutation to assert the new concurrency behavior under prune/rebuild pressure.

- wrap all PageList writers in a re-entrant WriteGuard that drains pending frees, tracks structural mutation, and bumps the snapshot epoch exactly once; add a writeGuardMutating shortcut for the common case - change drainPendingFree to return bool so callers can detect frees, and explicitly discard its result (and other optional returns) where we do not care about it - keep concurrent readers safe by adding nodeInList helper and retaining/releasing via the sliding window as before - teach PageListSearch.next to re-check snapshot_epoch under the read guard; on change, clear the window, resample the epoch, and rebuild so we never hand back stale selections - exercise the prune-while-retained path with the new regression test “PageListSearch drops stale nodes after epoch bump” to ensure the pruned node is gone and subsequent matches resolve in the live list

mitchellh

Oh awesome. This is definitely heading in the right direction and the code quality looks... normal at first glance (that's a good thing).

One thing that I want to be really careful about as we go down this path is testing the no-search throughput to ensure we're not regressing that. To that end, do you have benchmark numbers of the terminal stream before and after this branch? We should be able to use our existing terminal-stream benchmark.

It'd be good to test that with plain ASCII, Japanese or some other complex Unicode language, and some mix of escape sequences.

dave-fl · 2025-09-23T05:02:32Z

Here are the numbers - Note that 5a29dd3 changed the build defaults for the benchmarks, I can test on main with -Doptimize=ReleaseFast if you would like.

zig-out/bin/ghostty-gen +ascii | head -c 67108864 > /tmp/ascii_64m.stream
zig-out/bin/ghostty-gen +osc --p-valid=0.7 | head -c 67108864 > /tmp/osc_64m.stream
zig-out/bin/ghostty-gen +utf8 | head -c 67108864 > /tmp/utf8_64m.stream

git log -1 --oneline
1a6af1704 (HEAD) gtk: restore flatpak-aware resource directory support (#8816)

hyperfine -w 2 'zig-out/bin/ghostty-bench +terminal-stream --data=/tmp/ascii_64m.stream --terminal-rows=24 --terminal-cols=80'
Benchmark 1: zig-out/bin/ghostty-bench +terminal-stream --data=/tmp/ascii_64m.stream --terminal-rows=24 --terminal-cols=80
  Time (mean ± σ):     400.9 ms ±   3.4 ms    [User: 385.1 ms, System: 14.6 ms]
  Range (min … max):   396.0 ms … 407.5 ms    10 runs

hyperfine -w 2 'zig-out/bin/ghostty-bench +terminal-stream --data=/tmp/osc_64m.stream --terminal-rows=24 --terminal-cols=80'
Benchmark 1: zig-out/bin/ghostty-bench +terminal-stream --data=/tmp/osc_64m.stream --terminal-rows=24 --terminal-cols=80
  Time (mean ± σ):      1.093 s ±  0.013 s    [User: 1.073 s, System: 0.018 s]
  Range (min … max):    1.080 s …  1.124 s    10 runs

hyperfine -w 2 'zig-out/bin/ghostty-bench +terminal-stream --data=/tmp/utf8_64m.stream --terminal-rows=24 --terminal-cols=80'
Benchmark 1: zig-out/bin/ghostty-bench +terminal-stream --data=/tmp/utf8_64m.stream --terminal-rows=24 --terminal-cols=80
  Time (mean ± σ):     607.8 ms ±   2.6 ms    [User: 541.5 ms, System: 62.1 ms]
  Range (min … max):   604.1 ms … 611.6 ms    10 runs

git log -1 --oneline
bd32b1ce1 (HEAD -> feature/search-concurrency, origin/feature/search-concurrency) PageList: write-guard RAII and epoch-safe search window

hyperfine -w 2 'zig-out/bin/ghostty-bench +terminal-stream --data=/tmp/ascii_64m.stream --terminal-rows=24 --terminal-cols=80'
Benchmark 1: zig-out/bin/ghostty-bench +terminal-stream --data=/tmp/ascii_64m.stream --terminal-rows=24 --terminal-cols=80
  Time (mean ± σ):     401.2 ms ±   1.7 ms    [User: 386.1 ms, System: 14.0 ms]
  Range (min … max):   398.3 ms … 403.9 ms    10 runs

hyperfine -w 2 'zig-out/bin/ghostty-bench +terminal-stream --data=/tmp/osc_64m.stream --terminal-rows=24 --terminal-cols=80'
Benchmark 1: zig-out/bin/ghostty-bench +terminal-stream --data=/tmp/osc_64m.stream --terminal-rows=24 --terminal-cols=80
  Time (mean ± σ):      1.090 s ±  0.006 s    [User: 1.072 s, System: 0.016 s]
  Range (min … max):    1.083 s …  1.102 s    10 runs

hyperfine -w 2 'zig-out/bin/ghostty-bench +terminal-stream --data=/tmp/utf8_64m.stream --terminal-rows=24 --terminal-cols=80'
Benchmark 1: zig-out/bin/ghostty-bench +terminal-stream --data=/tmp/utf8_64m.stream --terminal-rows=24 --terminal-cols=80
  Time (mean ± σ):     615.5 ms ±   3.3 ms    [User: 547.6 ms, System: 63.4 ms]
  Range (min … max):   613.1 ms … 624.2 ms    10 runs

dave-fl · 2025-09-24T03:57:16Z

@mitchellh Let me know if there is anything I can do - I got part 3/4 cooking.

dave-fl · 2025-09-25T05:03:00Z

~~PR not ready yet — I'm investigating a possible race on tracked pins under load (clone iterating pins while search mutates).~~

…n race - add a dedicated TrackedPinsGuard/TrackedPinsGuardConst to wrap every access to tracked_pins - update clone, resize/reflow helpers, row/page erase, initial reset, and diagram to use the guard - implement erasePageLocked so callers that already hold the guard avoid relocking - move tracked pin allocations inside the guard and keep the critical sections small - add a stress test (PageList concurrent clone vs trackPin stress) with two threads - clean up random usage and switch to the thread-safe allocator for the stress test

dave-fl · 2025-09-26T01:28:49Z

Added trackedPinsGuardso any caller that mutates tracked_pins (renderer clone or app-side search) runs through the same lock. Without it we can still hit the original race: the search controller tracks/untracks pins while the renderer copies the set, and we corrupt the map. This guard eliminates that interleaving; pins are tracked long enough to keep matches alive and the renderer safe.

Numbers did not change on benchmark results shown above.

dave-fl · 2025-09-28T18:55:00Z

I split PageList into safe reader/writer paths so the search worker can scan while the IO thread mutates. Every structural change bumps a snapshot_epoch; the worker snapshots it at start, and each batch carries that epoch. If the PageList epoch changes, we treat incoming batches as stale and restart cleanly.
Matches stream as pointer‑free anchors: {node_serial, y/x, len, snapshot_epoch}. The serial is a monotonic ID per page node, so we can order reliably (forward = strictly after, backward = nearest strictly before) even across restarts. The epoch tells us if an anchor is still valid; the serial tells us where it belongs.
For highlighting, we only materialize the active match as pins (under a read lock), keeping retarget work tiny. Renderer clones and writer updates iterate pins under a small pin lock. Screen‑level guards keep the cursor in sync during scroll/grow.
Live demo: typing streams results smoothly; forward/backward restarts land where you expect; and it stays stable under constant output.

Clip.mov

- Retain 1 byte when needle.len == 1; otherwise retain needle.len − 1. - Keeps enough overlap to detect matches straddling batches while pruning efficiently. - Fixes off‑by‑one behavior for single‑byte patterns without impacting larger needles.

mitchellh · 2025-10-21T14:00:29Z

@dave-fl FYI I'm on vacation currently but plan on reviewing this in earnest perhaps next week. Sorry for the delay, appreciate your efforts here. 😄

mitchellh · 2025-11-08T20:48:36Z

Just as an update here, I've been working through the changes here in a slightly different way and experimenting. I'm actively working on this and will give any credit to @dave-fl once I get somewhere with it (or just accept this PR as-is). Thanks!

dave-fl · 2025-11-09T19:35:54Z

Hi @mitchellh welcome back. While this waited on review, I prototyped an "anchor" variant of search. Here's the breakdown:

Context

Original path: Worker returns live Selections (pins) and keeps nodes alive via retain/release. Page reuse in grow() can be skipped when a node is retained by a reader.
Anchor path: Worker returns pointerless anchors { node_serial, y, x, len }. Sliding window stores only node_serial (no *Node pointers, no retainNode()). PageList drops refcounts/pending‑free; prune always reuses the head page.

What changed

src/terminal/search.zig:

PageListSearch.next returns Message.Match (anchor) instead of Selection.
SlidingWindow.Meta stores { node_serial, cell_map } instead of { list, node, cell_map }.
No retainNode() call in append().
"Find next node" uses findNodeBySerial(...).next instead of meta.node.next.

src/terminal/PageList.zig:

Remove per‑node refcounts and pending‑free queue (~70 lines); processPendingFree() becomes no‑op.
grow() always reuses pruned head page (no "skip reuse when retained" path).

Trade-offs

Anchors:

No cross-thread node ownership, no atomic refcounting, no deferred free queue.
Predictable memory: always reuses pruned pages.
Must re-lookup nodes from serial when materializing results (epoch validation ensures correctness).

Refcounts:

Can hold *Node pointers across lock boundaries (more flexible).
Nodes stay alive while retained (prune may skip reuse or allocate).
Cost: atomic ops + deferred free queue + reuse gating logic.

Happy to stick with refcounts if we want that flexibility. Just wanted you aware the anchor approach exists and what it trades. No recommendation implied.

mitchellh · 2025-11-10T22:17:40Z

Thanks thats an interesting idea. My immediate reaction is that I think we'll want the refcounting and such on Nodes anyways because concurrent PageList will be used for more than just search. Getting this core right will be key to also enabling stuff like background archiving of scroll back to disk (or, the reverse, restoring it). I think. That's at least how I've always thought about stuff like this.

Btw are you in Discord? Would be useful to run stuff by you but if not no big deal.

dave-fl · 2025-11-11T21:35:25Z

Hey @mitchellh. Makes sense - good to understand your vision on this. I'm going on vacation this week but will ping you when I get back on Discord. Happy to discuss further then!

mitchellh · 2025-11-13T23:39:16Z

See #9585 for a different direction. If the "big lock" approach works fine for performance then we can probably yeet all this PageList complexity completely. If it doesn't work fine, then I think the components I laid out there are still going to be necessary and can be augmented to be fine-grained-locking-aware.

@dave-fl

Chugging along towards #189 This adds significantly more internal work for searching. A long time ago, I added #2885 which had a hint of what I was thinking of. This simultaneously builds on this and changes direction. The change of direction is that instead of making PageList fully concurrency safe and having a search thread access it concurrently, I'm now making an architectural shift where our search thread will grab the big lock (blocking all IO/rendering), but with the bet that we can make our critical areas small enough and time them well enough that the performance hit while actively searching will be minimal. **Results yet to be seen, but the path to implement this is much, much simpler.** ## Rearchitecting Search To that end, this PR builds on #2885 by making `src/terminal/search` and entire package (rather than a single file). ```mermaid graph TB subgraph Layer5 ["Layer 5: Thread Orchestration"] Thread["Thread ━━━━━━━━━━━━━━━━━━━━━ • MPSC queue management • libxev event loop • Message handling • Surface mailbox communication • Forward progress coordination"] end subgraph Layer4 ["Layer 4: Screen Coordination"] ScreenSearch["ScreenSearch ━━━━━━━━━━━━━━━━━━━━━ • State machine (tick + feed) • Result caching • Per-screen (alt/primary) • Composes Active + History search • Interrupt handling"] end subgraph Layer3 ["Layer 3: Domain-Specific Search"] ActiveSearch["ActiveSearch ━━━━━━━━━━━━━━━━━━━━━ • Active area only • Invalidate & re-search • Small, volatile data"] PageListSearch["PageListSearch ━━━━━━━━━━━━━━━━━━━━━ • History search (reverse order) • Separated tick/feed ops • Immutable PageList assumption • Garbage pin detection"] end subgraph Layer2 ["Layer 1: Primitive Operations"] SlidingWindow["SlidingWindow ━━━━━━━━━━━━━━━━━━━━━ • Manual linked list node management • Circular buffer maintenance • Zero-allocation search • Match yielding • Page boundary handling"] end Thread --> ScreenSearch ScreenSearch --> ActiveSearch ScreenSearch --> PageListSearch ActiveSearch --> SlidingWindow PageListSearch --> SlidingWindow classDef layer5 fill:#0a0a0a,stroke:#ff0066,stroke-width:3px,color:#ffffff classDef layer4 fill:#0f0f0f,stroke:#ff6600,stroke-width:3px,color:#ffffff classDef layer3 fill:#141414,stroke:#ffaa00,stroke-width:3px,color:#ffffff classDef layer2 fill:#1a1a1a,stroke:#00ff00,stroke-width:3px,color:#ffffff class Thread layer5 class ScreenSearch layer4 class ActiveSearch,PageListSearch layer3 class SlidingWindow layer2 style Layer5 fill:#050505,stroke:#ff0066,stroke-width:2px,color:#ffffff style Layer4 fill:#080808,stroke:#ff6600,stroke-width:2px,color:#ffffff style Layer3 fill:#0c0c0c,stroke:#ffaa00,stroke-width:2px,color:#ffffff style Layer2 fill:#101010,stroke:#00ff00,stroke-width:2px,color:#ffffff ``` Within the package, we have composable layers that let us test each point: - `SlidingWindow`: The lowest layer, the caller manually adds linked list page nodes and it maintains a sliding window we search over, yielding results without allocation (besides the circular buffers to maintain the sliding window). - `PageListSearch`: Searches a PageList structure in reverse order (assumption: more recent matches are more valuable than older), but separates out the `tick` (search, but no PageList access) and `feed` (PageList access, prep data for search but don't search) operations. This lets us `feed` in a critical area and `tick` outside. **This assumes an immutable PageList, so this is for history.** - `ActiveSearch`: Searches only the active area of a PageList. The expectation is that the active area changes much more regularly, but it is also very small (relative to scrollback). Throws away and re-searches the active area as necessary. - `ScreenSearch`: Composes the previous three components to coordinate searching an active terminal screen. You'd have one of these per screen (alt vs primary). This also caches results unlike the other components, with the expectation that the caller will revisit the results as screens change (so if you switch from neovim back to your shell and vice versa with a search active, it won't start over). - `Thread`: A dedicated search thread that will receive messages via MPSC queues while managing the forward progress of a `ScreenSearch` and sending matches back to the surface mailbox for apprt rendering. **The thread component is not functional, just boilerplate, in this PR.** ScreenSearch is a state machine that moves in an iterative `tick` + `feed` fashion. This will let us "interrupt" the search with updates on the search thread (read our mailbox via libxev loops for example) and will let us minimize critical areas with locks (only `feed`). Each component is significantly unit tested, especially around page boundary cases. Given the complexity, there is no way this is perfect, but the architecture is such that we can easily add regression tests as we find issues. ## Other Changes, Notes The only change to actually used code is that tracked pins in a `PageList` can now be flagged as "garbage." A garbage tracked pin is one that had to be moved in a non-sensical way because the previous location it tracked has been deleted. This is used by the searcher to detect that our history was pruned. **If my assumption about the big lock is wrong** and this ends up being godawful for performance, then it should still be okay because more granular locking and reference counting such as that down by @dave-fl in #8850 can be pushed into these components and reused. So this work is still valuable on its own. ## Future This PR is still just a bunch of internals, split out into its own PR so I don't make one huge 10K diff PR. There are a number of future tasks: - Flesh out `ScreenSearch` and hook it up to `Thread` - Pull search thread management into `Surface` (or possibly the render thread or shared render state since active area changes can be synchronized with renderer frame rebuilds. Not sure yet.) - Send updates back to the surface thread so that apprts can update UI. - Apprt actions, input bindings, etc. to hook this all up (the easy part, really). The next step is to continue to flesh out the `ScreenSearch` as required and hook it up to `Thread`. **AI disclosure:** AI reviewed the code and assisted with some tests, but didn't write any of the logic or design. This is beyond its ability (or my ability to spec it out clearly enough for AI to succeed).

mitchellh · 2025-11-27T21:24:49Z

Going to close this. We've shipped search [on macOS] and the internals are shared. We avoided the concurrency. For now. If this ends up being useful I may pluck certain things from it. Thank you so much @dave-fl for looking into this, sorry we didn't use it.

dave-fl requested a review from a team as a code owner September 23, 2025 01:26

mitchellh reviewed Sep 23, 2025

View reviewed changes

00-kat mentioned this pull request Sep 28, 2025

macOS: add native find and all its friends #1414

Closed

This comment was marked as off-topic.

Sign in to view

avarayr mentioned this pull request Oct 28, 2025

Search feature for Ghostty (Mac only) #9382

Closed

mitchellh mentioned this pull request Nov 13, 2025

Considerably more search internals #9585

Merged

mitchellh closed this Nov 27, 2025

Make PageList/Search concurrency-safe for background search (Part 2 of #189) #8850

Make PageList/Search concurrency-safe for background search (Part 2 of #189) #8850

Uh oh!

Conversation

dave-fl commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mitchellh left a comment

Choose a reason for hiding this comment

Uh oh!

dave-fl commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dave-fl commented Sep 24, 2025

Uh oh!

dave-fl commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dave-fl commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dave-fl commented Sep 28, 2025

Uh oh!

mitchellh commented Oct 21, 2025

Uh oh!

This comment was marked as off-topic.

mitchellh commented Nov 8, 2025

Uh oh!

dave-fl commented Nov 9, 2025

Context

What changed

Trade-offs

Uh oh!

mitchellh commented Nov 10, 2025

Uh oh!

dave-fl commented Nov 11, 2025

Uh oh!

mitchellh commented Nov 13, 2025

Uh oh!

mitchellh commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Make PageList/Search concurrency-safe for background search (Part 2 of #189) #8850

Make PageList/Search concurrency-safe for background search (Part 2 of #189) #8850

dave-fl commented Sep 23, 2025 •

edited

Loading

dave-fl commented Sep 23, 2025 •

edited

Loading

dave-fl commented Sep 25, 2025 •

edited

Loading

dave-fl commented Sep 26, 2025 •

edited

Loading