Naive search internals (core only) #2885

mitchellh · 2024-12-04T20:47:21Z

Towards #189

This PR implements a simple, single-threaded, single-direction Booyer-Moore-Horspool search on top of PageList.

The code in this PR is not utilized in builds yet. It is only part of unit tests. I want to open a PR incrementally so we don't have one mega PR for search. I want to be very clear that this does not finish #189, there's many more steps before that.

High-level technical overview: we maintain a circular buffer of UTF-8 encoded terminal page contents that is just large enough to fit the larger of a single page or the search term. As we search, the circular buffer is pruned and additional terminal pages are added to search. This minimizes our memory usage to the minimum necessary for this approach. In practice, the overhead is very small for realistic search terms!

Some limitations that need to be resolved in future work:

Single-direction: We only search oldest-to-newest. We'll want to search newest-to-oldest by default. It shouldn't be hard to implement directionality but I wanted to wait until I needed it.
Single-threaded: Search only works as long as the PageList is not being actively modified. That's not going to work for a real final user-facing solution.

mitchellh · 2024-12-04T21:48:08Z

Going to merge this. This isn't in use at all yet so there's no potential harm here in a merge. I'll still be notified and read any feedback if anyone has any. Then I'll plan follow ups to keep working on this...

@dave-fl

Chugging along towards #189 This adds significantly more internal work for searching. A long time ago, I added #2885 which had a hint of what I was thinking of. This simultaneously builds on this and changes direction. The change of direction is that instead of making PageList fully concurrency safe and having a search thread access it concurrently, I'm now making an architectural shift where our search thread will grab the big lock (blocking all IO/rendering), but with the bet that we can make our critical areas small enough and time them well enough that the performance hit while actively searching will be minimal. **Results yet to be seen, but the path to implement this is much, much simpler.** ## Rearchitecting Search To that end, this PR builds on #2885 by making `src/terminal/search` and entire package (rather than a single file). ```mermaid graph TB subgraph Layer5 ["Layer 5: Thread Orchestration"] Thread["Thread ━━━━━━━━━━━━━━━━━━━━━ • MPSC queue management • libxev event loop • Message handling • Surface mailbox communication • Forward progress coordination"] end subgraph Layer4 ["Layer 4: Screen Coordination"] ScreenSearch["ScreenSearch ━━━━━━━━━━━━━━━━━━━━━ • State machine (tick + feed) • Result caching • Per-screen (alt/primary) • Composes Active + History search • Interrupt handling"] end subgraph Layer3 ["Layer 3: Domain-Specific Search"] ActiveSearch["ActiveSearch ━━━━━━━━━━━━━━━━━━━━━ • Active area only • Invalidate & re-search • Small, volatile data"] PageListSearch["PageListSearch ━━━━━━━━━━━━━━━━━━━━━ • History search (reverse order) • Separated tick/feed ops • Immutable PageList assumption • Garbage pin detection"] end subgraph Layer2 ["Layer 1: Primitive Operations"] SlidingWindow["SlidingWindow ━━━━━━━━━━━━━━━━━━━━━ • Manual linked list node management • Circular buffer maintenance • Zero-allocation search • Match yielding • Page boundary handling"] end Thread --> ScreenSearch ScreenSearch --> ActiveSearch ScreenSearch --> PageListSearch ActiveSearch --> SlidingWindow PageListSearch --> SlidingWindow classDef layer5 fill:#0a0a0a,stroke:#ff0066,stroke-width:3px,color:#ffffff classDef layer4 fill:#0f0f0f,stroke:#ff6600,stroke-width:3px,color:#ffffff classDef layer3 fill:#141414,stroke:#ffaa00,stroke-width:3px,color:#ffffff classDef layer2 fill:#1a1a1a,stroke:#00ff00,stroke-width:3px,color:#ffffff class Thread layer5 class ScreenSearch layer4 class ActiveSearch,PageListSearch layer3 class SlidingWindow layer2 style Layer5 fill:#050505,stroke:#ff0066,stroke-width:2px,color:#ffffff style Layer4 fill:#080808,stroke:#ff6600,stroke-width:2px,color:#ffffff style Layer3 fill:#0c0c0c,stroke:#ffaa00,stroke-width:2px,color:#ffffff style Layer2 fill:#101010,stroke:#00ff00,stroke-width:2px,color:#ffffff ``` Within the package, we have composable layers that let us test each point: - `SlidingWindow`: The lowest layer, the caller manually adds linked list page nodes and it maintains a sliding window we search over, yielding results without allocation (besides the circular buffers to maintain the sliding window). - `PageListSearch`: Searches a PageList structure in reverse order (assumption: more recent matches are more valuable than older), but separates out the `tick` (search, but no PageList access) and `feed` (PageList access, prep data for search but don't search) operations. This lets us `feed` in a critical area and `tick` outside. **This assumes an immutable PageList, so this is for history.** - `ActiveSearch`: Searches only the active area of a PageList. The expectation is that the active area changes much more regularly, but it is also very small (relative to scrollback). Throws away and re-searches the active area as necessary. - `ScreenSearch`: Composes the previous three components to coordinate searching an active terminal screen. You'd have one of these per screen (alt vs primary). This also caches results unlike the other components, with the expectation that the caller will revisit the results as screens change (so if you switch from neovim back to your shell and vice versa with a search active, it won't start over). - `Thread`: A dedicated search thread that will receive messages via MPSC queues while managing the forward progress of a `ScreenSearch` and sending matches back to the surface mailbox for apprt rendering. **The thread component is not functional, just boilerplate, in this PR.** ScreenSearch is a state machine that moves in an iterative `tick` + `feed` fashion. This will let us "interrupt" the search with updates on the search thread (read our mailbox via libxev loops for example) and will let us minimize critical areas with locks (only `feed`). Each component is significantly unit tested, especially around page boundary cases. Given the complexity, there is no way this is perfect, but the architecture is such that we can easily add regression tests as we find issues. ## Other Changes, Notes The only change to actually used code is that tracked pins in a `PageList` can now be flagged as "garbage." A garbage tracked pin is one that had to be moved in a non-sensical way because the previous location it tracked has been deleted. This is used by the searcher to detect that our history was pruned. **If my assumption about the big lock is wrong** and this ends up being godawful for performance, then it should still be okay because more granular locking and reference counting such as that down by @dave-fl in #8850 can be pushed into these components and reused. So this work is still valuable on its own. ## Future This PR is still just a bunch of internals, split out into its own PR so I don't make one huge 10K diff PR. There are a number of future tasks: - Flesh out `ScreenSearch` and hook it up to `Thread` - Pull search thread management into `Surface` (or possibly the render thread or shared render state since active area changes can be synchronized with renderer frame rebuilds. Not sure yet.) - Send updates back to the surface thread so that apprts can update UI. - Apprt actions, input bindings, etc. to hook this all up (the easy part, really). The next step is to continue to flesh out the `ScreenSearch` as required and hook it up to `Thread`. **AI disclosure:** AI reviewed the code and assisted with some tests, but didn't write any of the logic or design. This is beyond its ability (or my ability to spec it out clearly enough for AI to succeed).

mitchellh added 18 commits December 3, 2024 15:53

terminal: move UTF8 encoding to Page and wrap around it

bcefbfd

terminal: support cell_map for encodeUtf8

204e4f8

terminal: single pagelist node search

61c5fb8

datastruct: CircBuf can be initialized empty

eaddb69

CircBuf: add ensureUnusedCapacity, appendSlice

8abbd80

terminal: working on a pagelist sliding window for search

2a13c6b

terminal: sliding window search starts working

6ed298c

terminal: sliding window search can move the cursor

d307b02

terminal: search across two pages and pruning appears to be working

b487aa8

terminal: remove unused pagesearch

09e4ccc

terminal: test no match pruning

79026a1

terminal: search match across page boundary

af1ee4d

terminal: test for match in second slice of circ buf

852e04f

terminal: search match on overlap case

34fb840

terminal: update comments/docs on sliding window search

6361bf4

terminal: PageListSearch works!

b9dda6a

comments

50b36c5

remove unused file

7dd8e7c

mitchellh merged commit 711f947 into main Dec 4, 2024
34 checks passed

mitchellh deleted the search branch December 4, 2024 21:48

mitchellh mentioned this pull request Dec 4, 2024

GTK: Search scrollback #189

Closed

8 tasks

mikenomitch mentioned this pull request May 22, 2025

Search scrollback mikenomitch/ghostty#1

Open

mitchellh mentioned this pull request Nov 13, 2025

Considerably more search internals #9585

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Naive search internals (core only) #2885

Naive search internals (core only) #2885

Uh oh!

mitchellh commented Dec 4, 2024 •

edited

Loading

Uh oh!

mitchellh commented Dec 4, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Naive search internals (core only) #2885

Naive search internals (core only) #2885

Uh oh!

Conversation

mitchellh commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mitchellh commented Dec 4, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mitchellh commented Dec 4, 2024 •

edited

Loading