perf(api): avoid intermediate SSE batch allocations by shikhar · Pull Request #320 · s2-streamstore/s2

shikhar · 2026-03-13T01:50:40Z

Summary

serialize read batches directly from the common backend batch into JSON without materializing the intermediate API ReadBatch/SequencedRecord string tree
reuse that borrowed serializer for both SSE batch events and unary JSON read responses
keep the existing encode helpers alongside the new hot-path serializer
keep the existing wire JSON shape and verify it with a parity test

Why This Helps

Before this change, the JSON read path rebuilt the backend batch into the owned API JSON shape, allocating a new ReadBatch, a new Vec<SequencedRecord>, and fresh Strings for header names, header values, and bodies before serializing that second object graph.

The new path serializes directly from the existing common read batch. That removes the intermediate JSON-shaped allocation layer while preserving the same wire JSON. The optimization now applies to both:

SSE batch events
unary JSON read responses

The owned API structs and their encode helpers remain available; the hot read paths just no longer depend on them.

Measured Impact

Fresh local release-mode microbenchmarks, measuring SSE batch event construction only.

Representative points:

100 records x 1 KiB body
- raw: 104.9us -> 93.3us per batch, 1.12x faster, 11.0% less serializer time
- base64: 112.4us -> 108.4us per batch, 1.04x faster, 3.6% less serializer time
100 records x 32-byte body
- raw: 36.7us -> 28.8us per batch, 1.28x faster, 21.7% less serializer time
- base64: 39.1us -> 33.1us per batch, 1.18x faster, 15.4% less serializer time

Broader sweep over records in batch = {1, 10, 100, 500} and body bytes = {0, 32, 128, 512, 1024, 4096}:

raw improved in every measured cell
base64 improved in 23/24 cells; the only regression was 1 record x 4096 bytes (4.245us -> 4.416us, 4.0% slower)
average improvement by batch size
- raw: 11.6% at 1 record up to 18.0% at 500 records
- base64: 9.4% at 1 record up to 12.2% at 500 records
average improvement by body size
- raw: 23.4% at 0 bytes, tapering to 7.0% at 4096 bytes
- base64: 18.5% at 0 bytes, tapering to 0.5% at 4096 bytes

Interpretation:

raw UTF-8-heavy read batches see a meaningful reduction in serialization work
base64 batches still benefit, but less, because base64 encoding remains a larger share of the cost
the relative win is larger for smaller bodies and larger batches, where the removed per-record/per-field allocation work is a bigger share of total cost
these numbers cover batch-event serialization, not end-to-end request latency

Testing

just fmt
cargo test -p s2-api --features axum serialized_batch_matches_existing_json_shape
cargo test -p s2-lite read
just test (still known to fail in an unrelated pre-existing CLI test on this machine: s2-cli::cli missing_access_token, because s2 list-basins succeeds without S2_ACCESS_TOKEN in the current environment)

greptile-apps · 2026-03-13T01:55:28Z

Greptile Summary

This PR eliminates the intermediate JSON-shaped allocation layer on the hot read path by introducing a new json.rs serializer (serialize_read_batch) that serializes directly from the common types::stream::ReadBatch rather than first converting it into the owned API ReadBatch/SequencedRecord tree. The optimization is applied consistently to both SSE batch events and unary JSON read responses. The existing owned structs and their encode helpers are preserved. Correctness is verified by a parity test that compares new and old serialization output for all record types (envelope, empty fence, non-empty fence, trim) across both Raw and Base64 formats.

New api/src/v1/stream/json.rs: Implements zero-copy Serialize wrappers (ReadBatchJson, RecordJson, HeaderJson, etc.) plus a stack-allocated chunked Base64Display encoder that avoids per-field heap allocations during serialization.
api/src/v1/stream/sse.rs: Replaces the impl ReadEvent constructor block and TryFrom<ReadEvent> conversion with four standalone #[cfg(feature = "axum")]-gated functions. This is a breaking change for consumers of the published s2-api crate (v0.27.9) who used the old unconditionally-public constructors without the axum feature — no version bump accompanies the change.
lite/src/handlers/v1/records.rs: Updated to call the new standalone SSE helpers and serialize_read_batch directly, removing the intermediate ReadBatch::encode calls.
The serialize_struct("SequencedRecord", 4) hint in json.rs is always the maximum but the actual field count ranges from 2–4; this is harmless for serde_json but could cause issues if the serializer is reused with a stricter backend.

Confidence Score: 3/5

Safe to merge functionally — the optimization is correct and parity-tested — but the breaking API change without a version bump warrants attention before release.
The serialization logic is sound, mathematically verified (Base64 buffer sizing), and validated by a parity test covering all record variants. The performance improvement is well-benchmarked. The main concern dragging the score down is a breaking public API change (removal of unconditionally-public ReadEvent constructor methods, now replaced only under #[cfg(feature = "axum")]) in a published crate with no accompanying version bump in api/Cargo.toml.
api/src/v1/stream/sse.rs — verify the intent around the API-breaking removal of ReadEvent constructors and ensure api/Cargo.toml version is bumped accordingly.

Important Files Changed

Filename	Overview
api/src/v1/stream/json.rs	New zero-copy JSON serializer. Logic is correct and parity-tested against the old path. Minor: `serialize_struct` hints 4 fields even when only 2–3 are emitted, which is safe for serde_json but misleading for other serializers.
api/src/v1/stream/sse.rs	Replaces ReadEvent constructor methods (unconditionally public) with axum-gated standalone functions, creating a breaking API change for non-axum consumers of the published s2-api crate with no accompanying version bump.
api/src/v1/stream/mod.rs	Trivial one-line addition to expose the new `json` module publicly.
lite/src/handlers/v1/records.rs	Call-sites updated to use the new standalone SSE event constructors and the borrowed JSON serializer; semantics are preserved and the two streaming paths (unary JSON and SSE) are now consistent.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Handler as lite/handlers/v1/records.rs
    participant Backend
    participant JSON as api/v1/stream/json.rs

    Note over Client,JSON: Unary JSON Read (new path)
    Client->>Handler: GET /records (Accept: application/json)
    Handler->>Backend: backend.read(basin, stream, start, end)
    Backend-->>Handler: ReadBatch (types::stream::ReadBatch)
    Handler->>JSON: serialize_read_batch(format, &batch)
    Note over JSON: Serialize directly from common ReadBatch<br/>No intermediate ReadBatch/SequencedRecord allocation
    JSON-->>Handler: impl Serialize (ReadBatchJson)
    Handler-->>Client: JSON response

    Note over Client,JSON: SSE Streaming Read (new path)
    Client->>Handler: GET /records (Accept: text/event-stream)
    Handler->>Backend: backend.read(basin, stream, start, end)
    loop Per batch
        Backend-->>Handler: ReadSessionOutput::Batch(batch)
        Handler->>JSON: read_batch_event(format, &batch, id)
        Note over JSON: json_data(serialize_read_batch(format, &batch))<br/>Borrows existing batch directly
        JSON-->>Handler: SSE Event (batch)
        Handler-->>Client: data: {records:[...], tail:{...}}
    end
    Backend-->>Handler: ReadSessionOutput::Heartbeat
    Handler-->>Client: SSE ping event
    Handler-->>Client: data: [DONE]

Prompt To Fix All With AI

This is a comment left during a code review.
Path: api/src/v1/stream/sse.rs
Line: 135-165

Comment:
**Breaking API change drops non-axum-gated constructors**

The removed `impl ReadEvent` block (with `batch`, `error`, `ping`, `done`) carried **no** `#[cfg(feature = "axum")]` gate, so those constructors were available to every consumer of the published `s2-api` crate. All four replacement functions introduced in this PR (`read_batch_event`, `error_event`, `ping_event`, `done_event`) are `#[cfg(feature = "axum")]`-gated. Non-axum users of the crate now have no equivalent replacement — they must switch to direct enum-variant construction.

No version bump appears in `api/Cargo.toml` (still `0.27.9`) to communicate this break. Under semver 0.x the minimum expectation is a minor bump; the missing bump risks silent build failures for downstream consumers who update the dependency.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: api/src/v1/stream/json.rs
Line: 74

Comment:
**`serialize_struct` length hint is always over-counted**

`serde_json` ignores the `len` argument completely, so this has no impact on JSON output today. However, some binary serializers treat the hint as authoritative and may produce incorrect output or return an error when the actual number of serialized fields is fewer than the hint declares.

The struct serializes between 2 and 4 fields depending on the record variant — for example, an empty `Envelope` with no headers and no body emits only `seq_num` and `timestamp`, while a command record with a body emits all four. Passing the fixed value `4` over-counts in the majority of cases.

Since computing the exact count up-front would require a pre-scan and partially defeat the zero-allocation goal, the pragmatic fix is to document the intentional over-count and restrict use of this serializer to JSON contexts, preventing silent bugs if it is ever reused with a stricter serializer.

How can I resolve this? If you propose a fix, please make it concise.

_{Last reviewed commit: 08f8f53}

api/src/v1/stream/sse.rs

api/src/v1/stream/json.rs

## 🤖 New release * `s2-api`: 0.27.9 -> 0.27.10 (✓ API compatible changes) * `s2-lite`: 0.29.21 -> 0.29.22 (✓ API compatible changes) * `s2-cli`: 0.29.21 -> 0.29.22 <details><summary>Changelog</summary> ## `s2-api` <blockquote> ## [0.27.10] - 2026-03-13 ### Performance - Avoid intermediate SSE batch allocations ([#320](#320))  </blockquote> ## `s2-lite` <blockquote> ## [0.29.22] - 2026-03-13 ### Bug Fixes - Make delete-on-empty armings prospective ([#314](#314)) ### Performance - Avoid intermediate SSE batch allocations ([#320](#320))  </blockquote> ## `s2-cli` <blockquote> ## [0.29.22] - 2026-03-13  </blockquote> </details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/). Co-authored-by: s2-release-plz[bot] <262023388+s2-release-plz[bot]@users.noreply.github.com>

perf(api): avoid intermediate SSE batch allocations

57eeae7

shikhar marked this pull request as draft March 13, 2026 01:51

greptile-apps bot reviewed Mar 13, 2026

View reviewed changes

api/src/v1/stream/sse.rs Outdated Show resolved Hide resolved

shikhar added 7 commits March 12, 2026 22:08

refactor(api): share read batch JSON serialization

bc11dc0

refactor(api): drop legacy read batch encoders

67fd3a8

refactor(api): restore read batch encode helpers

7bbe359

test(api): cover non-empty fence token parity

32fda57

refactor(api): qualify read batch JSON serializer

05b9495

refactor(api): make SSE event framing explicit

895b660

refactor(api): rename read batch JSON wrappers

08f8f53

shikhar marked this pull request as ready for review March 13, 2026 15:12

greptile-apps bot reviewed Mar 13, 2026

View reviewed changes

api/src/v1/stream/sse.rs Show resolved Hide resolved

api/src/v1/stream/json.rs Show resolved Hide resolved

docs(api): note JSON serializer field count

e3ac8a7

shikhar merged commit 121aefb into main Mar 13, 2026
16 checks passed

shikhar deleted the codex/perf-sse-batch-serialization branch March 13, 2026 15:55

s2-release-plz bot mentioned this pull request Mar 13, 2026

chore: release #315

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(api): avoid intermediate SSE batch allocations#320

perf(api): avoid intermediate SSE batch allocations#320
shikhar merged 9 commits intomainfrom
codex/perf-sse-batch-serialization

shikhar commented Mar 13, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shikhar commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why This Helps

Measured Impact

Testing

Uh oh!

greptile-apps bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shikhar commented Mar 13, 2026 •

edited

Loading

greptile-apps bot commented Mar 13, 2026 •

edited

Loading