Skip to content

perf(api): avoid intermediate SSE batch allocations#320

Merged
shikhar merged 9 commits intomainfrom
codex/perf-sse-batch-serialization
Mar 13, 2026
Merged

perf(api): avoid intermediate SSE batch allocations#320
shikhar merged 9 commits intomainfrom
codex/perf-sse-batch-serialization

Conversation

@shikhar
Copy link
Member

@shikhar shikhar commented Mar 13, 2026

Summary

  • serialize read batches directly from the common backend batch into JSON without materializing the intermediate API ReadBatch/SequencedRecord string tree
  • reuse that borrowed serializer for both SSE batch events and unary JSON read responses
  • keep the existing encode helpers alongside the new hot-path serializer
  • keep the existing wire JSON shape and verify it with a parity test

Why This Helps

Before this change, the JSON read path rebuilt the backend batch into the owned API JSON shape, allocating a new ReadBatch, a new Vec<SequencedRecord>, and fresh Strings for header names, header values, and bodies before serializing that second object graph.

The new path serializes directly from the existing common read batch. That removes the intermediate JSON-shaped allocation layer while preserving the same wire JSON. The optimization now applies to both:

  • SSE batch events
  • unary JSON read responses

The owned API structs and their encode helpers remain available; the hot read paths just no longer depend on them.

Measured Impact

Fresh local release-mode microbenchmarks, measuring SSE batch event construction only.

Representative points:

  • 100 records x 1 KiB body
    • raw: 104.9us -> 93.3us per batch, 1.12x faster, 11.0% less serializer time
    • base64: 112.4us -> 108.4us per batch, 1.04x faster, 3.6% less serializer time
  • 100 records x 32-byte body
    • raw: 36.7us -> 28.8us per batch, 1.28x faster, 21.7% less serializer time
    • base64: 39.1us -> 33.1us per batch, 1.18x faster, 15.4% less serializer time

Broader sweep over records in batch = {1, 10, 100, 500} and body bytes = {0, 32, 128, 512, 1024, 4096}:

  • raw improved in every measured cell
  • base64 improved in 23/24 cells; the only regression was 1 record x 4096 bytes (4.245us -> 4.416us, 4.0% slower)
  • average improvement by batch size
    • raw: 11.6% at 1 record up to 18.0% at 500 records
    • base64: 9.4% at 1 record up to 12.2% at 500 records
  • average improvement by body size
    • raw: 23.4% at 0 bytes, tapering to 7.0% at 4096 bytes
    • base64: 18.5% at 0 bytes, tapering to 0.5% at 4096 bytes

Interpretation:

  • raw UTF-8-heavy read batches see a meaningful reduction in serialization work
  • base64 batches still benefit, but less, because base64 encoding remains a larger share of the cost
  • the relative win is larger for smaller bodies and larger batches, where the removed per-record/per-field allocation work is a bigger share of total cost
  • these numbers cover batch-event serialization, not end-to-end request latency
read_batch_sweep_heatmap

Testing

  • just fmt
  • cargo test -p s2-api --features axum serialized_batch_matches_existing_json_shape
  • cargo test -p s2-lite read
  • just test (still known to fail in an unrelated pre-existing CLI test on this machine: s2-cli::cli missing_access_token, because s2 list-basins succeeds without S2_ACCESS_TOKEN in the current environment)

@shikhar shikhar marked this pull request as draft March 13, 2026 01:51
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 13, 2026

Greptile Summary

This PR eliminates the intermediate JSON-shaped allocation layer on the hot read path by introducing a new json.rs serializer (serialize_read_batch) that serializes directly from the common types::stream::ReadBatch rather than first converting it into the owned API ReadBatch/SequencedRecord tree. The optimization is applied consistently to both SSE batch events and unary JSON read responses. The existing owned structs and their encode helpers are preserved. Correctness is verified by a parity test that compares new and old serialization output for all record types (envelope, empty fence, non-empty fence, trim) across both Raw and Base64 formats.

  • New api/src/v1/stream/json.rs: Implements zero-copy Serialize wrappers (ReadBatchJson, RecordJson, HeaderJson, etc.) plus a stack-allocated chunked Base64Display encoder that avoids per-field heap allocations during serialization.
  • api/src/v1/stream/sse.rs: Replaces the impl ReadEvent constructor block and TryFrom<ReadEvent> conversion with four standalone #[cfg(feature = "axum")]-gated functions. This is a breaking change for consumers of the published s2-api crate (v0.27.9) who used the old unconditionally-public constructors without the axum feature — no version bump accompanies the change.
  • lite/src/handlers/v1/records.rs: Updated to call the new standalone SSE helpers and serialize_read_batch directly, removing the intermediate ReadBatch::encode calls.
  • The serialize_struct("SequencedRecord", 4) hint in json.rs is always the maximum but the actual field count ranges from 2–4; this is harmless for serde_json but could cause issues if the serializer is reused with a stricter backend.

Confidence Score: 3/5

  • Safe to merge functionally — the optimization is correct and parity-tested — but the breaking API change without a version bump warrants attention before release.
  • The serialization logic is sound, mathematically verified (Base64 buffer sizing), and validated by a parity test covering all record variants. The performance improvement is well-benchmarked. The main concern dragging the score down is a breaking public API change (removal of unconditionally-public ReadEvent constructor methods, now replaced only under #[cfg(feature = "axum")]) in a published crate with no accompanying version bump in api/Cargo.toml.
  • api/src/v1/stream/sse.rs — verify the intent around the API-breaking removal of ReadEvent constructors and ensure api/Cargo.toml version is bumped accordingly.

Important Files Changed

Filename Overview
api/src/v1/stream/json.rs New zero-copy JSON serializer. Logic is correct and parity-tested against the old path. Minor: serialize_struct hints 4 fields even when only 2–3 are emitted, which is safe for serde_json but misleading for other serializers.
api/src/v1/stream/sse.rs Replaces ReadEvent constructor methods (unconditionally public) with axum-gated standalone functions, creating a breaking API change for non-axum consumers of the published s2-api crate with no accompanying version bump.
api/src/v1/stream/mod.rs Trivial one-line addition to expose the new json module publicly.
lite/src/handlers/v1/records.rs Call-sites updated to use the new standalone SSE event constructors and the borrowed JSON serializer; semantics are preserved and the two streaming paths (unary JSON and SSE) are now consistent.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Handler as lite/handlers/v1/records.rs
    participant Backend
    participant JSON as api/v1/stream/json.rs

    Note over Client,JSON: Unary JSON Read (new path)
    Client->>Handler: GET /records (Accept: application/json)
    Handler->>Backend: backend.read(basin, stream, start, end)
    Backend-->>Handler: ReadBatch (types::stream::ReadBatch)
    Handler->>JSON: serialize_read_batch(format, &batch)
    Note over JSON: Serialize directly from common ReadBatch<br/>No intermediate ReadBatch/SequencedRecord allocation
    JSON-->>Handler: impl Serialize (ReadBatchJson)
    Handler-->>Client: JSON response

    Note over Client,JSON: SSE Streaming Read (new path)
    Client->>Handler: GET /records (Accept: text/event-stream)
    Handler->>Backend: backend.read(basin, stream, start, end)
    loop Per batch
        Backend-->>Handler: ReadSessionOutput::Batch(batch)
        Handler->>JSON: read_batch_event(format, &batch, id)
        Note over JSON: json_data(serialize_read_batch(format, &batch))<br/>Borrows existing batch directly
        JSON-->>Handler: SSE Event (batch)
        Handler-->>Client: data: {records:[...], tail:{...}}
    end
    Backend-->>Handler: ReadSessionOutput::Heartbeat
    Handler-->>Client: SSE ping event
    Handler-->>Client: data: [DONE]
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: api/src/v1/stream/sse.rs
Line: 135-165

Comment:
**Breaking API change drops non-axum-gated constructors**

The removed `impl ReadEvent` block (with `batch`, `error`, `ping`, `done`) carried **no** `#[cfg(feature = "axum")]` gate, so those constructors were available to every consumer of the published `s2-api` crate. All four replacement functions introduced in this PR (`read_batch_event`, `error_event`, `ping_event`, `done_event`) are `#[cfg(feature = "axum")]`-gated. Non-axum users of the crate now have no equivalent replacement — they must switch to direct enum-variant construction.

No version bump appears in `api/Cargo.toml` (still `0.27.9`) to communicate this break. Under semver 0.x the minimum expectation is a minor bump; the missing bump risks silent build failures for downstream consumers who update the dependency.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: api/src/v1/stream/json.rs
Line: 74

Comment:
**`serialize_struct` length hint is always over-counted**

`serde_json` ignores the `len` argument completely, so this has no impact on JSON output today. However, some binary serializers treat the hint as authoritative and may produce incorrect output or return an error when the actual number of serialized fields is fewer than the hint declares.

The struct serializes between 2 and 4 fields depending on the record variant — for example, an empty `Envelope` with no headers and no body emits only `seq_num` and `timestamp`, while a command record with a body emits all four. Passing the fixed value `4` over-counts in the majority of cases.

Since computing the exact count up-front would require a pre-scan and partially defeat the zero-allocation goal, the pragmatic fix is to document the intentional over-count and restrict use of this serializer to JSON contexts, preventing silent bugs if it is ever reused with a stricter serializer.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: 08f8f53

@shikhar shikhar marked this pull request as ready for review March 13, 2026 15:12
@shikhar shikhar merged commit 121aefb into main Mar 13, 2026
16 checks passed
@shikhar shikhar deleted the codex/perf-sse-batch-serialization branch March 13, 2026 15:55
@s2-release-plz s2-release-plz bot mentioned this pull request Mar 13, 2026
shikhar pushed a commit that referenced this pull request Mar 13, 2026
## 🤖 New release

* `s2-api`: 0.27.9 -> 0.27.10 (✓ API compatible changes)
* `s2-lite`: 0.29.21 -> 0.29.22 (✓ API compatible changes)
* `s2-cli`: 0.29.21 -> 0.29.22

<details><summary><i><b>Changelog</b></i></summary><p>

## `s2-api`

<blockquote>

## [0.27.10] - 2026-03-13

### Performance

- Avoid intermediate SSE batch allocations
([#320](#320))

<!-- generated by git-cliff -->
</blockquote>

## `s2-lite`

<blockquote>

## [0.29.22] - 2026-03-13

### Bug Fixes

- Make delete-on-empty armings prospective
([#314](#314))

### Performance

- Avoid intermediate SSE batch allocations
([#320](#320))

<!-- generated by git-cliff -->
</blockquote>

## `s2-cli`

<blockquote>

## [0.29.22] - 2026-03-13

<!-- generated by git-cliff -->
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).

Co-authored-by: s2-release-plz[bot] <262023388+s2-release-plz[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant