Skip to content

feat(native): Add customSerializedValue routing for binary connector deserialization#27652

Merged
20001020ycx merged 11 commits into
prestodb:masterfrom
y-scope:yscope/feat/custom-serialized-value-routing
Jun 9, 2026
Merged

feat(native): Add customSerializedValue routing for binary connector deserialization#27652
20001020ycx merged 11 commits into
prestodb:masterfrom
y-scope:yscope/feat/custom-serialized-value-routing

Conversation

@20001020ycx

@20001020ycx 20001020ycx commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

Description

Add customSerializedValue routing in from_json() for all 10 connector handle types. When customSerializedValue is present in the incoming JSON, the routing Base64-decodes it and dispatches to ConnectorProtocol::deserialize() instead of the default JSON-to-struct parsing.

Motivation and Context

Enables C++ plugin connectors to use binary serialization codecs instead of JSON, as proposed in prestodb/rfcs#49.

This is the C++ worker-side counterpart to the Java ConnectorCodec / CodecSerializer pipeline (#26257).

Test Plan

  • Build presto_server with the changes
  • End-to-end tested with a customized connector using binary serialization codecs on the Java side, loaded as a native plugin via plugin-dir

Release Notes

== NO RELEASE NOTE ==

Summary by Sourcery

Add support for binary-serialized connector handles via a new customSerializedValue pathway in native protocol deserialization.

New Features:

  • Allow connector handle JSON payloads to specify a customSerializedValue that is Base64-decoded and deserialized by the connector protocol instead of using standard JSON mapping.
  • Introduce binary serialization hooks for ColumnHandle, ConnectorPartitioningHandle, and ConnectorIndexHandle through new serialize/deserialize APIs on ConnectorProtocol and corresponding handle types.

Enhancements:

  • Refine native protocol from_json routing for all connector handle types to guard against use of customSerializedValue with internal handle types.

Routes customSerializedValue in from_json() to ConnectorProtocol::deserialize()
for the 5 handle types used by the CLP connector: ColumnHandle,
ConnectorTableHandle, ConnectorTableLayoutHandle, ConnectorSplit,
ConnectorTransactionHandle.

Also adds serialize/deserialize virtuals to ConnectorProtocol.h for
ColumnHandle, and VELOX_NYI stubs to presto_protocol_core.h.
@linux-foundation-easycla

linux-foundation-easycla Bot commented Apr 23, 2026

Copy link
Copy Markdown

CLA Signed

The committers listed above are authorized under a signed CLA.

@sourcery-ai

sourcery-ai Bot commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

Reviewer's Guide

Adds support for binary-serialized connector handles by routing from_json() through a new customSerializedValue field that base64-decodes payloads and delegates to ConnectorProtocol::deserialize, and introduces serialization hooks and header refactors for ColumnHandle, ConnectorPartitioningHandle, and ConnectorIndexHandle.

Sequence diagram for from_json routing with customSerializedValue

sequenceDiagram
  actor JavaCoordinator
  participant NativeWorker
  participant from_json_Function
  participant ConnectorProtocolRegistry
  participant ConnectorProtocol
  participant ConnectorHandle

  JavaCoordinator->>NativeWorker: Send handle_json(@type, customSerializedValue)
  NativeWorker->>from_json_Function: from_json(handle_json, handle_ptr)

  from_json_Function->>from_json_Function: Extract type from @type
  alt customSerializedValue present
    from_json_Function->>from_json_Function: Base64 decode customSerializedValue to binaryData
    from_json_Function->>ConnectorProtocolRegistry: getConnectorProtocol(type)
    ConnectorProtocolRegistry-->>from_json_Function: protocol
    from_json_Function->>ConnectorProtocol: deserialize(binaryData, handle_ptr)
    ConnectorProtocol-->>from_json_Function: handle_ptr populated
  else customSerializedValue absent
    from_json_Function->>ConnectorProtocolRegistry: getConnectorProtocol(type)
    ConnectorProtocolRegistry-->>from_json_Function: protocol
    from_json_Function->>ConnectorProtocol: from_json(handle_json, handle_ptr)
    ConnectorProtocol-->>from_json_Function: handle_ptr populated
  end

  from_json_Function-->>NativeWorker: deserialized ConnectorHandle
  NativeWorker-->>JavaCoordinator: Continue query execution
Loading

Updated class diagram for ConnectorProtocol and binary-serialization handle types

classDiagram

  class JsonEncodedSubclass

  class ColumnHandle {
    <<abstract>>
    +operator_less(o ColumnHandle) bool
    +static serialize(handle ColumnHandle) string
    +static deserialize(data string, handle std_shared_ptr_ColumnHandle)
  }

  class ConnectorPartitioningHandle {
    +static serialize(handle ConnectorPartitioningHandle) string
    +static deserialize(data string, handle std_shared_ptr_ConnectorPartitioningHandle)
  }

  class ConnectorIndexHandle {
    +static serialize(handle ConnectorIndexHandle) string
    +static deserialize(data string, handle std_shared_ptr_ConnectorIndexHandle)
  }

  class ConnectorProtocol {
    <<interface>>
    +to_json(j json, p std_shared_ptr_ColumnHandle) void
    +from_json(j json, p std_shared_ptr_ColumnHandle) void
    +serialize(proto std_shared_ptr_ColumnHandle, thrift string) void
    +deserialize(thrift string, proto std_shared_ptr_ColumnHandle) void

    +to_json(j json, p std_shared_ptr_ConnectorPartitioningHandle) void
    +from_json(j json, p std_shared_ptr_ConnectorPartitioningHandle) void
    +serialize(proto std_shared_ptr_ConnectorPartitioningHandle, thrift string) void
    +deserialize(thrift string, proto std_shared_ptr_ConnectorPartitioningHandle) void

    +to_json(j json, p std_shared_ptr_ConnectorIndexHandle) void
    +from_json(j json, p std_shared_ptr_ConnectorIndexHandle) void
    +serialize(proto std_shared_ptr_ConnectorIndexHandle, thrift string) void
    +deserialize(thrift string, proto std_shared_ptr_ConnectorIndexHandle) void
  }

  class ConnectorProtocolTemplate {
    +to_json(j json, p std_shared_ptr_ColumnHandle) void
    +from_json(j json, p std_shared_ptr_ColumnHandle) void
    +serialize(proto std_shared_ptr_ColumnHandle, thrift string) void
    +deserialize(thrift string, proto std_shared_ptr_ColumnHandle) void

    +to_json(j json, p std_shared_ptr_ConnectorPartitioningHandle) void
    +from_json(j json, p std_shared_ptr_ConnectorPartitioningHandle) void
    +serialize(proto std_shared_ptr_ConnectorPartitioningHandle, thrift string) void
    +deserialize(thrift string, proto std_shared_ptr_ConnectorPartitioningHandle) void

    +to_json(j json, p std_shared_ptr_ConnectorIndexHandle) void
    +from_json(j json, p std_shared_ptr_ConnectorIndexHandle) void
    +serialize(proto std_shared_ptr_ConnectorIndexHandle, thrift string) void
    +deserialize(thrift string, proto std_shared_ptr_ConnectorIndexHandle) void
  }

  JsonEncodedSubclass <|-- ColumnHandle
  JsonEncodedSubclass <|-- ConnectorPartitioningHandle
  JsonEncodedSubclass <|-- ConnectorIndexHandle

  ConnectorProtocolTemplate ..|> ConnectorProtocol
Loading

File-Level Changes

Change Details Files
Route handle deserialization through customSerializedValue when present, falling back to existing JSON-based handling otherwise.
  • Update from_json overloads for all connector handle types (table, transaction, split, partitioning, output, delete, index, column, table layout, insert) to check for a customSerializedValue field.
  • When customSerializedValue exists, ensure handle type is non-internal, Base64-decode the payload, and call getConnectorProtocol(type).deserialize(...) for the appropriate shared_ptr handle.
  • Preserve existing special cases for internal types like $remote and standard JSON-based from_json parsing when customSerializedValue is absent.
presto-native-execution/presto_cpp/presto_protocol/core/presto_protocol_core.cpp
presto-native-execution/presto_cpp/presto_protocol/core/special/*.cpp.inc
Extend ConnectorProtocol to support binary (non-JSON) serialization for selected handle types and wire this through the template implementation.
  • Add pure virtual serialize/deserialize methods for ColumnHandle, ConnectorPartitioningHandle, and ConnectorIndexHandle in ConnectorProtocol.
  • Implement these methods in ConnectorProtocolTemplate to delegate to serializeTemplate for serialization and static ::deserialize methods on the handle structs for deserialization.
  • Ensure the new methods operate on std::string payloads suitable for use with Base64-encoded customSerializedValue fields.
presto-native-execution/presto_cpp/presto_protocol/core/ConnectorProtocol.h
Introduce serialization-capable handle base structs and reorganize their declarations into dedicated headers.
  • Move ColumnHandle, ConnectorPartitioningHandle, and ConnectorIndexHandle struct declarations out of presto_protocol_core.h into new special/*.hpp.inc headers.
  • Augment these structs with static serialize and deserialize stubs that currently VELOX_NYI, to be implemented by concrete connectors.
  • Keep existing behavior such as ColumnHandle::operator< while preparing these types for binary serialization use across the protocol.
presto-native-execution/presto_cpp/presto_protocol/core/presto_protocol_core.h
presto-native-execution/presto_cpp/presto_protocol/core/special/ColumnHandle.hpp.inc
presto-native-execution/presto_cpp/presto_protocol/core/special/ConnectorPartitioningHandle.hpp.inc
presto-native-execution/presto_cpp/presto_protocol/core/special/ConnectorIndexHandle.hpp.inc

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@20001020ycx 20001020ycx marked this pull request as ready for review April 23, 2026 17:31
@20001020ycx 20001020ycx requested review from a team as code owners April 23, 2026 17:31

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The new customSerializedValue handling for various handles appears duplicated between presto_protocol_core.cpp and the corresponding special/*.cpp.inc files (e.g., ConnectorTableHandle, ConnectorPartitioningHandle, ConnectorIndexHandle, ColumnHandle), which increases maintenance cost; consider centralizing this logic in one place per handle type.
  • For ColumnHandle::from_json, the customSerializedValue branch uses j["@type"] without the internal handle guard present elsewhere (!type.empty() && type[0] != '$'), leading to slightly different semantics from the other handle types; consider aligning the checks and error behavior for consistency.
  • In ConnectorProtocol and its template implementation, the binary payload parameters are named thrift, which is a bit misleading now that this path is generic binary serialization; consider renaming these parameters (and related helper names) to reflect their general purpose.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new `customSerializedValue` handling for various handles appears duplicated between `presto_protocol_core.cpp` and the corresponding `special/*.cpp.inc` files (e.g., `ConnectorTableHandle`, `ConnectorPartitioningHandle`, `ConnectorIndexHandle`, `ColumnHandle`), which increases maintenance cost; consider centralizing this logic in one place per handle type.
- For `ColumnHandle::from_json`, the `customSerializedValue` branch uses `j["@type"]` without the internal handle guard present elsewhere (`!type.empty() && type[0] != '$'`), leading to slightly different semantics from the other handle types; consider aligning the checks and error behavior for consistency.
- In `ConnectorProtocol` and its template implementation, the binary payload parameters are named `thrift`, which is a bit misleading now that this path is generic binary serialization; consider renaming these parameters (and related helper names) to reflect their general purpose.

## Individual Comments

### Comment 1
<location path="presto-native-execution/presto_cpp/presto_protocol/core/ConnectorProtocol.h" line_range="234-241" />
<code_context>
+      std::string& thrift) const final {
+    serializeTemplate<ColumnHandleType>(proto, thrift);
+  }
+  void deserialize(
+      const std::string& thrift,
+      std::shared_ptr<ColumnHandle>& proto) const final {
</code_context>
<issue_to_address>
**issue (bug_risk):** ConnectorPartitioningHandle custom deserialization is also wired to the base type, limiting connector-specific behavior.

`ConnectorProtocolTemplate` currently calls the base `ConnectorPartitioningHandle::deserialize(thrift, proto)`, which is NYI and prevents connector-specific handle types from implementing their own binary deserialization for `customSerializedValue`. To support connector-specific payloads, this should dispatch via the concrete `ConnectorPartitioningHandleType` (or a template helper) instead of the base class static.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@steveburnett

Copy link
Copy Markdown
Contributor

Please sign the Presto CLA as mentioned in this comment.

@20001020ycx 20001020ycx force-pushed the yscope/feat/custom-serialized-value-routing branch from becf9fc to 6f307dc Compare April 24, 2026 15:06
20001020ycx added a commit to y-scope/presto that referenced this pull request Apr 29, 2026
Cherry-picked from upstream PR prestodb#27652 (Tim's approach for Tpch connector).

Adds serialize/deserialize pure virtuals to ConnectorProtocol for all
handle types and customSerializedValue routing in .cpp.inc files.
@majetideepak

Copy link
Copy Markdown
Collaborator

@20001020ycx

Copy link
Copy Markdown
Contributor Author

@20001020ycx is it possible to add some tests here https://github.com/prestodb/presto/tree/master/presto-native-execution/presto_cpp/presto_protocol/tests for this support?

Sure, I will look into it now and post the related change soon.

Add tests verifying that from_json() correctly routes Base64-encoded
customSerializedValue payloads to ConnectorProtocol::deserialize() for
all 10 connector handle types.
…butedProcedureHandle

Add customSerializedValue routing in ConnectorDistributedProcedureHandle.cpp.inc
and add serialize/deserialize virtuals to ConnectorProtocol and
ConnectorProtocolTemplate. Add test coverage for this handle type.
@20001020ycx

20001020ycx commented May 14, 2026

Copy link
Copy Markdown
Contributor Author

Hi @majetideepak, I've added an unit test, CustomSerializedValueRoutingTest.cpp, which instantiates a test connector implementing all handle types supported by ConnectorProtocolTemplate. In this unit test, we verify all types can route customSerializedValue with a simple Base64 string. This simulates the end-to-end binary serialization/deserialization used by our CLP connector and TIm's example on TPCH connector in #26650.

Thank you for your time reviewing this PR — please let us know if you have any further concerns.

@tdcmeehan tdcmeehan self-assigned this May 14, 2026
…edProcedureHandle

Add VELOX_NYI serialize/deserialize stubs to ConnectorDistributedProcedureHandle
so that the customSerializedValue routing added in the previous commit compiles.
Without these stubs, IcebergDistributedProcedureHandle fails to compile when
inheriting the ConnectorProtocol template methods.

@majetideepak majetideepak left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calude has some more points

  - 11x copy-pasted routing block — extract a tryRouteCustomSerializedValue() helper.
  - ConnectorDistributedProcedureHandle::serialize takes non-const ref; the other new stubs take const&. Make consistent.
  - Test uses <arpa/inet.h> for htons/ntohs just for a 2-byte length prefix — non-portable.   use folly::Endian::big?.

…tent with other handle types

Move the customSerializedValue check after getSubclassKey() and add the
VELOX_CHECK guard against $-prefixed internal types, matching the pattern
used by all other handle types.
…ke const&

Align with the other new serialize stubs (ColumnHandle,
ConnectorPartitioningHandle, ConnectorIndexHandle) which all take
const&. The non-const ref was inconsistent and semantically wrong
for a serialize operation.
@20001020ycx

Copy link
Copy Markdown
Contributor Author

Calude has some more points

  - 11x copy-pasted routing block — extract a tryRouteCustomSerializedValue() helper.
  - ConnectorDistributedProcedureHandle::serialize takes non-const ref; the other new stubs take const&. Make consistent.
  - Test uses <arpa/inet.h> for htons/ntohs just for a 2-byte length prefix — non-portable.   use folly::Endian::big?.

Good points! I've applied the 2nd and the 3rd point in the latest push

However, on the first point, I am a bit hesitant on the first point as I am not sure if *.cpp.inc support a common utility cpp.inc for such helper template - tryRouteCustomSerializedValue. Since the original routing was already duplicated, I'd prefer to stay consistent with the existing pattern.

@20001020ycx 20001020ycx force-pushed the yscope/feat/custom-serialized-value-routing branch from 097a555 to 0a7c63a Compare June 5, 2026 02:58
Use folly::Endian::big instead of htons/ntohs for byte-swapping,
which is already a linked dependency and makes the big-endian intent
explicit rather than relying on the network-order abstraction.
@20001020ycx 20001020ycx force-pushed the yscope/feat/custom-serialized-value-routing branch from 0a7c63a to 1843179 Compare June 5, 2026 03:02
…test CMake

FOLLY_INCLUDE_DIRS was empty on CI, making target_include_directories
a no-op. Folly::folly as a CMake imported target automatically provides
both include directories and link libraries, matching how other targets
in the project reference folly.
majetideepak
majetideepak previously approved these changes Jun 5, 2026
…dian.h

folly/Endian.h does not exist in this version of folly — the Endian
class is defined in folly/lang/Bits.h. Also revert Folly::folly back
to ${FOLLY_LIBRARIES} as the include path issue was a red herring.
@20001020ycx 20001020ycx merged commit 6e1942b into prestodb:master Jun 9, 2026
130 of 132 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants