feat(native): Add customSerializedValue routing for binary connector deserialization#27652
Conversation
Routes customSerializedValue in from_json() to ConnectorProtocol::deserialize() for the 5 handle types used by the CLP connector: ColumnHandle, ConnectorTableHandle, ConnectorTableLayoutHandle, ConnectorSplit, ConnectorTransactionHandle. Also adds serialize/deserialize virtuals to ConnectorProtocol.h for ColumnHandle, and VELOX_NYI stubs to presto_protocol_core.h.
Reviewer's GuideAdds support for binary-serialized connector handles by routing from_json() through a new customSerializedValue field that base64-decodes payloads and delegates to ConnectorProtocol::deserialize, and introduces serialization hooks and header refactors for ColumnHandle, ConnectorPartitioningHandle, and ConnectorIndexHandle. Sequence diagram for from_json routing with customSerializedValuesequenceDiagram
actor JavaCoordinator
participant NativeWorker
participant from_json_Function
participant ConnectorProtocolRegistry
participant ConnectorProtocol
participant ConnectorHandle
JavaCoordinator->>NativeWorker: Send handle_json(@type, customSerializedValue)
NativeWorker->>from_json_Function: from_json(handle_json, handle_ptr)
from_json_Function->>from_json_Function: Extract type from @type
alt customSerializedValue present
from_json_Function->>from_json_Function: Base64 decode customSerializedValue to binaryData
from_json_Function->>ConnectorProtocolRegistry: getConnectorProtocol(type)
ConnectorProtocolRegistry-->>from_json_Function: protocol
from_json_Function->>ConnectorProtocol: deserialize(binaryData, handle_ptr)
ConnectorProtocol-->>from_json_Function: handle_ptr populated
else customSerializedValue absent
from_json_Function->>ConnectorProtocolRegistry: getConnectorProtocol(type)
ConnectorProtocolRegistry-->>from_json_Function: protocol
from_json_Function->>ConnectorProtocol: from_json(handle_json, handle_ptr)
ConnectorProtocol-->>from_json_Function: handle_ptr populated
end
from_json_Function-->>NativeWorker: deserialized ConnectorHandle
NativeWorker-->>JavaCoordinator: Continue query execution
Updated class diagram for ConnectorProtocol and binary-serialization handle typesclassDiagram
class JsonEncodedSubclass
class ColumnHandle {
<<abstract>>
+operator_less(o ColumnHandle) bool
+static serialize(handle ColumnHandle) string
+static deserialize(data string, handle std_shared_ptr_ColumnHandle)
}
class ConnectorPartitioningHandle {
+static serialize(handle ConnectorPartitioningHandle) string
+static deserialize(data string, handle std_shared_ptr_ConnectorPartitioningHandle)
}
class ConnectorIndexHandle {
+static serialize(handle ConnectorIndexHandle) string
+static deserialize(data string, handle std_shared_ptr_ConnectorIndexHandle)
}
class ConnectorProtocol {
<<interface>>
+to_json(j json, p std_shared_ptr_ColumnHandle) void
+from_json(j json, p std_shared_ptr_ColumnHandle) void
+serialize(proto std_shared_ptr_ColumnHandle, thrift string) void
+deserialize(thrift string, proto std_shared_ptr_ColumnHandle) void
+to_json(j json, p std_shared_ptr_ConnectorPartitioningHandle) void
+from_json(j json, p std_shared_ptr_ConnectorPartitioningHandle) void
+serialize(proto std_shared_ptr_ConnectorPartitioningHandle, thrift string) void
+deserialize(thrift string, proto std_shared_ptr_ConnectorPartitioningHandle) void
+to_json(j json, p std_shared_ptr_ConnectorIndexHandle) void
+from_json(j json, p std_shared_ptr_ConnectorIndexHandle) void
+serialize(proto std_shared_ptr_ConnectorIndexHandle, thrift string) void
+deserialize(thrift string, proto std_shared_ptr_ConnectorIndexHandle) void
}
class ConnectorProtocolTemplate {
+to_json(j json, p std_shared_ptr_ColumnHandle) void
+from_json(j json, p std_shared_ptr_ColumnHandle) void
+serialize(proto std_shared_ptr_ColumnHandle, thrift string) void
+deserialize(thrift string, proto std_shared_ptr_ColumnHandle) void
+to_json(j json, p std_shared_ptr_ConnectorPartitioningHandle) void
+from_json(j json, p std_shared_ptr_ConnectorPartitioningHandle) void
+serialize(proto std_shared_ptr_ConnectorPartitioningHandle, thrift string) void
+deserialize(thrift string, proto std_shared_ptr_ConnectorPartitioningHandle) void
+to_json(j json, p std_shared_ptr_ConnectorIndexHandle) void
+from_json(j json, p std_shared_ptr_ConnectorIndexHandle) void
+serialize(proto std_shared_ptr_ConnectorIndexHandle, thrift string) void
+deserialize(thrift string, proto std_shared_ptr_ConnectorIndexHandle) void
}
JsonEncodedSubclass <|-- ColumnHandle
JsonEncodedSubclass <|-- ConnectorPartitioningHandle
JsonEncodedSubclass <|-- ConnectorIndexHandle
ConnectorProtocolTemplate ..|> ConnectorProtocol
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The new
customSerializedValuehandling for various handles appears duplicated betweenpresto_protocol_core.cppand the correspondingspecial/*.cpp.incfiles (e.g.,ConnectorTableHandle,ConnectorPartitioningHandle,ConnectorIndexHandle,ColumnHandle), which increases maintenance cost; consider centralizing this logic in one place per handle type. - For
ColumnHandle::from_json, thecustomSerializedValuebranch usesj["@type"]without the internal handle guard present elsewhere (!type.empty() && type[0] != '$'), leading to slightly different semantics from the other handle types; consider aligning the checks and error behavior for consistency. - In
ConnectorProtocoland its template implementation, the binary payload parameters are namedthrift, which is a bit misleading now that this path is generic binary serialization; consider renaming these parameters (and related helper names) to reflect their general purpose.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new `customSerializedValue` handling for various handles appears duplicated between `presto_protocol_core.cpp` and the corresponding `special/*.cpp.inc` files (e.g., `ConnectorTableHandle`, `ConnectorPartitioningHandle`, `ConnectorIndexHandle`, `ColumnHandle`), which increases maintenance cost; consider centralizing this logic in one place per handle type.
- For `ColumnHandle::from_json`, the `customSerializedValue` branch uses `j["@type"]` without the internal handle guard present elsewhere (`!type.empty() && type[0] != '$'`), leading to slightly different semantics from the other handle types; consider aligning the checks and error behavior for consistency.
- In `ConnectorProtocol` and its template implementation, the binary payload parameters are named `thrift`, which is a bit misleading now that this path is generic binary serialization; consider renaming these parameters (and related helper names) to reflect their general purpose.
## Individual Comments
### Comment 1
<location path="presto-native-execution/presto_cpp/presto_protocol/core/ConnectorProtocol.h" line_range="234-241" />
<code_context>
+ std::string& thrift) const final {
+ serializeTemplate<ColumnHandleType>(proto, thrift);
+ }
+ void deserialize(
+ const std::string& thrift,
+ std::shared_ptr<ColumnHandle>& proto) const final {
</code_context>
<issue_to_address>
**issue (bug_risk):** ConnectorPartitioningHandle custom deserialization is also wired to the base type, limiting connector-specific behavior.
`ConnectorProtocolTemplate` currently calls the base `ConnectorPartitioningHandle::deserialize(thrift, proto)`, which is NYI and prevents connector-specific handle types from implementing their own binary deserialization for `customSerializedValue`. To support connector-specific payloads, this should dispatch via the concrete `ConnectorPartitioningHandleType` (or a template helper) instead of the base class static.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
Please sign the Presto CLA as mentioned in this comment. |
becf9fc to
6f307dc
Compare
Cherry-picked from upstream PR prestodb#27652 (Tim's approach for Tpch connector). Adds serialize/deserialize pure virtuals to ConnectorProtocol for all handle types and customSerializedValue routing in .cpp.inc files.
|
@20001020ycx is it possible to add some tests here https://github.com/prestodb/presto/tree/master/presto-native-execution/presto_cpp/presto_protocol/tests for this support? |
Sure, I will look into it now and post the related change soon. |
Add tests verifying that from_json() correctly routes Base64-encoded customSerializedValue payloads to ConnectorProtocol::deserialize() for all 10 connector handle types.
…butedProcedureHandle Add customSerializedValue routing in ConnectorDistributedProcedureHandle.cpp.inc and add serialize/deserialize virtuals to ConnectorProtocol and ConnectorProtocolTemplate. Add test coverage for this handle type.
|
Hi @majetideepak, I've added an unit test, Thank you for your time reviewing this PR — please let us know if you have any further concerns. |
…edProcedureHandle Add VELOX_NYI serialize/deserialize stubs to ConnectorDistributedProcedureHandle so that the customSerializedValue routing added in the previous commit compiles. Without these stubs, IcebergDistributedProcedureHandle fails to compile when inheriting the ConnectorProtocol template methods.
majetideepak
left a comment
There was a problem hiding this comment.
Calude has some more points
- 11x copy-pasted routing block — extract a tryRouteCustomSerializedValue() helper.
- ConnectorDistributedProcedureHandle::serialize takes non-const ref; the other new stubs take const&. Make consistent.
- Test uses <arpa/inet.h> for htons/ntohs just for a 2-byte length prefix — non-portable. use folly::Endian::big?.
…tent with other handle types Move the customSerializedValue check after getSubclassKey() and add the VELOX_CHECK guard against $-prefixed internal types, matching the pattern used by all other handle types.
…ke const& Align with the other new serialize stubs (ColumnHandle, ConnectorPartitioningHandle, ConnectorIndexHandle) which all take const&. The non-const ref was inconsistent and semantically wrong for a serialize operation.
Good points! I've applied the 2nd and the 3rd point in the latest push However, on the first point, I am a bit hesitant on the first point as I am not sure if *.cpp.inc support a common utility cpp.inc for such helper template - tryRouteCustomSerializedValue. Since the original routing was already duplicated, I'd prefer to stay consistent with the existing pattern. |
097a555 to
0a7c63a
Compare
Use folly::Endian::big instead of htons/ntohs for byte-swapping, which is already a linked dependency and makes the big-endian intent explicit rather than relying on the network-order abstraction.
0a7c63a to
1843179
Compare
…test CMake FOLLY_INCLUDE_DIRS was empty on CI, making target_include_directories a no-op. Folly::folly as a CMake imported target automatically provides both include directories and link libraries, matching how other targets in the project reference folly.
…dian.h
folly/Endian.h does not exist in this version of folly — the Endian
class is defined in folly/lang/Bits.h. Also revert Folly::folly back
to ${FOLLY_LIBRARIES} as the include path issue was a red herring.
…pe/feat/custom-serialized-value-routing
Description
Add
customSerializedValuerouting infrom_json()for all 10 connector handle types. WhencustomSerializedValueis present in the incoming JSON, the routing Base64-decodes it and dispatches toConnectorProtocol::deserialize()instead of the default JSON-to-struct parsing.Motivation and Context
Enables C++ plugin connectors to use binary serialization codecs instead of JSON, as proposed in prestodb/rfcs#49.
This is the C++ worker-side counterpart to the Java
ConnectorCodec/CodecSerializerpipeline (#26257).Test Plan
presto_serverwith the changesplugin-dirRelease Notes
Summary by Sourcery
Add support for binary-serialized connector handles via a new customSerializedValue pathway in native protocol deserialization.
New Features:
Enhancements: