Skip to content

feat: Add binary serialization codecs for CLP connector.#158

Merged
20001020ycx merged 13 commits into
y-scope:release-0.297-edge-10-clp-connectorfrom
20001020ycx:feat/2026-04-27-binary-codecs
May 1, 2026
Merged

feat: Add binary serialization codecs for CLP connector.#158
20001020ycx merged 13 commits into
y-scope:release-0.297-edge-10-clp-connectorfrom
20001020ycx:feat/2026-04-27-binary-codecs

Conversation

@20001020ycx

@20001020ycx 20001020ycx commented Apr 27, 2026

Copy link
Copy Markdown

Summary

Add binary serialization codecs for all handle types required by CLP Connector (ColumnHandle, Split, TableHandle, TableLayoutHandle, TransactionHandle) using the ConnectorCodecProvider SPI introduced in prestodb#26257.

Follows the approach proposed by Tim in prestodb#26650 for the Tpch connector.

Background & Motivation

When the Presto coordinator plans a query, it creates connector-specific handles and sends them to workers. Currently, each handle is serialized to JSON via Jackson. On the native Velox worker side, these JSON handles are deserialized into C++ structs through auto-generated code —Presto provides a build-time component called Presto Native Worker Protocol Code Generation, which generates C++ struct definitions and their from_json()/to_json() functions from YAML schema files (presto_protocol_*.yml) using a chevron template engine. These YAML schemas live in the Presto presto_protocol directory, reference the Java handle classes in the Presto source root, and are tightly coupled with Presto's protocol code infrastructure.

This works fine as long as the connector lives inside the Presto repo, since any change to the handles is as simple as regenerating the protocol files using the chevron engine. But we are migrating the CLP connector into a standalone Velox .so plugin repo, and this codegen pipeline becomes a maintenance burden.

For example, say we want to add a field to ClpColumnHandle.java. With the current JSON approach, we must:

  1. Fork Presto, then copy the Java plugin into the fork so the YAML schema can reference it
  2. Edit the YAML protocol schema to add the new field
  3. Run the chevron code generator to regenerate .h and .cpp files
  4. Copy the generated results into the Velox plugin

The alternative is to copy over the entire Presto Presto Native Worker Protocol Code Generation pipeline within our Velox plugin. This lets us run the codegen in our own repo without forking Presto, but then any upstream change to the protocol must be ported manually.

Either approach costs us greatly in maintenance.

Solution: Binary Serialization

Binary serialization eliminates the dependency on Presto Native Worker Protocol Code Generation entirely. Instead of relying on the generated from_json()/to_json(), a field called customSerializedValue routes the deserialization to our custom end point whose workflow is shown belox:

Now, adding a field to ClpColumnHandle.java is simply:

  1. Add the field to the Java handle class
  2. Update the Java codec (ClpColumnHandleCodec) to serialize/deserialize the new field
  3. Update the C++ ClpConnectorProtocol::deserialize() to read the new field

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • All unit test passed
  • End to end test
    • Query: SELECT CLP_GET_JSON_STRING() from clp.default.default limit 100
    • Set up: official clp package v0.12.0, Presto with the changes in this PR, Velox with deserialization of customSerializedValue , and minio as the storage.
    • The result is returned as expected with the binary serialization, here is the JSON envelope sent from coordinator to worker, when use-connector-provided-serialization-codecs=true:
"table" : {
  "connectorId" : "clp",
  "connectorHandle" : {
    "@type" : "clp",
    "customSerializedValue" : "AAdkZWZhdWx0AAdkZWZhdWx0ABBhcmNoaXZlcy9kZWZhdWx0"
  },
  "transaction" : {
    "@type" : "clp",
    "customSerializedValue" : ""
  },
  "connectorTableLayout" : {
    "@type" : "clp",
    "customSerializedValue" : "AAdkZWZhdWx0AAdkZWZhdWx0ABBhcmNoaXZlcy9kZWZhdWx0AAA="
  }
},
"outputVariables" : [ {
  "@type" : "variable",
  "name" : "__json_string",
  "type" : "varchar"
} ],
"assignments" : {
  "__json_string\u003Cvarchar\u003E" : {
    "@type" : "clp",
    "customSerializedValue" : "AA1fX2pzb25fc3RyaW5nAA1fX2pzb25fc3RyaW5nAAd2YXJjaGFy"
  }
}

Summary by CodeRabbit

  • New Features

    • Connector exposes a codec provider and accepts type information to enable correct, type-aware encoding/decoding of CLP handles.
    • Deterministic serialization/deserialization added for table, column, split, table-layout and transaction handles with strict wire-format validation.
  • Tests

    • Added test suite validating codec round-trips and key-field preservation for all CLP connector handle types.

Implement ConnectorCodecProvider with DataOutputStream-based binary
codecs for all 5 CLP handle types (ColumnHandle, Split, TableHandle,
TableLayoutHandle, TransactionHandle). This enables the C++ Velox
worker to deserialize handles via customSerializedValue without
depending on the Presto JSON protocol type system.
@coderabbitai

coderabbitai Bot commented Apr 27, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds connector codecs and a ConnectorCodecProvider to CLP; ClpConnector now accepts a TypeManager and exposes the provider. New deterministic binary codecs handle serialization/deserialization for table, layout, column, split, and transaction handles; a utility for UTF‑8 length‑prefixed strings and tests were added.

Changes

Cohort / File(s) Summary
Connector
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
Constructor now accepts TypeManager; added getConnectorCodecProvider() override returning ClpConnectorCodecProvider instantiated with the injected TypeManager.
Codec Provider
presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpConnectorCodecProvider.java
New ConnectorCodecProvider that stores a TypeManager and supplies codecs for table/table-layout/column/split/transaction handles.
Column Codec
presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpColumnHandleCodec.java
New ConnectorCodec<ColumnHandle>: serializes column name, original name, and type signature; deserializes by parsing the TypeSignature and resolving a Type via the injected TypeManager.
Table / Layout / Split / Transaction Codecs
presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpTableHandleCodec.java, .../ClpTableLayoutHandleCodec.java, .../ClpSplitCodec.java, .../ClpTransactionHandleCodec.java
Four new codecs implementing deterministic binary wire formats: table handle (schema, table, path), table layout (table handle + optional kql/metadataSql), split (path, split type ordinal, optional kql), transaction (empty payload / singleton). IOExceptions wrapped as UncheckedIOException.
Codec Utilities
presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/CodecUtils.java
New internal utility for writing/reading UTF‑8 strings with a 2‑byte big‑endian length prefix.
Tests
presto-clp/src/test/java/com/facebook/presto/plugin/clp/codec/TestClpConnectorCodecProvider.java
New TestNG suite validating serialize/deserialize roundtrips for column, split, table, table layout, and transaction codecs using a stub TypeManager.

Sequence Diagram

sequenceDiagram
    participant Client
    participant ClpConnector as ClpConnector
    participant Provider as ClpConnectorCodecProvider
    participant TypeManager as TypeManager
    participant Codec as ConnectorCodec

    Client->>ClpConnector: getConnectorCodecProvider()
    ClpConnector-->>Client: Provider (holds TypeManager)

    Client->>Provider: getColumnHandleCodec()
    Provider->>TypeManager: (injected) used to construct ClpColumnHandleCodec
    Provider-->>Client: Optional<ConnectorCodec>

    Client->>Codec: serialize(handle)
    Codec-->>Client: byte[]

    Client->>Codec: deserialize(byte[])
    Codec->>TypeManager: resolve Type (column codec)
    Codec-->>Client: handle
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.41% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description check ✅ Passed The PR description covers the required template sections: a comprehensive summary, detailed background/motivation, the solution approach, validation performed with end-to-end testing, and a contributor checklist with items marked complete.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title 'feat: Add binary serialization codecs for CLP connector' accurately and concisely describes the main change: implementing binary serialization codecs for the CLP connector's handle types.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpColumnHandleCodec.java`:
- Around line 65-67: ClpColumnHandleCodec currently parses the TypeSignature and
calls typeManager.getType(...) but does not handle a null return, producing an
invalid ClpColumnHandle; add a null check after Type type =
typeManager.getType(typeSignature) in ClpColumnHandleCodec and if type is null
throw an explicit exception (e.g., IllegalArgumentException or a
PrestoException) with a clear message including the typeSignatureString and
columnName so callers fail fast instead of creating a bad ClpColumnHandle.

In
`@presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpSplitCodec.java`:
- Around line 56-60: The deserialization reads an ordinal with
ClpSplit.SplitType.values()[in.readInt()] which can throw
ArrayIndexOutOfBoundsException for invalid/future ordinals; change it in
ClpSplitCodec to read the int into a local variable (e.g., int ordinal =
in.readInt()), check that ordinal is within 0 and
ClpSplit.SplitType.values().length - 1, and if out of range throw a clear
IOException/IllegalArgumentException describing the bad ordinal and payload
source; only after validation map ordinal to ClpSplit.SplitType and proceed to
construct the new ClpSplit(path, type, kqlQuery).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 34620496-6316-4af9-8844-12a0f454ef9c

📥 Commits

Reviewing files that changed from the base of the PR and between d10e6a4 and 9be37e1.

📒 Files selected for processing (9)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpCodecProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpColumnHandleCodec.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpSplitCodec.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpTableHandleCodec.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpTableLayoutHandleCodec.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpTransactionHandleCodec.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/codec/TestClpCodecByteFixtures.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/codec/TestClpCodecRoundtrip.java

…ch style

Rename ClpCodecProvider → ClpConnectorCodecProvider to follow Tim's
TpchConnectorCodecProvider naming convention. Consolidate all 5 codec
classes into inner static classes within a single file. Adopt Tim's
variable naming style (cast to type-specific variable) and error
handling pattern (UncheckedIOException with plain try blocks).

Follows the approach proposed in prestodb#26650.
Move each codec class out of ClpConnectorCodecProvider into its own
file. ClpConnectorCodecProvider now only provides the connector-level
factory methods. Each codec still follows Tim's style from prestodb#26650
(cast to type-specific variable, UncheckedIOException, plain try).
…remove byte fixtures

- Delete TestClpCodecByteFixtures.java
- Rename TestClpCodecRoundtrip → TestClpConnectorCodecProvider
- Change kql in testSplitRoundtripArchiveWithKql to LEVEL:ERROR
- Remove testTableLayoutHandleMinimalRoundtrip
- Remove testCodecProviderReturnsAllCodecs
- Remove testEmptyStringRoundtrip
- Remove testEmptyOptionalRoundtrip

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpSplitCodec.java (1)

52-63: ⚠️ Potential issue | 🟠 Major

Bounds-check the split ordinal before indexing the enum.

SplitType.values()[...] will throw on bad or future ordinals, which makes corrupted payloads fail with an opaque exception.

Proposed fix
-            ClpSplit.SplitType type = ClpSplit.SplitType.values()[in.readInt()];
+            int ordinal = in.readInt();
+            ClpSplit.SplitType[] values = ClpSplit.SplitType.values();
+            if (ordinal < 0 || ordinal >= values.length) {
+                throw new IllegalArgumentException("Unknown ClpSplit.SplitType ordinal: " + ordinal);
+            }
+            ClpSplit.SplitType type = values[ordinal];
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpSplitCodec.java`
around lines 52 - 63, In ClpSplitCodec.deserialize, guard against invalid
ordinals before indexing ClpSplit.SplitType.values(): read the int ordinal into
a local variable, check it is within 0 (inclusive) and
ClpSplit.SplitType.values().length (exclusive), and if not throw a descriptive
IOException (or return a clear failure) instead of directly indexing; then map
the validated ordinal to ClpSplit.SplitType and continue building the ClpSplit
(refer to method deserialize and enum ClpSplit.SplitType and the ClpSplit
constructor).
presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpColumnHandleCodec.java (1)

59-69: ⚠️ Potential issue | 🟠 Major

Guard the resolved type before constructing the handle.

TypeManager#getType(...) can return null; passing that through here would create an invalid ClpColumnHandle and shift the failure to a later call site.

Proposed fix
-            Type type = typeManager.getType(TypeSignature.parseTypeSignature(typeSignature));
-            return new ClpColumnHandle(columnName, originalColumnName, type);
+            Type type = typeManager.getType(TypeSignature.parseTypeSignature(typeSignature));
+            if (type == null) {
+                throw new IllegalArgumentException("Unknown CLP column type: " + typeSignature);
+            }
+            return new ClpColumnHandle(columnName, originalColumnName, type);
Read-only verification
#!/bin/bash
set -euo pipefail
sed -n '1,220p' presto-common/src/main/java/com/facebook/presto/common/type/TypeManager.java
rg -n -C 2 '\bgetType\s*\(' \
  presto-common/src/main/java/com/facebook/presto/common/type/TypeManager.java \
  presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpColumnHandleCodec.java
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpColumnHandleCodec.java`
around lines 59 - 69, In ClpColumnHandleCodec.deserialize, guard against
TypeManager#getType returning null before constructing a ClpColumnHandle: after
resolving Type type = typeManager.getType(...), check if type == null and if so
throw a clear IOException (or other appropriate checked exception declared by
deserialize) with a message that includes the parsed typeSignature; otherwise
proceed to return new ClpColumnHandle(...). This ensures deserialize validates
the resolved type and fails fast instead of creating an invalid ClpColumnHandle.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In
`@presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpColumnHandleCodec.java`:
- Around line 59-69: In ClpColumnHandleCodec.deserialize, guard against
TypeManager#getType returning null before constructing a ClpColumnHandle: after
resolving Type type = typeManager.getType(...), check if type == null and if so
throw a clear IOException (or other appropriate checked exception declared by
deserialize) with a message that includes the parsed typeSignature; otherwise
proceed to return new ClpColumnHandle(...). This ensures deserialize validates
the resolved type and fails fast instead of creating an invalid ClpColumnHandle.

In
`@presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpSplitCodec.java`:
- Around line 52-63: In ClpSplitCodec.deserialize, guard against invalid
ordinals before indexing ClpSplit.SplitType.values(): read the int ordinal into
a local variable, check it is within 0 (inclusive) and
ClpSplit.SplitType.values().length (exclusive), and if not throw a descriptive
IOException (or return a clear failure) instead of directly indexing; then map
the validated ordinal to ClpSplit.SplitType and continue building the ClpSplit
(refer to method deserialize and enum ClpSplit.SplitType and the ClpSplit
constructor).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c3bdf0b5-fc82-4e80-973d-5c6b3a5d88f6

📥 Commits

Reviewing files that changed from the base of the PR and between 9be37e1 and b733bd3.

📒 Files selected for processing (8)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpColumnHandleCodec.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpConnectorCodecProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpSplitCodec.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpTableHandleCodec.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpTableLayoutHandleCodec.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpTransactionHandleCodec.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/codec/TestClpConnectorCodecProvider.java

…HandleCodec

Add wire format comments to each codec class cross-referencing the C++
ClpConnectorProtocol::deserialize() counterpart. Remove redundant
comments from ClpTransactionHandleCodec.
@20001020ycx 20001020ycx requested a review from gibber9809 April 28, 2026 21:34
20001020ycx added a commit that referenced this pull request Apr 29, 2026
Implement binary deserialization for all 5 CLP connector handle types
(ClpColumnHandle, ClpSplit, ClpTableHandle, ClpTableLayoutHandle,
ClpTransactionHandle) by inheriting ConnectorProtocol and overriding
deserialize().

Reads binary bytes (Java DataOutputStream big-endian wire format from
#158) directly into existing protocol structs using std::istringstream
and folly::Endian::big.

Follows Tim's TpchConnectorProtocol pattern from prestodb#26650.

@gibber9809 gibber9809 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Besides the other comments I left, I'm also wondering if you think there's any benefit to making the deserialization code stricter? In particular, right now it seems like all of the deserialize methods will allow and silently drop extra bytes that follow valid encoded data, e.g. <bytes for valid encoding><extra unexpected bytes>.

ByteArrayOutputStream byteOut = new ByteArrayOutputStream();
DataOutputStream out = new DataOutputStream(byteOut);
out.writeUTF(columnHandle.getColumnName());
out.writeUTF(columnHandle.getOriginalColumnName());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per java's docs the string content from writeUTF isn't actually UTF-8, and in particular uses a non-standard encoding for \u0000 and only supports encoding 4-byte characters as surrogate pairs.

To make it easier to work with this encoded data in clp-s it might be worth considering just writing our own function to read/write strings as real UTF-8.

@20001020ycx 20001020ycx Apr 30, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emmmm, I am a bit hesitant on this probably because I am not clear with the context. I am sure you are correct on Java's writeUTF not being UTF-8, but what would the consequence be? Everything we serialize here will be eventually deserialized at Velox worker, which is way before we perform data extraction with clp-s.

Using kql, the field in ClpTableLayoutHandle as an example, it will be a native C++ string when passing it to clp-s, rather than a byte buffer.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand the confusion. Unlike other languages, c++ doesn't really have a well-defined encoding for std::string. Typically UTF-8 or ASCII will be used, but internally it's just treated as an array of bytes, so it's up to the programmer to guarantee a specific encoding.

Since clp-s works under the assumption that (most) strings are valid UTF-8, some string operations, such as string comparisons, won't work as expected when that's not the case.

In this instance, that would come up as some column names that contain characters writeUTF encodes in a non-standard way not being searchable.

Of course we could follow the alternative approach of converting the non-standard UTF-8 into regular UTF-8 on the velox side, but that's more complicated, bug-prone, and probably slower than just using UTF-8 in the first place.

@20001020ycx 20001020ycx Apr 30, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean now, thanks for the clear explanation.

Introduced a new class CodecUtils which writes UTF encoding of Java string as
byte[] bytes = value.getBytes(StandardCharsets.UTF_8);
According to spec, it is the standard UTF 8 format. Also explained the problem in the comment of CodecUtils class definition now.

Extract writeTableHandle/readTableHandle so ClpTableLayoutHandleCodec
can share the same DataOutputStream/DataInputStream instead of
duplicating the table handle serialization fields.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpColumnHandleCodec.java (1)

73-74: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast when the type signature cannot be resolved.

At Line 73, typeManager.getType(...) can return null; Line 74 then builds a ClpColumnHandle with an invalid type. Please validate and throw an explicit exception with context (columnName, typeSignature) before constructing the handle.

Proposed fix
             String columnName = in.readUTF();
             String originalColumnName = in.readUTF();
             String typeSignature = in.readUTF();
-            Type type = typeManager.getType(TypeSignature.parseTypeSignature(typeSignature));
+            Type type = typeManager.getType(TypeSignature.parseTypeSignature(typeSignature));
+            if (type == null) {
+                throw new IllegalArgumentException("Unknown CLP column type: " + typeSignature + " for column: " + columnName);
+            }
             return new ClpColumnHandle(columnName, originalColumnName, type);
#!/bin/bash
set -euo pipefail

# Verify TypeManager#getType contract and ClpColumnHandle nullability expectations.
fd -i 'TypeManager.java' --exec sed -n '1,220p' {}
fd -i 'ClpColumnHandle.java' --exec sed -n '1,220p' {}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpColumnHandleCodec.java`
around lines 73 - 74, Type resolution may return null so before constructing the
ClpColumnHandle you must validate the resolved Type and throw a clear exception
including columnName and typeSignature; update the code around Type type =
typeManager.getType(TypeSignature.parseTypeSignature(typeSignature)) to check if
type == null and throw an IllegalArgumentException (or PrestoException) with a
message like "Unable to resolve type for column <columnName>: <typeSignature>"
instead of returning a ClpColumnHandle with a null type; ensure you reference
TypeManager.getType, TypeSignature.parseTypeSignature, ClpColumnHandle,
columnName and typeSignature in the message for debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpSplitCodec.java`:
- Line 41: In ClpSplitCodec, avoid the direct cast "ClpSplit split = (ClpSplit)
handle;" — first check that the ConnectorSplit (handle) is an instance of
ClpSplit (e.g., using instanceof) and if not throw an IllegalArgumentException
with a clear message that includes the actual handle.getClass().getName() and
expected ClpSplit; then safely cast to ClpSplit and proceed. This ensures
ClpSplitCodec fails fast with a descriptive error instead of a
ClassCastException.

In
`@presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpTableHandleCodec.java`:
- Around line 51-56: The deserialize method in ClpTableHandleCodec returns
immediately after calling readTableHandle(in) without verifying the input stream
was fully consumed, allowing trailing bytes to be ignored; update
ConnectorTableHandle deserialize(byte[] bytes) to wrap the DataInputStream, call
readTableHandle(in), then check in.read() (or in.available()) to detect any
remaining bytes and throw a descriptive IOException/IllegalArgumentException if
there are trailing bytes, ensuring protocol drift is caught; apply the same
trailing-byte validation pattern to ClpColumnHandleCodec, ClpSplitCodec, and
ClpTableLayoutHandleCodec deserialize methods.

In
`@presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpTableLayoutHandleCodec.java`:
- Around line 70-84: The deserialize method in ClpTableLayoutHandleCodec
currently stops after reading fields (readTableHandle, kqlQuery, metadataSql)
but doesn't validate there are no remaining bytes, allowing truncated/extra-data
payloads to pass; update ClpTableLayoutHandleCodec.deserialize to check that the
DataInputStream has been fully consumed (e.g. verify in.read() returns -1 or
available() == 0) after constructing the ClpTableLayoutHandle and throw an
IOException/IllegalArgumentException if trailing bytes remain; apply the same
pattern to ClpTableHandleCodec.deserialize and add unit tests that pass
malformed payloads with extra or dangling bytes to ensure the validation
triggers.

In
`@presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpTransactionHandleCodec.java`:
- Around line 27-35: The serialize and deserialize methods in
ClpTransactionHandleCodec must enforce invariants: update
serialize(ConnectorTransactionHandle handle) to null-check handle and verify
handle == ClpTransactionHandle.INSTANCE, throwing an IllegalArgumentException
(or similar) if not, and update deserialize(byte[] bytes) to null-check bytes
and verify bytes.length == 0, throwing on mismatch; reference the methods
serialize and deserialize and the ClpTransactionHandle.INSTANCE constant when
making these checks so cross-connector or payload-mismatch errors fail fast.

---

Duplicate comments:
In
`@presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpColumnHandleCodec.java`:
- Around line 73-74: Type resolution may return null so before constructing the
ClpColumnHandle you must validate the resolved Type and throw a clear exception
including columnName and typeSignature; update the code around Type type =
typeManager.getType(TypeSignature.parseTypeSignature(typeSignature)) to check if
type == null and throw an IllegalArgumentException (or PrestoException) with a
message like "Unable to resolve type for column <columnName>: <typeSignature>"
instead of returning a ClpColumnHandle with a null type; ensure you reference
TypeManager.getType, TypeSignature.parseTypeSignature, ClpColumnHandle,
columnName and typeSignature in the message for debugging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b44c27e4-2578-4187-b8bf-461a2679c776

📥 Commits

Reviewing files that changed from the base of the PR and between b733bd3 and a77633f.

📒 Files selected for processing (5)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpColumnHandleCodec.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpSplitCodec.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpTableHandleCodec.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpTableLayoutHandleCodec.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpTransactionHandleCodec.java

@20001020ycx

20001020ycx commented Apr 30, 2026

Copy link
Copy Markdown
Author

Nice work! Besides the other comments I left, I'm also wondering if you think there's any benefit to making the deserialization code stricter? In particular, right now it seems like all of the deserialize methods will allow and silently drop extra bytes that follow valid encoded data, e.g. <bytes for valid encoding><extra unexpected bytes>.

Java codec is serializing the data, deserialize methods are implemented for completeness and testing. But this is a good reminder for error handling, I will ensure my C++ side did check on this. Also adding some simple trialing bytes check at the deserialize method

@20001020ycx 20001020ycx requested a review from gibber9809 April 30, 2026 15:28
Add instanceof checks before casting in serialize() for clearer error
messages on type mismatch. Add trailing-byte checks in deserialize()
to reject unexpected extra data after valid encoded fields.
Java's writeUTF uses Modified UTF-8 which encodes \0 and supplementary
characters differently from standard UTF-8. Since clp-s assumes real
UTF-8 for string comparisons, using writeUTF would break searchability
for strings containing non-ASCII content.

Add CodecUtils with writeUtf8String/readUtf8String helpers that use
StandardCharsets.UTF_8, keeping the same wire format (2-byte BE length
prefix + encoded bytes).

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpColumnHandleCodec.java (1)

81-83: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast when TypeManager cannot resolve the type signature.

On Line 81, if typeManager.getType(...) returns null, Line 82 creates an invalid ClpColumnHandle that fails later and less clearly.

Proposed fix
-            Type type = typeManager.getType(TypeSignature.parseTypeSignature(typeSignature));
+            Type type = typeManager.getType(TypeSignature.parseTypeSignature(typeSignature));
+            if (type == null) {
+                throw new IOException("Unknown CLP column type signature: " + typeSignature);
+            }
             return new ClpColumnHandle(columnName, originalColumnName, type);
#!/bin/bash
set -euo pipefail

# Verify TypeManager contract and current codec behaviour from source.
fd -i "TypeManager.java" --exec sed -n '1,220p' {}
fd -i "ClpColumnHandleCodec.java" --exec sed -n '1,220p' {}
fd -i "ClpColumnHandle.java" --exec sed -n '1,140p' {}

# Find other call sites guarding/null-checking getType results for precedent.
rg -n -C2 'getType\(TypeSignature\.parseTypeSignature' --iglob '*.java'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpColumnHandleCodec.java`
around lines 81 - 83, The deserialization code in ClpColumnHandleCodec calls
TypeManager.getType(TypeSignature.parseTypeSignature(typeSignature)) and can
return null, producing an invalid ClpColumnHandle; update the code in
ClpColumnHandleCodec (the block that calls TypeManager.getType /
TypeSignature.parseTypeSignature and constructs new ClpColumnHandle) to check
the returned Type for null and immediately throw an IllegalArgumentException (or
PrestoException if preferred) with a clear message containing the typeSignature
and columnName; this makes the failure fail-fast and avoids constructing an
invalid ClpColumnHandle.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpSplitCodec.java (1)

71-71: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate split-type ordinal before indexing enum values.

On Line 71, direct enum indexing can throw ArrayIndexOutOfBoundsException for malformed or drifted payloads; validate bounds and throw a clear deserialization error.

Proposed fix
-            ClpSplit.SplitType type = ClpSplit.SplitType.values()[in.readInt()];
+            int ordinal = in.readInt();
+            ClpSplit.SplitType[] values = ClpSplit.SplitType.values();
+            if (ordinal < 0 || ordinal >= values.length) {
+                throw new IOException("Unknown ClpSplit.SplitType ordinal: " + ordinal);
+            }
+            ClpSplit.SplitType type = values[ordinal];
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpSplitCodec.java`
at line 71, Validate the ordinal read from the stream before indexing into the
enum: read the int into a local (e.g., "int ordinal = in.readInt()"), check that
ordinal >= 0 && ordinal < ClpSplit.SplitType.values().length, and if out of
bounds throw a clear deserialization exception (e.g., IOException or a
Presto-friendly exception) that includes the invalid ordinal; only then assign
ClpSplit.SplitType type = ClpSplit.SplitType.values()[ordinal]. This protects
ClpSplit.SplitType lookup from ArrayIndexOutOfBoundsException for malformed or
drifted payloads.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpTableHandleCodec.java`:
- Around line 43-45: The type-check exception message can throw NPE when handle
is null; update the IllegalArgumentException construction in ClpTableHandleCodec
to use a null-safe class name (e.g., use handle == null ? "null" :
handle.getClass().getName()) instead of directly calling
handle.getClass().getName(); apply the same change to the analogous checks in
ClpSplitCodec, ClpTableLayoutHandleCodec, and ClpColumnHandleCodec so every
type-mismatch log uses the null-safe pattern referencing the local variable
handle.

In
`@presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/CodecUtils.java`:
- Around line 37-42: The writeUtf8String method currently writes a 2-byte length
with out.writeShort(bytes.length) which will wrap for lengths >65535; add an
explicit bounds check in writeUtf8String that computes byte[] bytes =
value.getBytes(StandardCharsets.UTF_8) and if bytes.length > 0xFFFF throw an
IOException (or IllegalArgumentException) with a clear message rejecting strings
longer than 65535 bytes before calling out.writeShort and out.write; reference
the writeUtf8String method, the bytes variable, and the out.writeShort call when
making the change.

---

Duplicate comments:
In
`@presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpColumnHandleCodec.java`:
- Around line 81-83: The deserialization code in ClpColumnHandleCodec calls
TypeManager.getType(TypeSignature.parseTypeSignature(typeSignature)) and can
return null, producing an invalid ClpColumnHandle; update the code in
ClpColumnHandleCodec (the block that calls TypeManager.getType /
TypeSignature.parseTypeSignature and constructs new ClpColumnHandle) to check
the returned Type for null and immediately throw an IllegalArgumentException (or
PrestoException if preferred) with a clear message containing the typeSignature
and columnName; this makes the failure fail-fast and avoids constructing an
invalid ClpColumnHandle.

In
`@presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpSplitCodec.java`:
- Line 71: Validate the ordinal read from the stream before indexing into the
enum: read the int into a local (e.g., "int ordinal = in.readInt()"), check that
ordinal >= 0 && ordinal < ClpSplit.SplitType.values().length, and if out of
bounds throw a clear deserialization exception (e.g., IOException or a
Presto-friendly exception) that includes the invalid ordinal; only then assign
ClpSplit.SplitType type = ClpSplit.SplitType.values()[ordinal]. This protects
ClpSplit.SplitType lookup from ArrayIndexOutOfBoundsException for malformed or
drifted payloads.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b6ea899a-0ef4-46c9-9ba0-a91db53f56cf

📥 Commits

Reviewing files that changed from the base of the PR and between a77633f and 5168280.

📒 Files selected for processing (6)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpColumnHandleCodec.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpSplitCodec.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpTableHandleCodec.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpTableLayoutHandleCodec.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpTransactionHandleCodec.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/CodecUtils.java

20001020ycx added a commit that referenced this pull request Apr 30, 2026
Implement binary deserialization for all 5 CLP connector handle types
(ClpColumnHandle, ClpSplit, ClpTableHandle, ClpTableLayoutHandle,
ClpTransactionHandle) by inheriting ConnectorProtocol and overriding
deserialize().

Reads binary bytes (Java DataOutputStream big-endian wire format from
#158) directly into existing protocol structs using std::istringstream
and folly::Endian::big.

Follows Tim's TpchConnectorProtocol pattern from prestodb#26650.
20001020ycx added a commit that referenced this pull request Apr 30, 2026
Implement binary deserialization for all 5 CLP connector handle types
(ClpColumnHandle, ClpSplit, ClpTableHandle, ClpTableLayoutHandle,
ClpTransactionHandle) by inheriting ConnectorProtocol and overriding
deserialize().

Reads binary bytes (Java DataOutputStream big-endian wire format from
#158) directly into existing protocol structs using std::istringstream
and folly::Endian::big.

Follows Tim's TpchConnectorProtocol pattern from prestodb#26650.
20001020ycx added a commit that referenced this pull request Apr 30, 2026
Implement binary deserialization for all 5 CLP connector handle types
(ClpColumnHandle, ClpSplit, ClpTableHandle, ClpTableLayoutHandle,
ClpTransactionHandle) by inheriting ConnectorProtocol and overriding
deserialize().

Reads binary bytes (Java DataOutputStream big-endian wire format from
#158) directly into existing protocol structs using std::istringstream
and folly::Endian::big.

Follows Tim's TpchConnectorProtocol pattern from prestodb#26650.
20001020ycx added a commit that referenced this pull request Apr 30, 2026
Implement binary deserialization for all 5 CLP connector handle types
(ClpColumnHandle, ClpSplit, ClpTableHandle, ClpTableLayoutHandle,
ClpTransactionHandle) by inheriting ConnectorProtocol and overriding
deserialize().

Reads binary bytes (Java DataOutputStream big-endian wire format from
#158) directly into existing protocol structs using std::istringstream
and folly::Endian::big.

Follows Tim's TpchConnectorProtocol pattern from prestodb#26650.
20001020ycx added a commit that referenced this pull request Apr 30, 2026
Implement binary deserialization for all 5 CLP connector handle types
(ClpColumnHandle, ClpSplit, ClpTableHandle, ClpTableLayoutHandle,
ClpTransactionHandle) by inheriting ConnectorProtocol and overriding
deserialize().

Reads binary bytes (Java DataOutputStream big-endian wire format from
#158) directly into existing protocol structs using std::istringstream
and folly::Endian::big.

Follows Tim's TpchConnectorProtocol pattern from prestodb#26650.

@gibber9809 gibber9809 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit for docstring, but otherwise LGTM!

Comment thread presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/CodecUtils.java Outdated
…CodecUtils.java

Co-authored-by: Devin Gibson <gibber9809@users.noreply.github.com>
@20001020ycx 20001020ycx requested a review from gibber9809 May 1, 2026 15:17

@gibber9809 gibber9809 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! For PR title maybe just add a period at the end to match our convention, but otherwise looks good.

@20001020ycx 20001020ycx changed the title feat: Add binary serialization codecs for CLP connector feat: Add binary serialization codecs for CLP connector. May 1, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpColumnHandleCodec.java (1)

82-83: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard unknown type signatures before constructing ClpColumnHandle.

typeManager.getType(...) can return null; on Line 83 this creates an invalid handle and defers failure to later paths. Fail fast with an explicit exception when type resolution fails.

Proposed fix
-            Type type = typeManager.getType(TypeSignature.parseTypeSignature(typeSignature));
-            return new ClpColumnHandle(columnName, originalColumnName, type);
+            TypeSignature parsedTypeSignature = TypeSignature.parseTypeSignature(typeSignature);
+            Type type = typeManager.getType(parsedTypeSignature);
+            if (type == null) {
+                throw new IllegalArgumentException("Unknown CLP column type signature: " + typeSignature +
+                        " for column: " + columnName);
+            }
+            return new ClpColumnHandle(columnName, originalColumnName, type);
#!/bin/bash
set -euo pipefail

# Verify TypeManager contract (null possible) and current codec behaviour.
sed -n '1,60p' presto-common/src/main/java/com/facebook/presto/common/type/TypeManager.java
sed -n '70,95p' presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpColumnHandleCodec.java

# Check whether tests already cover unknown type-signature rejection.
rg -n --iglob '*TestClpConnectorCodecProvider.java' 'unknown|invalid|type signature|null.*type|deserialize' presto-clp/src/test/java
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpColumnHandleCodec.java`
around lines 82 - 83, The code currently calls
typeManager.getType(TypeSignature.parseTypeSignature(typeSignature)) and
immediately constructs a ClpColumnHandle, but getType can return null; modify
the ClpColumnHandleCodec decoding path so after resolving Type type =
typeManager.getType(...) you explicitly check for null and throw a clear
IllegalArgumentException (or PrestoException if preferred) including the
unresolved typeSignature and columnName, instead of returning a handle with a
null type; ensure the check is placed before the ClpColumnHandle constructor so
callers fail fast.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpSplitCodec.java`:
- Line 52: The codec currently writes and reads ClpSplit.SplitType using
enum.ordinal() (in ClpSplitCodec at the write/read sites), which couples the
wire format to enum order; add explicit encoder/decoder methods (e.g.,
encodeSplitType(ClpSplit.SplitType) and decodeSplitType(int)) that map each
SplitType to a stable integer ID (e.g., ARCHIVE -> 0, IR -> 1) and throw on
unknown values, then replace the direct uses of split.getType().ordinal() and
the inverse ordinal-based read with calls to these new methods so the wire
protocol is stable across enum reordering.

---

Duplicate comments:
In
`@presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpColumnHandleCodec.java`:
- Around line 82-83: The code currently calls
typeManager.getType(TypeSignature.parseTypeSignature(typeSignature)) and
immediately constructs a ClpColumnHandle, but getType can return null; modify
the ClpColumnHandleCodec decoding path so after resolving Type type =
typeManager.getType(...) you explicitly check for null and throw a clear
IllegalArgumentException (or PrestoException if preferred) including the
unresolved typeSignature and columnName, instead of returning a handle with a
null type; ensure the check is placed before the ClpColumnHandle constructor so
callers fail fast.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d309cbf5-8c90-4ac2-a83c-69824bcc2d90

📥 Commits

Reviewing files that changed from the base of the PR and between 5168280 and b6a1657.

📒 Files selected for processing (5)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpColumnHandleCodec.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpSplitCodec.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpTableHandleCodec.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpTableLayoutHandleCodec.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/CodecUtils.java

ByteArrayOutputStream byteOut = new ByteArrayOutputStream();
DataOutputStream out = new DataOutputStream(byteOut);
writeUtf8String(split.getPath(), out);
out.writeInt(split.getType().ordinal());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use explicit split-type wire IDs instead of enum ordinals.

Line 52 and Line 72 currently bind the wire protocol to enum declaration order. Reordering/inserting enum values can silently break cross-version decode semantics.

Proposed fix
-            out.writeInt(split.getType().ordinal());
+            out.writeInt(encodeSplitType(split.getType()));
...
-            ClpSplit.SplitType type = ClpSplit.SplitType.values()[in.readInt()];
+            ClpSplit.SplitType type = decodeSplitType(in.readInt());
private static int encodeSplitType(ClpSplit.SplitType type)
{
    switch (type) {
        case ARCHIVE:
            return 0;
        case IR:
            return 1;
        default:
            throw new IllegalArgumentException("Unsupported ClpSplit.SplitType: " + type);
    }
}

private static ClpSplit.SplitType decodeSplitType(int id)
{
    switch (id) {
        case 0:
            return ClpSplit.SplitType.ARCHIVE;
        case 1:
            return ClpSplit.SplitType.IR;
        default:
            throw new IllegalArgumentException("Unknown ClpSplit.SplitType id: " + id);
    }
}

Also applies to: 72-72

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@presto-clp/src/main/java/com/facebook/presto/plugin/clp/codec/ClpSplitCodec.java`
at line 52, The codec currently writes and reads ClpSplit.SplitType using
enum.ordinal() (in ClpSplitCodec at the write/read sites), which couples the
wire format to enum order; add explicit encoder/decoder methods (e.g.,
encodeSplitType(ClpSplit.SplitType) and decodeSplitType(int)) that map each
SplitType to a stable integer ID (e.g., ARCHIVE -> 0, IR -> 1) and throw on
unknown values, then replace the direct uses of split.getType().ordinal() and
the inverse ordinal-based read with calls to these new methods so the wire
protocol is stable across enum reordering.

@20001020ycx 20001020ycx merged commit b0b318f into y-scope:release-0.297-edge-10-clp-connector May 1, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants