Skip to content

feat(native): Add Java planner integration, protocol, and dynamic RPC function detection [8/8] (OSS) (#27358)#27358

Open
zhichenxu-meta wants to merge 1 commit into
prestodb:masterfrom
zhichenxu-meta:export-D94996326
Open

feat(native): Add Java planner integration, protocol, and dynamic RPC function detection [8/8] (OSS) (#27358)#27358
zhichenxu-meta wants to merge 1 commit into
prestodb:masterfrom
zhichenxu-meta:export-D94996326

Conversation

@zhichenxu-meta

@zhichenxu-meta zhichenxu-meta commented Mar 17, 2026

Copy link
Copy Markdown
Contributor

Summary:

X-link: https://github.com/facebookexternal/presto-facebook/pull/3595

End-to-end integration connecting the Java planner to the C++ RPCOperator.

Java Planner (presto-main-base)

  • RPCNode.java: Plan node for async RPC operations.
  • RpcFunctionOptimizer.java: Rewrites ProjectNode(rpc_function(...))
    into Source -> RPCNode -> ProjectNode. Uses Supplier<Set> for
    lazy RPC function name resolution.
  • Visitor integration across plan visitors.

Dynamic RPC Function Detection

The coordinator discovers RPC functions from the sidecar without
hardcoded function name lists:

  1. C++ FunctionMetadata checks isRegistered(), sets isRpcFunction=true
  2. presto_protocol adds isRpcFunction field to serialization
  3. Java NativeSidecarFunctionRegistryTool filters for isRpcFunction=true
  4. Guice Supplier<Set> defers sidecar call to first query planning

C++ Plan Converter

  • Converts protocol RPCNode -> core::RPCNode (name-based)
  • Validates function exists via isRegistered() (no instantiation)
  • Parses result type via typeParser_.parse()
  • Function instantiation deferred to RPCOperator::initialize()

Tests

  • RPCPlanConverterTest: Protocol deserialization, plan conversion with
    name-based validation, error paths for missing functions.

Reading Guide

  1. RpcFunctionOptimizer.java — Start here. The core optimizer that
    rewrites ProjectNode(rpc_function(...)) into Source -> RPCNode ->
    ProjectNode. Key methods: optimize() walks the plan,
    rewriteProjectWithRpcFunction() extracts the RPC call, builds RPCNode
    with arguments and result type.

  2. RPCNode.java — Java plan node. Constructor, getters, serialization
    fields. Maps to the C++ core::RPCNode from diff 2.

  3. PrestoToVeloxQueryPlan.cpp — C++ plan converter. Converts
    protocol RPCNode -> Velox RPCNode. Validates function via
    isRegistered().

  4. presto_protocol_core.h/cpp/yml — Protocol serde for RPCNode.
    Generated-style code; review the .yml for the schema, skim the
    .h/.cpp for correctness.

  5. RPCPlanConverterTest.cpp — Unit tests for the C++ plan converter.
    Good for understanding the expected protocol format.

  6. FunctionMetadata.cpp — Adds isRpcFunction field based on
    AsyncRPCFunctionRegistry::isRegistered().

  7. NativeSidecarFunctionRegistryTool.java — Filters sidecar function
    list by isRpcFunction flag to build the RPC function name set.

  8. PlanOptimizers.java + ServerMainModule.java — Wiring: registers
    RpcFunctionOptimizer with Guice Supplier for lazy RPC function
    discovery.

  9. Visitor touchpoints (~15 Java files, 5-20 lines each) —
    Mechanical: adds RPCNode cases to existing plan visitors
    (AddExchanges, LimitPushDown, PruneUnreferencedOutputs, etc.).
    Safe to skim.

Reviewed By: gggrace14

Differential Revision: D94996326

@sourcery-ai

sourcery-ai Bot commented Mar 17, 2026

Copy link
Copy Markdown
Contributor

Reviewer's Guide

Integrates Java planner RPCNode support with Velox C++ RPC plan conversion and dynamic RPC function discovery, enabling async RPC functions (e.g., LLM inference) to run via a dedicated RPC plan node wired through the planning, protocol, sidecar, and execution stacks.

Sequence diagram for RPC function planning and async execution via RPCNode

sequenceDiagram
    actor User
    participant Coordinator as CoordinatorPlanner
    participant Sidecar as NativeSidecarFunctionRegistryTool
    participant RpcOpt as RpcFunctionOptimizer
    participant JavaRPCNode as RPCNode
    participant Proto as ProtocolSerde
    participant Worker as PrestoServerVelox
    participant Conv as VeloxQueryPlanConverterBase
    participant Registry as AsyncRPCFunctionRegistry
    participant RPCOp as RPCOperator

    User->>Coordinator: Submit query with rpc_function(...)

    Coordinator->>Sidecar: getRpcFunctionNames()
    Sidecar->>Sidecar: Fetch /v1/functions
    Sidecar->>Sidecar: Filter JsonBasedUdfFunctionMetadata where isRpcFunction=true
    Sidecar-->>Coordinator: Set<String> rpcFunctionNames

    Coordinator->>RpcOpt: optimize(plan, rpcFunctionNames)
    RpcOpt->>RpcOpt: Scan ProjectNode assignments
    RpcOpt->>RpcOpt: Detect CallExpression where name in rpcFunctionNames
    RpcOpt->>RpcOpt: Parse options JSON arg[3]
    RpcOpt->>RpcOpt: Determine streamingMode, dispatchBatchSize
    RpcOpt->>JavaRPCNode: Create RPCNode(source, functionName,
    RpcOpt->>JavaRPCNode: arguments, argumentColumns,
    RpcOpt->>JavaRPCNode: outputVariable, streamingMode, dispatchBatchSize
    RpcOpt->>Coordinator: Return rewritten plan

    Coordinator->>Proto: Serialize plan
    Proto->>Proto: Write protocol.RPCNode JSON
    Proto-->>Worker: Send plan fragment with RPCNode

    Worker->>Conv: toVeloxQueryPlan(protocol.RPCNode)
    Conv->>Registry: isRegistered(functionName)
    Registry-->>Conv: bool isRegistered
    Conv->>Conv: VELOX_CHECK isRegistered
    Conv->>Conv: Parse resultType from outputVariable.type
    Conv->>Conv: Convert arguments to Velox expressions
    Conv->>Conv: Extract constantInputs and argumentTypes
    Conv->>Conv: Parse streamingMode, dispatchBatchSize
    Conv->>Worker: Build core.RPCNode

    Worker->>RPCOp: Initialize from core.RPCNode
    RPCOp->>Registry: Instantiate AsyncRPCFunction by name
    RPCOp->>RPCOp: Dispatch async RPC calls
    RPCOp-->>Worker: Produce result column
    Worker-->>User: Return query results with RPC function output
Loading

File-Level Changes

Change Details Files
Add RPCNode as a first-class Java plan node and thread it through planner visitors and execution infrastructure.
  • Introduce com.facebook.presto.sql.planner.plan.RPCNode with streaming mode, argument metadata, and output schema composition from its source plus result column.
  • Extend InternalPlanVisitor and multiple optimizers/visitors (AddExchanges, AddLocalExchanges, LimitPushDown, PruneUnreferencedOutputs, PushdownSubfields, Property/StreamPropertyDerivations, PlanPrinter, BasePlanFragmenter, SplitSourceFactory, PhasedExecutionSchedule, LocalExecutionPlanner) with visitRPC handling, typically as a single-source passthrough.
  • Enforce that LocalExecutionPlanner rejects RPCNode (Velox-only execution) and that LIMIT pushdown and dependency validation treat RPCNode as 1:1 with its source.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/RPCNode.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/InternalPlanVisitor.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/AddExchanges.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/AddLocalExchanges.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/LimitPushDown.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/PruneUnreferencedOutputs.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/PropertyDerivations.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/PushdownSubfields.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/StreamPropertyDerivations.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/planPrinter/PlanPrinter.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/BasePlanFragmenter.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/LocalExecutionPlanner.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/SplitSourceFactory.java
presto-main-base/src/main/java/com/facebook/presto/execution/scheduler/PhasedExecutionSchedule.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/sanity/ValidateDependenciesChecker.java
Introduce RpcFunctionOptimizer in the Java planner to rewrite RPC function calls into RPCNode-based plans using dynamically discovered RPC function names.
  • Create RpcFunctionOptimizer that scans ProjectNode assignments, finds CallExpressions whose function name is in a supplied Set and rewrites them into a chain of RPCNode plus Projects, extracting arguments into columns and replacing calls with result symbols.
  • Handle nested RPC calls and wrapper expressions via RowExpressionTreeRewriter; keep original argument expressions for C++ type/constant extraction while building argumentColumns from pre-projected variables.
  • Parse RPC options JSON (arg[3]) in the optimizer to derive streamingMode and dispatchBatchSize defaults and wire RpcFunctionOptimizer into PlanOptimizers with a Guice-supplied rpcFunctionNames Supplier.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/RpcFunctionOptimizer.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/PlanOptimizers.java
Extend Presto protocol and Velox C++ plan conversion to support RPCNode, including streaming mode, argument metadata, and runtime option parsing.
  • Add protocol RPCNode struct and RPCNodeStreamingMode enum with JSON (de)serialization and register RPCNode as a PlanNode subtype in presto_protocol_core.{yml,h,cpp} and presto_protocol.yml.
  • Extend PrestoToVeloxQueryPlan with a toVeloxQueryPlan overload that converts protocol::RPCNode to core::RPCNode, validates the RPC function via AsyncRPCFunctionRegistry::isRegistered, parses the result type, builds argumentTypes/constantInputs, and constructs output row type including the RPC result column.
  • Implement parseRpcOptions helpers in C++ to decode Base64-encoded VARCHAR constant options blocks (handling ConstantExpression and CallExpression wrappers) to derive streamingMode and dispatchBatchSize as a secondary source of truth.
  • Wire the generic PlanNode converter to recognize protocol::RPCNode and delegate to the RPC-specific overload.
presto-native-execution/presto_cpp/main/types/PrestoToVeloxQueryPlan.cpp
presto-native-execution/presto_cpp/main/types/PrestoToVeloxQueryPlan.h
presto-native-execution/presto_cpp/presto_protocol/core/presto_protocol_core.yml
presto-native-execution/presto_cpp/presto_protocol/presto_protocol.yml
presto-native-execution/presto_cpp/presto_protocol/core/presto_protocol_core.h
presto-native-execution/presto_cpp/presto_protocol/core/presto_protocol_core.cpp
Enable dynamic discovery and propagation of RPC function metadata from Velox to the Java coordinator via sidecar function metadata.
  • Extend JsonBasedUdfFunctionMetadata in protocol and Java to carry an optional isRpcFunction boolean flag and update FunctionResource and unit tests to populate the new parameter.
  • Update C++ FunctionMetadata building code to compute isRpcFunction by checking AsyncRPCFunctionRegistry::isRegistered for scalar functions and embed this into the JSON returned by /v1/functions.
  • Modify NativeSidecarFunctionRegistryTool and WorkerFunctionRegistryTool to expose getRpcFunctionNames(), caching the fetched signature map and filtering for isRpcFunction=true (lowercasing names).
  • Provide a Guice binding in ServerMainModule that supplies a lazy Supplier<Set> rpcFunctionNames (backed by the sidecar when built-in sidecar functions are enabled) for injection into PlanOptimizers/RpcFunctionOptimizer.
presto-native-execution/presto_cpp/presto_protocol/core/presto_protocol_core.h
presto-native-execution/presto_cpp/presto_protocol/core/presto_protocol_core.cpp
presto-function-namespace-managers-common/src/main/java/com/facebook/presto/functionNamespace/JsonBasedUdfFunctionMetadata.java
presto-function-server/src/main/java/com/facebook/presto/server/FunctionResource.java
presto-function-namespace-managers/src/test/java/com/facebook/presto/functionNamespace/TestRestBasedFunctionNamespaceManager.java
presto-native-execution/src/test/java/com/facebook/presto/nativeworker/TestPrestoNativeBuiltInFunctions.java
presto-native-execution/presto_cpp/main/functions/FunctionMetadata.cpp
presto-built-in-worker-function-tools/src/main/java/com/facebook/presto/builtin/tools/WorkerFunctionRegistryTool.java
presto-built-in-worker-function-tools/src/main/java/com/facebook/presto/builtin/tools/NativeSidecarFunctionRegistryTool.java
presto-main/src/main/java/com/facebook/presto/server/ServerMainModule.java
Register RPC execution support in the Velox-based Presto server and guard behavior via tests.
  • Register the RPC plan node translator and RPC function stubs in PrestoServer, so RPCOperator can be instantiated from RPCNode and the worker exposes RPC functions via AsyncRPCFunctionRegistry::registeredFunctions().
  • Add RPCPlanConverterTest exercising RPCNode plan conversion error and success paths, including function registration, argumentColumns/argumentTypes/constantInputs wiring, and error messages when the RPC function is missing.
  • Minor touchups in TaskResource to use the updated converter, and in other components (e.g., TaskResource) to support the new plan type.
presto-native-execution/presto_cpp/main/PrestoServer.cpp
presto-native-execution/presto_cpp/main/types/tests/RPCPlanConverterTest.cpp
presto-native-execution/presto_cpp/main/TaskResource.cpp
Ensure planner utilities handle RPCNode correctly in symbol canonicalization and dependency checks.
  • Update UnaliasSymbolReferences to rewrite RPCNode, canonicalizing RowExpression arguments and argumentColumns based on symbol mapping.
  • Extend ValidateDependenciesChecker to validate that RPCNode argument expressions only reference symbols produced by its source.
  • Adjust PushdownSubfields to track RPCNode’s source outputs and RPC result variable for subfield pushdown purposes.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/UnaliasSymbolReferences.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/sanity/ValidateDependenciesChecker.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/PushdownSubfields.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In RpcFunctionOptimizer.parseOptionsJson, using optionsArg.getValue().toString() is fragile for VARCHAR constants (e.g., Slice.toString() won’t yield the JSON payload); consider decoding the constant according to its type (e.g., via VarcharType or Slice utilities) so streaming_mode and dispatch_batch_size are parsed reliably.
  • RPCNode’s Java constructor doesn’t enforce that arguments.size() matches argumentColumns.size(), which can easily drift during rewrites; adding a precondition check here would surface planner bugs earlier instead of producing invalid protocol plans.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In RpcFunctionOptimizer.parseOptionsJson, using optionsArg.getValue().toString() is fragile for VARCHAR constants (e.g., Slice.toString() won’t yield the JSON payload); consider decoding the constant according to its type (e.g., via VarcharType or Slice utilities) so streaming_mode and dispatch_batch_size are parsed reliably.
- RPCNode’s Java constructor doesn’t enforce that arguments.size() matches argumentColumns.size(), which can easily drift during rewrites; adding a precondition check here would surface planner bugs earlier instead of producing invalid protocol plans.

## Individual Comments

### Comment 1
<location path="presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/RPCNode.java" line_range="88-89" />
<code_context>
+        super(sourceLocation, id, Optional.empty());
+        this.source = requireNonNull(source, "source is null");
+        this.functionName = requireNonNull(functionName, "functionName is null");
+        this.arguments = ImmutableList.copyOf(requireNonNull(arguments, "arguments is null"));
+        this.argumentColumns = ImmutableList.copyOf(requireNonNull(argumentColumns, "argumentColumns is null"));
+        this.outputVariable = requireNonNull(outputVariable, "outputVariable is null");
+        this.streamingMode = streamingMode != null ? streamingMode : StreamingMode.PER_ROW;
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Consider validating that arguments and argumentColumns have matching cardinality.

RPCNode relies on arguments[i] matching argumentColumns[i], but the constructor doesn’t currently guarantee equal list sizes. A mismatch would manifest as a runtime error on the native side. Please add a size check (e.g., checkArgument(arguments.size() == argumentColumns.size(), ...)) so inconsistencies fail fast at construction time.

Suggested implementation:

```java
        this.source = requireNonNull(source, "source is null");
        this.functionName = requireNonNull(functionName, "functionName is null");
        this.arguments = ImmutableList.copyOf(requireNonNull(arguments, "arguments is null"));
        this.argumentColumns = ImmutableList.copyOf(requireNonNull(argumentColumns, "argumentColumns is null"));
        checkArgument(
                this.arguments.size() == this.argumentColumns.size(),
                "arguments and argumentColumns must have the same size: arguments=%s, argumentColumns=%s",
                this.arguments.size(),
                this.argumentColumns.size());
        this.outputVariable = requireNonNull(outputVariable, "outputVariable is null");
        this.streamingMode = streamingMode != null ? streamingMode : StreamingMode.PER_ROW;

```

If `checkArgument` is not already imported in `RPCNode.java`, add:

- A static import at the top of the file (among other imports):
  `import static com.google.common.base.Preconditions.checkArgument;`

Adjust the import placement to match the existing import ordering conventions in the file.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@meta-codesync meta-codesync Bot changed the title feat(rpc): Add Java planner integration, protocol, and dynamic RPC function detection [8/8] (OSS) feat(rpc): Add Java planner integration, protocol, and dynamic RPC function detection [8/8] (OSS) (#27358) Mar 17, 2026
zhichenxu-meta added a commit to zhichenxu-meta/presto that referenced this pull request Mar 17, 2026
…nction detection [8/8] (OSS) (prestodb#27358)

Summary:
Pull Request resolved: prestodb#27358

X-link: https://github.com/facebookexternal/presto-facebook/pull/3595

End-to-end integration connecting the Java planner to the C++ RPCOperator.

## Java Planner (presto-main-base)

- **RPCNode.java**: Plan node for async RPC operations.
- **RpcFunctionOptimizer.java**: Rewrites ProjectNode(rpc_function(...))
  into Source -> RPCNode -> ProjectNode. Uses Supplier<Set<String>> for
  lazy RPC function name resolution.
- Visitor integration across plan visitors.

## Dynamic RPC Function Detection

The coordinator discovers RPC functions from the sidecar without
hardcoded function name lists:
1. C++ FunctionMetadata checks isRegistered(), sets isRpcFunction=true
2. presto_protocol adds isRpcFunction field to serialization
3. Java NativeSidecarFunctionRegistryTool filters for isRpcFunction=true
4. Guice Supplier<Set<String>> defers sidecar call to first query planning

## C++ Plan Converter

- Converts protocol RPCNode -> core::RPCNode (name-based)
- Validates function exists via isRegistered() (no instantiation)
- Parses result type via typeParser_.parse()
- Function instantiation deferred to RPCOperator::initialize()

## Tests

- RPCPlanConverterTest: Protocol deserialization, plan conversion with
  name-based validation, error paths for missing functions.

## Reading Guide

1. **RpcFunctionOptimizer.java** — Start here. The core optimizer that
   rewrites ProjectNode(rpc_function(...)) into Source -> RPCNode ->
   ProjectNode. Key methods: optimize() walks the plan,
   rewriteProjectWithRpcFunction() extracts the RPC call, builds RPCNode
   with arguments and result type.

2. **RPCNode.java** — Java plan node. Constructor, getters, serialization
   fields. Maps to the C++ core::RPCNode from diff 2.

3. **PrestoToVeloxQueryPlan.cpp** — C++ plan converter. Converts
   protocol RPCNode -> Velox RPCNode. Validates function via
   isRegistered().

4. **presto_protocol_core.h/cpp/yml** — Protocol serde for RPCNode.
   Generated-style code; review the .yml for the schema, skim the
   .h/.cpp for correctness.

5. **RPCPlanConverterTest.cpp** — Unit tests for the C++ plan converter.
   Good for understanding the expected protocol format.

6. **FunctionMetadata.cpp** — Adds isRpcFunction field based on
   AsyncRPCFunctionRegistry::isRegistered().

7. **NativeSidecarFunctionRegistryTool.java** — Filters sidecar function
   list by isRpcFunction flag to build the RPC function name set.

8. **PlanOptimizers.java + ServerMainModule.java** — Wiring: registers
   RpcFunctionOptimizer with Guice Supplier for lazy RPC function
   discovery.

9. **Visitor touchpoints** (~15 Java files, 5-20 lines each) —
   Mechanical: adds RPCNode cases to existing plan visitors
   (AddExchanges, LimitPushDown, PruneUnreferencedOutputs, etc.).
   Safe to skim.

Differential Revision: D94996326
@aditi-pandit

Copy link
Copy Markdown
Contributor

Thanks @zhichenxu-meta for this work. Do you have a small design doc or RFC for this feature ?

@zhichenxu-meta

Copy link
Copy Markdown
Contributor Author

Thanks @zhichenxu-meta for this work. Do you have a small design doc or RFC for this feature ?

@aditi-pandit Thanks, I have an Meta internal design doc, and will create a public one.

zhichenxu-meta added a commit to zhichenxu-meta/presto that referenced this pull request Mar 17, 2026
…nction detection [8/8] (OSS) (prestodb#27358)

Summary:
Pull Request resolved: prestodb#27358

X-link: https://github.com/facebookexternal/presto-facebook/pull/3595

End-to-end integration connecting the Java planner to the C++ RPCOperator.

## Java Planner (presto-main-base)

- **RPCNode.java**: Plan node for async RPC operations.
- **RpcFunctionOptimizer.java**: Rewrites ProjectNode(rpc_function(...))
  into Source -> RPCNode -> ProjectNode. Uses Supplier<Set<String>> for
  lazy RPC function name resolution.
- Visitor integration across plan visitors.

## Dynamic RPC Function Detection

The coordinator discovers RPC functions from the sidecar without
hardcoded function name lists:
1. C++ FunctionMetadata checks isRegistered(), sets isRpcFunction=true
2. presto_protocol adds isRpcFunction field to serialization
3. Java NativeSidecarFunctionRegistryTool filters for isRpcFunction=true
4. Guice Supplier<Set<String>> defers sidecar call to first query planning

## C++ Plan Converter

- Converts protocol RPCNode -> core::RPCNode (name-based)
- Validates function exists via isRegistered() (no instantiation)
- Parses result type via typeParser_.parse()
- Function instantiation deferred to RPCOperator::initialize()

## Tests

- RPCPlanConverterTest: Protocol deserialization, plan conversion with
  name-based validation, error paths for missing functions.

## Reading Guide

1. **RpcFunctionOptimizer.java** — Start here. The core optimizer that
   rewrites ProjectNode(rpc_function(...)) into Source -> RPCNode ->
   ProjectNode. Key methods: optimize() walks the plan,
   rewriteProjectWithRpcFunction() extracts the RPC call, builds RPCNode
   with arguments and result type.

2. **RPCNode.java** — Java plan node. Constructor, getters, serialization
   fields. Maps to the C++ core::RPCNode from diff 2.

3. **PrestoToVeloxQueryPlan.cpp** — C++ plan converter. Converts
   protocol RPCNode -> Velox RPCNode. Validates function via
   isRegistered().

4. **presto_protocol_core.h/cpp/yml** — Protocol serde for RPCNode.
   Generated-style code; review the .yml for the schema, skim the
   .h/.cpp for correctness.

5. **RPCPlanConverterTest.cpp** — Unit tests for the C++ plan converter.
   Good for understanding the expected protocol format.

6. **FunctionMetadata.cpp** — Adds isRpcFunction field based on
   AsyncRPCFunctionRegistry::isRegistered().

7. **NativeSidecarFunctionRegistryTool.java** — Filters sidecar function
   list by isRpcFunction flag to build the RPC function name set.

8. **PlanOptimizers.java + ServerMainModule.java** — Wiring: registers
   RpcFunctionOptimizer with Guice Supplier for lazy RPC function
   discovery.

9. **Visitor touchpoints** (~15 Java files, 5-20 lines each) —
   Mechanical: adds RPCNode cases to existing plan visitors
   (AddExchanges, LimitPushDown, PruneUnreferencedOutputs, etc.).
   Safe to skim.

Differential Revision: D94996326
zhichenxu-meta added a commit to zhichenxu-meta/presto that referenced this pull request Mar 17, 2026
…nction detection [8/8] (OSS) (prestodb#27358)

Summary:
Pull Request resolved: prestodb#27358

X-link: https://github.com/facebookexternal/presto-facebook/pull/3595

End-to-end integration connecting the Java planner to the C++ RPCOperator.

## Java Planner (presto-main-base)

- **RPCNode.java**: Plan node for async RPC operations.
- **RpcFunctionOptimizer.java**: Rewrites ProjectNode(rpc_function(...))
  into Source -> RPCNode -> ProjectNode. Uses Supplier<Set<String>> for
  lazy RPC function name resolution.
- Visitor integration across plan visitors.

## Dynamic RPC Function Detection

The coordinator discovers RPC functions from the sidecar without
hardcoded function name lists:
1. C++ FunctionMetadata checks isRegistered(), sets isRpcFunction=true
2. presto_protocol adds isRpcFunction field to serialization
3. Java NativeSidecarFunctionRegistryTool filters for isRpcFunction=true
4. Guice Supplier<Set<String>> defers sidecar call to first query planning

## C++ Plan Converter

- Converts protocol RPCNode -> core::RPCNode (name-based)
- Validates function exists via isRegistered() (no instantiation)
- Parses result type via typeParser_.parse()
- Function instantiation deferred to RPCOperator::initialize()

## Tests

- RPCPlanConverterTest: Protocol deserialization, plan conversion with
  name-based validation, error paths for missing functions.

## Reading Guide

1. **RpcFunctionOptimizer.java** — Start here. The core optimizer that
   rewrites ProjectNode(rpc_function(...)) into Source -> RPCNode ->
   ProjectNode. Key methods: optimize() walks the plan,
   rewriteProjectWithRpcFunction() extracts the RPC call, builds RPCNode
   with arguments and result type.

2. **RPCNode.java** — Java plan node. Constructor, getters, serialization
   fields. Maps to the C++ core::RPCNode from diff 2.

3. **PrestoToVeloxQueryPlan.cpp** — C++ plan converter. Converts
   protocol RPCNode -> Velox RPCNode. Validates function via
   isRegistered().

4. **presto_protocol_core.h/cpp/yml** — Protocol serde for RPCNode.
   Generated-style code; review the .yml for the schema, skim the
   .h/.cpp for correctness.

5. **RPCPlanConverterTest.cpp** — Unit tests for the C++ plan converter.
   Good for understanding the expected protocol format.

6. **FunctionMetadata.cpp** — Adds isRpcFunction field based on
   AsyncRPCFunctionRegistry::isRegistered().

7. **NativeSidecarFunctionRegistryTool.java** — Filters sidecar function
   list by isRpcFunction flag to build the RPC function name set.

8. **PlanOptimizers.java + ServerMainModule.java** — Wiring: registers
   RpcFunctionOptimizer with Guice Supplier for lazy RPC function
   discovery.

9. **Visitor touchpoints** (~15 Java files, 5-20 lines each) —
   Mechanical: adds RPCNode cases to existing plan visitors
   (AddExchanges, LimitPushDown, PruneUnreferencedOutputs, etc.).
   Safe to skim.

Differential Revision: D94996326
zhichenxu-meta added a commit to zhichenxu-meta/presto that referenced this pull request Mar 17, 2026
…nction detection [8/8] (OSS) (prestodb#27358)

Summary:
Pull Request resolved: prestodb#27358

X-link: https://github.com/facebookexternal/presto-facebook/pull/3595

End-to-end integration connecting the Java planner to the C++ RPCOperator.

## Java Planner (presto-main-base)

- **RPCNode.java**: Plan node for async RPC operations.
- **RpcFunctionOptimizer.java**: Rewrites ProjectNode(rpc_function(...))
  into Source -> RPCNode -> ProjectNode. Uses Supplier<Set<String>> for
  lazy RPC function name resolution.
- Visitor integration across plan visitors.

## Dynamic RPC Function Detection

The coordinator discovers RPC functions from the sidecar without
hardcoded function name lists:
1. C++ FunctionMetadata checks isRegistered(), sets isRpcFunction=true
2. presto_protocol adds isRpcFunction field to serialization
3. Java NativeSidecarFunctionRegistryTool filters for isRpcFunction=true
4. Guice Supplier<Set<String>> defers sidecar call to first query planning

## C++ Plan Converter

- Converts protocol RPCNode -> core::RPCNode (name-based)
- Validates function exists via isRegistered() (no instantiation)
- Parses result type via typeParser_.parse()
- Function instantiation deferred to RPCOperator::initialize()

## Tests

- RPCPlanConverterTest: Protocol deserialization, plan conversion with
  name-based validation, error paths for missing functions.

## Reading Guide

1. **RpcFunctionOptimizer.java** — Start here. The core optimizer that
   rewrites ProjectNode(rpc_function(...)) into Source -> RPCNode ->
   ProjectNode. Key methods: optimize() walks the plan,
   rewriteProjectWithRpcFunction() extracts the RPC call, builds RPCNode
   with arguments and result type.

2. **RPCNode.java** — Java plan node. Constructor, getters, serialization
   fields. Maps to the C++ core::RPCNode from diff 2.

3. **PrestoToVeloxQueryPlan.cpp** — C++ plan converter. Converts
   protocol RPCNode -> Velox RPCNode. Validates function via
   isRegistered().

4. **presto_protocol_core.h/cpp/yml** — Protocol serde for RPCNode.
   Generated-style code; review the .yml for the schema, skim the
   .h/.cpp for correctness.

5. **RPCPlanConverterTest.cpp** — Unit tests for the C++ plan converter.
   Good for understanding the expected protocol format.

6. **FunctionMetadata.cpp** — Adds isRpcFunction field based on
   AsyncRPCFunctionRegistry::isRegistered().

7. **NativeSidecarFunctionRegistryTool.java** — Filters sidecar function
   list by isRpcFunction flag to build the RPC function name set.

8. **PlanOptimizers.java + ServerMainModule.java** — Wiring: registers
   RpcFunctionOptimizer with Guice Supplier for lazy RPC function
   discovery.

9. **Visitor touchpoints** (~15 Java files, 5-20 lines each) —
   Mechanical: adds RPCNode cases to existing plan visitors
   (AddExchanges, LimitPushDown, PruneUnreferencedOutputs, etc.).
   Safe to skim.

Differential Revision: D94996326
zhichenxu-meta added a commit to zhichenxu-meta/presto that referenced this pull request Mar 17, 2026
…nction detection [8/8] (OSS) (prestodb#27358)

Summary:
Pull Request resolved: prestodb#27358

X-link: https://github.com/facebookexternal/presto-facebook/pull/3595

End-to-end integration connecting the Java planner to the C++ RPCOperator.

## Java Planner (presto-main-base)

- **RPCNode.java**: Plan node for async RPC operations.
- **RpcFunctionOptimizer.java**: Rewrites ProjectNode(rpc_function(...))
  into Source -> RPCNode -> ProjectNode. Uses Supplier<Set<String>> for
  lazy RPC function name resolution.
- Visitor integration across plan visitors.

## Dynamic RPC Function Detection

The coordinator discovers RPC functions from the sidecar without
hardcoded function name lists:
1. C++ FunctionMetadata checks isRegistered(), sets isRpcFunction=true
2. presto_protocol adds isRpcFunction field to serialization
3. Java NativeSidecarFunctionRegistryTool filters for isRpcFunction=true
4. Guice Supplier<Set<String>> defers sidecar call to first query planning

## C++ Plan Converter

- Converts protocol RPCNode -> core::RPCNode (name-based)
- Validates function exists via isRegistered() (no instantiation)
- Parses result type via typeParser_.parse()
- Function instantiation deferred to RPCOperator::initialize()

## Tests

- RPCPlanConverterTest: Protocol deserialization, plan conversion with
  name-based validation, error paths for missing functions.

## Reading Guide

1. **RpcFunctionOptimizer.java** — Start here. The core optimizer that
   rewrites ProjectNode(rpc_function(...)) into Source -> RPCNode ->
   ProjectNode. Key methods: optimize() walks the plan,
   rewriteProjectWithRpcFunction() extracts the RPC call, builds RPCNode
   with arguments and result type.

2. **RPCNode.java** — Java plan node. Constructor, getters, serialization
   fields. Maps to the C++ core::RPCNode from diff 2.

3. **PrestoToVeloxQueryPlan.cpp** — C++ plan converter. Converts
   protocol RPCNode -> Velox RPCNode. Validates function via
   isRegistered().

4. **presto_protocol_core.h/cpp/yml** — Protocol serde for RPCNode.
   Generated-style code; review the .yml for the schema, skim the
   .h/.cpp for correctness.

5. **RPCPlanConverterTest.cpp** — Unit tests for the C++ plan converter.
   Good for understanding the expected protocol format.

6. **FunctionMetadata.cpp** — Adds isRpcFunction field based on
   AsyncRPCFunctionRegistry::isRegistered().

7. **NativeSidecarFunctionRegistryTool.java** — Filters sidecar function
   list by isRpcFunction flag to build the RPC function name set.

8. **PlanOptimizers.java + ServerMainModule.java** — Wiring: registers
   RpcFunctionOptimizer with Guice Supplier for lazy RPC function
   discovery.

9. **Visitor touchpoints** (~15 Java files, 5-20 lines each) —
   Mechanical: adds RPCNode cases to existing plan visitors
   (AddExchanges, LimitPushDown, PruneUnreferencedOutputs, etc.).
   Safe to skim.

Differential Revision: D94996326
zhichenxu-meta added a commit to zhichenxu-meta/presto that referenced this pull request Mar 17, 2026
…nction detection [8/8] (OSS) (prestodb#27358)

Summary:
Pull Request resolved: prestodb#27358

X-link: https://github.com/facebookexternal/presto-facebook/pull/3595

End-to-end integration connecting the Java planner to the C++ RPCOperator.

## Java Planner (presto-main-base)

- **RPCNode.java**: Plan node for async RPC operations.
- **RpcFunctionOptimizer.java**: Rewrites ProjectNode(rpc_function(...))
  into Source -> RPCNode -> ProjectNode. Uses Supplier<Set<String>> for
  lazy RPC function name resolution.
- Visitor integration across plan visitors.

## Dynamic RPC Function Detection

The coordinator discovers RPC functions from the sidecar without
hardcoded function name lists:
1. C++ FunctionMetadata checks isRegistered(), sets isRpcFunction=true
2. presto_protocol adds isRpcFunction field to serialization
3. Java NativeSidecarFunctionRegistryTool filters for isRpcFunction=true
4. Guice Supplier<Set<String>> defers sidecar call to first query planning

## C++ Plan Converter

- Converts protocol RPCNode -> core::RPCNode (name-based)
- Validates function exists via isRegistered() (no instantiation)
- Parses result type via typeParser_.parse()
- Function instantiation deferred to RPCOperator::initialize()

## Tests

- RPCPlanConverterTest: Protocol deserialization, plan conversion with
  name-based validation, error paths for missing functions.

## Reading Guide

1. **RpcFunctionOptimizer.java** — Start here. The core optimizer that
   rewrites ProjectNode(rpc_function(...)) into Source -> RPCNode ->
   ProjectNode. Key methods: optimize() walks the plan,
   rewriteProjectWithRpcFunction() extracts the RPC call, builds RPCNode
   with arguments and result type.

2. **RPCNode.java** — Java plan node. Constructor, getters, serialization
   fields. Maps to the C++ core::RPCNode from diff 2.

3. **PrestoToVeloxQueryPlan.cpp** — C++ plan converter. Converts
   protocol RPCNode -> Velox RPCNode. Validates function via
   isRegistered().

4. **presto_protocol_core.h/cpp/yml** — Protocol serde for RPCNode.
   Generated-style code; review the .yml for the schema, skim the
   .h/.cpp for correctness.

5. **RPCPlanConverterTest.cpp** — Unit tests for the C++ plan converter.
   Good for understanding the expected protocol format.

6. **FunctionMetadata.cpp** — Adds isRpcFunction field based on
   AsyncRPCFunctionRegistry::isRegistered().

7. **NativeSidecarFunctionRegistryTool.java** — Filters sidecar function
   list by isRpcFunction flag to build the RPC function name set.

8. **PlanOptimizers.java + ServerMainModule.java** — Wiring: registers
   RpcFunctionOptimizer with Guice Supplier for lazy RPC function
   discovery.

9. **Visitor touchpoints** (~15 Java files, 5-20 lines each) —
   Mechanical: adds RPCNode cases to existing plan visitors
   (AddExchanges, LimitPushDown, PruneUnreferencedOutputs, etc.).
   Safe to skim.

Differential Revision: D94996326
zhichenxu-meta added a commit to zhichenxu-meta/presto that referenced this pull request Mar 17, 2026
…nction detection [8/8] (OSS) (prestodb#27358)

Summary:
Pull Request resolved: prestodb#27358

X-link: https://github.com/facebookexternal/presto-facebook/pull/3595

End-to-end integration connecting the Java planner to the C++ RPCOperator.

## Java Planner (presto-main-base)

- **RPCNode.java**: Plan node for async RPC operations.
- **RpcFunctionOptimizer.java**: Rewrites ProjectNode(rpc_function(...))
  into Source -> RPCNode -> ProjectNode. Uses Supplier<Set<String>> for
  lazy RPC function name resolution.
- Visitor integration across plan visitors.

## Dynamic RPC Function Detection

The coordinator discovers RPC functions from the sidecar without
hardcoded function name lists:
1. C++ FunctionMetadata checks isRegistered(), sets isRpcFunction=true
2. presto_protocol adds isRpcFunction field to serialization
3. Java NativeSidecarFunctionRegistryTool filters for isRpcFunction=true
4. Guice Supplier<Set<String>> defers sidecar call to first query planning

## C++ Plan Converter

- Converts protocol RPCNode -> core::RPCNode (name-based)
- Validates function exists via isRegistered() (no instantiation)
- Parses result type via typeParser_.parse()
- Function instantiation deferred to RPCOperator::initialize()

## Tests

- RPCPlanConverterTest: Protocol deserialization, plan conversion with
  name-based validation, error paths for missing functions.

## Reading Guide

1. **RpcFunctionOptimizer.java** — Start here. The core optimizer that
   rewrites ProjectNode(rpc_function(...)) into Source -> RPCNode ->
   ProjectNode. Key methods: optimize() walks the plan,
   rewriteProjectWithRpcFunction() extracts the RPC call, builds RPCNode
   with arguments and result type.

2. **RPCNode.java** — Java plan node. Constructor, getters, serialization
   fields. Maps to the C++ core::RPCNode from diff 2.

3. **PrestoToVeloxQueryPlan.cpp** — C++ plan converter. Converts
   protocol RPCNode -> Velox RPCNode. Validates function via
   isRegistered().

4. **presto_protocol_core.h/cpp/yml** — Protocol serde for RPCNode.
   Generated-style code; review the .yml for the schema, skim the
   .h/.cpp for correctness.

5. **RPCPlanConverterTest.cpp** — Unit tests for the C++ plan converter.
   Good for understanding the expected protocol format.

6. **FunctionMetadata.cpp** — Adds isRpcFunction field based on
   AsyncRPCFunctionRegistry::isRegistered().

7. **NativeSidecarFunctionRegistryTool.java** — Filters sidecar function
   list by isRpcFunction flag to build the RPC function name set.

8. **PlanOptimizers.java + ServerMainModule.java** — Wiring: registers
   RpcFunctionOptimizer with Guice Supplier for lazy RPC function
   discovery.

9. **Visitor touchpoints** (~15 Java files, 5-20 lines each) —
   Mechanical: adds RPCNode cases to existing plan visitors
   (AddExchanges, LimitPushDown, PruneUnreferencedOutputs, etc.).
   Safe to skim.

Differential Revision: D94996326
zhichenxu-meta added a commit to zhichenxu-meta/presto that referenced this pull request Mar 17, 2026
…nction detection [8/8] (OSS) (prestodb#27358)

Summary:
Pull Request resolved: prestodb#27358

X-link: https://github.com/facebookexternal/presto-facebook/pull/3595

End-to-end integration connecting the Java planner to the C++ RPCOperator.

## Java Planner (presto-main-base)

- **RPCNode.java**: Plan node for async RPC operations.
- **RpcFunctionOptimizer.java**: Rewrites ProjectNode(rpc_function(...))
  into Source -> RPCNode -> ProjectNode. Uses Supplier<Set<String>> for
  lazy RPC function name resolution.
- Visitor integration across plan visitors.

## Dynamic RPC Function Detection

The coordinator discovers RPC functions from the sidecar without
hardcoded function name lists:
1. C++ FunctionMetadata checks isRegistered(), sets isRpcFunction=true
2. presto_protocol adds isRpcFunction field to serialization
3. Java NativeSidecarFunctionRegistryTool filters for isRpcFunction=true
4. Guice Supplier<Set<String>> defers sidecar call to first query planning

## C++ Plan Converter

- Converts protocol RPCNode -> core::RPCNode (name-based)
- Validates function exists via isRegistered() (no instantiation)
- Parses result type via typeParser_.parse()
- Function instantiation deferred to RPCOperator::initialize()

## Tests

- RPCPlanConverterTest: Protocol deserialization, plan conversion with
  name-based validation, error paths for missing functions.

## Reading Guide

1. **RpcFunctionOptimizer.java** — Start here. The core optimizer that
   rewrites ProjectNode(rpc_function(...)) into Source -> RPCNode ->
   ProjectNode. Key methods: optimize() walks the plan,
   rewriteProjectWithRpcFunction() extracts the RPC call, builds RPCNode
   with arguments and result type.

2. **RPCNode.java** — Java plan node. Constructor, getters, serialization
   fields. Maps to the C++ core::RPCNode from diff 2.

3. **PrestoToVeloxQueryPlan.cpp** — C++ plan converter. Converts
   protocol RPCNode -> Velox RPCNode. Validates function via
   isRegistered().

4. **presto_protocol_core.h/cpp/yml** — Protocol serde for RPCNode.
   Generated-style code; review the .yml for the schema, skim the
   .h/.cpp for correctness.

5. **RPCPlanConverterTest.cpp** — Unit tests for the C++ plan converter.
   Good for understanding the expected protocol format.

6. **FunctionMetadata.cpp** — Adds isRpcFunction field based on
   AsyncRPCFunctionRegistry::isRegistered().

7. **NativeSidecarFunctionRegistryTool.java** — Filters sidecar function
   list by isRpcFunction flag to build the RPC function name set.

8. **PlanOptimizers.java + ServerMainModule.java** — Wiring: registers
   RpcFunctionOptimizer with Guice Supplier for lazy RPC function
   discovery.

9. **Visitor touchpoints** (~15 Java files, 5-20 lines each) —
   Mechanical: adds RPCNode cases to existing plan visitors
   (AddExchanges, LimitPushDown, PruneUnreferencedOutputs, etc.).
   Safe to skim.

Differential Revision: D94996326
zhichenxu-meta added a commit to zhichenxu-meta/presto that referenced this pull request Mar 17, 2026
…nction detection [8/8] (OSS) (prestodb#27358)

Summary:
Pull Request resolved: prestodb#27358

X-link: https://github.com/facebookexternal/presto-facebook/pull/3595

End-to-end integration connecting the Java planner to the C++ RPCOperator.

## Java Planner (presto-main-base)

- **RPCNode.java**: Plan node for async RPC operations.
- **RpcFunctionOptimizer.java**: Rewrites ProjectNode(rpc_function(...))
  into Source -> RPCNode -> ProjectNode. Uses Supplier<Set<String>> for
  lazy RPC function name resolution.
- Visitor integration across plan visitors.

## Dynamic RPC Function Detection

The coordinator discovers RPC functions from the sidecar without
hardcoded function name lists:
1. C++ FunctionMetadata checks isRegistered(), sets isRpcFunction=true
2. presto_protocol adds isRpcFunction field to serialization
3. Java NativeSidecarFunctionRegistryTool filters for isRpcFunction=true
4. Guice Supplier<Set<String>> defers sidecar call to first query planning

## C++ Plan Converter

- Converts protocol RPCNode -> core::RPCNode (name-based)
- Validates function exists via isRegistered() (no instantiation)
- Parses result type via typeParser_.parse()
- Function instantiation deferred to RPCOperator::initialize()

## Tests

- RPCPlanConverterTest: Protocol deserialization, plan conversion with
  name-based validation, error paths for missing functions.

## Reading Guide

1. **RpcFunctionOptimizer.java** — Start here. The core optimizer that
   rewrites ProjectNode(rpc_function(...)) into Source -> RPCNode ->
   ProjectNode. Key methods: optimize() walks the plan,
   rewriteProjectWithRpcFunction() extracts the RPC call, builds RPCNode
   with arguments and result type.

2. **RPCNode.java** — Java plan node. Constructor, getters, serialization
   fields. Maps to the C++ core::RPCNode from diff 2.

3. **PrestoToVeloxQueryPlan.cpp** — C++ plan converter. Converts
   protocol RPCNode -> Velox RPCNode. Validates function via
   isRegistered().

4. **presto_protocol_core.h/cpp/yml** — Protocol serde for RPCNode.
   Generated-style code; review the .yml for the schema, skim the
   .h/.cpp for correctness.

5. **RPCPlanConverterTest.cpp** — Unit tests for the C++ plan converter.
   Good for understanding the expected protocol format.

6. **FunctionMetadata.cpp** — Adds isRpcFunction field based on
   AsyncRPCFunctionRegistry::isRegistered().

7. **NativeSidecarFunctionRegistryTool.java** — Filters sidecar function
   list by isRpcFunction flag to build the RPC function name set.

8. **PlanOptimizers.java + ServerMainModule.java** — Wiring: registers
   RpcFunctionOptimizer with Guice Supplier for lazy RPC function
   discovery.

9. **Visitor touchpoints** (~15 Java files, 5-20 lines each) —
   Mechanical: adds RPCNode cases to existing plan visitors
   (AddExchanges, LimitPushDown, PruneUnreferencedOutputs, etc.).
   Safe to skim.

Differential Revision: D94996326
zhichenxu-meta added a commit to zhichenxu-meta/presto that referenced this pull request Mar 17, 2026
…nction detection [8/8] (OSS) (prestodb#27358)

Summary:
Pull Request resolved: prestodb#27358

X-link: https://github.com/facebookexternal/presto-facebook/pull/3595

End-to-end integration connecting the Java planner to the C++ RPCOperator.

## Java Planner (presto-main-base)

- **RPCNode.java**: Plan node for async RPC operations.
- **RpcFunctionOptimizer.java**: Rewrites ProjectNode(rpc_function(...))
  into Source -> RPCNode -> ProjectNode. Uses Supplier<Set<String>> for
  lazy RPC function name resolution.
- Visitor integration across plan visitors.

## Dynamic RPC Function Detection

The coordinator discovers RPC functions from the sidecar without
hardcoded function name lists:
1. C++ FunctionMetadata checks isRegistered(), sets isRpcFunction=true
2. presto_protocol adds isRpcFunction field to serialization
3. Java NativeSidecarFunctionRegistryTool filters for isRpcFunction=true
4. Guice Supplier<Set<String>> defers sidecar call to first query planning

## C++ Plan Converter

- Converts protocol RPCNode -> core::RPCNode (name-based)
- Validates function exists via isRegistered() (no instantiation)
- Parses result type via typeParser_.parse()
- Function instantiation deferred to RPCOperator::initialize()

## Tests

- RPCPlanConverterTest: Protocol deserialization, plan conversion with
  name-based validation, error paths for missing functions.

## Reading Guide

1. **RpcFunctionOptimizer.java** — Start here. The core optimizer that
   rewrites ProjectNode(rpc_function(...)) into Source -> RPCNode ->
   ProjectNode. Key methods: optimize() walks the plan,
   rewriteProjectWithRpcFunction() extracts the RPC call, builds RPCNode
   with arguments and result type.

2. **RPCNode.java** — Java plan node. Constructor, getters, serialization
   fields. Maps to the C++ core::RPCNode from diff 2.

3. **PrestoToVeloxQueryPlan.cpp** — C++ plan converter. Converts
   protocol RPCNode -> Velox RPCNode. Validates function via
   isRegistered().

4. **presto_protocol_core.h/cpp/yml** — Protocol serde for RPCNode.
   Generated-style code; review the .yml for the schema, skim the
   .h/.cpp for correctness.

5. **RPCPlanConverterTest.cpp** — Unit tests for the C++ plan converter.
   Good for understanding the expected protocol format.

6. **FunctionMetadata.cpp** — Adds isRpcFunction field based on
   AsyncRPCFunctionRegistry::isRegistered().

7. **NativeSidecarFunctionRegistryTool.java** — Filters sidecar function
   list by isRpcFunction flag to build the RPC function name set.

8. **PlanOptimizers.java + ServerMainModule.java** — Wiring: registers
   RpcFunctionOptimizer with Guice Supplier for lazy RPC function
   discovery.

9. **Visitor touchpoints** (~15 Java files, 5-20 lines each) —
   Mechanical: adds RPCNode cases to existing plan visitors
   (AddExchanges, LimitPushDown, PruneUnreferencedOutputs, etc.).
   Safe to skim.

Differential Revision: D94996326
zhichenxu-meta added a commit to zhichenxu-meta/presto that referenced this pull request Mar 17, 2026
…nction detection [8/8] (OSS) (prestodb#27358)

Summary:
Pull Request resolved: prestodb#27358

X-link: https://github.com/facebookexternal/presto-facebook/pull/3595

End-to-end integration connecting the Java planner to the C++ RPCOperator.

## Java Planner (presto-main-base)

- **RPCNode.java**: Plan node for async RPC operations.
- **RpcFunctionOptimizer.java**: Rewrites ProjectNode(rpc_function(...))
  into Source -> RPCNode -> ProjectNode. Uses Supplier<Set<String>> for
  lazy RPC function name resolution.
- Visitor integration across plan visitors.

## Dynamic RPC Function Detection

The coordinator discovers RPC functions from the sidecar without
hardcoded function name lists:
1. C++ FunctionMetadata checks isRegistered(), sets isRpcFunction=true
2. presto_protocol adds isRpcFunction field to serialization
3. Java NativeSidecarFunctionRegistryTool filters for isRpcFunction=true
4. Guice Supplier<Set<String>> defers sidecar call to first query planning

## C++ Plan Converter

- Converts protocol RPCNode -> core::RPCNode (name-based)
- Validates function exists via isRegistered() (no instantiation)
- Parses result type via typeParser_.parse()
- Function instantiation deferred to RPCOperator::initialize()

## Tests

- RPCPlanConverterTest: Protocol deserialization, plan conversion with
  name-based validation, error paths for missing functions.

## Reading Guide

1. **RpcFunctionOptimizer.java** — Start here. The core optimizer that
   rewrites ProjectNode(rpc_function(...)) into Source -> RPCNode ->
   ProjectNode. Key methods: optimize() walks the plan,
   rewriteProjectWithRpcFunction() extracts the RPC call, builds RPCNode
   with arguments and result type.

2. **RPCNode.java** — Java plan node. Constructor, getters, serialization
   fields. Maps to the C++ core::RPCNode from diff 2.

3. **PrestoToVeloxQueryPlan.cpp** — C++ plan converter. Converts
   protocol RPCNode -> Velox RPCNode. Validates function via
   isRegistered().

4. **presto_protocol_core.h/cpp/yml** — Protocol serde for RPCNode.
   Generated-style code; review the .yml for the schema, skim the
   .h/.cpp for correctness.

5. **RPCPlanConverterTest.cpp** — Unit tests for the C++ plan converter.
   Good for understanding the expected protocol format.

6. **FunctionMetadata.cpp** — Adds isRpcFunction field based on
   AsyncRPCFunctionRegistry::isRegistered().

7. **NativeSidecarFunctionRegistryTool.java** — Filters sidecar function
   list by isRpcFunction flag to build the RPC function name set.

8. **PlanOptimizers.java + ServerMainModule.java** — Wiring: registers
   RpcFunctionOptimizer with Guice Supplier for lazy RPC function
   discovery.

9. **Visitor touchpoints** (~15 Java files, 5-20 lines each) —
   Mechanical: adds RPCNode cases to existing plan visitors
   (AddExchanges, LimitPushDown, PruneUnreferencedOutputs, etc.).
   Safe to skim.

Differential Revision: D94996326
zhichenxu-meta added a commit to zhichenxu-meta/presto that referenced this pull request Mar 17, 2026
…nction detection [8/8] (OSS) (prestodb#27358)

Summary:
Pull Request resolved: prestodb#27358

X-link: https://github.com/facebookexternal/presto-facebook/pull/3595

End-to-end integration connecting the Java planner to the C++ RPCOperator.

## Java Planner (presto-main-base)

- **RPCNode.java**: Plan node for async RPC operations.
- **RpcFunctionOptimizer.java**: Rewrites ProjectNode(rpc_function(...))
  into Source -> RPCNode -> ProjectNode. Uses Supplier<Set<String>> for
  lazy RPC function name resolution.
- Visitor integration across plan visitors.

## Dynamic RPC Function Detection

The coordinator discovers RPC functions from the sidecar without
hardcoded function name lists:
1. C++ FunctionMetadata checks isRegistered(), sets isRpcFunction=true
2. presto_protocol adds isRpcFunction field to serialization
3. Java NativeSidecarFunctionRegistryTool filters for isRpcFunction=true
4. Guice Supplier<Set<String>> defers sidecar call to first query planning

## C++ Plan Converter

- Converts protocol RPCNode -> core::RPCNode (name-based)
- Validates function exists via isRegistered() (no instantiation)
- Parses result type via typeParser_.parse()
- Function instantiation deferred to RPCOperator::initialize()

## Tests

- RPCPlanConverterTest: Protocol deserialization, plan conversion with
  name-based validation, error paths for missing functions.

## Reading Guide

1. **RpcFunctionOptimizer.java** — Start here. The core optimizer that
   rewrites ProjectNode(rpc_function(...)) into Source -> RPCNode ->
   ProjectNode. Key methods: optimize() walks the plan,
   rewriteProjectWithRpcFunction() extracts the RPC call, builds RPCNode
   with arguments and result type.

2. **RPCNode.java** — Java plan node. Constructor, getters, serialization
   fields. Maps to the C++ core::RPCNode from diff 2.

3. **PrestoToVeloxQueryPlan.cpp** — C++ plan converter. Converts
   protocol RPCNode -> Velox RPCNode. Validates function via
   isRegistered().

4. **presto_protocol_core.h/cpp/yml** — Protocol serde for RPCNode.
   Generated-style code; review the .yml for the schema, skim the
   .h/.cpp for correctness.

5. **RPCPlanConverterTest.cpp** — Unit tests for the C++ plan converter.
   Good for understanding the expected protocol format.

6. **FunctionMetadata.cpp** — Adds isRpcFunction field based on
   AsyncRPCFunctionRegistry::isRegistered().

7. **NativeSidecarFunctionRegistryTool.java** — Filters sidecar function
   list by isRpcFunction flag to build the RPC function name set.

8. **PlanOptimizers.java + ServerMainModule.java** — Wiring: registers
   RpcFunctionOptimizer with Guice Supplier for lazy RPC function
   discovery.

9. **Visitor touchpoints** (~15 Java files, 5-20 lines each) —
   Mechanical: adds RPCNode cases to existing plan visitors
   (AddExchanges, LimitPushDown, PruneUnreferencedOutputs, etc.).
   Safe to skim.

Differential Revision: D94996326
@zhichenxu-meta zhichenxu-meta force-pushed the export-D94996326 branch 2 times, most recently from e3984a9 to fcfdef2 Compare March 17, 2026 21:12
@meta-codesync meta-codesync Bot changed the title feat(rpc): Add Java planner integration, protocol, and dynamic RPC function detection [8/8] (OSS) (#27358) feat(native): Add Java planner integration, protocol, and dynamic RPC function detection [8/8] (OSS) Mar 17, 2026
zhichenxu-meta added a commit to zhichenxu-meta/presto that referenced this pull request Mar 18, 2026
… function detection [8/8] (OSS) (prestodb#27358)

Summary:
Pull Request resolved: prestodb#27358

X-link: https://github.com/facebookexternal/presto-facebook/pull/3595

End-to-end integration connecting the Java planner to the C++ RPCOperator.

## Java Planner (presto-main-base)

- **RPCNode.java**: Plan node for async RPC operations.
- **RpcFunctionOptimizer.java**: Rewrites ProjectNode(rpc_function(...))
  into Source -> RPCNode -> ProjectNode. Uses Supplier<Set<String>> for
  lazy RPC function name resolution.
- Visitor integration across plan visitors.

## Dynamic RPC Function Detection

The coordinator discovers RPC functions from the sidecar without
hardcoded function name lists:
1. C++ FunctionMetadata checks isRegistered(), sets isRpcFunction=true
2. presto_protocol adds isRpcFunction field to serialization
3. Java NativeSidecarFunctionRegistryTool filters for isRpcFunction=true
4. Guice Supplier<Set<String>> defers sidecar call to first query planning

## C++ Plan Converter

- Converts protocol RPCNode -> core::RPCNode (name-based)
- Validates function exists via isRegistered() (no instantiation)
- Parses result type via typeParser_.parse()
- Function instantiation deferred to RPCOperator::initialize()

## Tests

- RPCPlanConverterTest: Protocol deserialization, plan conversion with
  name-based validation, error paths for missing functions.

## Reading Guide

1. **RpcFunctionOptimizer.java** — Start here. The core optimizer that
   rewrites ProjectNode(rpc_function(...)) into Source -> RPCNode ->
   ProjectNode. Key methods: optimize() walks the plan,
   rewriteProjectWithRpcFunction() extracts the RPC call, builds RPCNode
   with arguments and result type.

2. **RPCNode.java** — Java plan node. Constructor, getters, serialization
   fields. Maps to the C++ core::RPCNode from diff 2.

3. **PrestoToVeloxQueryPlan.cpp** — C++ plan converter. Converts
   protocol RPCNode -> Velox RPCNode. Validates function via
   isRegistered().

4. **presto_protocol_core.h/cpp/yml** — Protocol serde for RPCNode.
   Generated-style code; review the .yml for the schema, skim the
   .h/.cpp for correctness.

5. **RPCPlanConverterTest.cpp** — Unit tests for the C++ plan converter.
   Good for understanding the expected protocol format.

6. **FunctionMetadata.cpp** — Adds isRpcFunction field based on
   AsyncRPCFunctionRegistry::isRegistered().

7. **NativeSidecarFunctionRegistryTool.java** — Filters sidecar function
   list by isRpcFunction flag to build the RPC function name set.

8. **PlanOptimizers.java + ServerMainModule.java** — Wiring: registers
   RpcFunctionOptimizer with Guice Supplier for lazy RPC function
   discovery.

9. **Visitor touchpoints** (~15 Java files, 5-20 lines each) —
   Mechanical: adds RPCNode cases to existing plan visitors
   (AddExchanges, LimitPushDown, PruneUnreferencedOutputs, etc.).
   Safe to skim.

Differential Revision: D94996326
@meta-codesync meta-codesync Bot changed the title feat(native): Add Java planner integration, protocol, and dynamic RPC function detection [8/8] (OSS) feat(native): Add Java planner integration, protocol, and dynamic RPC function detection [8/8] (OSS) (#27358) Mar 18, 2026
zhichenxu-meta added a commit to zhichenxu-meta/presto that referenced this pull request Mar 18, 2026
… function detection [8/8] (OSS) (prestodb#27358)

Summary:
Pull Request resolved: prestodb#27358

X-link: https://github.com/facebookexternal/presto-facebook/pull/3595

End-to-end integration connecting the Java planner to the C++ RPCOperator.

## Java Planner (presto-main-base)

- **RPCNode.java**: Plan node for async RPC operations.
- **RpcFunctionOptimizer.java**: Rewrites ProjectNode(rpc_function(...))
  into Source -> RPCNode -> ProjectNode. Uses Supplier<Set<String>> for
  lazy RPC function name resolution.
- Visitor integration across plan visitors.

## Dynamic RPC Function Detection

The coordinator discovers RPC functions from the sidecar without
hardcoded function name lists:
1. C++ FunctionMetadata checks isRegistered(), sets isRpcFunction=true
2. presto_protocol adds isRpcFunction field to serialization
3. Java NativeSidecarFunctionRegistryTool filters for isRpcFunction=true
4. Guice Supplier<Set<String>> defers sidecar call to first query planning

## C++ Plan Converter

- Converts protocol RPCNode -> core::RPCNode (name-based)
- Validates function exists via isRegistered() (no instantiation)
- Parses result type via typeParser_.parse()
- Function instantiation deferred to RPCOperator::initialize()

## Tests

- RPCPlanConverterTest: Protocol deserialization, plan conversion with
  name-based validation, error paths for missing functions.

## Reading Guide

1. **RpcFunctionOptimizer.java** — Start here. The core optimizer that
   rewrites ProjectNode(rpc_function(...)) into Source -> RPCNode ->
   ProjectNode. Key methods: optimize() walks the plan,
   rewriteProjectWithRpcFunction() extracts the RPC call, builds RPCNode
   with arguments and result type.

2. **RPCNode.java** — Java plan node. Constructor, getters, serialization
   fields. Maps to the C++ core::RPCNode from diff 2.

3. **PrestoToVeloxQueryPlan.cpp** — C++ plan converter. Converts
   protocol RPCNode -> Velox RPCNode. Validates function via
   isRegistered().

4. **presto_protocol_core.h/cpp/yml** — Protocol serde for RPCNode.
   Generated-style code; review the .yml for the schema, skim the
   .h/.cpp for correctness.

5. **RPCPlanConverterTest.cpp** — Unit tests for the C++ plan converter.
   Good for understanding the expected protocol format.

6. **FunctionMetadata.cpp** — Adds isRpcFunction field based on
   AsyncRPCFunctionRegistry::isRegistered().

7. **NativeSidecarFunctionRegistryTool.java** — Filters sidecar function
   list by isRpcFunction flag to build the RPC function name set.

8. **PlanOptimizers.java + ServerMainModule.java** — Wiring: registers
   RpcFunctionOptimizer with Guice Supplier for lazy RPC function
   discovery.

9. **Visitor touchpoints** (~15 Java files, 5-20 lines each) —
   Mechanical: adds RPCNode cases to existing plan visitors
   (AddExchanges, LimitPushDown, PruneUnreferencedOutputs, etc.).
   Safe to skim.

Differential Revision: D94996326
private final NodeManager nodeManager;
private final HttpClient httpClient;
private static final String FUNCTION_SIGNATURES_ENDPOINT = "/v1/functions";
private volatile UdfFunctionSignatureMap cachedSignatureMap;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need a cachedSignatureMap for the RPC function? Why it's special comparing with other functions? cc @kevintang2022 who knows this code better

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it maintains existing behavior.

It's simple caching so that getNativeFunctionSignatureMap() is not called more than once. Otherwise, it would get called twice. Once in getWorkerFunctions and once in getRpcFunctionNames

* Whether this function is an RPC function (dispatched via the async RPC framework).
* Set by the sidecar based on AsyncRPCFunctionRegistry.
*/
private final Optional<Boolean> isRpcFunction;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only add this to JsonBasedUdf metadata? Are we expecting that it will only be registered in json function namespace manager?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This JsonBasedUdf metadata is used to store info about the functions that are in worker, and each function from worker is represented by one of these instances. It looks like this just adds an extra field to how a worker function is represented

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Feilong for the question and Kevin for the answer/comment!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This JsonBasedUdf metadata is used to store info about the functions that are in worker,

So worker function uses JsonBasedUdf for registration in coordinator? Didn't know this.

node.getStreamingMode(),
node.getDispatchBatchSize());

return new PlanWithProperties(rewrittenNode, source.getProperties());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we want to pass source.getProperties() untouched, at least the ordering properties and grouping properties should not be kept (it may not be used in addExchanges though, need double check)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@feilong-liu Thanks for the comment! Fixed. Changed to rebaseAndDeriveProperties(node, source) — same pattern as RowNumberNode. This properly derives output properties from the rewritten node.

@Override
public PlanWithProperties visitRPC(RPCNode node, StreamPreferredProperties parentPreferences)
{
return planAndEnforceChildren(node, parentPreferences.withDefaultParallelism(session), parentPreferences.withDefaultParallelism(session));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we want to enforce parentPreferences for this node

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@feilong-liu Added comment explaining the rationale: RPCNode benefits from multiple drivers for concurrent RPC dispatch. For constant-only queries, the C++ RPCPlanNodeTranslator::maxDrivers() forces single-driver to avoid ROUND_ROBIN distribution issues.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um, I guess RPCNode does not need to satisfy the parent preferences, since it's a preference it's a good to have property. Usually it's the parent node which has special requirement to decide to add additional node to enforce input to be a specific distribution.
Since this is just performance related and do not affect correctness, I guess we can land with current code and improve later if we see inefficient plans

// Guard: if a new streaming mode is added that changes cardinality,
// this must be revisited.
RPCNode.StreamingMode mode = node.getStreamingMode();
checkState(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks redundant to me, it has only two modes after all.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@feilong-liu Good catch! Fixed. Removed the checkState guard. Kept the rewrite logic with a comment that RPCNode preserves cardinality (1:1) in both modes.

* 2. The function name will be automatically discovered via the sidecar
*/
public class RpcFunctionOptimizer
implements PlanOptimizer

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if IterativeRule is better than PlanOptimizer here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. PlanOptimizer works well here because we need bottom-up RowExpressionTreeRewriter for nested RPC extraction (e.g., rpc(rpc(col))), which doesn't map cleanly to IterativeRule's top-down Pattern matching. Happy to migrate in a follow-up if you feel strongly.

* 1. Register the AsyncRPCFunction with signatures in the C++ worker
* 2. The function name will be automatically discovered via the sidecar
*/
public class RpcFunctionOptimizer

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a plan test for this optimizer? And include cases where there are nested rpc function calls (if supported)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion! Will add in a follow-up — need to set up a mock RPC function in the test query runner. The optimizer is already E2E tested with 23 SQL queries (including nested RPCs) on the verifier cluster.

* This node contains all the RPC metadata needed to create the operator:
* - functionName: The RPC function to call (e.g., "llm_inference")
* - arguments: The original argument expressions (used by C++ plan converter
* to extract types and constant values for AsyncRPCFunction::initialize())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to extract types

I assume it can also be retrieved from the argumentColumns too?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated comment. argumentColumns are just column name strings — they don't carry type information. The expression tree in arguments is needed for the C++ plan converter to extract types and constant values.

@Override
public PlanNode assignStatsEquivalentPlanNode(Optional<PlanNode> statsEquivalentPlanNode)
{
return this;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will lose the statsEquivalentPlanNode. Perhaps refer to how ProjectNode handle this function

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@feilong-liu Good catch! Fixed. Added two-constructor pattern following ProjectNode: @JsonCreator delegates to full constructor with Optional.empty(), assignStatsEquivalentPlanNode creates new RPCNode with the stats node, and replaceChildren now preserves getStatsEquivalentPlanNode().

schema,
protocol::FunctionKind::SCALAR,
*signature,
isRpcFunction)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a new field isRpcFunction in function metadata, wonder if we can directly tell from function metadata to see if a function is rpc, instead of relying on function name check.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are needed for different purposes. The metadata field (isRpcFunction) is for Java-side sidecar filtering — the coordinator uses it to build the set of RPC function names for the optimizer. The C++ registry check (isRegistered()) validates the function is actually linked and registered in the binary. The metadata field alone can't guarantee the function exists in the C++ worker.

kevintang2022
kevintang2022 previously approved these changes Mar 19, 2026

@kevintang2022 kevintang2022 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the changes on the registry tools.Feilong will review the visitor changes

@zhichenxu-meta

Copy link
Copy Markdown
Contributor Author

Approving the changes on the registry tools.Feilong will review the visitor changes

Thanks !

feilong-liu
feilong-liu previously approved these changes Mar 20, 2026

@feilong-liu feilong-liu left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except some nits.
I think we can improve it as we use it.

* Whether this function is an RPC function (dispatched via the async RPC framework).
* Set by the sidecar based on AsyncRPCFunctionRegistry.
*/
private final Optional<Boolean> isRpcFunction;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This JsonBasedUdf metadata is used to store info about the functions that are in worker,

So worker function uses JsonBasedUdf for registration in coordinator? Didn't know this.

@Override
public PlanWithProperties visitRPC(RPCNode node, StreamPreferredProperties parentPreferences)
{
return planAndEnforceChildren(node, parentPreferences.withDefaultParallelism(session), parentPreferences.withDefaultParallelism(session));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um, I guess RPCNode does not need to satisfy the parent preferences, since it's a preference it's a good to have property. Usually it's the parent node which has special requirement to decide to add additional node to enforce input to be a specific distribution.
Since this is just performance related and do not affect correctness, I guess we can land with current code and improve later if we see inefficient plans

@zhichenxu-meta

Copy link
Copy Markdown
Contributor Author

The CI failure on FunctionMetadata.cpp is expected — it includes velox/expression/rpc/AsyncRPCFunctionRegistry.h which was recently merged into Velox main (via PRs #16645, #16727, #16787, #16792, #16793). The Presto Velox submodule needs to be advanced to pick up these headers.

Could someone with write access bump the Velox submodule to latest main?

cd presto-native-execution/velox
git fetch origin
git checkout origin/main
cd ../..
git add presto-native-execution/velox

@steveburnett

Copy link
Copy Markdown
Contributor

Thanks for the PR! Please add a release note - or NO RELEASE NOTE - following the Release Notes Guidelines.

gggrace14
gggrace14 previously approved these changes Mar 25, 2026
feilong-liu
feilong-liu previously approved these changes Mar 25, 2026
@zhichenxu-meta

Copy link
Copy Markdown
Contributor Author

@prestodb/release-notes

Release Notes

General Changes

  • Add RPCNode plan node support for async RPC execution in Prestissimo. RPCNode enables single-operator pipelined RPC calls with per-row and batch streaming modes, rate limiting, and backpressure.

… function detection [8/8] (OSS) (prestodb#27358)

Summary:
Pull Request resolved: prestodb#27358

X-link: https://github.com/facebookexternal/presto-facebook/pull/3595

End-to-end integration connecting the Java planner to the C++ RPCOperator.

## Java Planner (presto-main-base)

- **RPCNode.java**: Plan node for async RPC operations.
- **RpcFunctionOptimizer.java**: Rewrites ProjectNode(rpc_function(...))
  into Source -> RPCNode -> ProjectNode. Uses Supplier<Set<String>> for
  lazy RPC function name resolution.
- Visitor integration across plan visitors.

## Dynamic RPC Function Detection

The coordinator discovers RPC functions from the sidecar without
hardcoded function name lists:
1. C++ FunctionMetadata checks isRegistered(), sets isRpcFunction=true
2. presto_protocol adds isRpcFunction field to serialization
3. Java NativeSidecarFunctionRegistryTool filters for isRpcFunction=true
4. Guice Supplier<Set<String>> defers sidecar call to first query planning

## C++ Plan Converter

- Converts protocol RPCNode -> core::RPCNode (name-based)
- Validates function exists via isRegistered() (no instantiation)
- Parses result type via typeParser_.parse()
- Function instantiation deferred to RPCOperator::initialize()

## Tests

- RPCPlanConverterTest: Protocol deserialization, plan conversion with
  name-based validation, error paths for missing functions.

## Reading Guide

1. **RpcFunctionOptimizer.java** — Start here. The core optimizer that
   rewrites ProjectNode(rpc_function(...)) into Source -> RPCNode ->
   ProjectNode. Key methods: optimize() walks the plan,
   rewriteProjectWithRpcFunction() extracts the RPC call, builds RPCNode
   with arguments and result type.

2. **RPCNode.java** — Java plan node. Constructor, getters, serialization
   fields. Maps to the C++ core::RPCNode from diff 2.

3. **PrestoToVeloxQueryPlan.cpp** — C++ plan converter. Converts
   protocol RPCNode -> Velox RPCNode. Validates function via
   isRegistered().

4. **presto_protocol_core.h/cpp/yml** — Protocol serde for RPCNode.
   Generated-style code; review the .yml for the schema, skim the
   .h/.cpp for correctness.

5. **RPCPlanConverterTest.cpp** — Unit tests for the C++ plan converter.
   Good for understanding the expected protocol format.

6. **FunctionMetadata.cpp** — Adds isRpcFunction field based on
   AsyncRPCFunctionRegistry::isRegistered().

7. **NativeSidecarFunctionRegistryTool.java** — Filters sidecar function
   list by isRpcFunction flag to build the RPC function name set.

8. **PlanOptimizers.java + ServerMainModule.java** — Wiring: registers
   RpcFunctionOptimizer with Guice Supplier for lazy RPC function
   discovery.

9. **Visitor touchpoints** (~15 Java files, 5-20 lines each) —
   Mechanical: adds RPCNode cases to existing plan visitors
   (AddExchanges, LimitPushDown, PruneUnreferencedOutputs, etc.).
   Safe to skim.

Reviewed By: gggrace14

Differential Revision: D94996326
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants