feat(clp-s): Add support for duplicate columns in projections.#1217
Conversation
WalkthroughAdds a boolean flag to Projection to control whether duplicate columns are permitted; constructor updated to accept Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant Projection
Caller->>Projection: add_column(column)
Projection->>Projection: find_if(existing column)
alt duplicate found
alt m_allow_duplicate_columns == false
Projection-->>Caller: throw BadParam (duplicate)
else m_allow_duplicate_columns == true
rect rgb(220,245,220)
Projection->>Projection: push column (allow duplicate)
end
Projection-->>Caller: return
end
else no duplicate
Projection->>Projection: push column
Projection-->>Caller: return
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: CodeRabbit UI 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (1)**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}⚙️ CodeRabbit Configuration File
Files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
🔇 Additional comments (2)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 3
🔭 Outside diff range comments (1)
components/core/src/clp_s/search/Projection.hpp (1)
41-43: Update throws contract to reflect the new duplicate policyThe doc still states duplicates always throw. It should mention that throwing only happens when duplicates are disallowed.
Suggested tweak:
- * @throws OperationFailed if `column` is identical to a previously added column + * @throws OperationFailed if `column` is identical to a previously added column and duplicate columns are disallowed
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
components/core/src/clp_s/search/Projection.cpp(1 hunks)components/core/src/clp_s/search/Projection.hpp(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit Configuration File
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp_s/search/Projection.cppcomponents/core/src/clp_s/search/Projection.hpp
🧠 Learnings (1)
📚 Learning: 2025-05-02T22:27:59.347Z
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.
Applied to files:
components/core/src/clp_s/search/Projection.cpp
🧬 Code Graph Analysis (1)
components/core/src/clp_s/search/Projection.cpp (2)
components/core/src/clp_s/search/Projection.hpp (1)
column(44-44)components/core/src/clp_s/search/ast/ColumnDescriptor.cpp (2)
rhs(105-110)rhs(105-105)
🔇 Additional comments (2)
components/core/src/clp_s/search/Projection.cpp (2)
16-22: Duplicate-column gate is correct; behaviour preserved by defaultThe conditional correctly short-circuits the O(n) duplicate scan when duplicates are allowed, preserving existing behaviour when the flag is false. Nice adherence to the codebase style (false == expr).
16-22: Confirm uniqueness assumptions with allow_duplicate_columns=trueAllowing duplicates changes the contract: callers that previously relied on add_column() throwing on duplicates will now receive repeated entries. Please review all Projection usages to ensure downstream logic handles duplicates correctly.
Key areas to inspect:
- Constructors of Projection – are any callsites assuming a thrown error for duplicate columns?
- Calls to get_ordered_matching_nodes() – does the consumer expect only unique results?
- Iterations over m_selected_columns in consuming code – is any aggregation or lookup logic broken by repeats?
If any callers depend on the old “throw on duplicate” behaviour, either restore that guard where needed or update those callers to explicitly handle duplicate columns.
| explicit Projection(ProjectionMode mode, bool allow_duplicate_columns = false) | ||
| : m_projection_mode{mode}, | ||
| m_allow_duplicate_columns{allow_duplicate_columns} {} | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Consider documenting the new parameter in-class
A brief comment for allow_duplicate_columns will help downstream integrators (e.g., Velox–CLP) discover this capability without digging through PRs.
Happy to draft a short doxygen note for the constructor and class comment if you want it in this PR.
🤖 Prompt for AI Agents
In components/core/src/clp_s/search/Projection.hpp around lines 33 to 36, the
new constructor parameter allow_duplicate_columns is undocumented; add a concise
in-class Doxygen comment for the class or directly above the constructor that
describes the parameter purpose (what allowing duplicate columns means), its
default value (false), and any behavioral/compatibility notes so downstream
integrators can discover and use it without reading PR history.
🛠️ Refactor suggestion
Preserve ABI/API: re-introduce the legacy constructor overload and default-initialize the flag
Changing the constructor signature can break ABI for consumers linking against a shared library. Keep the old overload and delegate to the new one; also default-initialize the new member to be robust against future constructors.
Proposed diff:
- explicit Projection(ProjectionMode mode, bool allow_duplicate_columns = false)
- : m_projection_mode{mode},
- m_allow_duplicate_columns{allow_duplicate_columns} {}
+ // Preserve legacy API/ABI
+ explicit Projection(ProjectionMode mode)
+ : m_projection_mode{mode},
+ m_allow_duplicate_columns{false} {}
+
+ explicit Projection(ProjectionMode mode, bool allow_duplicate_columns)
+ : m_projection_mode{mode},
+ m_allow_duplicate_columns{allow_duplicate_columns} {}
@@
- bool m_allow_duplicate_columns;
+ bool m_allow_duplicate_columns{false};Also applies to: 91-91
🤖 Prompt for AI Agents
In components/core/src/clp_s/search/Projection.hpp around lines 33-36 (and also
apply the same change at line 91), the public API/ABI was changed by removing
the legacy constructor overload and by not default-initializing the new flag
member; reintroduce the old constructor overload that accepts only
ProjectionMode and delegate it to the new two-argument constructor, and
default-initialize the m_allow_duplicate_columns member in its declaration so
existing consumers and future constructors remain ABI-stable and robust.
gibber9809
left a comment
There was a problem hiding this comment.
Changes look good -- just a few nits.
| absl::flat_hash_set<int32_t> m_matching_nodes; | ||
| std::vector<std::vector<int32_t>> m_ordered_matching_nodes; | ||
| ProjectionMode m_projection_mode{ProjectionMode::ReturnAllColumns}; | ||
| bool m_allow_duplicate_columns; |
There was a problem hiding this comment.
| bool m_allow_duplicate_columns; | |
| bool m_allow_duplicate_columns{false}; |
| }; | ||
|
|
||
| explicit Projection(ProjectionMode mode) : m_projection_mode{mode} {} | ||
| explicit Projection(ProjectionMode mode, bool allow_duplicate_columns = false) |
There was a problem hiding this comment.
Probably need a brief docstring for this constructor.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
components/core/src/clp_s/search/Projection.hpp (2)
40-42: Potential ABI break: re-introduce the legacy 1-arg ctor and drop the default arg on the 2-arg ctorChanging the constructor’s signature (even with a default arg) alters the mangled symbol and can break consumers linking against a shared library compiled against the old ABI. Keep the legacy overload and a dedicated 2-arg overload.
Suggested change:
- explicit Projection(ProjectionMode mode, bool allow_duplicate_columns = false) - : m_projection_mode{mode}, - m_allow_duplicate_columns{allow_duplicate_columns} {} + // Preserve legacy API/ABI + explicit Projection(ProjectionMode mode) + : m_projection_mode{mode}, + m_allow_duplicate_columns{false} {} + + explicit Projection(ProjectionMode mode, bool allow_duplicate_columns) + : m_projection_mode{mode}, + m_allow_duplicate_columns{allow_duplicate_columns} {}If ABI stability isn’t a requirement (e.g., static-only or all downstreams recompile), feel free to disregard. Otherwise, this avoids an unnecessary breaking change.
To verify whether we build shared libraries in-tree:
#!/usr/bin/env bash # Scan CMake files for shared library targets which would be sensitive to ABI changes rg -nP -C2 '(add_library\s*\([^)]*SHARED)|(\bSHARED\b)' --type cmake
98-98: Good: default-initialize m_allow_duplicate_columnsThis aligns with robust member initialization and addresses earlier feedback.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
components/core/src/clp_s/search/Projection.cpp(1 hunks)components/core/src/clp_s/search/Projection.hpp(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit Configuration File
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp_s/search/Projection.cppcomponents/core/src/clp_s/search/Projection.hpp
🧠 Learnings (2)
📚 Learning: 2025-05-02T22:27:59.347Z
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.
Applied to files:
components/core/src/clp_s/search/Projection.cpp
📚 Learning: 2024-10-14T03:45:21.323Z
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:331-335
Timestamp: 2024-10-14T03:45:21.323Z
Learning: When reviewing code, ensure that suggestions about unnecessary use of `std::move` with `std::shared_ptr` are only made if `std::move` is actually used in the code.
Applied to files:
components/core/src/clp_s/search/Projection.cpp
🧬 Code Graph Analysis (1)
components/core/src/clp_s/search/Projection.cpp (3)
components/core/src/clp_s/search/Projection.hpp (1)
column(51-51)components/core/src/clp_s/search/ast/ColumnDescriptor.cpp (2)
rhs(105-110)rhs(105-105)components/core/src/clp_s/search/ast/ColumnDescriptor.hpp (3)
rhs(63-65)rhs(63-63)rhs(277-277)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (1)
components/core/src/clp_s/search/Projection.cpp (1)
16-22: Duplicate check gating + lambda capture: LGTM
- Correctly gates the duplicate check behind the flag (default false).
- Using [&column] avoids shared_ptr refcount churn and addresses the prior nit.
- Style aligns with our guideline (false == ...).
| if (false == m_allow_duplicate_columns | ||
| && m_selected_columns.end() | ||
| != std::find_if( | ||
| m_selected_columns.begin(), | ||
| m_selected_columns.end(), | ||
| [&column](auto const& rhs) -> bool { return *column == *rhs; } | ||
| )) |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Nit: prefer std::any_of for readability over find_if != end
This reads a bit clearer and avoids the end-iterator comparison.
- if (false == m_allow_duplicate_columns
- && m_selected_columns.end()
- != std::find_if(
- m_selected_columns.begin(),
- m_selected_columns.end(),
- [&column](auto const& rhs) -> bool { return *column == *rhs; }
- ))
+ if (false == m_allow_duplicate_columns
+ && std::any_of(
+ m_selected_columns.begin(),
+ m_selected_columns.end(),
+ [&column](auto const& rhs) -> bool { return *column == *rhs; }
+ ))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (false == m_allow_duplicate_columns | |
| && m_selected_columns.end() | |
| != std::find_if( | |
| m_selected_columns.begin(), | |
| m_selected_columns.end(), | |
| [&column](auto const& rhs) -> bool { return *column == *rhs; } | |
| )) | |
| if (false == m_allow_duplicate_columns | |
| && std::any_of( | |
| m_selected_columns.begin(), | |
| m_selected_columns.end(), | |
| [&column](auto const& rhs) -> bool { return *column == *rhs; } | |
| )) |
🤖 Prompt for AI Agents
In components/core/src/clp_s/search/Projection.cpp around lines 16 to 22,
replace the current check that uses std::find_if and an end-iterator comparison
with std::any_of for clarity; keep the same operands and lambda predicate
(comparing *column == *rhs) and the surrounding logic (the false ==
m_allow_duplicate_columns && ... condition) so behavior is unchanged, i.e., call
std::any_of(m_selected_columns.begin(), m_selected_columns.end(), [&column](auto
const& rhs){ return *column == *rhs; }) in place of the find_if != end
comparison.
| /** | ||
| * Constructs a Projection object with the specified mode and column handling behavior. | ||
| * | ||
| * @param mode Projection mode that determines how expressions are projected. | ||
| * @param allow_duplicate_columns Whether duplicate column descriptiors are permitted in the | ||
| * projection. | ||
| */ | ||
| explicit Projection(ProjectionMode mode, bool allow_duplicate_columns = false) | ||
| : m_projection_mode{mode}, | ||
| m_allow_duplicate_columns{allow_duplicate_columns} {} | ||
|
|
There was a problem hiding this comment.
Fix Doxygen typo and clarify default; align add_column docs with new behaviour
- Typo: “descriptiors” → “descriptors”.
- Clarify that the default is false.
- Update add_column’s @throws to reflect that duplicates only error when allow_duplicate_columns is false.
Apply:
- * @param allow_duplicate_columns Whether duplicate column descriptiors are permitted in the
- * projection.
+ * @param allow_duplicate_columns Whether duplicate column descriptors are permitted in the
+ * projection (default: false).Additionally, please update the add_column docs to match the new behaviour (outside the selected range):
- * @throws OperationFailed if `column` is identical to a previously added column
+ * @throws OperationFailed if allow_duplicate_columns is false and `column` is identical to a
+ * previously added column📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /** | |
| * Constructs a Projection object with the specified mode and column handling behavior. | |
| * | |
| * @param mode Projection mode that determines how expressions are projected. | |
| * @param allow_duplicate_columns Whether duplicate column descriptiors are permitted in the | |
| * projection. | |
| */ | |
| explicit Projection(ProjectionMode mode, bool allow_duplicate_columns = false) | |
| : m_projection_mode{mode}, | |
| m_allow_duplicate_columns{allow_duplicate_columns} {} | |
| /** | |
| * Constructs a Projection object with the specified mode and column handling behavior. | |
| * | |
| * @param mode Projection mode that determines how expressions are projected. | |
| * @param allow_duplicate_columns Whether duplicate column descriptors are permitted in the | |
| * projection (default: false). | |
| */ | |
| explicit Projection(ProjectionMode mode, bool allow_duplicate_columns = false) | |
| : m_projection_mode{mode}, | |
| m_allow_duplicate_columns{allow_duplicate_columns} {} |
🤖 Prompt for AI Agents
In components/core/src/clp_s/search/Projection.hpp around lines 33 to 43, fix
the Doxygen typo and clarify the default for allow_duplicate_columns: change
“descriptiors” to “descriptors” and state that the default is false
(allow_duplicate_columns = false). Also update the add_column method's Doxygen
(elsewhere in the file) so its @throws states that duplicate-column errors are
raised only when allow_duplicate_columns is false, and otherwise duplicates are
permitted; make the wording consistent with the constructor documentation.
| * Constructs a Projection object with the specified mode and column handling behavior. | ||
| * | ||
| * @param mode Projection mode that determines how expressions are projected. | ||
| * @param allow_duplicate_columns Whether duplicate column descriptiors are permitted in the |
There was a problem hiding this comment.
| * @param allow_duplicate_columns Whether duplicate column descriptiors are permitted in the | |
| * @param allow_duplicate_columns Whether duplicate column descriptors are permitted in the |
gibber9809
left a comment
There was a problem hiding this comment.
LGTM. For the PR title how about:
feat(clp-s): Add support for duplicate columns in projections.
Description
This change adds support for duplicate columns in Projection.
When using CLP as a library, results may not always be serialized into a JSON string. In these cases, enforcing uniqueness is unnecessary and, in some scenarios, duplicate columns are expected.
For example, in the Velox–CLP connector, accessing the same nested field through both a UDF and a row type may produce two projection fields. Even if one of them is unused, we still need to allow duplicates.
Since the default value remains false, this change does not affect existing CLP behavior.
Checklist
breaking change.
Validation performed
Summary by CodeRabbit
New Features
Bug Fixes
Notes