Skip to content

refactor(clp-s): Divide clp-s search AST code so that it can easily be turned into a library.#754

Merged
gibber9809 merged 12 commits into
y-scope:mainfrom
gibber9809:ast-lib
Mar 26, 2025
Merged

refactor(clp-s): Divide clp-s search AST code so that it can easily be turned into a library.#754
gibber9809 merged 12 commits into
y-scope:mainfrom
gibber9809:ast-lib

Conversation

@gibber9809

@gibber9809 gibber9809 commented Mar 10, 2025

Copy link
Copy Markdown
Contributor

Description

This PR modifies the clp-s codebase so that the build can be split into a "clp_s::TimestampPattern" library, a "clp_s::search::ast" library, and the core clp binary.

To clean up this split we move around some code to eliminate circular dependencies and delete some duplicated/unnecessary code in the process.

These changes include:

  • Moving tokenize_column_descriptor, unescape_kql_value, and related utilities from StringUtils:: to SearchUtils (which is part of the core ast build)
  • Deleting the wildcard_match implementation in SearchUtils -- it is a duplicate of wildcard_match_unsafe from StringUtils that doesn't handle case sensitivity (remaining uses of wildcard_match have been replaced with the StringUtils implementation)
  • Moving node_to_literal_type from SearchUtils.hpp to SchemaTree.hpp

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Verified that unit tests continue to pass
  • Manually verified that refactors allow case insensitive wildcard search inside of unstructured arrays to work correctly

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Introduced enhanced modules for search and timestamp pattern processing that improve query string tokenization and handling.
    • Added functions for converting four-byte hexadecimal sequences to UTF-8 and unescaping KQL values.
    • Implemented a new function to map node types to literal types.
  • Refactor

    • Consolidated and restructured search utilities and string processing functions to streamline query operations.
    • Updated function calls to utilize a new namespace for improved organization.
  • Chores

    • Updated build configurations and dependency management by removing deprecated components and adding new library linkages.
    • Expanded the set of source files compiled for the indexer executable.

@coderabbitai

coderabbitai Bot commented Mar 10, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

This pull request reorganizes and refactors several utility functions and header dependencies across multiple components. The primary changes include shifting the tokenization functionality from the StringUtils class to the clp_s::search namespace, adding a new function to convert node types to literal types, and updating TimestampPattern initialization. Several legacy functions related to KQL unescaping, tokenization, and wildcard matching have been removed, while new implementations have been provided in the SearchUtils module. Build configuration files have also been updated to reflect new library dependencies.

Changes

Files Group Change Summary
components/core/src/clp_s/JsonParser.cpp, components/core/src/clp_s/TimestampDictionaryReader.cpp, components/core/src/clp_s/clp-s.cpp, components/core/src/clp_s/search/kql/kql.cpp Replaced calls from StringUtils::tokenize_column_descriptor to clp_s::search::tokenize_column_descriptor and updated include directives to reference the new SearchUtils header.
components/core/src/clp_s/SchemaTree.cpp, components/core/src/clp_s/SchemaTree.hpp Added a new function node_to_literal_type (with declaration and definition) that converts a NodeType to a LiteralType, along with the inclusion of the appropriate literal header.
components/core/src/clp_s/TimestampPattern.cpp, components/core/src/clp_s/TimestampPattern.hpp Added a guard in TimestampPattern::init() to prevent reinitialization of the static member and updated documentation/comments; removed an unnecessary include (FileWriter.hpp).
components/core/src/clp_s/Utils.cpp, components/core/src/clp_s/Utils.hpp Removed several functions from the StringUtils class: tokenization, unescaping of KQL values, and hexadecimal conversion functions, effectively deprecating the legacy KQL string processing logic.
components/core/src/clp_s/search/Output.cpp, components/core/src/clp_s/search/Projection.cpp, components/core/src/clp_s/search/SchemaMatch.cpp Updated include directives by removing SearchUtils.hpp and adding SchemaTree.hpp; modified wildcard matching call in Output.cpp to use StringUtils::wildcard_match_unsafe with an added case sensitivity parameter.
components/core/src/clp_s/search/SearchUtils.cpp, components/core/src/clp_s/search/SearchUtils.hpp Introduced new functions: tokenize_column_descriptor, unescape_kql_value, unescape_kql_internal, and convert_four_byte_hex_to_utf8 in an unnamed namespace; removed the old node_to_literal_type and wildcard_match functions, thus centralizing KQL processing under the clp_s::search namespace.
components/core/src/clp_s/search/kql/CMakeLists.txt Updated linked libraries: removed dependency on Boost::filesystem and included spdlog::spdlog alongside antlr4_static to align with the new dependency structure.
components/core/src/clp_s/search/sql/CMakeLists.txt Modified the target linkage for the SQL library by removing Boost::filesystem and linking with antlr4_static and spdlog::spdlog, reflecting a realignment of dependencies.
components/core/src/clp_s/indexer/CMakeLists.txt Added two new source files: ../search/SearchUtils.cpp and ../search/SearchUtils.hpp to the INDEXER_SOURCES list.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Caller (JsonParser/TimestampDictionaryReader/clp-s)
    participant SU as clp_s::search::SearchUtils
    Note over Caller, SU: Tokenization Flow
    Caller->>SU: tokenize_column_descriptor(descriptor, tokens, ns)
    SU-->>Caller: returns tokens (after unescaping via unescape_kql_internal)
Loading
sequenceDiagram
    participant CP as Caller (TimestampPattern::init)
    participant TP as TimestampPattern
    Note over TP: Prevent Reinitialization
    CP->>TP: init()
    alt m_known_ts_patterns is not nullptr
        TP-->>CP: Return early
    else
        TP->>TP: Initialize m_known_ts_patterns
        TP-->>CP: Return after initialization
    end
Loading

Possibly related PRs


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ec1e448 and c1afc97.

📒 Files selected for processing (1)
  • components/core/src/clp_s/TimestampPattern.hpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/core/src/clp_s/TimestampPattern.hpp
⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: lint-check (macos-latest)
  • GitHub Check: lint-check (ubuntu-latest)

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@gibber9809 gibber9809 changed the title chore(clp-s): Split clp-s search AST into its own library build for easier re-use. refactor(clp-s): Split clp-s search AST into its own library build for easier re-use. Mar 10, 2025
@gibber9809 gibber9809 marked this pull request as ready for review March 10, 2025 21:27
@gibber9809 gibber9809 requested review from a team and wraymo as code owners March 10, 2025 21:27

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
components/core/src/clp_s/TimestampPattern.hpp (1)

73-76: Improved documentation with thread-safety warning.

The updated documentation more clearly describes what the init() function does and importantly adds a thread-safety warning.

Consider adding thread-safety to this initialization method in a future update if this is expected to be used in multi-threaded contexts.

components/core/src/clp_s/search/SearchUtils.cpp (2)

63-130: Consider providing clearer failure diagnostics within tokenize_column_descriptor.
While returning false for error conditions is straightforward, you might improve maintainability by logging or returning a specific error code to indicate the exact issue (i.e., empty token, incomplete escape, etc.).


165-266: Refactor large switch statement in unescape_kql_internal.
Though the logic is correct, the multiple case statements can be simplified or stored in a lookup table to improve readability and reduce repetition.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 110b5a0 and 8c103a8.

📒 Files selected for processing (20)
  • components/core/CMakeLists.txt (1 hunks)
  • components/core/src/clp_s/CMakeLists.txt (5 hunks)
  • components/core/src/clp_s/JsonParser.cpp (2 hunks)
  • components/core/src/clp_s/SchemaTree.cpp (1 hunks)
  • components/core/src/clp_s/SchemaTree.hpp (2 hunks)
  • components/core/src/clp_s/TimestampDictionaryReader.cpp (2 hunks)
  • components/core/src/clp_s/TimestampPattern.cpp (1 hunks)
  • components/core/src/clp_s/TimestampPattern.hpp (1 hunks)
  • components/core/src/clp_s/Utils.cpp (0 hunks)
  • components/core/src/clp_s/Utils.hpp (0 hunks)
  • components/core/src/clp_s/clp-s.cpp (2 hunks)
  • components/core/src/clp_s/indexer/CMakeLists.txt (1 hunks)
  • components/core/src/clp_s/search/Output.cpp (4 hunks)
  • components/core/src/clp_s/search/Projection.cpp (1 hunks)
  • components/core/src/clp_s/search/SchemaMatch.cpp (1 hunks)
  • components/core/src/clp_s/search/SearchUtils.cpp (2 hunks)
  • components/core/src/clp_s/search/SearchUtils.hpp (2 hunks)
  • components/core/src/clp_s/search/kql/CMakeLists.txt (1 hunks)
  • components/core/src/clp_s/search/kql/kql.cpp (5 hunks)
  • components/core/src/clp_s/search/sql/CMakeLists.txt (1 hunks)
💤 Files with no reviewable changes (2)
  • components/core/src/clp_s/Utils.cpp
  • components/core/src/clp_s/Utils.hpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ...

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • components/core/src/clp_s/search/Projection.cpp
  • components/core/src/clp_s/SchemaTree.cpp
  • components/core/src/clp_s/TimestampDictionaryReader.cpp
  • components/core/src/clp_s/TimestampPattern.cpp
  • components/core/src/clp_s/search/SchemaMatch.cpp
  • components/core/src/clp_s/SchemaTree.hpp
  • components/core/src/clp_s/clp-s.cpp
  • components/core/src/clp_s/TimestampPattern.hpp
  • components/core/src/clp_s/search/Output.cpp
  • components/core/src/clp_s/JsonParser.cpp
  • components/core/src/clp_s/search/kql/kql.cpp
  • components/core/src/clp_s/search/SearchUtils.cpp
  • components/core/src/clp_s/search/SearchUtils.hpp
🔇 Additional comments (41)
components/core/src/clp_s/TimestampPattern.cpp (1)

211-214: Properly guards against re-initialization

Good addition of a check to prevent unnecessary re-initialization of static data. This avoids redundant work if the function is called multiple times, which is especially important given the amount of pattern data being initialized.

components/core/src/clp_s/indexer/CMakeLists.txt (1)

88-89: Appropriate library dependencies added

The addition of the new clp_s::search::ast and clp_s::TimestampPattern libraries properly reflects the structural changes in the PR. This ensures the indexer correctly links against the newly separated components.

components/core/src/clp_s/search/Projection.cpp (1)

5-5: Updated include for relocated functionality

The include path change from "SearchUtils.hpp" to "../SchemaTree.hpp" correctly reflects the relocation of the node_to_literal_type function that's used on line 78. This change maintains the proper dependencies after refactoring.

components/core/src/clp_s/search/SchemaMatch.cpp (1)

8-8: Updated include for relocated functionality

The include path change from "SearchUtils.hpp" to "../SchemaTree.hpp" correctly reflects the relocation of the node_to_literal_type function that's used in multiple places throughout this file (lines 77, 315, 428, and 528). This change aligns with the PR objective of eliminating circular dependencies.

components/core/src/clp_s/SchemaTree.cpp (1)

8-33: Good implementation of node type to literal type conversion.

The function properly maps each NodeType to the corresponding clp_s::search::LiteralType with a comprehensive switch statement, and includes a catchall with UnknownT for unhandled cases.

I noticed there's a TODO comment about supporting NodeType::Object in the future. This is good documentation for future work.

components/core/src/clp_s/SchemaTree.hpp (2)

15-15: LGTM! Appropriate inclusion of required header.

The added include for "search/Literal.hpp" properly brings in the necessary LiteralType definition.


46-51: Well-documented function declaration.

The function declaration is clear and the documentation properly explains its purpose and return value.

components/core/src/clp_s/clp-s.cpp (2)

35-35: LGTM! Appropriate inclusion of required header.

The added include for "search/SearchUtils.hpp" properly supports the namespace change for tokenization functions.


207-209: Namespace update correctly implemented.

The change from StringUtils::tokenize_column_descriptor to clp_s::search::tokenize_column_descriptor aligns with the PR objective of relocating string processing utilities from StringUtils to the specialized SearchUtils in the clp_s::search namespace.

components/core/src/clp_s/TimestampDictionaryReader.cpp (2)

5-5: Include change matches the refactoring objective.

The change of the include from Utils.hpp to search/SearchUtils.hpp aligns with the PR's goal of reorganizing utility functions into more specific namespaces.


24-26: Function call updated to use new namespace.

The call has been properly updated from StringUtils::tokenize_column_descriptor to clp_s::search::tokenize_column_descriptor, maintaining the same functionality while using the newly organized code structure.

components/core/src/clp_s/JsonParser.cpp (2)

32-32: Added include for reorganized utility functions.

This include addition properly supports the refactoring goal of moving search-related utility functions into their own namespace.


94-96: Function call updated to use new namespace.

The call to tokenize_column_descriptor has been correctly updated to use the function from the clp_s::search namespace instead of StringUtils, consistent with the restructuring objectives.

components/core/src/clp_s/search/kql/CMakeLists.txt (1)

17-24: CMake configuration updated to reflect new library dependencies.

The target_link_libraries command has been properly updated to include the new dependencies (clp::string_utils, clp_s::search::ast, and spdlog::spdlog) while removing unnecessary dependencies. This change aligns with the PR's objective to split the codebase into more modular components.

components/core/src/clp_s/search/Output.cpp (4)

7-7: Include changed to reflect dependency restructuring.

The addition of SchemaTree.hpp and removal of SearchUtils.hpp reflects the refactoring of code to eliminate circular dependencies. This change supports the PR's objective of better code organization.


595-599: Wildcard matching function updated with case sensitivity parameter.

The call to wildcard_match has been replaced with StringUtils::wildcard_match_unsafe, which includes an explicit case sensitivity parameter. This preserves the original functionality while using the implementation from the correct namespace.


735-739: Wildcard matching function updated with case sensitivity parameter.

Similar to the previous instance, this call has been updated to use StringUtils::wildcard_match_unsafe with the appropriate case sensitivity parameter, ensuring consistent behavior with the original implementation.


809-814: Wildcard matching function updated with case sensitivity parameter.

This is the third instance where wildcard_match has been replaced with StringUtils::wildcard_match_unsafe, properly maintaining case sensitivity behavior. The change is consistent with the other instances and the PR's objectives.

components/core/CMakeLists.txt (1)

653-654: Appropriate library additions for the refactored architecture

The addition of clp_s::search::ast and clp_s::TimestampPattern libraries to unitTest target links correctly implements the architectural restructuring described in the PR objectives. This helps eliminate circular dependencies while enabling the code to be more modular and easier to reuse.

components/core/src/clp_s/search/sql/CMakeLists.txt (1)

17-23: Updated dependencies align with the new architecture

The changes to the sql library dependencies properly reflect the restructured code base, now linking to clp_s::search::ast rather than directly to the individual expression classes. This modification correctly implements the modular approach described in the PR objectives.

components/core/src/clp_s/search/kql/kql.cpp (6)

8-8: Added appropriate include for string utilities

Including the string_utils header correctly prepares for the usage of clp::string_utils::clean_up_wildcard_search_string later in the file.


21-21: SearchUtils inclusion supports refactored function calls

This include correctly brings in the declarations for clp_s::search::tokenize_column_descriptor and clp_s::search::unescape_kql_value that were moved from StringUtils.


77-77: Updated function call using refactored namespace

Correctly replaced StringUtils::unescape_kql_value with clp_s::search::unescape_kql_value, aligning with the PR's objective to move these utilities to a more appropriate namespace.


89-89: Updated wildcard search string cleaning function call

The change to use clp::string_utils::clean_up_wildcard_search_string instead of StringUtils version is consistent with the refactoring goals and maintains proper namespace organization.


95-95: Consistent namespace update for unescaping function

This change to use clp_s::search::unescape_kql_value maintains consistency with the earlier change at line 77 and follows the PR objective to move these utility functions.


114-119: Updated tokenize function call using appropriate namespace

The replacement of StringUtils::tokenize_column_descriptor with clp_s::search::tokenize_column_descriptor properly implements the function relocations mentioned in the PR objectives.

components/core/src/clp_s/search/SearchUtils.hpp (3)

4-6: Added standard library includes

The addition of standard includes for string and vector is appropriate as these types are now used directly in the function signatures rather than being included through SchemaTree.hpp.


35-46: Well-documented tokenize_column_descriptor function declaration

This function declaration properly includes the [[nodiscard]] attribute and thorough documentation explaining the purpose and behavior of the function that was relocated from StringUtils.


48-60: Well-documented unescape_kql_value function declaration

The newly added function declaration includes clear documentation about the KQL value unescaping process and properly uses the [[nodiscard]] attribute to ensure return values are checked.

components/core/src/clp_s/search/SearchUtils.cpp (4)

4-5: Include directives are appropriate.
These new headers are suitable for string and vector operations in this file.


7-7: Confirm simdjson linkage.
The addition of <simdjson.h> is consistent with usage in this file. Ensure that the CMake configuration correctly references this external dependency.


132-134: Simple wrapper function looks good.
unescape_kql_value is straightforward and provides a helpful abstraction for unescaping KQL values.


137-163: Verify surrogate pair handling in convert_four_byte_hex_to_utf8.
This approach successfully converts four-byte hex sequences within the Basic Multilingual Plane. However, it may not handle characters beyond 0xFFFF (requiring surrogate pairs). If such codepoints must be supported, consider expanding the logic to handle combined surrogate pairs.

components/core/src/clp_s/CMakeLists.txt (8)

86-92: New timestamp pattern sources grouping is clear.
Defining CLP_S_TIMESTAMP_PATTERN_SOURCES clarifies which files belong to the timestamp pattern library and avoids mixing them with other sources.


94-95: Separate definition of CLP_S_SOURCES.
Pulling out timestamp pattern files from CLP_S_SOURCES helps reduce clutter and enables more granular builds.


167-168: Introducing CLP_S_CORE_AST_SOURCES.
This set appears to isolate core AST functionalities. Confirm that other AST-related files not explicitly listed here do not become orphaned.


199-205: Consolidated search utility files.
Including SearchUtils.cpp and SearchUtils.hpp under the core AST sources is consistent with the refactoring. Be sure to verify that no circular dependencies remain.


207-217: Creating CLP_S_SEARCH_SOURCES fosters separation of concerns.
Grouping these search logic files distinctly makes the codebase structure more intuitive and easier to maintain.


252-265: New library clp_s_TimestampPattern introduced.
Splitting out TimestampPattern into its own target helps ensure modularity. This change is aligned with the goal of avoiding circular dependencies.


266-279: clp_s_search_ast library definition.
Creating a dedicated library for AST logic clarifies dependencies and reduces coupling. Ensure any future AST-related additions are also placed here.


296-297: Linking the new libraries to clp-s.
Adding clp_s::search::ast and clp_s::TimestampPattern improves clarity. This sets the stage for simpler reuse in other targets or projects.

@Bill-hbrhbr

Copy link
Copy Markdown
Contributor

At a first glance this PR looks a bit big, could at least separate library encapsulation and implementation detail changes into 2 separate PRs.

Also I'm wondering if you think removing StringUtils in clp-s and add the missing components to clp::string_utils is a good idea.

This reverts commit edab948.
…p-s ast code into their own libraries"

This reverts commit 5c61eca.
@gibber9809

Copy link
Copy Markdown
Contributor Author

At a first glance this PR looks a bit big, could at least separate library encapsulation and implementation detail changes into 2 separate PRs.

Also I'm wondering if you think removing StringUtils in clp-s and add the missing components to clp::string_utils is a good idea.

OK -- will revert CMakeLists changes from this PR and leave just the code changes in this PR.

And yeah, removing clp_s::StringUtils and switching over to clp::string_utils for most stuff is the plan eventually.

@gibber9809 gibber9809 changed the title refactor(clp-s): Split clp-s search AST into its own library build for easier re-use. refactor(clp-s): Divide clp-s search AST code so that it can easily be turned into a library. Mar 15, 2025

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
components/core/src/clp_s/indexer/CMakeLists.txt (1)

58-59: Verify New Source File Entries for SearchUtils Library

The additions of ../search/SearchUtils.cpp and ../search/SearchUtils.hpp correctly incorporate the new SearchUtils module aimed at improving code re-use. However, note that previous discussions indicated that CMakeLists changes were to be reverted in this PR, with only code changes retained. Kindly confirm whether these new entries should remain or be removed for consistency with the plan.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 50ed98a and 537ad0e.

📒 Files selected for processing (1)
  • components/core/src/clp_s/indexer/CMakeLists.txt (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: lint-check (ubuntu-latest)

@wraymo wraymo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactoring! The code is more modular now.

Comment thread components/core/src/clp_s/TimestampPattern.hpp Outdated
return ret;
} else {
return StringLiteral::create(StringUtils::clean_up_wildcard_search_string(token));
return StringLiteral::create(clp::string_utils::clean_up_wildcard_search_string(token));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to remove duplicated code in StringUtils and use clp::string_utils instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general yes, but I think that that should probably happen as its own PR. The only reason I make this change here is because I want kql to be independent of clp_s/Utils.

* @param s the string to match
* @param p the pattern to match against
* @return true if s matches p, false otherwise
* Converts a KQL string column descriptor delimited by '.' into a list of tokens. The

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's better to put those functions in a class (maybe SearchUtils) like StringUtils?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do it that way if you prefer, I don't have a strong preference.

Could also put it into its own namespace clp_s::search::search_utils or put all of the AST stuff into clp_s::search::ast or something.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with either approach. I personally prefer the namespace option, but it looks like the CLP codebase derives namespaces from directory names. I'm not sure if adding a custom namespace aligns with the existing convention.

Co-authored-by: wraymo <37269683+wraymo@users.noreply.github.com>
@gibber9809 gibber9809 requested a review from wraymo March 24, 2025 14:09
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
…e turned into a library. (y-scope#754)

Co-authored-by: wraymo <37269683+wraymo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants