feat(clp-s::timestamp_parser): Add support for parsing CAT sequences; Add utility functions to provide a default set of timestamp patterns.#1626
Conversation
WalkthroughAdds fmt as a private build dependency; removes the FormatSpecifierNotImplemented enum and its message; extends timestamp parsing with Z/O/?/P specifiers, fractional and timezone handling, precision estimation, CatSequenceReplacement dynamic patterns; adds APIs to retrieve/search default timestamp patterns; updates tests accordingly. Changes
Sequence Diagram(s)sequenceDiagram
participant Test
participant Defaults as PatternDefaults
participant Parser as TimestampParser
participant Search as PatternSearch
Test->>Defaults: get_all_default_timestamp_patterns()
Defaults-->>Test: patterns[]
Test->>Parser: parse_timestamp(timestamp, pattern)
Note over Parser: handle specifiers Z/O/?/P<br/>accumulate CatSequenceReplacements<br/>estimate precision
Parser-->>Test: ParseResult { epoch, returned_pattern }
Test->>Search: search_known_timestamp_patterns(timestamp, patterns[], generated_pattern)
Note over Search: iterate patterns, attempt parse, apply replacements to produce generated_pattern
Search-->>Test: Result { epoch, matched_pattern }
Test->>Test: compare epochs & patterns (assert parity)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (1)
139-146: Update parse_timestamp error-code documentation to match implementation.The comment still lists
ErrorCodeEnum::FormatSpecifierNotImplemented, but the implementation no longer returns this code and the enum entry has been removed elsewhere in the PR. Please drop it from the documented error list so callers are not misled about possible failures.components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (1)
582-605: CAT replacement and generated_pattern handling look correct; drop stale [[maybe_unused]].The
CatSequenceReplacementtracking plus the reconstruction loop forgenerated_patterncorrectly handle ordered, non-overlapping substitutions for\Z,\?,\P, and\O, and the returnedstd::string_viewcleanly switches between the original pattern and the generated one based on whether replacements occurred.Now that
generated_patternis actively used whenever CAT sequences are present, the[[maybe_unused]]attribute on the parameter (Line 586) is outdated and should be removed to avoid confusion.Also applies to: 1021-1135, 1153-1167
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (7)
components/core/cmake/Options/options.cmake(1 hunks)components/core/src/clp_s/timestamp_parser/CMakeLists.txt(1 hunks)components/core/src/clp_s/timestamp_parser/ErrorCode.cpp(0 hunks)components/core/src/clp_s/timestamp_parser/ErrorCode.hpp(0 hunks)components/core/src/clp_s/timestamp_parser/TimestampParser.cpp(12 hunks)components/core/src/clp_s/timestamp_parser/TimestampParser.hpp(3 hunks)components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp(3 hunks)
💤 Files with no reviewable changes (2)
- components/core/src/clp_s/timestamp_parser/ErrorCode.hpp
- components/core/src/clp_s/timestamp_parser/ErrorCode.cpp
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cppcomponents/core/src/clp_s/timestamp_parser/TimestampParser.hppcomponents/core/src/clp_s/timestamp_parser/TimestampParser.cpp
🧠 Learnings (9)
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Applied to files:
components/core/cmake/Options/options.cmakecomponents/core/src/clp_s/timestamp_parser/CMakeLists.txt
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Applied to files:
components/core/cmake/Options/options.cmakecomponents/core/src/clp_s/timestamp_parser/CMakeLists.txt
📚 Learning: 2025-06-27T01:49:15.724Z
Learnt from: sitaowang1998
Repo: y-scope/clp PR: 1044
File: .github/workflows/clp-core-build-macos.yaml:0-0
Timestamp: 2025-06-27T01:49:15.724Z
Learning: In the clp project, manually setting LDFLAGS for library paths can cause runtime errors because it interferes with CMake's built-in library discovery and linking mechanisms. CMake can identify and correctly link to libraries on its own without requiring manual LDFLAGS configuration.
Applied to files:
components/core/cmake/Options/options.cmake
📚 Learning: 2025-08-03T18:56:07.366Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1124
File: components/core/cmake/Modules/FindBZip2.cmake:49-65
Timestamp: 2025-08-03T18:56:07.366Z
Learning: In CLP's custom CMake Find modules, `FindStaticLibraryDependencies` populates the `*_DYNAMIC_LIBS` variable even when building with static libraries. When the main library is static, all dependencies must also use static libraries - there's no fallback to shared libraries for dependencies.
Applied to files:
components/core/cmake/Options/options.cmakecomponents/core/src/clp_s/timestamp_parser/CMakeLists.txt
📚 Learning: 2025-06-02T18:22:24.060Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Applied to files:
components/core/cmake/Options/options.cmakecomponents/core/src/clp_s/timestamp_parser/CMakeLists.txt
📚 Learning: 2025-07-24T21:56:05.806Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1124
File: components/core/cmake/Modules/FindBZip2.cmake:60-60
Timestamp: 2025-07-24T21:56:05.806Z
Learning: In CLP's custom CMake Find modules (like FindBZip2.cmake), when building with dynamic libraries, the code intentionally uses `bzip2_PKGCONF_STATIC_LIBRARIES` as input to `FindDynamicLibraryDependencies`. This is a deliberate fallback mechanism to ensure all library dependencies are resolved even in dynamic builds, using the static library list from pkg-config as a source of dependency information.
Applied to files:
components/core/cmake/Options/options.cmake
📚 Learning: 2024-11-26T19:12:43.982Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 486
File: components/core/tests/test-error_handling.cpp:37-38
Timestamp: 2024-11-26T19:12:43.982Z
Learning: In the CLP project, C++20 is used, allowing the utilization of C++20 features such as class template argument deduction (CTAD) with `std::array`.
Applied to files:
components/core/src/clp_s/timestamp_parser/CMakeLists.txt
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.
Applied to files:
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
📚 Learning: 2025-05-02T22:27:59.347Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.
Applied to files:
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
🧬 Code graph analysis (3)
components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp (2)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (9)
get_default_date_time_timestamp_patterns(1229-1237)get_default_date_time_timestamp_patterns(1229-1230)get_default_numeric_timestamp_patterns(1239-1247)get_default_numeric_timestamp_patterns(1239-1240)get_all_default_timestamp_patterns(1249-1261)get_all_default_timestamp_patterns(1249-1250)get_all_default_quoted_timestamp_patterns(1263-1281)get_all_default_quoted_timestamp_patterns(1263-1264)search_known_timestamp_patterns(1215-1219)components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (5)
get_default_date_time_timestamp_patterns(165-166)get_default_numeric_timestamp_patterns(173-174)get_all_default_timestamp_patterns(181-182)get_all_default_quoted_timestamp_patterns(189-190)search_known_timestamp_patterns(154-158)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (1)
nodiscard(1215-1227)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (6)
nodiscard(42-42)nodiscard(44-47)nodiscard(49-51)nodiscard(53-53)pattern(29-30)parse_timestamp(148-152)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: package-image
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (4)
components/core/cmake/Options/options.cmake (1)
379-379: LGTM! Dependency declaration aligns with library usage.The addition of
CLP_NEED_FMTproperly declares the fmt library dependency for the timestamp parser target, synchronizing with the correspondingfmt::fmtlinkage added in the CMakeLists.txt.components/core/src/clp_s/timestamp_parser/CMakeLists.txt (1)
23-23: LGTM! Library linkage properly configured.The
fmt::fmtlibrary is correctly linked as a PRIVATE dependency, which is appropriate since formatting capabilities are internal implementation details of the timestamp parser.components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (1)
154-190: New search/default-pattern APIs look coherent and useful.The
search_known_timestamp_patternsandget_*_timestamp_patternsdeclarations give a clear, composable surface for default pattern discovery and search, and they align with theResult<std::vector<TimestampPattern>>patterns used elsewhere. Signatures and[[nodiscard]]usage look appropriate.components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp (1)
321-334: Nice sanity checks for default pattern provider APIs.The new section that exercises
get_default_date_time_timestamp_patterns,get_default_numeric_timestamp_patterns,get_all_default_timestamp_patterns, andget_all_default_quoted_timestamp_patternsensures construction of default sets cannot silently start failing. This is a good lightweight guard for future pattern-table edits.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
🧠 Learnings (2)
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.
Applied to files:
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
📚 Learning: 2025-05-02T22:27:59.347Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.
Applied to files:
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
🧬 Code graph analysis (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (1)
nodiscard(1215-1227)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: package-image
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (3)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (3)
9-9: Include of<vector>is appropriate and scoped correctlyThe added
<vector>include matches the new use ofstd::vector<TimestampPattern>in the declarations below and keeps dependencies explicit and minimal. No changes needed.
121-127: CAT sequence documentation for\O{...}is clear and consistentThe added bullet for
\O{...}fits well with the existing CAT sequence descriptions and clearly states that it represents “one of several literal characters” drawn from the braces. This should be sufficient for users of the pattern language, assuming the behaviour is mirrored in any external docs.
172-202: Default pattern getter APIs are coherent and well-documentedThe four getters (
get_default_date_time_timestamp_patterns,get_default_numeric_timestamp_patterns,get_all_default_timestamp_patterns,get_all_default_quoted_timestamp_patterns) form a clear and orthogonal API surface, and their return types (Result<std::vector<TimestampPattern>>) are consistent with the rest of the parser interface. The doc comments correctly state that they forwardTimestampPattern::createfailures, which will help debugging invalid built-in patterns if that ever arises.No functional or design issues from the header’s perspective.
LinZhihao-723
left a comment
There was a problem hiding this comment.
overall lgtm. Will do another quick pass once these comments are resolved.
| * - An `std::string_view` of the timestamp pattern that corresponds to the timestamp. | ||
| * - The lifetime of the `std::string_view` is the least of `patterns` and `generated_pattern`. | ||
| * - The possible error codes: | ||
| * - ErrorCodeEnum::IncompatibleTimestampPattern if no pattern can be used to parse the timestamp. |
There was a problem hiding this comment.
Conceptually it might make more sense to return an optional value, using std::nullopt to indicate nothing's found in the list.
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
LinZhihao-723
left a comment
There was a problem hiding this comment.
Directly modified the PR title.
… Add utility functions to provide a default set of timestamp patterns. (y-scope#1626) Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Description
This PR follows up on #1599, #1564, and #1385 by implementing support for parsing CAT sequences.
In particular this includes:
\Z- parse an arbitrary ISO8601 timezone and convert it into static text and a\z{...}format specifier.\P- parse an unknown precision epoch timestamp and convert it into\E,\L,\C, or\Nbased on a heuristic.\?- parse the sub-second fractional part of a timestamp and convert it into\3,\6,\9, or\Tdepending on the number of digits.\O{...}- match one of the characters between{}and convert it into the matched literal character.We use these CAT sequences, in combination with the previously implemented format specifiers, to create several utility functions that provide a default set of timestamp patterns. This default set of patterns is a superset of the default set of patterns specified in the old implementation.
Checklist
breaking change.
Validation performed
Summary by CodeRabbit
New Features
Bug Fixes
Chores
Tests
✏️ Tip: You can customize this high-level summary in your review settings.