feat(clp-s::timestamp_parser): Add support for parsing date-time timestamps with timezones; Add TimestampPattern class to pre-validate timestamp patterns.#1599
Conversation
…mestamps with timezones.
WalkthroughAdds TimestampPattern with a factory that extracts optional timezone metadata and twelve-hour flags, introduces bracket/timezone parsing helpers, adds ErrorCodeEnum::InvalidTimezoneOffset and adjusts error message literals, refactors parse_timestamp to accept TimestampPattern and apply timezone offsets, and updates tests to use the pattern factory. Changes
Sequence Diagram(s)sequenceDiagram
participant Test
participant Factory as TimestampPattern::create()
participant BracketExt as extract_bracket_pattern()
participant TZParser as extract_timezone_offset_in_minutes()
participant Parser as parse_timestamp()
Test->>Factory: create(pattern_string)
activate Factory
Factory->>BracketExt: extract {bracket} (if present)
BracketExt-->>Factory: bracket_content or Error
alt bracket extracted
Factory->>TZParser: parse timezone spec from bracket_content
TZParser-->>Factory: (consumed_prefix, offset_minutes) or Error
alt tz parsed
Factory-->>Test: Ok(TimestampPattern with optional timezone)
else tz parse error
Factory-->>Test: Err(InvalidTimezoneOffset)
end
else no bracket
Factory-->>Test: Ok(TimestampPattern without timezone)
end
deactivate Factory
alt pattern creation succeeded
Test->>Parser: parse_timestamp(timestamp, TimestampPattern)
activate Parser
Parser->>Parser: classify & extract components using pattern metadata
Parser->>Parser: compute epochtime
alt timezone present
Parser->>Parser: apply offset (subtract offset_minutes)
end
Parser-->>Test: Ok((epochtime, generated_pattern))
deactivate Parser
else pattern creation failed
Test-->>Test: handle validation error
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (5)
components/core/src/clp_s/timestamp_parser/ErrorCode.cpp(1 hunks)components/core/src/clp_s/timestamp_parser/ErrorCode.hpp(1 hunks)components/core/src/clp_s/timestamp_parser/TimestampParser.cpp(10 hunks)components/core/src/clp_s/timestamp_parser/TimestampParser.hpp(3 hunks)components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp(5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp_s/timestamp_parser/ErrorCode.hppcomponents/core/src/clp_s/timestamp_parser/ErrorCode.cppcomponents/core/src/clp_s/timestamp_parser/TimestampParser.cppcomponents/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cppcomponents/core/src/clp_s/timestamp_parser/TimestampParser.hpp
🧠 Learnings (4)
📚 Learning: 2024-11-26T19:12:09.102Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 486
File: components/core/src/clp/error_handling/ErrorCode.hpp:131-134
Timestamp: 2024-11-26T19:12:09.102Z
Learning: In the error handling system (`components/core/src/clp/error_handling/ErrorCode.hpp`), the `ErrorCode` class is intentionally designed to wrap the `ErrorCodeEnum`, not to use the enum directly. This wrapper provides a general interface for adding customized error enums.
Applied to files:
components/core/src/clp_s/timestamp_parser/ErrorCode.hppcomponents/core/src/clp_s/timestamp_parser/ErrorCode.cpp
📚 Learning: 2024-11-26T19:10:05.051Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 486
File: components/core/src/clp/error_handling/ErrorCode.hpp:34-34
Timestamp: 2024-11-26T19:10:05.051Z
Learning: In the `ErrorCategory` template class in `ErrorCode.hpp`, the `name()` and `message(ErrorCodeEnum)` methods should not be declared as pure virtual functions because the class relies on template specializations, and making them pure virtual would make the class abstract, which is not intended.
Applied to files:
components/core/src/clp_s/timestamp_parser/ErrorCode.cpp
📚 Learning: 2024-11-26T19:16:19.933Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 486
File: components/core/tests/test-error_handling.cpp:44-91
Timestamp: 2024-11-26T19:16:19.933Z
Learning: In `components/core/tests/test-error_handling.cpp` (C++), when specializing template methods of `ErrorCategory` via type aliases (e.g., `AlwaysSuccessErrorCategory`), it's acceptable to define these specializations in the current namespace, even though the primary template is in the `clp::error_handling` namespace, due to the use of type aliases and template instantiation.
Applied to files:
components/core/src/clp_s/timestamp_parser/ErrorCode.cpp
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.
Applied to files:
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
🧬 Code graph analysis (2)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (5)
nodiscard(43-43)nodiscard(45-48)nodiscard(50-52)nodiscard(54-54)pattern(30-31)
components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp (2)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (4)
create(360-481)create(360-361)parse_timestamp(484-983)parse_timestamp(484-488)components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (2)
pattern(30-31)parse_timestamp(148-152)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: package-image
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-14, true)
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
| * failure: | ||
| * - ErrorCodeEnum::InvalidTimestampPattern if `pattern` is not a valid timestamp pattern. | ||
| * - ErrorCodeEnum::InvalidTimezone if `pattern` contains a \z{} format specifier with an | ||
| * invalid timezone. | ||
| */ |
There was a problem hiding this comment.
Fix incorrect error-code name in docstring.
The comment references ErrorCodeEnum::InvalidTimezone, but the enum only exposes InvalidTimezoneOffset. Please bring the documentation in line with the actual API to avoid confusing readers.
- * - ErrorCodeEnum::InvalidTimezone if `pattern` contains a \z{} format specifier with an
- * invalid timezone.
+ * - ErrorCodeEnum::InvalidTimezoneOffset if `pattern` contains a \z{} format specifier with an
+ * invalid timezone offset.🤖 Prompt for AI Agents
In components/core/src/clp_s/timestamp_parser/TimestampParser.hpp around lines
25 to 29, the docstring references ErrorCodeEnum::InvalidTimezone which does not
exist; update the documentation to reference the correct enum name
ErrorCodeEnum::InvalidTimezoneOffset (replace InvalidTimezone with
InvalidTimezoneOffset) so the documented failure cases match the actual API.
LinZhihao-723
left a comment
There was a problem hiding this comment.
The implementation lgtm. Had some nit comments.
Will go through the test cases soon.
| // Variables | ||
| std::string m_pattern; | ||
| std::optional<std::pair<size_t, int>> m_optional_timezone_size_and_offset{std::nullopt}; | ||
| bool m_date_type_representation{false}; |
There was a problem hiding this comment.
How about m_uses_date_type_representation?
|
|
||
| auto TimestampPattern::create(std::string pattern) | ||
| -> ystdlib::error_handling::Result<TimestampPattern> { | ||
| std::unordered_set<char> format_specifiers; |
There was a problem hiding this comment.
Nit: How about using a vector<bool> of size 256?
| = {1, 10, 100, 1000, 10'000, 100'000, 1'000'000, 10'000'000, 100'000'000, 1'000'000'000}; | ||
|
|
||
| constexpr std::array cPlusMinus | ||
| = {std::string_view{"+"}, std::string_view{"-"}, std::string_view{"\u2212"}}; |
There was a problem hiding this comment.
Just curious why \u2212 is legal here? lol
There was a problem hiding this comment.
\u2212 is − which is the unicode minus sign.
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (1)
21-29: Fix error-code name in TimestampPattern::create documentation.The
createdocstring still referencesErrorCodeEnum::InvalidTimezone, but the implementation usesErrorCodeEnum::InvalidTimezoneOffsetwhen the\z{...}content is not a valid offset. This mismatch can confuse users trying to handle specific error cases.Recommend updating the comment to match the actual enum name:
- * - ErrorCodeEnum::InvalidTimezone if `pattern` contains a \z{} format specifier with an - * invalid timezone. + * - ErrorCodeEnum::InvalidTimezoneOffset if `pattern` contains a \z{} format specifier with an + * invalid timezone offset.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp(10 hunks)components/core/src/clp_s/timestamp_parser/TimestampParser.hpp(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp_s/timestamp_parser/TimestampParser.hppcomponents/core/src/clp_s/timestamp_parser/TimestampParser.cpp
🧠 Learnings (3)
📚 Learning: 2025-06-19T16:15:50.601Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 1021
File: components/core/src/clp_s/ColumnWriter.cpp:14-24
Timestamp: 2025-06-19T16:15:50.601Z
Learning: In C++ classes, member variables can be initialized using brace initialization in the class definition (e.g., `int64_t m_cur{};`), which initializes the variable to its default value (0 for integral types). This is equivalent to explicit initialization and eliminates the need for constructor initialization.
Applied to files:
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
📚 Learning: 2024-10-14T03:45:21.323Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:331-335
Timestamp: 2024-10-14T03:45:21.323Z
Learning: When reviewing code, ensure that suggestions about unnecessary use of `std::move` with `std::shared_ptr` are only made if `std::move` is actually used in the code.
Applied to files:
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.
Applied to files:
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
🧬 Code graph analysis (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (5)
nodiscard(41-41)nodiscard(43-46)nodiscard(48-50)nodiscard(52-52)pattern(28-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: package-image
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (6)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (4)
154-171: Bracket-pattern extraction looks correct and defensive.
extract_bracket_patterncleanly enforces{...}structure, rejects embedded\, and handles empty or unterminated brackets, matching the documented failure modes. The early-return style and use of explicitErrorCodeEnum::InvalidTimestampPatternmake the behaviour easy to reason about. No changes needed.Also applies to: 286-305
363-487: TimestampPattern::create invariants are reasonable; duplicate-specifier rule is strict.The factory cleanly:
- Tracks which format specifiers have been seen via a 256‑entry
vector<bool>.- Differentiates date vs number representations and forbids mixing them.
- Enforces that 12‑hour clocks always pair with
\p(part-of-day) and vice versa.- Pre-extracts timezone metadata (
size_t+ minutes offset) using the bracket and timezone helpers, and ensures the{...}content is fully consumed.One behavioural nuance: any repeated format specifier, including
\\and potentially innocuous ones like\3, will causeInvalidTimestampPatterndue to theformat_specifierscheck. If existing patterns ever relied on multiple instances of the same specifier (e.g., multiple literal backslashes via\\or repeated fractional specifiers), they'll now be rejected. Please confirm this strict “at most once per specifier” rule is intentional for all specifiers.
490-507: Numeric vs date-type patterns: behaviour preserved and clearly gated.Switching
parse_timestampto takeTimestampPatternand branching onuses_date_type_representation()keeps the old behaviour for pure epoch patterns: the function uses the pre-parsed epoch value and simply adds/subtracts the fractional nanoseconds, ignoring timezone metadata (which is disallowed for numeric-only patterns by the factory).The new
optional_timezone_offset_in_minutesis scoped per parse and only used in the date-type branch, with a safe default of zero. This separation is clean and matches the documented semantics.Also applies to: 512-513, 947-955
957-960: 12-hour conversion and timezone offset application look correct.
- The 12‑hour clock conversion
(parsed_hour % 12) + optional_part_of_day_idx.value_or(0) * 12gives the expected mapping (12AM→0, 1–11AM unchanged, 12PM→12, 1–11PM→13–23) and is guarded by the factory‑enforced invariant that\I/\lappears with\p.- The final
time_pointsubtractsoptional_timezone_offset_in_minutes(in minutes), so local wall time at offset +HH:MM is correctly converted to UTC by moving backwards, and offset −HH:MM moves forwards.This matches ISO‑8601 offset semantics and integrates neatly with the existing
date::sys_daysflow.Also applies to: 967-972
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (2)
15-29: TimestampPattern API and layout look good.The new
TimestampPatternclass cleanly encapsulates:
- Ownership of the pattern string.
- Optional timezone metadata (
size_t+intminutes offset).- Flags for date-type vs 12‑hour usage.
The private constructor plus static
createfactory make the invariants explicit, and the accessors (get_pattern,get_optional_timezone_size_and_offset,uses_date_type_representation,uses_twelve_hour_clock) provide just enough surface forparse_timestampto stay simple. Member defaults and move/copy behaviour are also reasonable.Also applies to: 68-72
146-150: parse_timestamp declaration correctly reflects the new pattern type.The header now exposes
parse_timestampwithTimestampPattern const& pattern, matching the implementation and keeping the existing return type. The surrounding documentation describingpatternas a timestamp pattern made up of literals, format specifiers, and CAT sequences remains accurate under the new abstraction.
TimestampPattern class to pre-validate timestamp patterns.
LinZhihao-723
left a comment
There was a problem hiding this comment.
Directly modified the PR title.
…stamps with timezones; Add `TimestampPattern` class to pre-validate timestamp patterns. (y-scope#1599) Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Description
This PR follows up on #1564 and #1385 by adding support for parsing date-time timestamps with timezones. To simplify the code in the
parse_timestampfunction and avoid some kinds of duplicated parsing work each time we parse a timestamp with a given pattern, this PR also introduces aTimestampPatternclass which pre-validates timestamp patterns, so thatparse_timestampcan skip many kinds of validation.Note: the
\z{}format specifier is only designed to capture ISO 8601 timezone offsets. The intention is that ISO 8601 timezone information that isn't this offset should be captured as static text (e.g.,UTC\z{...}). The\ZCAT sequence will be responsible for trying to find such static text once it is implemented.Checklist
breaking change.
Validation performed
Summary by CodeRabbit
New Features
Bug Fixes
Tests