feat(clp-s::timestamp_parser): Add support for parsing date-time and epoch timestamps without timezones. by gibber9809 · Pull Request #1564 · y-scope/clp

gibber9809 · 2025-11-06T17:23:27Z

Description

This PR follows up on #1385 by implementing the remainder of the initial set of format specifier excepting timezones, and CAT sequences.

Since the set of supported format specifiers is nearly complete, we also add tests for parsing all of the unique timestamps from tests/test-TimestampPattern.cpp, with some additional testcases focused on numeric timestamps.

Checklist

The PR satisfies the contribution guidelines.
This is a breaking change and that has been indicated in the PR title, OR this isn't a
breaking change.
Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Added tests to ensure every newly implemented format specifier accepts the expected kind of content.
Added tests to ensure that timestamps are parsed accurately for known cases.

Summary by CodeRabbit

New Features
- Enhanced timestamp parsing: 12/24‑hour support with AM/PM, minutes, seconds, and variable‑length subseconds (up to nanoseconds); accepts direct epoch inputs.
Bug Fixes
- Stricter validation and error handling: range checks, consistent 12‑hour/AM‑PM rules, rejection of mixed or trailing formats, and correct epoch/subsecond composition.
Tests
- Expanded test coverage for padding variants, AM/PM and 12‑hour scenarios, and wide‑range epoch representations.

…uences.

… expected content.

…crosecond parsing.

…on numeric patterns.

coderabbitai · 2025-11-06T17:23:37Z

Walkthrough

Adds bounded and variable-length numeric-prefix parsers, new time-component constants and parse state, AM/PM and 12/24‑hour parsing, subsecond and epoch token handling with scaling and validation, plus expanded tests covering many parsing patterns and expected epoch outputs.

Changes

Cohort / File(s)	Summary
Timestamp parser implementation `components/core/src/clp_s/timestamp_parser/TimestampParser.cpp`	Adds numeric-prefix converters (`convert_positive_bounded_variable_length_string_prefix_to_number`, `convert_variable_length_string_prefix_to_number`), constants for hour/minute/second/subsecond bounds and power-of-ten arrays, parts-of-day primitives, extended parse state (`parsed_hour`, `parsed_minute`, `parsed_second`, `parsed_subsecond_nanoseconds`, `optional_part_of_day_idx`, `parsed_epoch_nanoseconds`, `uses_12_hour_clock`), and extensive parsing logic for tokens: AM/PM (`p`), 24‑hour (`H`,`k`), 12‑hour (`I`,`l`), minutes (`M`), seconds (`S`), subseconds (`3`,`6`,`9`,`T`) and epoch tokens (`E`,`L`,`C`,`N`). Implements scaling to nanoseconds, 12‑hour offset application, epoch assembly, range checks, mutual‑exclusion between date/time and epoch modes, and trailing-content validation.
Tests — parser coverage & helpers `components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp`	Adds `ExpectedParsingResult` test struct, helpers `generate_number_triangles()` and `generate_padded_number_subset()`, switches to raw string literal patterns, introduces a large `expected_parsing_results` dataset, and extends assertions to verify both parsed epoch nanoseconds and returned pattern across many cases (padding, AM/PM, 12‑hour, subseconds, multiple epoch widths).

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Suite
    participant Parser as TimestampParser
    participant Utils as NumericUtils
    participant State as ParseState
    participant Epoch as EpochAssembler

    Test->>Parser: parse(input, pattern)
    Parser->>State: initialize parse state (parsed_*, flags)
    alt pattern contains date/time tokens
        Parser->>Utils: parse numeric prefix (bounded/unbounded)
        Utils-->>Parser: value + consumed_length or error
        Parser->>State: assign parsed_hour/minute/second/subsecond
        alt AM/PM token present
            Parser->>State: set optional_part_of_day_idx and uses_12_hour_clock
        end
    end
    alt pattern contains epoch token
        Parser->>Utils: parse epoch numeric token
        Utils-->>Parser: epoch_value
        Parser->>Epoch: scale to nanoseconds -> parsed_epoch_nanoseconds
    end
    Parser->>State: validate ranges and mode consistency
    Parser->>Epoch: combine date/time + subsecond -> final_epoch_ns
    Epoch-->>Parser: final_epoch_ns or error
    Parser-->>Test: return (pattern, final_epoch_ns) or error

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas to focus on:
- Numeric-conversion helpers: bounds, consumed-digit reporting, error paths.
- Token parsing branches for H/k/I/l/p/M/S/3/6/9/T/E/L/C/N and their scaling to nanoseconds.
- 12‑hour vs AM/PM interaction, offset application, and consistency checks.
- Epoch assembly logic (combining date/time with subseconds, negative epoch handling).
- Tests: correctness of expected epoch values, pattern round-trip assertions, and wide dataset coverage.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding support for parsing date-time and epoch timestamps without timezones, which aligns with the extensive enhancements shown in the raw summary.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1a0c362 and b2d91c8.

📒 Files selected for processing (2)

components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (8 hunks)
components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp (6 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

Prefer false == <expression> rather than !<expression>.

Files:

components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp

🧠 Learnings (4)

📚 Learning: 2025-01-30T19:26:33.869Z

Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.

Applied to files:

components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp

📚 Learning: 2024-11-26T19:12:43.982Z

Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 486
File: components/core/tests/test-error_handling.cpp:37-38
Timestamp: 2024-11-26T19:12:43.982Z
Learning: In the CLP project, C++20 is used, allowing the utilization of C++20 features such as class template argument deduction (CTAD) with `std::array`.

Applied to files:

components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp

📚 Learning: 2024-11-01T03:26:26.386Z

Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.

Applied to files:

components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp

📚 Learning: 2025-02-10T22:36:08.496Z

Learnt from: davemarco
Repo: y-scope/clp PR: 646
File: components/core/src/clp/streaming_archive/single_file_archive/writer.cpp:161-171
Timestamp: 2025-02-10T22:36:08.496Z
Learning: When using std::array, size checking between arrays is enforced at compile-time through the type system, making additional static_assert size checks redundant.

Applied to files:

components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp

🧬 Code graph analysis (1)

components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp (2)

components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (2)

parse_timestamp (248-756)

parse_timestamp (248-252)

components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (1)

parse_timestamp (85-89)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)

GitHub Check: ubuntu-jammy-static-linked-bins
GitHub Check: package-image
GitHub Check: ubuntu-jammy-lint
GitHub Check: ubuntu-jammy-dynamic-linked-bins
GitHub Check: centos-stream-9-dynamic-linked-bins
GitHub Check: centos-stream-9-static-linked-bins
GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
GitHub Check: manylinux_2_28-x86_64-static-linked-bins
GitHub Check: musllinux_1_2-x86_64-static-linked-bins
GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
GitHub Check: build-macos (macos-15, false)
GitHub Check: build-macos (macos-14, false)
GitHub Check: lint-check (ubuntu-24.04)
GitHub Check: lint-check (macos-15)

LinZhihao-723

The implementation lgtm in general. Had a few style-wise comments.
Will review test cases shortly.

LinZhihao-723 · 2025-11-07T16:39:34Z

-            case 'L':
-            case 'C':
-            case 'N':
+            case 'p': {


Can we add an inline comment to explain which case it is like we did in the previous PR?

LinZhihao-723 · 2025-11-07T17:58:33Z

    // Do not allow trailing unmatched content.
    if (pattern_idx != pattern.size() || timestamp_idx != timestamp.size()) {
        return ErrorCode{ErrorCodeEnum::IncompatibleTimestampPattern};
    }


Shall we move this before any format changes?

LinZhihao-723 · 2025-11-07T18:24:30Z

+                number_type_representation = true;
+                break;
+            }
+            case 'L': {


Just to confirm: For L, C and N, we will allow negative numbers to indicate timestamps before Unix epoch, right?

Yes, that's correct.

LinZhihao-723 · 2025-11-07T18:26:12Z

+        std::string_view str,
+        size_t max_num_digits
+) -> ystdlib::error_handling::Result<std::pair<int, size_t>> {
+    constexpr int cTen{10};


Can we return int64_t since it will be converted to nanosec through multiplication eventually? Making two new helpers return the same type would also lead to less confusion...

LinZhihao-723 · 2025-11-07T18:27:06Z

+constexpr std::array cPowersOfTen
+        = {1, 10, 100, 1000, 10'000, 100'000, 1'000'000, 10'000'000, 100'000'000, 1'000'000'000};


Shall we explicitly mark the type as int64_t?

LinZhihao-723 · 2025-11-07T18:31:48Z

+                parsed_epoch_nanoseconds = number
+                                           * cPowersOfTen.at(
+                                                   cNumNanosecondPrecisionSubsecondDigits
+                                                   - cNumMicrosecondPrecisionSubsecondDigits
+                                           );


btw this factor can be made as a constexpr:

Suggested change

parsed_epoch_nanoseconds = number

* cPowersOfTen.at(

cNumNanosecondPrecisionSubsecondDigits

- cNumMicrosecondPrecisionSubsecondDigits

);

constexpr auto cFactor{cPowersOfTen

[cNumNanosecondPrecisionSubsecondDigits

- cNumMicrosecondPrecisionSubsecondDigits]};

parsed_epoch_nanoseconds = number * cFactor;

LinZhihao-723

The overall test cases also lgtm.
Will approve once all comments are resolved.

LinZhihao-723 · 2025-11-07T20:47:32Z

+        assert_specifier_accepts_valid_content('N', negative_epoch_timestamps);
+    }
+
+    std::vector<ExpectedParsingResult> const expected_parsing_results{


Can we move this vector inside the section? Or do u plan to reuse it in other test sections?

I was undecided -- depends on whether we also want to test round-tripping and accuracy for deducing timestamp patterns based on a list of predefined patterns in the section below. I'm now leading towards testing those all at the same time though, because then we can also test that parsing using the list of predefined patterns (which may contain CAT sequences) yields the same result as parsing using the specified pattern (which does not contain CAT sequences).

I'll move it inside.

kk sg. We can move it out again in later PRs if needed.

LinZhihao-723 · 2025-11-07T20:48:44Z

            auto const timestamp{fmt::format("{}a", day_in_week_timestamp)};
-            auto const result{parse_timestamp(timestamp, "\\d \\aa", generated_pattern)};
+            auto const result{parse_timestamp(timestamp, cPattern, generated_pattern)};
+            REQUIRE(false == result.has_error());


Suggested change

REQUIRE(false == result.has_error());

REQUIRE_FALSE(result.has_error());

LinZhihao-723 · 2025-11-07T20:52:13Z

+
+/**
+ * @param num_digits
+ * @return All of the padded numbers with `num_digits` digits having a single unique digit.


Suggested change

* @return All of the padded numbers with `num_digits` digits having a single unique digit.

* @return A vector containing all digit numbers (1 to 9) padded to the specified `num_digits`

* length with the digit.

Nit: u can apply it if u think this description is better

Probably won't use this exact phrasing, but I'll incorporate the "A vector of" part.

Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>

…s them.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (2)

171-200: CRITICAL: Guard against overflow in convert_positive_bounded_variable_length_string_prefix_to_number.

This function performs converted_value *= cTen and converted_value += ... without checking for overflow. Large inputs can overflow int64_t before validation, causing undefined behaviour. This issue was flagged in a previous review with detailed fix suggestions.

Please accumulate the magnitude in an unsigned type with explicit std::numeric_limits guards and convert back to int64_t only after verifying the value is representable.

202-244: CRITICAL: Guard against overflow in convert_variable_length_string_prefix_to_number.

This function performs converted_value *= cTen and converted_value += ... without checking for overflow. Inputs such as "99999999999999999999" or "-9223372036854775808" can overflow before rejection, yielding undefined behaviour. This issue was flagged in a previous review with detailed fix suggestions.

Please use unsigned arithmetic with explicit magnitude checks for both positive and negative values, handling the special case of INT64_MIN separately.

components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp (1)

182-184: Consider using REQUIRE_FALSE for consistency.

Similar to earlier test assertions, consider using REQUIRE_FALSE(result.has_error()) instead of the double-negative for better readability.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 64ed11b and 9558d74.

📒 Files selected for processing (2)

components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (8 hunks)
components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp (6 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

Prefer false == <expression> rather than !<expression>.

Files:

components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp

🧠 Learnings (7)

📚 Learning: 2024-10-13T09:29:56.553Z

Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/tests/test-ir_encoding_methods.cpp:1283-1286
Timestamp: 2024-10-13T09:29:56.553Z
Learning: `clp::ffi::SchemaTree::Node::id_t` is of type `uint32_t`, so adding `1` to `INT32_MAX` is safe and does not cause integer overflow.

Applied to files:

components/core/src/clp_s/timestamp_parser/TimestampParser.cpp

📚 Learning: 2025-09-11T13:15:39.930Z

Learnt from: anlowee
Repo: y-scope/clp PR: 1176
File: components/core/src/clp_s/ColumnWriter.cpp:67-72
Timestamp: 2025-09-11T13:15:39.930Z
Learning: In components/core/src/clp_s/ColumnWriter.cpp, when using sizeof with types like variable_dictionary_id_t, always use the fully qualified namespace (clp::variable_dictionary_id_t) to match the variable declaration and maintain consistency with other column writers in the codebase.

Applied to files:

components/core/src/clp_s/timestamp_parser/TimestampParser.cpp

📚 Learning: 2024-11-26T19:12:43.982Z

Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 486
File: components/core/tests/test-error_handling.cpp:37-38
Timestamp: 2024-11-26T19:12:43.982Z
Learning: In the CLP project, C++20 is used, allowing the utilization of C++20 features such as class template argument deduction (CTAD) with `std::array`.

Applied to files:

components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp

📚 Learning: 2025-01-30T19:26:33.869Z

Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.

Applied to files:

components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp

📚 Learning: 2025-02-10T22:36:08.496Z

Learnt from: davemarco
Repo: y-scope/clp PR: 646
File: components/core/src/clp/streaming_archive/single_file_archive/writer.cpp:161-171
Timestamp: 2025-02-10T22:36:08.496Z
Learning: When using std::array, size checking between arrays is enforced at compile-time through the type system, making additional static_assert size checks redundant.

Applied to files:

components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp

📚 Learning: 2024-10-14T03:45:21.323Z

Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:331-335
Timestamp: 2024-10-14T03:45:21.323Z
Learning: When reviewing code, ensure that suggestions about unnecessary use of `std::move` with `std::shared_ptr` are only made if `std::move` is actually used in the code.

Applied to files:

components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp

📚 Learning: 2024-11-01T03:26:26.386Z

Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.

Applied to files:

components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp

🧬 Code graph analysis (1)

components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp (2)

components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (2)

parse_timestamp (248-753)

parse_timestamp (248-252)

components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (1)

parse_timestamp (85-89)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)

GitHub Check: musllinux_1_2-x86_64-static-linked-bins
GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
GitHub Check: ubuntu-jammy-lint
GitHub Check: ubuntu-jammy-dynamic-linked-bins
GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
GitHub Check: ubuntu-jammy-static-linked-bins
GitHub Check: centos-stream-9-static-linked-bins
GitHub Check: centos-stream-9-dynamic-linked-bins
GitHub Check: manylinux_2_28-x86_64-static-linked-bins
GitHub Check: build (macos-15)
GitHub Check: lint-check (ubuntu-24.04)
GitHub Check: build-macos (macos-14, true)
GitHub Check: build-macos (macos-15, false)
GitHub Check: build-macos (macos-14, false)
GitHub Check: build-macos (macos-15, true)

🔇 Additional comments (17)

components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (12)

33-46: LGTM - Time component constants are well-defined.

The constants properly define bounds for hour, minute, second, and subsecond components. The inclusion of 60 as the maximum second value correctly accommodates leap seconds.

89-92: LGTM - Constants for AM/PM and powers of ten are properly defined.

The arrays are well-structured and use appropriate types (string_view for AM/PM, int64_t for powers of ten).

259-271: LGTM - Parsing state variables are appropriately declared and initialized.

The new variables properly track time components, AM/PM indicator, epoch representation, and clock format, supporting the extended parsing capabilities.

415-423: LGTM - AM/PM parsing is correctly implemented.

The case properly matches and stores the AM/PM indicator for use with 12-hour clock formats.

424-509: LGTM - Hour parsing for 24-hour and 12-hour formats is correctly implemented.

All four hour format specifiers (H, k, I, l) properly validate field length, parse with appropriate padding, enforce correct ranges, and set the necessary flags for clock format tracking.

510-547: LGTM - Minute and second parsing is correctly implemented.

Both format specifiers properly validate field length and ranges, with the second parser correctly accommodating leap seconds (maximum value 60).

548-620: LGTM - Subsecond parsing for all precision levels is correctly implemented.

The fixed-length cases (3, 6, 9-digit) properly scale values to nanoseconds using the powers-of-ten array. The variable-length case 'T' correctly rejects trailing zeros and applies appropriate scaling based on the number of digits consumed.

621-673: LGTM - Epoch format parsing is correctly implemented.

All four epoch format specifiers (E, L, C, N) properly parse variable-length signed integers, apply correct scaling factors to convert to nanoseconds, and support negative values for pre-epoch timestamps.

706-710: LGTM - Validation ensures 12-hour clock and AM/PM consistency.

This check correctly enforces that 12-hour clock formats must have an AM/PM indicator and 24-hour formats must not, preventing malformed timestamp patterns.

712-720: LGTM - Epoch timestamp handling with subseconds is correctly implemented.

The logic properly combines epoch values with subsecond components, correctly subtracting subseconds for negative epochs and adding them for positive epochs.

722-725: LGTM - 12-hour to 24-hour clock conversion is correct.

The formula properly converts 12-hour format with AM/PM to 24-hour format, correctly handling edge cases like 12 AM (midnight) and 12 PM (noon).

732-735: LGTM - Time point construction properly combines date and time components.

The implementation correctly constructs the final time point by adding all parsed time components (hours, minutes, seconds, subsecond nanoseconds) to the date, using appropriate chrono duration types.

components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp (5)

18-31: LGTM - Test data structure is appropriately defined.

The ExpectedParsingResult struct provides a clear structure for organizing test cases with timestamp, pattern, and expected epoch value.

53-70: LGTM - Test helper function declarations are clear.

The function declarations properly use [[nodiscard]] and have clear documentation explaining their purpose and return values.

98-114: LGTM - Test helper implementations correctly generate test data.

Both functions efficiently generate the expected test data patterns: triangles of repeated digits for variable-length testing and padded single-digit sequences for fixed-length testing.

186-250: LGTM - Comprehensive test coverage for new format specifiers.

The tests thoroughly exercise all new format specifiers including:

24-hour and 12-hour clock formats with various padding

AM/PM indicators with both zero-padded and space-padded hours

Minutes and seconds (including leap seconds)

All subsecond precisions and variable-length fractional seconds

All epoch formats with both positive and negative values

Consider using REQUIRE_FALSE at lines 207-208 for consistency with test idioms, as mentioned in past review comments.

252-311: LGTM - Excellent accuracy validation with comprehensive real-world test cases.

The test section provides thorough validation of parsing accuracy across diverse timestamp formats including:

Multiple date and time separators

Various subsecond precisions (milliseconds, microseconds, nanoseconds)

Month name formats

Day of week validation

Epoch formats with subseconds

Variable-length fractional seconds

Each test case properly validates both the parsed epoch value and the returned pattern.

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (1)

components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (1)

202-244: Address the overflow concern from the previous review.

This function has the same overflow vulnerability as noted in the past review comment. Please apply the detailed mitigation suggested there, using unsigned accumulation with std::numeric_limits guards for both positive and negative values.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7f41e7d and eaf6825.

📒 Files selected for processing (1)

components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (8 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

Prefer false == <expression> rather than !<expression>.

Files:

components/core/src/clp_s/timestamp_parser/TimestampParser.cpp

🧠 Learnings (5)

📚 Learning: 2024-10-13T09:29:56.553Z

Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/tests/test-ir_encoding_methods.cpp:1283-1286
Timestamp: 2024-10-13T09:29:56.553Z
Learning: `clp::ffi::SchemaTree::Node::id_t` is of type `uint32_t`, so adding `1` to `INT32_MAX` is safe and does not cause integer overflow.

Applied to files:

components/core/src/clp_s/timestamp_parser/TimestampParser.cpp

📚 Learning: 2025-09-11T13:15:39.930Z

Learnt from: anlowee
Repo: y-scope/clp PR: 1176
File: components/core/src/clp_s/ColumnWriter.cpp:67-72
Timestamp: 2025-09-11T13:15:39.930Z
Learning: In components/core/src/clp_s/ColumnWriter.cpp, when using sizeof with types like variable_dictionary_id_t, always use the fully qualified namespace (clp::variable_dictionary_id_t) to match the variable declaration and maintain consistency with other column writers in the codebase.

Applied to files:

components/core/src/clp_s/timestamp_parser/TimestampParser.cpp

📚 Learning: 2024-11-26T19:12:43.982Z

Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 486
File: components/core/tests/test-error_handling.cpp:37-38
Timestamp: 2024-11-26T19:12:43.982Z
Learning: In the CLP project, C++20 is used, allowing the utilization of C++20 features such as class template argument deduction (CTAD) with `std::array`.

Applied to files:

components/core/src/clp_s/timestamp_parser/TimestampParser.cpp

📚 Learning: 2025-01-30T19:26:33.869Z

Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.

Applied to files:

components/core/src/clp_s/timestamp_parser/TimestampParser.cpp

📚 Learning: 2025-02-10T22:36:08.496Z

Learnt from: davemarco
Repo: y-scope/clp PR: 646
File: components/core/src/clp/streaming_archive/single_file_archive/writer.cpp:161-171
Timestamp: 2025-02-10T22:36:08.496Z
Learning: When using std::array, size checking between arrays is enforced at compile-time through the type system, making additional static_assert size checks redundant.

Applied to files:

components/core/src/clp_s/timestamp_parser/TimestampParser.cpp

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)

GitHub Check: ubuntu-jammy-lint
GitHub Check: ubuntu-jammy-static-linked-bins
GitHub Check: ubuntu-jammy-dynamic-linked-bins
GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
GitHub Check: musllinux_1_2-x86_64-static-linked-bins
GitHub Check: manylinux_2_28-x86_64-static-linked-bins
GitHub Check: centos-stream-9-static-linked-bins
GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
GitHub Check: centos-stream-9-dynamic-linked-bins
GitHub Check: package-image
GitHub Check: build-macos (macos-14, false)
GitHub Check: antlr-code-committed (macos-15)
GitHub Check: lint-check (macos-15)
GitHub Check: lint-check (ubuntu-24.04)

🔇 Additional comments (6)

components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (6)

33-92: LGTM!

The time component bounds and subsecond precision constants are well-defined. The inclusion of 60 as the maximum second value correctly accommodates leap seconds, and the powers of ten array provides the necessary scaling factors for subsecond conversions.

117-141: LGTM!

The function declarations for variable-length numeric prefix parsing are well-documented with clear parameter descriptions and error code specifications.

415-547: LGTM!

The time component parsing (AM/PM, hours, minutes, seconds) follows a consistent and robust pattern: bounds checking before substring extraction, range validation after parsing, and proper index advancement. The 12-hour clock flag is correctly set for cases 'I' and 'l'.

604-620: LGTM with a note on safety.

The variable-length fractional seconds parsing correctly uses the bounded helper function and validates that the last digit is not '0' to enforce the no-trailing-zeroes requirement. The cast to int at line 618 should be safe given the maximum value constraints, but as noted in past comments, using int64_t for parsed_subsecond_nanoseconds throughout would eliminate the cast and improve consistency.

Based on past review comments.

706-725: LGTM!

The validation logic correctly enforces that 12-hour clock patterns must include AM/PM and vice versa. The epoch time calculation properly handles negative timestamps by subtracting the subsecond component, and the 12-hour to 24-hour conversion using modulo arithmetic is accurate.

727-752: LGTM!

The final time-point construction correctly combines the parsed date and time components, validates the day of week when present, and converts to epoch nanoseconds. The use of the date library's type-safe operations ensures proper calendar arithmetic.

LinZhihao-723

For the PR title, how about the following to be more explicit about what's been implemented?

feat(clp-s::timestamp_parser): Add implementation for day time parsing.

gibber9809 · 2025-11-10T19:36:14Z

For the PR title, how about the following to be more explicit about what's been implemented?
feat(clp-s::timestamp_parser): Add implementation for day time parsing.

How about

feat(clp-s::timestamp_parser): Add support for parsing date-time and epoch timestamps without timezones.

LinZhihao-723 · 2025-11-10T19:37:30Z

For the PR title, how about the following to be more explicit about what's been implemented?
feat(clp-s::timestamp_parser): Add implementation for day time parsing.
How about
feat(clp-s::timestamp_parser): Add support for parsing date-time and epoch timestamps without timezones.

sgtm

gibber9809 · 2025-11-10T21:37:51Z

Accidentally hit close while checking on the CI status in the mobile client.

…epoch timestamps without timezones. (y-scope#1564) Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>

gibber9809 added 8 commits November 4, 2025 17:36

Implement remaining format specifiers except for timezone and CAT seq…

3e948c5

…uences.

Cover some missed edge cases in TimestampParser

09ff6fb

Add tests that ensure that newly implemented format specifiers accept…

7a8d504

… expected content.

Add tests for basic timestamp patterns; fix bug in millisecond and mi…

0d28cf6

…crosecond parsing.

Add tests for all timestamps from test-TimestampPattern.cpp plus comm…

f8a288f

…on numeric patterns.

Add missing const.

4fd8fe6

Fix clang-tidy errors in test_TimestampParser.cpp

9b58db2

Merge remote-tracking branch 'upstream/main' into timestamp-parser-2

b2d91c8

gibber9809 requested review from a team and wraymo as code owners November 6, 2025 17:23

gibber9809 requested a review from LinZhihao-723 November 6, 2025 17:24

coderabbitai Bot reviewed Nov 6, 2025

View reviewed changes

Comment thread components/core/src/clp_s/timestamp_parser/TimestampParser.cpp

LinZhihao-723 requested changes Nov 7, 2025

View reviewed changes

gibber9809 and others added 6 commits November 10, 2025 10:56

Apply suggestions from code review

64ed11b

Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>

Address most review comments.

d533f35

Address more review comments.

db68ff6

Add inline comments about each format specifier.

d8a68b0

Move expected parsing cases into only test section that currently use…

48a1b4c

…s them.

Merge remote-tracking branch 'upstream/main' into timestamp-parser-2

9558d74

gibber9809 requested a review from LinZhihao-723 November 10, 2025 16:48

coderabbitai Bot reviewed Nov 10, 2025

View reviewed changes

Comment thread components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp

gibber9809 added 3 commits November 10, 2025 16:59

Address rabbit comment.

7f41e7d

Use constexpr power of 10 in a few more instances.

1875c75

Fix clang-tidy issue in TimestampParser

eaf6825

coderabbitai Bot reviewed Nov 10, 2025

View reviewed changes

gibber9809 added 2 commits November 10, 2025 19:27

Address edge case caught by rabbit.

408c0fd

Merge remote-tracking branch 'upstream/main' into timestamp-parser-2

29fc82e

LinZhihao-723 approved these changes Nov 10, 2025

View reviewed changes

gibber9809 changed the title ~~feat(clp-s::timestamp_parser): Continue partial implementation of timestamp parsing utility.~~ feat(clp-s::timestamp_parser): Add support for parsing date-time and epoch timestamps without timezones. Nov 10, 2025

gibber9809 closed this Nov 10, 2025

gibber9809 reopened this Nov 10, 2025

Merge branch 'main' into timestamp-parser-2

5c9140a

gibber9809 merged commit 0274e6c into y-scope:main Nov 10, 2025
27 checks passed

This was referenced Nov 20, 2025

feat(clp-s::timestamp_parser): Add support for marshalling timestamps. #1646

Merged

feat(clp-s::timestamp_parser): Add support for preserving leap seconds in date-time timestamps. #1673

Merged

gibber9809 mentioned this pull request Dec 3, 2025

feat(clp-s::timestamp_parser): Make control characters and lone double quotes illegal in timestamp patterns. #1723

Merged

3 tasks

		constexpr std::array cPowersOfTen
		= {1, 10, 100, 1000, 10'000, 100'000, 1'000'000, 10'000'000, 100'000'000, 1'000'000'000};

	REQUIRE(false == result.has_error());
	REQUIRE_FALSE(result.has_error());

	* @return All of the padded numbers with `num_digits` digits having a single unique digit.
	* @return A vector containing all digit numbers (1 to 9) padded to the specified `num_digits`
	* length with the digit.

Conversation

gibber9809 commented Nov 6, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Validation performed

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

LinZhihao-723 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LinZhihao-723 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LinZhihao-723 left a comment

Choose a reason for hiding this comment

Uh oh!

gibber9809 commented Nov 10, 2025

Uh oh!

LinZhihao-723 commented Nov 10, 2025

Uh oh!

gibber9809 commented Nov 10, 2025

Uh oh!

gibber9809 commented Nov 6, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Nov 6, 2025 •

edited

Loading