Skip to content

feat(clp-s::timestamp_parser): Add support for parsing date-time and epoch timestamps without timezones.#1564

Merged
gibber9809 merged 20 commits into
y-scope:mainfrom
gibber9809:timestamp-parser-2
Nov 10, 2025
Merged

feat(clp-s::timestamp_parser): Add support for parsing date-time and epoch timestamps without timezones.#1564
gibber9809 merged 20 commits into
y-scope:mainfrom
gibber9809:timestamp-parser-2

Conversation

@gibber9809

@gibber9809 gibber9809 commented Nov 6, 2025

Copy link
Copy Markdown
Contributor

Description

This PR follows up on #1385 by implementing the remainder of the initial set of format specifier excepting timezones, and CAT sequences.

Since the set of supported format specifiers is nearly complete, we also add tests for parsing all of the unique timestamps from tests/test-TimestampPattern.cpp, with some additional testcases focused on numeric timestamps.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Added tests to ensure every newly implemented format specifier accepts the expected kind of content.
  • Added tests to ensure that timestamps are parsed accurately for known cases.

Summary by CodeRabbit

  • New Features

    • Enhanced timestamp parsing: 12/24‑hour support with AM/PM, minutes, seconds, and variable‑length subseconds (up to nanoseconds); accepts direct epoch inputs.
  • Bug Fixes

    • Stricter validation and error handling: range checks, consistent 12‑hour/AM‑PM rules, rejection of mixed or trailing formats, and correct epoch/subsecond composition.
  • Tests

    • Expanded test coverage for padding variants, AM/PM and 12‑hour scenarios, and wide‑range epoch representations.

@gibber9809 gibber9809 requested review from a team and wraymo as code owners November 6, 2025 17:23
@coderabbitai

coderabbitai Bot commented Nov 6, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

Adds bounded and variable-length numeric-prefix parsers, new time-component constants and parse state, AM/PM and 12/24‑hour parsing, subsecond and epoch token handling with scaling and validation, plus expanded tests covering many parsing patterns and expected epoch outputs.

Changes

Cohort / File(s) Summary
Timestamp parser implementation
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
Adds numeric-prefix converters (convert_positive_bounded_variable_length_string_prefix_to_number, convert_variable_length_string_prefix_to_number), constants for hour/minute/second/subsecond bounds and power-of-ten arrays, parts-of-day primitives, extended parse state (parsed_hour, parsed_minute, parsed_second, parsed_subsecond_nanoseconds, optional_part_of_day_idx, parsed_epoch_nanoseconds, uses_12_hour_clock), and extensive parsing logic for tokens: AM/PM (p), 24‑hour (H,k), 12‑hour (I,l), minutes (M), seconds (S), subseconds (3,6,9,T) and epoch tokens (E,L,C,N). Implements scaling to nanoseconds, 12‑hour offset application, epoch assembly, range checks, mutual‑exclusion between date/time and epoch modes, and trailing-content validation.
Tests — parser coverage & helpers
components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp
Adds ExpectedParsingResult test struct, helpers generate_number_triangles() and generate_padded_number_subset(), switches to raw string literal patterns, introduces a large expected_parsing_results dataset, and extends assertions to verify both parsed epoch nanoseconds and returned pattern across many cases (padding, AM/PM, 12‑hour, subseconds, multiple epoch widths).

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Suite
    participant Parser as TimestampParser
    participant Utils as NumericUtils
    participant State as ParseState
    participant Epoch as EpochAssembler

    Test->>Parser: parse(input, pattern)
    Parser->>State: initialize parse state (parsed_*, flags)
    alt pattern contains date/time tokens
        Parser->>Utils: parse numeric prefix (bounded/unbounded)
        Utils-->>Parser: value + consumed_length or error
        Parser->>State: assign parsed_hour/minute/second/subsecond
        alt AM/PM token present
            Parser->>State: set optional_part_of_day_idx and uses_12_hour_clock
        end
    end
    alt pattern contains epoch token
        Parser->>Utils: parse epoch numeric token
        Utils-->>Parser: epoch_value
        Parser->>Epoch: scale to nanoseconds -> parsed_epoch_nanoseconds
    end
    Parser->>State: validate ranges and mode consistency
    Parser->>Epoch: combine date/time + subsecond -> final_epoch_ns
    Epoch-->>Parser: final_epoch_ns or error
    Parser-->>Test: return (pattern, final_epoch_ns) or error
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Areas to focus on:
    • Numeric-conversion helpers: bounds, consumed-digit reporting, error paths.
    • Token parsing branches for H/k/I/l/p/M/S/3/6/9/T/E/L/C/N and their scaling to nanoseconds.
    • 12‑hour vs AM/PM interaction, offset application, and consistency checks.
    • Epoch assembly logic (combining date/time with subseconds, negative epoch handling).
    • Tests: correctness of expected epoch values, pattern round-trip assertions, and wide dataset coverage.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding support for parsing date-time and epoch timestamps without timezones, which aligns with the extensive enhancements shown in the raw summary.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1a0c362 and b2d91c8.

📒 Files selected for processing (2)
  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (8 hunks)
  • components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp (6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp
  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
🧠 Learnings (4)
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp
  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
📚 Learning: 2024-11-26T19:12:43.982Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 486
File: components/core/tests/test-error_handling.cpp:37-38
Timestamp: 2024-11-26T19:12:43.982Z
Learning: In the CLP project, C++20 is used, allowing the utilization of C++20 features such as class template argument deduction (CTAD) with `std::array`.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp
  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
📚 Learning: 2024-11-01T03:26:26.386Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp
📚 Learning: 2025-02-10T22:36:08.496Z
Learnt from: davemarco
Repo: y-scope/clp PR: 646
File: components/core/src/clp/streaming_archive/single_file_archive/writer.cpp:161-171
Timestamp: 2025-02-10T22:36:08.496Z
Learning: When using std::array, size checking between arrays is enforced at compile-time through the type system, making additional static_assert size checks redundant.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp
  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
🧬 Code graph analysis (1)
components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp (2)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (2)
  • parse_timestamp (248-756)
  • parse_timestamp (248-252)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (1)
  • parse_timestamp (85-89)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: package-image
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)

Comment thread components/core/src/clp_s/timestamp_parser/TimestampParser.cpp

@LinZhihao-723 LinZhihao-723 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation lgtm in general. Had a few style-wise comments.
Will review test cases shortly.

case 'L':
case 'C':
case 'N':
case 'p': {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an inline comment to explain which case it is like we did in the previous PR?

Comment on lines 710 to 713
// Do not allow trailing unmatched content.
if (pattern_idx != pattern.size() || timestamp_idx != timestamp.size()) {
return ErrorCode{ErrorCodeEnum::IncompatibleTimestampPattern};
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we move this before any format changes?

Comment thread components/core/src/clp_s/timestamp_parser/TimestampParser.cpp Outdated
number_type_representation = true;
break;
}
case 'L': {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm: For L, C and N, we will allow negative numbers to indicate timestamps before Unix epoch, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct.

std::string_view str,
size_t max_num_digits
) -> ystdlib::error_handling::Result<std::pair<int, size_t>> {
constexpr int cTen{10};

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we return int64_t since it will be converted to nanosec through multiplication eventually? Making two new helpers return the same type would also lead to less confusion...

Comment on lines +91 to +92
constexpr std::array cPowersOfTen
= {1, 10, 100, 1000, 10'000, 100'000, 1'000'000, 10'000'000, 100'000'000, 1'000'000'000};

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we explicitly mark the type as int64_t?

Comment on lines +658 to +662
parsed_epoch_nanoseconds = number
* cPowersOfTen.at(
cNumNanosecondPrecisionSubsecondDigits
- cNumMicrosecondPrecisionSubsecondDigits
);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw this factor can be made as a constexpr:

Suggested change
parsed_epoch_nanoseconds = number
* cPowersOfTen.at(
cNumNanosecondPrecisionSubsecondDigits
- cNumMicrosecondPrecisionSubsecondDigits
);
constexpr auto cFactor{cPowersOfTen
[cNumNanosecondPrecisionSubsecondDigits
- cNumMicrosecondPrecisionSubsecondDigits]};
parsed_epoch_nanoseconds = number * cFactor;

@LinZhihao-723 LinZhihao-723 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall test cases also lgtm.
Will approve once all comments are resolved.

Comment thread components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp Outdated
Comment thread components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp Outdated
Comment thread components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp Outdated
Comment thread components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp Outdated
assert_specifier_accepts_valid_content('N', negative_epoch_timestamps);
}

std::vector<ExpectedParsingResult> const expected_parsing_results{

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this vector inside the section? Or do u plan to reuse it in other test sections?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was undecided -- depends on whether we also want to test round-tripping and accuracy for deducing timestamp patterns based on a list of predefined patterns in the section below. I'm now leading towards testing those all at the same time though, because then we can also test that parsing using the list of predefined patterns (which may contain CAT sequences) yields the same result as parsing using the specified pattern (which does not contain CAT sequences).

I'll move it inside.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk sg. We can move it out again in later PRs if needed.

Comment thread components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp Outdated
auto const timestamp{fmt::format("{}a", day_in_week_timestamp)};
auto const result{parse_timestamp(timestamp, "\\d \\aa", generated_pattern)};
auto const result{parse_timestamp(timestamp, cPattern, generated_pattern)};
REQUIRE(false == result.has_error());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
REQUIRE(false == result.has_error());
REQUIRE_FALSE(result.has_error());

Comment thread components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp Outdated
Comment thread components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp Outdated

/**
* @param num_digits
* @return All of the padded numbers with `num_digits` digits having a single unique digit.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return All of the padded numbers with `num_digits` digits having a single unique digit.
* @return A vector containing all digit numbers (1 to 9) padded to the specified `num_digits`
* length with the digit.

Nit: u can apply it if u think this description is better

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably won't use this exact phrasing, but I'll incorporate the "A vector of" part.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (2)

171-200: CRITICAL: Guard against overflow in convert_positive_bounded_variable_length_string_prefix_to_number.

This function performs converted_value *= cTen and converted_value += ... without checking for overflow. Large inputs can overflow int64_t before validation, causing undefined behaviour. This issue was flagged in a previous review with detailed fix suggestions.

Please accumulate the magnitude in an unsigned type with explicit std::numeric_limits guards and convert back to int64_t only after verifying the value is representable.


202-244: CRITICAL: Guard against overflow in convert_variable_length_string_prefix_to_number.

This function performs converted_value *= cTen and converted_value += ... without checking for overflow. Inputs such as "99999999999999999999" or "-9223372036854775808" can overflow before rejection, yielding undefined behaviour. This issue was flagged in a previous review with detailed fix suggestions.

Please use unsigned arithmetic with explicit magnitude checks for both positive and negative values, handling the special case of INT64_MIN separately.

components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp (1)

182-184: Consider using REQUIRE_FALSE for consistency.

Similar to earlier test assertions, consider using REQUIRE_FALSE(result.has_error()) instead of the double-negative for better readability.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 64ed11b and 9558d74.

📒 Files selected for processing (2)
  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (8 hunks)
  • components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp (6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
  • components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp
🧠 Learnings (7)
📚 Learning: 2024-10-13T09:29:56.553Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/tests/test-ir_encoding_methods.cpp:1283-1286
Timestamp: 2024-10-13T09:29:56.553Z
Learning: `clp::ffi::SchemaTree::Node::id_t` is of type `uint32_t`, so adding `1` to `INT32_MAX` is safe and does not cause integer overflow.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
📚 Learning: 2025-09-11T13:15:39.930Z
Learnt from: anlowee
Repo: y-scope/clp PR: 1176
File: components/core/src/clp_s/ColumnWriter.cpp:67-72
Timestamp: 2025-09-11T13:15:39.930Z
Learning: In components/core/src/clp_s/ColumnWriter.cpp, when using sizeof with types like variable_dictionary_id_t, always use the fully qualified namespace (clp::variable_dictionary_id_t) to match the variable declaration and maintain consistency with other column writers in the codebase.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
📚 Learning: 2024-11-26T19:12:43.982Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 486
File: components/core/tests/test-error_handling.cpp:37-38
Timestamp: 2024-11-26T19:12:43.982Z
Learning: In the CLP project, C++20 is used, allowing the utilization of C++20 features such as class template argument deduction (CTAD) with `std::array`.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
  • components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
  • components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp
📚 Learning: 2025-02-10T22:36:08.496Z
Learnt from: davemarco
Repo: y-scope/clp PR: 646
File: components/core/src/clp/streaming_archive/single_file_archive/writer.cpp:161-171
Timestamp: 2025-02-10T22:36:08.496Z
Learning: When using std::array, size checking between arrays is enforced at compile-time through the type system, making additional static_assert size checks redundant.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
  • components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp
📚 Learning: 2024-10-14T03:45:21.323Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:331-335
Timestamp: 2024-10-14T03:45:21.323Z
Learning: When reviewing code, ensure that suggestions about unnecessary use of `std::move` with `std::shared_ptr` are only made if `std::move` is actually used in the code.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp
📚 Learning: 2024-11-01T03:26:26.386Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp
🧬 Code graph analysis (1)
components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp (2)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (2)
  • parse_timestamp (248-753)
  • parse_timestamp (248-252)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (1)
  • parse_timestamp (85-89)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: build (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-15, true)
🔇 Additional comments (17)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (12)

33-46: LGTM - Time component constants are well-defined.

The constants properly define bounds for hour, minute, second, and subsecond components. The inclusion of 60 as the maximum second value correctly accommodates leap seconds.


89-92: LGTM - Constants for AM/PM and powers of ten are properly defined.

The arrays are well-structured and use appropriate types (string_view for AM/PM, int64_t for powers of ten).


259-271: LGTM - Parsing state variables are appropriately declared and initialized.

The new variables properly track time components, AM/PM indicator, epoch representation, and clock format, supporting the extended parsing capabilities.


415-423: LGTM - AM/PM parsing is correctly implemented.

The case properly matches and stores the AM/PM indicator for use with 12-hour clock formats.


424-509: LGTM - Hour parsing for 24-hour and 12-hour formats is correctly implemented.

All four hour format specifiers (H, k, I, l) properly validate field length, parse with appropriate padding, enforce correct ranges, and set the necessary flags for clock format tracking.


510-547: LGTM - Minute and second parsing is correctly implemented.

Both format specifiers properly validate field length and ranges, with the second parser correctly accommodating leap seconds (maximum value 60).


548-620: LGTM - Subsecond parsing for all precision levels is correctly implemented.

The fixed-length cases (3, 6, 9-digit) properly scale values to nanoseconds using the powers-of-ten array. The variable-length case 'T' correctly rejects trailing zeros and applies appropriate scaling based on the number of digits consumed.


621-673: LGTM - Epoch format parsing is correctly implemented.

All four epoch format specifiers (E, L, C, N) properly parse variable-length signed integers, apply correct scaling factors to convert to nanoseconds, and support negative values for pre-epoch timestamps.


706-710: LGTM - Validation ensures 12-hour clock and AM/PM consistency.

This check correctly enforces that 12-hour clock formats must have an AM/PM indicator and 24-hour formats must not, preventing malformed timestamp patterns.


712-720: LGTM - Epoch timestamp handling with subseconds is correctly implemented.

The logic properly combines epoch values with subsecond components, correctly subtracting subseconds for negative epochs and adding them for positive epochs.


722-725: LGTM - 12-hour to 24-hour clock conversion is correct.

The formula properly converts 12-hour format with AM/PM to 24-hour format, correctly handling edge cases like 12 AM (midnight) and 12 PM (noon).


732-735: LGTM - Time point construction properly combines date and time components.

The implementation correctly constructs the final time point by adding all parsed time components (hours, minutes, seconds, subsecond nanoseconds) to the date, using appropriate chrono duration types.

components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp (5)

18-31: LGTM - Test data structure is appropriately defined.

The ExpectedParsingResult struct provides a clear structure for organizing test cases with timestamp, pattern, and expected epoch value.


53-70: LGTM - Test helper function declarations are clear.

The function declarations properly use [[nodiscard]] and have clear documentation explaining their purpose and return values.


98-114: LGTM - Test helper implementations correctly generate test data.

Both functions efficiently generate the expected test data patterns: triangles of repeated digits for variable-length testing and padded single-digit sequences for fixed-length testing.


186-250: LGTM - Comprehensive test coverage for new format specifiers.

The tests thoroughly exercise all new format specifiers including:

  • 24-hour and 12-hour clock formats with various padding
  • AM/PM indicators with both zero-padded and space-padded hours
  • Minutes and seconds (including leap seconds)
  • All subsecond precisions and variable-length fractional seconds
  • All epoch formats with both positive and negative values

Consider using REQUIRE_FALSE at lines 207-208 for consistency with test idioms, as mentioned in past review comments.


252-311: LGTM - Excellent accuracy validation with comprehensive real-world test cases.

The test section provides thorough validation of parsing accuracy across diverse timestamp formats including:

  • Multiple date and time separators
  • Various subsecond precisions (milliseconds, microseconds, nanoseconds)
  • Month name formats
  • Day of week validation
  • Epoch formats with subseconds
  • Variable-length fractional seconds

Each test case properly validates both the parsed epoch value and the returned pattern.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (1)

202-244: Address the overflow concern from the previous review.

This function has the same overflow vulnerability as noted in the past review comment. Please apply the detailed mitigation suggested there, using unsigned accumulation with std::numeric_limits guards for both positive and negative values.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7f41e7d and eaf6825.

📒 Files selected for processing (1)
  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (8 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
🧠 Learnings (5)
📚 Learning: 2024-10-13T09:29:56.553Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/tests/test-ir_encoding_methods.cpp:1283-1286
Timestamp: 2024-10-13T09:29:56.553Z
Learning: `clp::ffi::SchemaTree::Node::id_t` is of type `uint32_t`, so adding `1` to `INT32_MAX` is safe and does not cause integer overflow.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
📚 Learning: 2025-09-11T13:15:39.930Z
Learnt from: anlowee
Repo: y-scope/clp PR: 1176
File: components/core/src/clp_s/ColumnWriter.cpp:67-72
Timestamp: 2025-09-11T13:15:39.930Z
Learning: In components/core/src/clp_s/ColumnWriter.cpp, when using sizeof with types like variable_dictionary_id_t, always use the fully qualified namespace (clp::variable_dictionary_id_t) to match the variable declaration and maintain consistency with other column writers in the codebase.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
📚 Learning: 2024-11-26T19:12:43.982Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 486
File: components/core/tests/test-error_handling.cpp:37-38
Timestamp: 2024-11-26T19:12:43.982Z
Learning: In the CLP project, C++20 is used, allowing the utilization of C++20 features such as class template argument deduction (CTAD) with `std::array`.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
📚 Learning: 2025-02-10T22:36:08.496Z
Learnt from: davemarco
Repo: y-scope/clp PR: 646
File: components/core/src/clp/streaming_archive/single_file_archive/writer.cpp:161-171
Timestamp: 2025-02-10T22:36:08.496Z
Learning: When using std::array, size checking between arrays is enforced at compile-time through the type system, making additional static_assert size checks redundant.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: package-image
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: antlr-code-committed (macos-15)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (6)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (6)

33-92: LGTM!

The time component bounds and subsecond precision constants are well-defined. The inclusion of 60 as the maximum second value correctly accommodates leap seconds, and the powers of ten array provides the necessary scaling factors for subsecond conversions.


117-141: LGTM!

The function declarations for variable-length numeric prefix parsing are well-documented with clear parameter descriptions and error code specifications.


415-547: LGTM!

The time component parsing (AM/PM, hours, minutes, seconds) follows a consistent and robust pattern: bounds checking before substring extraction, range validation after parsing, and proper index advancement. The 12-hour clock flag is correctly set for cases 'I' and 'l'.


604-620: LGTM with a note on safety.

The variable-length fractional seconds parsing correctly uses the bounded helper function and validates that the last digit is not '0' to enforce the no-trailing-zeroes requirement. The cast to int at line 618 should be safe given the maximum value constraints, but as noted in past comments, using int64_t for parsed_subsecond_nanoseconds throughout would eliminate the cast and improve consistency.

Based on past review comments.


706-725: LGTM!

The validation logic correctly enforces that 12-hour clock patterns must include AM/PM and vice versa. The epoch time calculation properly handles negative timestamps by subtracting the subsecond component, and the 12-hour to 24-hour conversion using modulo arithmetic is accurate.


727-752: LGTM!

The final time-point construction correctly combines the parsed date and time components, validates the day of week when present, and converts to epoch nanoseconds. The use of the date library's type-safe operations ensures proper calendar arithmetic.

Comment thread components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
Comment thread components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
Comment thread components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
Comment thread components/core/src/clp_s/timestamp_parser/TimestampParser.cpp

@LinZhihao-723 LinZhihao-723 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about the following to be more explicit about what's been implemented?

feat(clp-s::timestamp_parser): Add implementation for day time parsing.

@gibber9809

Copy link
Copy Markdown
Contributor Author

For the PR title, how about the following to be more explicit about what's been implemented?

feat(clp-s::timestamp_parser): Add implementation for day time parsing.

How about

feat(clp-s::timestamp_parser): Add support for parsing date-time and epoch timestamps without timezones.

@LinZhihao-723

Copy link
Copy Markdown
Member

For the PR title, how about the following to be more explicit about what's been implemented?

feat(clp-s::timestamp_parser): Add implementation for day time parsing.

How about

feat(clp-s::timestamp_parser): Add support for parsing date-time and epoch timestamps without timezones.

sgtm

@gibber9809 gibber9809 changed the title feat(clp-s::timestamp_parser): Continue partial implementation of timestamp parsing utility. feat(clp-s::timestamp_parser): Add support for parsing date-time and epoch timestamps without timezones. Nov 10, 2025
@gibber9809 gibber9809 closed this Nov 10, 2025
@gibber9809 gibber9809 reopened this Nov 10, 2025
@gibber9809

Copy link
Copy Markdown
Contributor Author

Accidentally hit close while checking on the CI status in the mobile client.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants