Skip to content

feat(clp-s::timestamp_parser): Make control characters and lone double quotes illegal in timestamp patterns.#1723

Merged
gibber9809 merged 11 commits into
y-scope:mainfrom
gibber9809:timestamp-parser-ban-control-chars-and-escape-sequences
Dec 4, 2025
Merged

feat(clp-s::timestamp_parser): Make control characters and lone double quotes illegal in timestamp patterns.#1723
gibber9809 merged 11 commits into
y-scope:mainfrom
gibber9809:timestamp-parser-ban-control-chars-and-escape-sequences

Conversation

@gibber9809

@gibber9809 gibber9809 commented Dec 3, 2025

Copy link
Copy Markdown
Contributor

Description

This PR is the last in the series of PRs readying clp_s::timestamp_parser for use in clp-s (#1708, #1673, #1646, #1626, #1599, #1564, and #1385).

This PR makes several characters and escape sequences illegal in timestamp patterns, as well as fixes parsing and marshalling behaviour for the \\ escape sequence, in order to put timestamp patterns in a state where:

  1. We can use timestamp patterns to parse and marshal JSON literals and raw UTF-8, using a flag to indicate the content type
  2. We can add support for JSON escape sequences in a backwards compatible way over time

The banned characters and escape sequences are:

  1. Control characters (\x00 - \x1F)
  2. Double quote " (unless it is both the first and last character in a timestamp pattern, then the start and end quote are allowed)
  3. All JSON escape sequences except \\ (\uXXXX, \b, \f, \n, \r, \t)

Section 2.4 of the timestamp format doc has been updated to reflect these changes.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Added test case to validate that illegal characters and escape sequences are not allowed in timestamp patterns
  • Added test case to validate that timestamp patterns containing JSON escape sequences can be used to parse and marshal JSON literals
  • Added test case to validate that JSON escape sequences work as expected for parsing both JSON literals and raw UTF-8

Summary by CodeRabbit

  • New Features

    • JSON-style escaped backslashes supported when parsing and marshaling timestamps.
    • Dynamic generation of timestamp patterns for variable components.
    • New parsing-mode flag to toggle JSON-literal vs raw handling and small public helpers to validate escapes/characters.
  • Bug Fixes

    • Stronger validation rejecting unsupported escape sequences and illegal characters with distinct error classifications.
    • Safer string handling in pattern processing.
  • Documentation

    • Clarifies JSON-literal behaviour and new error cases.
  • Tests

    • New and expanded tests for JSON escapes and invalid-pattern cases.

✏️ Tip: You can customize this high-level summary in your review settings.

@gibber9809 gibber9809 requested a review from a team as a code owner December 3, 2025 18:25
@coderabbitai

coderabbitai Bot commented Dec 3, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

Adds JSON-style escape handling and stricter pattern validation to the timestamp parser, introduces three exported predicate helpers and dynamic pattern assembly, threads an is_json_literal flag through parse/search/marshal APIs, extends error codes/messages, and updates tests to exercise JSON-escaped timestamps and invalid patterns.

Changes

Cohort / File(s) Change Summary
Core parser implementation
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
Adds cJsonEscapedBackslash; implements three helper predicates (is_unsupported_json_escape_sequence, is_unsupported_character, is_repeatable_escape_sequence); threads is_json_literal through parse/marshal paths; marshals JSON-escaped backslashes when appropriate; replaces some substr with std::string_view; adds CatSequenceReplacement mechanism and dynamic pattern assembly; updates error signaling to use InvalidEscapeSequence / InvalidCharacter.
Public API & headers
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
Updates signatures to add bool is_json_literal to parse_timestamp and search_known_timestamp_patterns; documents JSON-literal behaviour and new escape/character validity; declares the three new exported helper predicate functions.
Error codes (declaration)
components/core/src/clp_s/timestamp_parser/ErrorCode.hpp
Adds two enum values to ErrorCodeEnum: InvalidEscapeSequence and InvalidCharacter.
Error messages (mapping)
components/core/src/clp_s/timestamp_parser/ErrorCode.cpp
Maps the new enum values to human-readable error messages in the error-category message switch.
Tests
components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp
Updates all parse_timestamp call sites to pass the new is_json_literal boolean (mostly false); adds tests for illegal pattern templates and escape-sequence handling; adds end-to-end JSON-escape parsing/marshalling tests; includes error-code header.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Inspect correctness of the three new predicate helpers against expected JSON escape and pattern rules.
  • Verify is_json_literal is propagated correctly through search_known_timestamp_patternsparse_timestamp → marshal paths and that JSON-escaped backslashes are handled consistently.
  • Review dynamic pattern generation (CatSequenceReplacement / cat_sequence_replacements) for correctness and safe std::string_view lifetimes.
  • Confirm new error codes (InvalidEscapeSequence, InvalidCharacter) are used consistently and that tests cover error paths.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title check ⚠️ Warning The title focuses on control characters and lone double quotes, but the PR primarily addresses JSON escape sequence support, pattern validation, and dynamic pattern generation across the codebase. Consider updating the title to reflect the main scope: 'feat(clp-s::timestamp_parser): Add JSON escape sequence support and strengthen pattern validation' or use the suggested alternative that covers both control characters and escape sequences.
Docstring Coverage ⚠️ Warning Docstring coverage is 31.25% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 69c88c3 and 7931dc7.

📒 Files selected for processing (1)
  • components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
🧠 Learnings (4)
📚 Learning: 2024-10-07T21:16:41.660Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
📚 Learning: 2024-10-07T21:35:04.362Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
📚 Learning: 2025-07-17T16:08:23.185Z
Learnt from: SharafMohamed
Repo: y-scope/clp PR: 1033
File: components/core/config/schemas.txt:42-43
Timestamp: 2025-07-17T16:08:23.185Z
Learning: In log-surgeon, regex patterns like `(?:\d{1,2}| \d)` are not supported. Log-surgeon has its own regex implementation with different capabilities compared to standard regex engines. Patterns like `[ 0-9]{2}` that allow character classes with spaces are valid in log-surgeon schemas.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
📚 Learning: 2025-07-17T16:02:57.688Z
Learnt from: SharafMohamed
Repo: y-scope/clp PR: 1033
File: components/core/config/schemas.txt:96-97
Timestamp: 2025-07-17T16:02:57.688Z
Learning: In log-surgeon, the `.` character is treated as a literal non-delimiter character rather than a regex wildcard that matches any character. This means patterns like `.*` in log-surgeon schemas match sequences of non-delimiter characters, not arbitrary characters like in standard regex.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
🧬 Code graph analysis (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (1)
  • nodiscard (1838-1851)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: package-image
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build (ubuntu-24.04)
  • GitHub Check: build (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (4)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (4)

24-31: Error-code docs for TimestampPattern::create look consistent.

The added bullets for InvalidEscapeSequence and InvalidCharacter clearly describe the new validation failure modes for patterns and align with the later description of illegal characters/escapes; no changes needed here.


202-211: Pattern-level constraints on escapes and characters are clearly documented.

This block now spells out that only \\ is a valid escape and that patterns must not contain ASCII control characters or lone double quotes (outside JSON string-literal quoting), which matches the PR’s intent and should make the error semantics much clearer to callers.


239-247: marshal_timestamp documentation aligns with JSON-literal focus.

Updating the description to “Marshals a timestamp as a JSON literal according to a timestamp pattern” accurately reflects the intended behaviour and complements the new is_json_literal parsing semantics.


253-270: search_known_timestamp_patterns parameter threading and docs are consistent.

Threading is_json_literal through this helper and documenting it identically to parse_timestamp keeps the API surface coherent, and placing generated_pattern last makes it clear it is an output buffer.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@LinZhihao-723 LinZhihao-723 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm.

for (size_t pattern_idx{start_idx}; pattern_idx < end_idx; ++pattern_idx) {
auto const cur_format_specifier{pattern.at(pattern_idx)};
if (is_unsupported_character(cur_format_specifier)) {
return ErrorCode{ErrorCodeEnum::InvalidTimestampPattern};

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we create two special error code for unsupported control characters and unsupported escape sequence?

Comment on lines 1679 to 1762
if ('\\' == timestamp[timestamp_idx]) {
size_t const field_length{is_json_literal ? 2ULL : 1ULL};
if (timestamp_idx + field_length > timestamp.size()) {
return ErrorCode{ErrorCodeEnum::IncompatibleTimestampPattern};
}

if (false == is_json_literal && '\\' == timestamp.at(timestamp_idx)) {
++timestamp_idx;
continue;
}
if (timestamp.substr(timestamp_idx).starts_with(cJsonEscapedBackslash)) {
timestamp_idx += cJsonEscapedBackslash.size();
continue;
}
return ErrorCode{ErrorCodeEnum::IncompatibleTimestampPattern};

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about rewrite in this way to make it cleaner to understand:

                auto const expected_backslash{is_json_literal ? cJsonEscapedBackslash : "\\"};
                if (false == timestamp.substr(timestamp_idx).starts_with(expected_backslash)) {
                    return ErrorCode{ErrorCodeEnum::IncompatibleTimestampPattern};
                }
                timestamp_idx += expected_backslash.size();
                break;

start_with should handle the case where the string view is shorter than expected_backslash.

Comment on lines +222 to +223
std::string& generated_pattern,
bool is_json_literal

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should swap them since generated_pattern is an output param

};
for (auto const& illegal_timestamp_pattern_template : illegal_timestamp_pattern_templates) {
auto const result{TimestampPattern::create(illegal_timestamp_pattern_template)};
REQUIRE(result.has_error());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should assert the returned error code if we plan to distinguish invalid/unsupported chars from a general error.

Comment thread components/core/src/clp_s/timestamp_parser/TimestampParser.cpp Outdated
* @param c
* @return Whether `c` is an unsupported character.
*/
[[nodiscard]] auto is_unsupported_character(char c) -> bool;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Shall we be explicit to jsut call it "control characters"?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that it is control characters + ", so I wanted a more generic name.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure sure, make sense

* - \P Unknown-precision epoch time -- resolves to \E, \L, \C, or \N based on a heuristic.
* - \O{...} One of several literal characters, described by content between {}.
* - \s Generic zero-padded second (00-60) -- resolves to \S or \J.
*

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we document our limitations on those unsupported chars in this docstring?

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7aa82b7 and 9d45fe1.

📒 Files selected for processing (5)
  • components/core/src/clp_s/timestamp_parser/ErrorCode.cpp (1 hunks)
  • components/core/src/clp_s/timestamp_parser/ErrorCode.hpp (1 hunks)
  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (14 hunks)
  • components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (5 hunks)
  • components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp (11 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp_s/timestamp_parser/ErrorCode.cpp
  • components/core/src/clp_s/timestamp_parser/ErrorCode.hpp
  • components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp
  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
  • components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
🧠 Learnings (6)
📚 Learning: 2024-11-26T19:12:09.102Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 486
File: components/core/src/clp/error_handling/ErrorCode.hpp:131-134
Timestamp: 2024-11-26T19:12:09.102Z
Learning: In the error handling system (`components/core/src/clp/error_handling/ErrorCode.hpp`), the `ErrorCode` class is intentionally designed to wrap the `ErrorCodeEnum`, not to use the enum directly. This wrapper provides a general interface for adding customized error enums.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/ErrorCode.cpp
  • components/core/src/clp_s/timestamp_parser/ErrorCode.hpp
📚 Learning: 2024-11-26T19:10:05.051Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 486
File: components/core/src/clp/error_handling/ErrorCode.hpp:34-34
Timestamp: 2024-11-26T19:10:05.051Z
Learning: In the `ErrorCategory` template class in `ErrorCode.hpp`, the `name()` and `message(ErrorCodeEnum)` methods should not be declared as pure virtual functions because the class relies on template specializations, and making them pure virtual would make the class abstract, which is not intended.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/ErrorCode.cpp
📚 Learning: 2024-10-07T21:16:41.660Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp
  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
  • components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
📚 Learning: 2025-01-29T22:16:52.665Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.cpp:29-46
Timestamp: 2025-01-29T22:16:52.665Z
Learning: In C++ code, when throwing exceptions, prefer throwing the original return code (RC) from lower-level operations instead of a generic error code to preserve error context and aid in debugging.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp
📚 Learning: 2024-10-07T21:35:04.362Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp
  • components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
📚 Learning: 2024-11-15T03:15:45.919Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 593
File: components/core/tests/test-NetworkReader.cpp:216-219
Timestamp: 2024-11-15T03:15:45.919Z
Learning: In the `network_reader_with_valid_http_header_kv_pairs` test case in `components/core/tests/test-NetworkReader.cpp`, additional error handling for JSON parsing failures is not necessary, as the current error message is considered sufficient.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: package-image
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: build (macos-15)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build (ubuntu-24.04)
  • GitHub Check: build-macos (macos-15, true)
🔇 Additional comments (14)
components/core/src/clp_s/timestamp_parser/ErrorCode.cpp (1)

28-31: LGTM!

The new error code messages are clear and appropriately describe the error conditions. The integration with the existing switch statement is correct.

components/core/src/clp_s/timestamp_parser/ErrorCode.hpp (1)

13-15: LGTM!

The new error codes InvalidEscapeSequence and InvalidCharacter are appropriately added to the enum, supporting the new validation logic for timestamp patterns.

components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp (3)

146-179: LGTM! Comprehensive test coverage for illegal pattern validation.

The test cases thoroughly cover:

  • Control characters (\x00, \x01, \x1f)
  • Double quotes in invalid positions
  • All unsupported JSON escape sequences (\b, \f, \n, \r, \t, \u, \uXXXX)

The error code assertions are correctly implemented as requested in the past review.


181-198: LGTM! Well-structured tests for JSON-literal vs raw UTF-8 backslash handling.

The test cases correctly verify:

  • is_json_literal=false: pattern \\ matches input \ (single backslash)
  • is_json_literal=true: pattern \\ matches input \\ (escaped backslash)

656-686: LGTM! Good integration test for JSON escape sequence round-trip.

The test verifies that timestamps with JSON-escaped backslashes can be parsed with is_json_literal=true and that marshalling produces the same output, ensuring bidirectional consistency.

components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (2)

202-211: LGTM!

The documentation clearly specifies the allowed escape sequence (\\) and illegal characters (control characters \x00-\x1F and double quotes except as pattern delimiters).


232-237: LGTM!

The new is_json_literal parameter is consistently added to both parse_timestamp and search_known_timestamp_patterns, enabling proper handling of JSON-escaped content during parsing.

Also applies to: 265-270

components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (7)

121-122: LGTM!

The cJsonEscapedBackslash constant is appropriately defined for representing a JSON-escaped backslash.


884-886: Verify signed char behavior for control character detection.

The condition '\x00' <= c && c <= '\x1f' may behave unexpectedly on platforms with signed char. If c has a value >= 0x80 (extended ASCII), c would be negative as a signed char, and '\x00' <= c would be false, which is actually the desired behavior. However, for clarity and portability, consider casting to unsigned char.

 auto is_unsupported_character(char c) -> bool {
-    return '"' == c || ('\x00' <= c && c <= '\x1f');
+    auto const uc = static_cast<unsigned char>(c);
+    return '"' == c || uc <= '\x1f';
 }

913-941: LGTM!

The pattern validation logic is well-structured:

  1. Correctly handles quoted patterns (allowing " only as first/last character)
  2. Validates characters before checking escape sequences
  3. Properly rejects unsupported JSON escape sequences while allowing \\
  4. Allows repeated use of \O{} and \\ escape sequences

759-764: LGTM!

The marshal functions correctly:

  1. Output JSON-escaped backslashes (\\) for the \\ escape sequence
  2. Return InvalidEscapeSequence for unrecognized escape sequences, providing more specific error information

Also applies to: 858-863


1748-1757: LGTM!

The backslash handling correctly differentiates between JSON literals and raw UTF-8:

  • For JSON literals (is_json_literal=true): expects \\ in the timestamp
  • For raw UTF-8 (is_json_literal=false): expects \ in the timestamp

The use of starts_with properly handles edge cases. The coding guideline (false == <expression>) is correctly followed.


1102-1105: LGTM!

Using std::string_view{m_pattern}.substr() instead of m_pattern.substr() avoids potential temporary allocations and is more efficient for these read-only operations.

Also applies to: 1117-1120, 1134-1137, 1149-1152


1838-1851: LGTM!

The is_json_literal parameter is correctly propagated to parse_timestamp, ensuring consistent JSON-literal handling throughout the timestamp pattern search.

Comment thread components/core/src/clp_s/timestamp_parser/TimestampParser.hpp

@LinZhihao-723 LinZhihao-723 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly polished the docstring otherwise we should be ready to go.

Comment thread components/core/src/clp_s/timestamp_parser/TimestampParser.hpp Outdated
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
LinZhihao-723
LinZhihao-723 previously approved these changes Dec 4, 2025

@LinZhihao-723 LinZhihao-723 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

feat(clp-s::timestamp_parser): Invalidate unsupported characters and escape sequences in timestamp patterns.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (1)

29-30: Fix the missing closing backtick in the documentation.

Line 30 is missing a closing backtick after pattern.

Apply this diff to fix the typo:

- * - ErrorCodeEnum::InvalidCharacter if `pattern contains an unsupported character.
+ * - ErrorCodeEnum::InvalidCharacter if `pattern` contains an unsupported character.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d81db82 and 69c88c3.

📒 Files selected for processing (1)
  • components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
🧠 Learnings (3)
📚 Learning: 2024-10-07T21:16:41.660Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
📚 Learning: 2025-07-17T16:08:23.185Z
Learnt from: SharafMohamed
Repo: y-scope/clp PR: 1033
File: components/core/config/schemas.txt:42-43
Timestamp: 2025-07-17T16:08:23.185Z
Learning: In log-surgeon, regex patterns like `(?:\d{1,2}| \d)` are not supported. Log-surgeon has its own regex implementation with different capabilities compared to standard regex engines. Patterns like `[ 0-9]{2}` that allow character classes with spaces are valid in log-surgeon schemas.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
📚 Learning: 2025-07-17T16:02:57.688Z
Learnt from: SharafMohamed
Repo: y-scope/clp PR: 1033
File: components/core/config/schemas.txt:96-97
Timestamp: 2025-07-17T16:02:57.688Z
Learning: In log-surgeon, the `.` character is treated as a literal non-delimiter character rather than a regex wildcard that matches any character. This means patterns like `.*` in log-surgeon schemas match sequences of non-delimiter characters, not arbitrary characters like in standard regex.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
🧬 Code graph analysis (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (1)
  • nodiscard (1838-1851)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: package-image
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: build (macos-15)
  • GitHub Check: build (ubuntu-24.04)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (4)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (4)

232-237: LGTM! The new parameter is well-documented and appropriately positioned.

The addition of the is_json_literal parameter aligns with the PR objectives to support parsing both JSON literals and raw UTF-8 strings. The parameter is correctly documented and positioned before the output parameter.


230-230: LGTM! Error code documentation is complete.

The addition of ErrorCodeEnum::InvalidEscapeSequence to the documented error codes properly describes the new validation behaviour.


240-240: LGTM! Documentation accurately describes JSON literal marshalling.

The updated documentation clearly indicates that timestamps are marshalled as JSON literals, which aligns with the PR's goal of supporting JSON-literal handling.


265-270: LGTM! Signature update is consistent with the parsing function.

The addition of the is_json_literal parameter to search_known_timestamp_patterns is consistent with the corresponding change to parse_timestamp, and the implementation correctly propagates this flag through to the parsing logic.

Comment thread components/core/src/clp_s/timestamp_parser/TimestampParser.hpp Outdated
@gibber9809

Copy link
Copy Markdown
Contributor Author

For the PR title, how about:

feat(clp-s::timestamp_parser): Invalidate unsupported characters and escape sequences in timestamp patterns.

How about:

feat(clp-s::timestamp_parser): Make control characters and lone double quotes illegal in timestamp patterns.

Trying to be a bit more precise since technically the unsupported escape sequences were already illegal, we're just being a bit more defensive about them.

@gibber9809 gibber9809 changed the title feat(clp-s::timestamp_parser): Make unsupported characters and escape sequences illegal in timestamp patterns. feat(clp-s::timestamp_parser): Make control characters and lone double quotes illegal in timestamp patterns. Dec 4, 2025
@gibber9809 gibber9809 merged commit d6c2122 into y-scope:main Dec 4, 2025
27 checks passed
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
…e quotes illegal in timestamp patterns. (y-scope#1723)

Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants