Skip to content

fix(clp-s::log_converter): Make timezone offset optional in timezone regex (fixes #1952).#2134

Merged
gibber9809 merged 1 commit into
y-scope:mainfrom
gibber9809:add-logconverter-timestamp-format
Mar 26, 2026
Merged

fix(clp-s::log_converter): Make timezone offset optional in timezone regex (fixes #1952).#2134
gibber9809 merged 1 commit into
y-scope:mainfrom
gibber9809:add-logconverter-timestamp-format

Conversation

@gibber9809

@gibber9809 gibber9809 commented Mar 26, 2026

Copy link
Copy Markdown
Contributor

Description

This PR fixes an issue in log-converter where the timezone Z wasn't accepted as a valid timezone on its own since the timezone regex required a timezone offset in matching timezones.

The fix is simply to update the regex to make the offset part of the timezone optional.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  1. Processed the following dataset into log-converter:
2026-03-26T07:03:14,788Z INFO [main] One test line.
2026-03-26T07:03:17,222Z INFO [main] Another line.
  1. Ingested the resulting converted log into clp-s using the --timestamp-key timestamp option.
  2. Observed that the search query timestamp < timestamp("2026-03-26T07:03:17,222Z") returned the following result, as expected:
{"message":" INFO [main] One test line.\n","timestamp":"2026-03-26T07:03:14,788Z"}

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Improved timestamp parsing flexibility in log conversion to better support varying fractional-seconds formats.

@gibber9809 gibber9809 requested a review from a team as a code owner March 26, 2026 14:37
@gibber9809 gibber9809 linked an issue Mar 26, 2026 that may be closed by this pull request
@coderabbitai

coderabbitai Bot commented Mar 26, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

Modified the timestamp schema regex pattern in the log converter to make fractional seconds an optional component directly following the seconds field, rather than being conditionally tied to timezone or subsequent segments. This alters which timestamp formats are accepted during log parsing.

Changes

Cohort / File(s) Summary
Timestamp Schema Pattern
components/core/src/clp_s/log_converter/LogConverter.cpp
Updated cTimestampSchema regex to reposition the optional fractional-seconds component ([,\.:]\d{1,9}){0,1} from being dependent on timezone/follow-on alternatives to being a direct optional modifier after the seconds field in the non-timezone timestamp pattern.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: making timezone offset optional in the timezone regex to fix issue #1952, which aligns with the PR's core objective of accepting standalone 'Z' timezone designators.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/core/src/clp_s/log_converter/LogConverter.cpp`:
- Around line 32-33: Add a unit test that verifies the regex in LogConverter
(the multi-line raw string literal that matches timestamps, including the Z
timezone branch) correctly accepts and parses timestamps ending with a bare "Z"
(e.g. "2023-12-01T12:34:56Z"); call the same public method used in production
(e.g., LogConverter::convertLine or LogConverter::parseTimestamp) with a sample
log containing a Z-only timezone and assert the timestamp is
recognized/converted to UTC and no error is produced; place the test alongside
other LogConverter tests so future changes to the regex literal will be caught
by CI.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: dd60f952-fc94-48d8-a9c4-24efc06c7edb

📥 Commits

Reviewing files that changed from the base of the PR and between dd0630e and 953d99f.

📒 Files selected for processing (1)
  • components/core/src/clp_s/log_converter/LogConverter.cpp

Comment on lines +32 to +33
R"((Dec(ember){0,1}))[ /\-]\d{2,4}))[ T:][ 0-9]{2}:[ 0-9]{2}:[ 0-9]{2}([,\.:]\d{1,9}){0,1})"
R"(([ ]{0,1}(UTC){0,1}([\+\-]\d{2}(:{0,1}\d{2}){0,1}){0,1}Z{0,1}){0,1}))"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add a regression test for Z-only timezone timestamps.

The case is manually validated in the PR, but it should be covered by automated tests to avoid future regex regressions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/core/src/clp_s/log_converter/LogConverter.cpp` around lines 32 -
33, Add a unit test that verifies the regex in LogConverter (the multi-line raw
string literal that matches timestamps, including the Z timezone branch)
correctly accepts and parses timestamps ending with a bare "Z" (e.g.
"2023-12-01T12:34:56Z"); call the same public method used in production (e.g.,
LogConverter::convertLine or LogConverter::parseTimestamp) with a sample log
containing a Z-only timezone and assert the timestamp is recognized/converted to
UTC and no error is produced; place the test alongside other LogConverter tests
so future changes to the regex literal will be caught by CI.

Comment on lines +32 to +33
R"((Dec(ember){0,1}))[ /\-]\d{2,4}))[ T:][ 0-9]{2}:[ 0-9]{2}:[ 0-9]{2}([,\.:]\d{1,9}){0,1})"
R"(([ ]{0,1}(UTC){0,1}([\+\-]\d{2}(:{0,1}\d{2}){0,1}){0,1}Z{0,1}){0,1}))"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iiuc, Z is a short term for UTC and it should be mutually exclusive to the UTC offset. Does it make more sense to write in this way (or similar)?

Suggested change
R"((Dec(ember){0,1}))[ /\-]\d{2,4}))[ T:][ 0-9]{2}:[ 0-9]{2}:[ 0-9]{2}([,\.:]\d{1,9}){0,1})"
R"(([ ]{0,1}(UTC){0,1}([\+\-]\d{2}(:{0,1}\d{2}){0,1}){0,1}Z{0,1}){0,1}))"
R"((Dec(ember){0,1}))[ /\-]\d{2,4}))[ T:][ 0-9]{2}:[ 0-9]{2}:[ 0-9]{2})"
R"(([,\.:]\d{1,9}){0,1}(([ ]{0,1}(UTC){0,1}[\+\-]\d{2}(:{0,1}\d{2}))|(Z)){0,1}))"

@gibber9809 gibber9809 Mar 26, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice it isn't mutually exclusive. We see timezones like -00Z in the wild. (Edited to a more common case).

@gibber9809 gibber9809 merged commit 7eba9fe into y-scope:main Mar 26, 2026
30 of 32 checks passed
@junhaoliao junhaoliao added this to the March 2026 milestone Mar 26, 2026
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

log-converter: Some timezones not extracted correctly.

3 participants