Skip to content

feat(clp-s): Change behaviour for kv-ir ingestion to warn and continue on error#733

Merged
gibber9809 merged 1 commit into
y-scope:mainfrom
gibber9809:kvir-warn-on-error
Mar 11, 2025
Merged

feat(clp-s): Change behaviour for kv-ir ingestion to warn and continue on error#733
gibber9809 merged 1 commit into
y-scope:mainfrom
gibber9809:kvir-warn-on-error

Conversation

@gibber9809

@gibber9809 gibber9809 commented Feb 26, 2025

Copy link
Copy Markdown
Contributor

Description

Properly handling kv-ir requires us to support ingesting truncated kv-ir streams, which can come about when workers producing those streams crash.

This PR modifies ingestion behaviour to handle most kv-ir deserialization errors by logging a warning and continuing on to the next stream. The only exceptions are that

  1. If we can not open a stream at all then we treat that as an error and fail ingestion
  2. If we encounter a deserialization error but we are ingesting data from a network source whose reader has encountered a network error then the network error takes precedence and ingestion fails

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Validated that ingestion succeeds with a warning when removing the zero byte at the end of an IR stream
  • Validated that ingestion succeeds with a warning when ingesting an IR stream that has been truncated in a random location (and that records that were not part of the truncated data appear in the archive)

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced error handling during log processing, ensuring that less critical issues generate warnings instead of errors.
    • Improved response to incomplete data, allowing for more graceful adjustment during unexpected stream interruptions.

@coderabbitai

coderabbitai Bot commented Feb 26, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

This change updates the error handling mechanism within the JsonParser::parse_from_ir method. The logging level is changed from SPDLOG_ERROR to SPDLOG_WARN and the error messages are augmented to include the stream path. The control flow has been modified to conditionally close the archive writer and decompressor based on the outcome of the check_and_log_curl_error function, differentiating between a fatal error and a truncated stream scenario.

Changes

File Path Change Summary
components/core/src/clp_s/JsonParser.cpp Modified error handling in parse_from_ir: logging level updated to SPDLOG_WARN, error message format now includes stream path, and control flow adjusted based on the result of check_and_log_curl_error (closes writer/decompressor and returns false vs. breaking the loop).

Sequence Diagram(s)

sequenceDiagram
    participant Parser as JsonParser::parse_from_ir
    participant CurlFunc as check_and_log_curl_error
    participant Writer as ArchiveWriter/Decompressor

    Parser->>Parser: Attempt deserialization of kv-ir log events
    alt Error Encountered
        Parser->>Parser: Log error with SPDLOG_WARN (includes stream path)
        Parser->>CurlFunc: Call check_and_log_curl_error
        alt Return value is true
            Parser->>Writer: Close writer and decompressor
            Parser->>Parser: Return false
        else Return value is false
            Parser->>Parser: Treat error as end-of truncated stream
            Parser->>Parser: Break from loop
        end
    end
Loading

Possibly related PRs

Suggested reviewers

  • wraymo
  • kirkrodrigues

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9fdc765 and d5bcac3.

📒 Files selected for processing (1)
  • components/core/src/clp_s/JsonParser.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ...

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • components/core/src/clp_s/JsonParser.cpp
🧠 Learnings (1)
components/core/src/clp_s/JsonParser.cpp (5)
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-11-10T16:46:53.300Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:717-726
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In the `JsonParser::parse_from_IR` function, when deserializing, it's intentional to break out of the loop without additional error handling when `kv_log_event_result.has_error()` returns true for errors other than `no_message_available` and `result_out_of_range`, as this signifies the end of the IR file.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-11-10T16:46:53.300Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir()` function, reaching the end of log events in a given IR is not considered an error case. The errors `std::errc::no_message_available` and `std::errc::result_out_of_range` are expected signals to break the deserialization loop and proceed accordingly.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:629-733
Timestamp: 2024-11-10T16:46:53.300Z
Learning: In the `JsonParser::parse_kv_log_event` method in `components/core/src/clp_s/JsonParser.cpp`, prefer passing exceptions to higher levels for more graceful handling, rather than handling them at that level.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-11-10T16:46:53.300Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: lint-check (macos-latest)
🔇 Additional comments (2)
components/core/src/clp_s/JsonParser.cpp (2)

933-937: Improved error logging with warning level instead of error

The change from SPDLOG_ERROR to SPDLOG_WARN and the inclusion of the stream path in the error message are appropriate adjustments that align with the PR objective to warn and continue on kv-ir deserialization errors rather than treating them as fatal issues.


940-947: Enhanced error handling flow for deserialization errors

This is a well-implemented change to the control flow that distinguishes between network errors and truncated streams. The code now:

  1. Checks for network errors using check_and_log_curl_error
  2. Fails only if a network error is detected
  3. Treats other deserialization errors as truncated streams and continues processing

This change directly implements the PR objective to "change behaviour for kv-ir ingestion to warn and continue on error" while maintaining the exception for network-related errors.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@gibber9809 gibber9809 force-pushed the kvir-warn-on-error branch from a5b3404 to d5bcac3 Compare March 7, 2025 16:28
@gibber9809 gibber9809 marked this pull request as ready for review March 7, 2025 16:28
@gibber9809 gibber9809 requested a review from wraymo as a code owner March 7, 2025 16:28

@wraymo wraymo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@gibber9809 gibber9809 merged commit ad95c99 into y-scope:main Mar 11, 2025
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants