Skip to content

chore(package)!: Set results_cache.retention_period to 60min so that old search results aren't retained indefinitely by default.#1231

Merged
haiqi96 merged 2 commits into
y-scope:mainfrom
haiqi96:retention_period_update
Aug 19, 2025
Merged

chore(package)!: Set results_cache.retention_period to 60min so that old search results aren't retained indefinitely by default.#1231
haiqi96 merged 2 commits into
y-scope:mainfrom
haiqi96:retention_period_update

Conversation

@haiqi96

@haiqi96 haiqi96 commented Aug 19, 2025

Copy link
Copy Markdown
Contributor

Description

As discussed offline with @kirkrodrigues, it would make more sense to enable retention control for search results by default, since user only have the access to the search results generated by the latest query job.

We choose 60 minutes to be the retention period since

  1. It's relatively small so stale search results won't be retained for too long.
  2. 60 is 2x of the sweep interval (30). Note that it would not make much sense to have retention_period < sweep_interval. 2x is a reasonable factor for the use case.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Manually verified that garbage collector starts with the package and the following log is printed:

[INFO] Creating search-result-garbage-collector with retention period = 60 minutes

Summary by CodeRabbit

  • New Features

    • Results cache now defaults to a 60-minute retention period when not configured, automatically expiring older entries to manage storage and performance.
  • Documentation

    • Updated sample configuration comments to reflect the new 60-minute default for results cache retention.

@coderabbitai

coderabbitai Bot commented Aug 19, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

Default for ResultsCache.retention_period changed from None to 60 in Python config. Corresponding commented default in the package template YAML updated from null to 60. Validation logic and other control flow remain unchanged.

Changes

Cohort / File(s) Summary
Config model default update
components/clp-py-utils/clp_py_utils/clp_config.py
Changed ResultsCache.retention_period default from None to 60; type remains Optional[int]; existing validator unchanged.
Template config comment update
components/package-template/src/etc/clp-config.yml
Updated commented example default for results_cache.retention_period from null to 60; no active config change.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@haiqi96 haiqi96 marked this pull request as ready for review August 19, 2025 14:07
@haiqi96 haiqi96 requested a review from a team as a code owner August 19, 2025 14:07

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c04e093 and 94a9668.

📒 Files selected for processing (2)
  • components/clp-py-utils/clp_py_utils/clp_config.py (1 hunks)
  • components/package-template/src/etc/clp-config.yml (1 hunks)
🔇 Additional comments (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)

365-365: LGTM: sensible default and validator are consistent

Defaulting ResultsCache.retention_period to 60 minutes while allowing None is reasonable, and the validator (> 0 when not None) is correct.

db_name: str = "clp-query-results"
stream_collection_name: str = "stream-files"
retention_period: Optional[int] = None
retention_period: Optional[int] = 60

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

❓ Verification inconclusive

Behaviour change: enable deletion by default; document + add config guard and avoid magic number

Changing the default from None to 60 means deployments that previously omitted this setting will now have search results deleted after 60 minutes. Recommend:

  • Call out this breaking default change in release notes and package docs.
  • Add a cross-field validation ensuring retention_period (if set) is not less than garbage_collector.sweep_interval.search_result.
  • Replace the magic number 60 with a named constant to reduce drift with the template YAML.

Diff within this hunk to use a constant:

-    retention_period: Optional[int] = 60
+    retention_period: Optional[int] = DEFAULT_SEARCH_RESULT_RETENTION_MINUTES

Add this constant near the other “Constants” in this module:

# Default retention settings
DEFAULT_SEARCH_RESULT_RETENTION_MINUTES = 60

Optionally, enforce the relationship at the aggregate config level:

# In class CLPConfig
from pydantic import root_validator

class CLPConfig(BaseModel):
    # ... existing fields ...

    @root_validator
    def validate_retention_vs_sweep(cls, values):
        rc = values.get("results_cache")
        gc = values.get("garbage_collector")
        if rc and gc:
            rp = rc.retention_period
            sweep = gc.sweep_interval.search_result
            if rp is not None and rp < sweep:
                raise ValueError(
                    f"results_cache.retention_period ({rp} min) must be >= "
                    f"garbage_collector.sweep_interval.search_result ({sweep} min)."
                )
        return values

Verification suggestion:

  • Ensure release notes explicitly mention the default enablement of result retention and the 60-minute default.
  • If you want, I can add a unit test asserting the default retention and the cross-field validation behaviour.

Use a named constant for retention_period and add cross-field validation

• components/clp-py-utils/clp_py_utils/clp_config.py:365
Change
retention_period: Optional[int] = 60
to
retention_period: Optional[int] = DEFAULT_SEARCH_RESULT_RETENTION_MINUTES

• In the same module (near the other constants), add:

# Default retention settings
DEFAULT_SEARCH_RESULT_RETENTION_MINUTES = 60

• In the CLPConfig model, enforce that results_cache.retention_period (if set) is ≥ garbage_collector.sweep_interval.search_result:

from pydantic import root_validator

class CLPConfig(BaseModel):
    # … existing fields …

    @root_validator
    def validate_retention_vs_sweep(cls, values):
        rc = values.get("results_cache")
        gc = values.get("garbage_collector")
        if rc and gc:
            rp = rc.retention_period
            sweep = gc.sweep_interval.search_result
            if rp is not None and rp < sweep:
                raise ValueError(
                    f"results_cache.retention_period ({rp} min) must be ≥ "
                    f"garbage_collector.sweep_interval.search_result ({sweep} min)"
                )
        return values

Document the breaking change in your release notes and package docs: deployments that omit this setting will now delete search results after 60 minutes by default.

(Optional) Add unit tests to assert the default value and validation logic.

🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/clp_config.py around line 365, replace
the literal default retention_period value with a named constant and add
cross-field validation: add a new constant
DEFAULT_SEARCH_RESULT_RETENTION_MINUTES = 60 near the other module constants,
change retention_period: Optional[int] = 60 to retention_period: Optional[int] =
DEFAULT_SEARCH_RESULT_RETENTION_MINUTES, and add a pydantic root_validator on
CLPConfig that checks if results_cache.retention_period (when not None) is >=
garbage_collector.sweep_interval.search_result and raises a ValueError with a
clear message if not; also note this breaking change in release notes/docs
(default retention now 60 minutes) and consider adding unit tests for the
default and validation logic.

#
# # Retention period for search results, in minutes. Set to null to disable automatic deletion.
# retention_period: null
# retention_period: 60

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Commented default moved to 60 — consider adding an inline hint about sweep interval

Runtime behaviour isn’t affected since this is commented, but adding a short hint helps users keep it consistent with the GC sweep interval.

Apply this diff to make the guidance explicit:

-#  retention_period: 60
+#  retention_period: 60  # minutes; should be >= garbage_collector.sweep_interval.search_result
🤖 Prompt for AI Agents
In components/package-template/src/etc/clp-config.yml around line 57, the
commented default "retention_period: 60" lacks an inline hint about matching the
GC sweep interval; update the commented line to include a short inline note
(e.g., "# retention_period: 60  # should match GC sweep interval; adjust if your
sweeper runs at different interval") so users know to keep retention_period
consistent with the garbage-collection sweep interval.

@kirkrodrigues kirkrodrigues left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

chore(package)!: Set `results_cache.retention_period` to 60min so that old search results aren't retained indefinitely by default.

I marked this as breaking since it is a change to the existing behaviour since the last release.

@haiqi96 haiqi96 changed the title docs(package): Update search result's default retention. chore(package): Set results_cache.retention_period to 60min so that old search results aren't retained indefinitely by default. Aug 19, 2025
@haiqi96 haiqi96 merged commit 51eaeeb into y-scope:main Aug 19, 2025
10 of 11 checks passed
@kirkrodrigues kirkrodrigues changed the title chore(package): Set results_cache.retention_period to 60min so that old search results aren't retained indefinitely by default. chore(package)!: Set results_cache.retention_period to 60min so that old search results aren't retained indefinitely by default. Aug 19, 2025
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
… old search results aren't retained indefinitely by default. (y-scope#1231)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants