Skip to content

fix(clp-mcp-server): Refine system prompt to make LLMs' KQL query generation more CLP friendly.#1494

Merged
20001020ycx merged 33 commits into
y-scope:mainfrom
20001020ycx:10-14-system-prompt
Nov 3, 2025
Merged

fix(clp-mcp-server): Refine system prompt to make LLMs' KQL query generation more CLP friendly.#1494
20001020ycx merged 33 commits into
y-scope:mainfrom
20001020ycx:10-14-system-prompt

Conversation

@20001020ycx

@20001020ycx 20001020ycx commented Oct 24, 2025

Copy link
Copy Markdown
Contributor

Description

With our current system prompt, LLM would generate kql such as kql_query: register OR registration OR registered.
This would result {"Error":"No log events found matching the KQL query."} or {"Error":"Query job with ID 15 ended in status FAILED."} pending the exact query submitted.

The reason why this happens is that the kql generated by LLM is not CLP-compatible; CLP expects a kql query that looks like this, following the convention specified in here.
kql_query: *register* OR *registration* OR *registered*.

This PR refines the system prompt to ensure LLM understands the rule of CLP's variant of KQL, thereby improving the efficiency of the diagnosis.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

We have carefully designed the system prompt and evaluated on two real-world cases: HDFS-4058 and Hadoop-10508. We have run each case three times, and the AI-generated kql reaches 100% accuracy (that is each query submitted wound't be killed or fail). HDFS-4058 makes 5 kql queries per time, and Hadoop-10508 makes 3 kql queires per time.

Summary by CodeRabbit

Release Notes

  • Improvements
    • Updated query guidance system to provide enhanced assistance and better support for CKQL queries with improved terminology and rule clarity.

@coderabbitai

coderabbitai Bot commented Oct 24, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

The SYSTEM_PROMPT constant in the MCP server configuration has been updated to replace Kibana/KQL-oriented guidance with CKQL-specific instructions and rules for the CLP-KQL query system. No other constants or functionality are affected.

Changes

Cohort / File(s) Summary
System Prompt Configuration
components/clp-mcp-server/clp_mcp_server/server/constants.py
Updated SYSTEM_PROMPT constant from KQL/Kibana-focused guidance to CKQL-specific prompting instructions with CKQL terminology and rules

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

  • The change involves a straightforward constant value replacement with no logic modifications or functional implications.

Possibly related issues

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "fix(clp-mcp-server): Refine system prompt to make LLMs' KQL query generation more CLP friendly" directly and accurately describes the main change in the changeset. The title identifies the specific component (clp-mcp-server), the action taken (refining the system prompt), and the primary objective (improving CLP compatibility of LLM-generated KQL queries). This aligns precisely with the actual modification to the SYSTEM_PROMPT constant, and the language is clear and specific enough that a developer reviewing the git history would immediately understand the change's purpose and scope.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@20001020ycx 20001020ycx marked this pull request as ready for review October 24, 2025 18:48
@20001020ycx 20001020ycx requested a review from a team as a code owner October 24, 2025 18:48

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eb7f0d5 and f5e8e30.

📒 Files selected for processing (1)
  • components/clp-mcp-server/clp_mcp_server/server/constants.py (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: clp-lint
components/clp-mcp-server/clp_mcp_server/server/constants.py

[error] 26-26: Q003 [*] Change outer quotes to avoid escaping inner quotes. Found 1 error. 1 fixable with the --fix option.

🪛 Ruff (0.14.1)
components/clp-mcp-server/clp_mcp_server/server/constants.py

26-26: Change outer quotes to avoid escaping inner quotes

Change outer quotes to avoid escaping inner quotes

(Q003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: package-image
🔇 Additional comments (3)
components/clp-mcp-server/clp_mcp_server/server/constants.py (3)

15-16: LGTM: Improved clarity by specifying the database name.

The addition of "called CLP" helps clarify the context for the LLM.


33-36: Excellent additions for improving query quality.

The guidance to ask clarifying questions when user queries are ambiguous and to ensure syntactically correct queries will help prevent the types of issues described in the PR objective.


31-32: Timestamp format documentation is inaccurate—the implementation accepts variable fractional-second precision, not exactly three digits.

The code demonstrates that timestamps with zero, one, or any number of fractional digits are accepted (e.g., 2024-10-18T16:00:00Z and 2024-10-18T16:00:00.123Z both parse successfully). The parsing relies on Python's datetime.fromisoformat(), which is flexible with fractional-second precision. However, the documentation in lines 31–32 specifies the format as YYYY-MM-DDTHH:mm:ss.fffZ (implying exactly three fractional digits), which does not reflect the actual implementation.

Update the format specification to clarify that fractional seconds are optional and any ISO 8601 precision is accepted, or add strict validation to enforce exactly three fractional digits if that is the intended requirement.

Comment thread components/clp-mcp-server/clp_mcp_server/server/constants.py Outdated
Comment thread components/clp-mcp-server/clp_mcp_server/server/constants.py Outdated
Comment thread components/clp-mcp-server/clp_mcp_server/server/constants.py Outdated
@20001020ycx 20001020ycx changed the title fix<clp-mcp-server>: Refine system prompt to improve LLM's KQL query generation to be CLP friendly fix(clp-mcp-server): Refine system prompt to improve LLM's KQL query generation to be CLP friendly Oct 24, 2025

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
components/clp-mcp-server/clp_mcp_server/server/constants.py (2)

23-24: Wildcard guidance addresses the core PR objective.

This new instruction directly solves the issue where queries like register OR registration OR registered were failing. The guidance correctly tells the LLM to use *register* OR *registration* OR *registered* for substring matching.

However, as noted in a previous review, the wording could be clearer and more concise.


25-26: Fix the linting error and indentation inconsistency.

This guidance about quoting multi-word keys/values is valuable for CLP's KQL variant. However, as flagged in previous reviews, there are two formatting issues:

  1. Critical: The quote style causes a linting error that must be fixed.
  2. Minor: The bullet indentation is inconsistent with line 23.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f5e8e30 and df3e46e.

📒 Files selected for processing (1)
  • components/clp-mcp-server/clp_mcp_server/server/constants.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: package-image
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (6)
components/clp-mcp-server/clp_mcp_server/server/constants.py (6)

15-16: LGTM! Clear identification of the database and acronym expansion.

Adding "called CLP" and expanding "KQL" to "Kibana Query Language" improves clarity for users who may not be familiar with the acronym.


20-20: LGTM! Improved readability.

The blank line appropriately separates the general description from the CLP-specific guidelines section.


21-22: LGTM! Essential context about KQL variant.

Explicitly stating that CLP supports a KQL variant provides crucial context for why the following guidelines differ from standard KQL usage patterns.


33-34: LGTM! Promotes better user interaction.

Instructing the LLM to ask clarifying questions when the user's intent is unclear is a best practice that will improve the accuracy of generated queries and overall user experience.


35-36: LGTM! Good defensive requirement.

While this requirement may seem implicit, explicitly instructing the LLM to ensure syntactic correctness is a good practice that reinforces the importance of generating valid, executable queries.


31-32: Timestamp format specification is accurate and verified.

The format YYYY-MM-DDTHH:mm:ss.fffZ with three-digit millisecond precision is correctly specified. The codebase validation confirms:

  • The trailing Z is mandatory (enforced in parse_timestamp_range() at utils.py:45)
  • Millisecond precision (.fff) is consistently documented across constants.py, server.py, and utils.py
  • The format matches exactly what the timestamp parsing and conversion functions expect

No changes needed.

Comment thread components/clp-mcp-server/clp_mcp_server/server/constants.py Outdated
@rishikeshdevsot

Copy link
Copy Markdown
Contributor

@kirkrodrigues @gibber9809 this PR's objective is to explain to the LLM the difference between regular KQL and KQL supported by CLP using as little text/tokens as possible. So one of you should probably take a second look to ensure that the important points are covered.

Some considerations:

  1. We don't want to list all the differences because that's too much text and it will confuse the LLM.
  2. We also can't rely on AI following this link because not all agents will be given internet access and even with internet access, an agent following a link is hit or miss.

@20001020ycx 20001020ycx changed the title fix(clp-mcp-server): Refine system prompt to improve LLM's KQL query generation to be CLP friendly fix(clp-mcp-server): Refine system prompt to make LLM's KQL query generation to be CLP friendly Oct 24, 2025

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
components/clp-mcp-server/clp_mcp_server/server/constants.py (1)

18-19: Clarify the interaction between specificity and wildcards.

The guidance to "be as specific as possible" might appear contradictory with the mandatory wildcard usage (lines 23-24), since wildcards broaden matches. Consider clarifying that "specific" refers to using specific field names and narrowing by time range or field values, while wildcards should still be used for keyword matching as required by CLP.

Example clarification:

-" query should be as specific as possible to minimize the number of log messages returned. When"
+" query should use specific field names, values, and time ranges to minimize the number of log"
+" messages returned. When"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f5e8e30 and 5c88582.

📒 Files selected for processing (1)
  • components/clp-mcp-server/clp_mcp_server/server/constants.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: package-image
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (2)
components/clp-mcp-server/clp_mcp_server/server/constants.py (2)

15-16: Good clarification of the target database.

The change from "log database using KQL" to "log database called CLP using KQL" makes the prompt more specific and sets the context for the CLP-specific guidance that follows.


21-22: Effective introduction to CLP-specific KQL.

This introduction appropriately sets the context that CLP uses a variant of KQL, preparing the LLM for the specific guidelines that follow. This aligns well with the PR objective to teach the LLM about CLP's KQL differences concisely.

Comment thread components/clp-mcp-server/clp_mcp_server/server/constants.py Outdated
Comment thread components/clp-mcp-server/clp_mcp_server/server/constants.py Outdated
# fmt: off
SYSTEM_PROMPT = (
"You are an AI assistant that helps users query a log database using KQL (Kibana Query Language)."
"You are an AI assistant that helps users query a log database called CLP using KQL"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I came up with:

You are an AI assistant that helps users query a log database, called CLP, by using a variant of KQL
(Kibana Query Language) that we'll call CLP-KQL.

You should generate a CLP-KQL query that accurately expresses the user's intent. The generated query
should be as specific as possible to minimize the number of log messages returned. When displaying
log messages, wrap them in hyperlinks with the link field from the search result.

Consider the following guidelines when generating CLP-KQL queries:

  • To search for a substring (a partial match), use the wildcards * and ?.
  • To query a key or a value that contains multiple words separated by spaces, wrap the words in
    double quotation marks.
  • In keys, to search for any of the literal characters \, ", ., *, @, $, !, and #, escape
    them with a backslash.
  • In values, to search for any of the literal characters \, ", ?, and *, escape them with a
    backslash.
  • Consider using search_by_kql_with_timestamp_range to narrow the query to a specific time range.
  • CLP doesn't support fuzzy matches.
  • CLP doesn't support less/greater than comparisons on strings, IP addresses, or timestamps.
  • If the user's prompt is vague, ask clarifying questions to better understand their intent before
    generating the CLP-KQL query.

@20001020ycx 20001020ycx Oct 25, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have problems with the rest of the points except the first two because we have specifically refined those two in order to make AI generate correct kql.
We have iterated many versions and only the current version ensures AI could understand and follow the rule. The takeaway from the experiment was that in order to mark a critical rule effective, we need to be:

  1. marking important rule with CRITICAL,
  2. explaining the rule with a concrete example

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Can you write a revision?
  • Can we add a comment above the prompt explaining what characteristics need to be maintained based on your experiments? Otherwise the next person to edit this likely won't know either, lol.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the second point and will add the rule as the comment right above the constant instantiation. But let's first make an agreement on the actual system prompt string.

Here is my revision:

You are an AI assistant that helps users query a log database, called CLP, by using a variant of KQL
(Kibana Query Language) that we'll call CLP-KQL.

You should generate a CLP-KQL query that accurately expresses the user's intent. The generated query
should be as specific as possible to minimize the number of log messages returned. When displaying
log messages, wrap them in hyperlinks with the link field from the search result.

Consider the following guidelines when generating CLP-KQL queries, and pay special attention to rules marked as CRITICAL - violation will cause query failure:

  • CRITICAL: To search for a substring keyword (a partial match), you MUST use wildcards (* or ?) at the beginning and end of your keyword.
  • CRITICAL: Use double quotation(") to wrap multiple keywords search that are separated by spaces. For example, to search for all logs containing: hello world, your query parameter should literally be: "hello world" (with the quotes as part of the search string wrapping around both wildcards and multiple words).
  • In keys, to search for any of the literal characters , ", ., *, @, $, !, and #, escape them with a backslash.
  • In values, to search for any of the literal characters , ", ?, and *, escape them with a backslash.
  • Consider using search_by_kql_with_timestamp_range to narrow the query to a specific time range. Timestamps must follow the ISO 8601 UTC format (YYYY-MM-DDTHH:mm:ss.fffZ), where the trailing Z indicates UTC.
  • CLP doesn't support fuzzy matches.
  • CLP doesn't support less/greater than comparisons on strings, IP addresses, or timestamps.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are my rationales behind my edits, a key point that I would like to emphasize is that we would like this system prompt to be AI friendly, even if this might not be stricitly following a documentation style for human readers

  1. CRITICAL Rule Marking: LLMs do not guarantee enforcement of all system prompt rules—this was confirmed through empirical testing. Rules 1 and 2 are essential for generating correct CLP-KQL queries; without them, LLMs produce invalid syntax like message: hello world (missing wildcards and quotes). Marking these rules as "CRITICAL" significantly improves KQL success rate, and ensuring LLMs prioritize them even when other rules may be ignored.
  2. Action item first and example showing: What I change in the second point was I move the action item for AI to the beginning of the sentence. I found that effective in the experiement. Same argument goes with example.
  3. Definition of exact format of timestamp: query with timestamp that does not follows ISO 8601 in UTC format will be rejected, so we want to make sure AI generates the exact format
  4. Removing your last point (also this is what I have in my original commit): i feel this is something good to have. However, I found that system prompt's length directly impacts LLM attention, as we don't want LLM loses its focus on CRITICAL points by some minor rules or reminder.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2nd point mentioned by @kirkrodrigues is updated in the pr

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responded in the design doc.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a4b8753 and 2f792c9.

📒 Files selected for processing (1)
  • components/clp-mcp-server/clp_mcp_server/server/constants.py (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: junhaoliao
PR: y-scope/clp#1405
File: components/clp-package-utils/pyproject.toml:5-15
Timestamp: 2025-10-13T03:24:35.074Z
Learning: In the y-scope/clp repository, the Python 3.9 to 3.10 version requirement change was intentionally deferred to a separate PR (after PR #1405) to reduce review effort, as decided in an offline discussion between junhaoliao and kirkrodrigues.
📚 Learning: 2025-05-07T16:56:35.687Z
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.

Applied to files:

  • components/clp-mcp-server/clp_mcp_server/server/constants.py
📚 Learning: 2025-08-15T21:48:40.228Z
Learnt from: quinntaylormitchell
PR: y-scope/clp#1181
File: docs/src/user-guide/guides-retention.md:68-73
Timestamp: 2025-08-15T21:48:40.228Z
Learning: In documentation for the CLP project, when suggesting formatting improvements for variables in explanatory text, the user quinntaylormitchell prefers to maintain existing sentence structures (like keeping "that" in "i.e., that the difference...") while applying monospace formatting to technical terms and variables for consistency.

Applied to files:

  • components/clp-mcp-server/clp_mcp_server/server/constants.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: package-image
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (4)
components/clp-mcp-server/clp_mcp_server/server/constants.py (4)

14-20: Excellent meta-documentation for prompt design.

This comment directly addresses the maintainability concern raised in the review discussion and documents the empirically validated practices. Future editors will understand why critical rules are marked and structured as they are.


28-30: Core wildcard rule addresses the PR's primary objective.

This CRITICAL rule with concrete example directly solves the issue described in the PR objectives where register failed but *register* succeeded. The format follows the documented best practices and has been empirically validated.


42-45: Escaping rules are clear and well-differentiated.

The distinction between key and value escaping rules is clearly documented, helping LLMs generate syntactically correct queries.


49-50: Unsupported features clearly documented.

Explicitly listing unsupported features helps prevent LLMs from generating invalid queries.

Comment thread components/clp-mcp-server/clp_mcp_server/server/constants.py Outdated
Comment on lines +36 to +39
- CRITICAL -- Multi-word text must be quoted: wrap multi-word searches in double quotes.
Example:
request: "*GET wp-admin*"
(quotes and wildcards are required).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Multi-word rule is clear and empirically validated.

The example correctly demonstrates the combination of quotes and wildcards for multi-word substring searches. This format has been validated through testing.

Optional: Consider changing line 39 from "(quotes and wildcards are required)" to "(both quotes and wildcards are required)" to more explicitly emphasize the combination.

🤖 Prompt for AI Agents
In components/clp-mcp-server/clp_mcp_server/server/constants.py around lines 36
to 39, the comment example is fine but line 39's parenthetical phrasing "(quotes
and wildcards are required)" should be changed to "(both quotes and wildcards
are required)" for clarity; update that exact text to the new wording while
preserving the surrounding example and punctuation.

Comment on lines +47 to +48
- Time range: use search_by_kql_with_timestamp_range to constrain by time.\n

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing critical timestamp format specification.

In the review discussion, you explicitly stated (point #3 in rationale) that timestamp format is critical: "query with timestamp that does not follow ISO 8601 in UTC format will be rejected." You proposed including: "Timestamps must follow the ISO 8601 UTC format (`YYYY-MM-DDTHH:mm:ss.fffZ`), where the trailing `Z` indicates UTC."

This specification is missing from the current version. Without it, LLMs may generate timestamps in incorrect formats, causing query failures.

Consider adding the timestamp format specification:

-- Time range: use search_by_kql_with_timestamp_range to constrain by time.\n
+- Time range: use search_by_kql_with_timestamp_range to constrain by time.
+  Timestamps must follow ISO 8601 UTC format: YYYY-MM-DDTHH:mm:ss.fffZ (the trailing Z indicates UTC).
+  Example: 2024-10-15T08:30:00.000Z\n
🤖 Prompt for AI Agents
In components/clp-mcp-server/clp_mcp_server/server/constants.py around lines
47-48, the timestamp format requirement is missing; add a clear constant or
comment stating that all timestamps must follow the ISO 8601 UTC format
"YYYY-MM-DDTHH:mm:ss.fffZ" (with fractional seconds and trailing Z indicating
UTC) and update any related docstring or constant used by
search_by_kql_with_timestamp_range to reference this exact format so generated
queries and validations enforce/rely on that format.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e57098e and a2c493b.

📒 Files selected for processing (1)
  • components/clp-mcp-server/clp_mcp_server/server/constants.py (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: junhaoliao
PR: y-scope/clp#1405
File: components/clp-package-utils/pyproject.toml:5-15
Timestamp: 2025-10-13T03:24:35.074Z
Learning: In the y-scope/clp repository, the Python 3.9 to 3.10 version requirement change was intentionally deferred to a separate PR (after PR #1405) to reduce review effort, as decided in an offline discussion between junhaoliao and kirkrodrigues.
📚 Learning: 2025-05-07T16:56:35.687Z
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.

Applied to files:

  • components/clp-mcp-server/clp_mcp_server/server/constants.py
📚 Learning: 2025-08-15T21:48:40.228Z
Learnt from: quinntaylormitchell
PR: y-scope/clp#1181
File: docs/src/user-guide/guides-retention.md:68-73
Timestamp: 2025-08-15T21:48:40.228Z
Learning: In documentation for the CLP project, when suggesting formatting improvements for variables in explanatory text, the user quinntaylormitchell prefers to maintain existing sentence structures (like keeping "that" in "i.e., that the difference...") while applying monospace formatting to technical terms and variables for consistency.

Applied to files:

  • components/clp-mcp-server/clp_mcp_server/server/constants.py
📚 Learning: 2025-09-04T13:51:02.311Z
Learnt from: quinntaylormitchell
PR: y-scope/clp#1283
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:206-210
Timestamp: 2025-09-04T13:51:02.311Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/compress.py, an empty string ("") is a valid value for the --timestamp-key parameter, so validation should not treat empty strings as missing values.

Applied to files:

  • components/clp-mcp-server/clp_mcp_server/server/constants.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: package-image
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (2)
components/clp-mcp-server/clp_mcp_server/server/constants.py (2)

37-41: CRITICAL wildcard rule effectively addresses the PR objective.

The wildcard guidance is clearly marked as CRITICAL and includes a concrete example. This directly solves the problem described in the PR where queries like register failed but *register* would succeed. The action-first structure and terse language align with the empirical guidelines.


47-52: Multi-word search rule is clear and well-validated.

The CRITICAL marking, concrete example, and explicit note that "quotes and wildcards are required" effectively communicate the CLP-specific syntax. This aligns with the empirically validated format from your testing.

Comment thread components/clp-mcp-server/clp_mcp_server/server/constants.py Outdated
Comment on lines +61 to +62
- Time range: use search_by_kql_with_timestamp_range
to constrain by time.\n

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add missing timestamp format specification.

In the review discussion, you explicitly stated that timestamp format is critical: "query with timestamp that does not follow ISO 8601 in UTC format will be rejected." You proposed including the exact format specification, and this was flagged as a major issue in past reviews. Without this specification, LLMs may generate incorrectly formatted timestamps, causing query failures.

Apply this diff to add the timestamp format specification:

-- Time range: use search_by_kql_with_timestamp_range
- to constrain by time.\n
+- Time range: use search_by_kql_with_timestamp_range to constrain by time.
+  Timestamps must follow ISO 8601 UTC format: YYYY-MM-DDTHH:mm:ss.fffZ
+  (the trailing Z indicates UTC).
+  Example: 2024-10-15T08:30:00.000Z\n
🤖 Prompt for AI Agents
In components/clp-mcp-server/clp_mcp_server/server/constants.py around lines
61-62, the comment about time range lacks the required timestamp format; update
the comment to explicitly state that timestamps must be ISO 8601 in UTC and give
the exact expected format and example (e.g. YYYY-MM-DDTHH:MM:SS.sssZ) so callers
and LLMs will produce valid timestamps; modify the string/documentation there to
append this specification and an example, and run a quick search to ensure any
related helper names (search_by_kql_with_timestamp_range) and docs are
consistent with the added format text.

Comment thread components/clp-mcp-server/clp_mcp_server/server/constants.py Outdated
Comment thread components/clp-mcp-server/clp_mcp_server/server/constants.py Outdated
Comment thread components/clp-mcp-server/clp_mcp_server/server/constants.py Outdated
Comment thread components/clp-mcp-server/clp_mcp_server/server/constants.py Outdated
Comment on lines +30 to +35
Your job is to generate CKQL that faithfully expresses the user's intent and
show key logs to the user
- Start broad to learn the schema/fields using wildcard searches like *, then
narrow the query to return a manageable result set.
- When showing log messages or when the user wants to see log messages, provide
the hyperlink from the result's link field.\n\n

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Your job is to generate CKQL that faithfully expresses the user's intent and
show key logs to the user
- Start broad to learn the schema/fields using wildcard searches like *, then
narrow the query to return a manageable result set.
- When showing log messages or when the user wants to see log messages, provide
the hyperlink from the result's link field.\n\n
Your job is to generate CKQL that faithfully expresses the user's intent and show key logs to the
user:
- Start broad to learn the schema/fields using wildcard searches like *, then narrow the query to
return a manageable result set.
- When showing log messages or when the user wants to see log messages, provide the hyperlink from
the result's link field.

@20001020ycx 20001020ycx Nov 1, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a concern about using triple quotation marks for the system prompt. Triple quotes create a raw string that preserves all newlines, including those added solely for code formatting (to maintain 100 characters per line). This causes prompt sentences to be split across multiple lines unintentionally.

For example, for a sentence defined like this:

When showing log messages or when the user wants to see log messages, provide the hyperlink from
the result's link field.

will be interpreted by AI to be

When showing log messages or when the user wants to see log messages, provide the hyperlink from \n the result's link field.

which would alter the semantics of the system prompt due to this actual \n.

Therefore, should we revert back to double quotation and add \n explicitly?

Comment thread components/clp-mcp-server/clp_mcp_server/server/constants.py Outdated
Comment thread components/clp-mcp-server/clp_mcp_server/server/constants.py Outdated
Comment thread components/clp-mcp-server/clp_mcp_server/server/constants.py Outdated
Comment thread components/clp-mcp-server/clp_mcp_server/server/constants.py Outdated
Comment thread components/clp-mcp-server/clp_mcp_server/server/constants.py Outdated
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
20001020ycx and others added 13 commits November 1, 2025 10:51
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
components/clp-mcp-server/clp_mcp_server/server/constants.py (1)

55-55: Missing critical timestamp format specification.

In the review discussion, you explicitly stated that "query with timestamp that does not follow ISO 8601 in UTC format will be rejected" and this was identified as a critical requirement (point #3 in your rationale). Without the exact format specification, LLMs may generate timestamps in incorrect formats, causing query failures and contradicting the PR's goal of reducing failed queries.

Add the timestamp format specification as you proposed:

-- Time range: use search_by_kql_with_timestamp_range to constrain by time.
+- Time range: use search_by_kql_with_timestamp_range to constrain by time. Timestamps must follow ISO 8601 UTC format: YYYY-MM-DDTHH:mm:ss.fffZ (the trailing Z indicates UTC). Example: 2024-10-15T08:30:00.000Z.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 384ffb7 and 60d28c0.

📒 Files selected for processing (1)
  • components/clp-mcp-server/clp_mcp_server/server/constants.py (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1405
File: components/clp-package-utils/pyproject.toml:5-15
Timestamp: 2025-10-13T03:24:35.074Z
Learning: In the y-scope/clp repository, the Python 3.9 to 3.10 version requirement change was intentionally deferred to a separate PR (after PR #1405) to reduce review effort, as decided in an offline discussion between junhaoliao and kirkrodrigues.
📚 Learning: 2025-05-07T16:56:35.687Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.

Applied to files:

  • components/clp-mcp-server/clp_mcp_server/server/constants.py
📚 Learning: 2025-08-15T21:48:40.228Z
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1181
File: docs/src/user-guide/guides-retention.md:68-73
Timestamp: 2025-08-15T21:48:40.228Z
Learning: In documentation for the CLP project, when suggesting formatting improvements for variables in explanatory text, the user quinntaylormitchell prefers to maintain existing sentence structures (like keeping "that" in "i.e., that the difference...") while applying monospace formatting to technical terms and variables for consistency.

Applied to files:

  • components/clp-mcp-server/clp_mcp_server/server/constants.py
📚 Learning: 2025-09-04T13:51:02.311Z
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1283
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:206-210
Timestamp: 2025-09-04T13:51:02.311Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/compress.py, an empty string ("") is a valid value for the --timestamp-key parameter, so validation should not treat empty strings as missing values.

Applied to files:

  • components/clp-mcp-server/clp_mcp_server/server/constants.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: package-image
🔇 Additional comments (2)
components/clp-mcp-server/clp_mcp_server/server/constants.py (2)

14-25: Excellent documentation of LLM-friendly prompt design principles.

The guidelines provide valuable context for future maintainers, explaining the empirical findings about effective LLM prompt design. This addresses the reviewer's request to document why the prompt has specific characteristics.


36-48: CRITICAL rules effectively address the PR objective.

The wildcard and multi-word search rules are well-designed and directly solve the problem described in the PR (where queries like register failed but *register* succeeds). The rules follow the documented LLM-friendly practices:

  • Marked as CRITICAL
  • Include concrete examples
  • Use action-first structure
  • Validated through testing (100% accuracy on test cases)

Comment thread components/clp-mcp-server/clp_mcp_server/server/constants.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 60d28c0 and b902444.

⛔ Files ignored due to path filters (1)
  • components/clp-mcp-server/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • components/clp-mcp-server/clp_mcp_server/server/constants.py (1 hunks)
  • components/clp-mcp-server/pyproject.toml (1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1405
File: components/clp-package-utils/pyproject.toml:5-15
Timestamp: 2025-10-13T03:24:35.074Z
Learning: In the y-scope/clp repository, the Python 3.9 to 3.10 version requirement change was intentionally deferred to a separate PR (after PR #1405) to reduce review effort, as decided in an offline discussion between junhaoliao and kirkrodrigues.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
📚 Learning: 2025-05-07T16:56:35.687Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.

Applied to files:

  • components/clp-mcp-server/clp_mcp_server/server/constants.py
📚 Learning: 2025-08-15T21:48:40.228Z
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1181
File: docs/src/user-guide/guides-retention.md:68-73
Timestamp: 2025-08-15T21:48:40.228Z
Learning: In documentation for the CLP project, when suggesting formatting improvements for variables in explanatory text, the user quinntaylormitchell prefers to maintain existing sentence structures (like keeping "that" in "i.e., that the difference...") while applying monospace formatting to technical terms and variables for consistency.

Applied to files:

  • components/clp-mcp-server/clp_mcp_server/server/constants.py
📚 Learning: 2025-09-04T13:51:02.311Z
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1283
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:206-210
Timestamp: 2025-09-04T13:51:02.311Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/compress.py, an empty string ("") is a valid value for the --timestamp-key parameter, so validation should not treat empty strings as missing values.

Applied to files:

  • components/clp-mcp-server/clp_mcp_server/server/constants.py
📚 Learning: 2025-10-27T04:05:01.560Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1507
File: taskfiles/deps/lock.yaml:66-76
Timestamp: 2025-10-27T04:05:01.560Z
Learning: In the CLP project, the `uv` CLI tool is documented as a prerequisite in docs/src/dev-docs/building-package.md and is not installed via a task dependency like the rust toolchain. UV is expected to be available in the environment rather than being provisioned by the taskfile system.

Applied to files:

  • components/clp-mcp-server/pyproject.toml
🔇 Additional comments (3)
components/clp-mcp-server/pyproject.toml (1)

15-15: LGTM! Local editable dependency correctly configured.

The addition of clp-py-utils as a local editable dependency is properly configured for development workflows. The [tool.uv.sources] section correctly points to the sibling directory, and the existing allow-direct-references = true setting in [tool.hatch.metadata] (line 39) supports this configuration.

Also applies to: 18-19

components/clp-mcp-server/clp_mcp_server/server/constants.py (2)

14-25: Excellent documentation of empirically validated best practices.

These guidelines provide valuable context for future maintainers, directly addressing the need to document what characteristics should be maintained when editing the system prompt. The practices are clearly numbered, actionable, and grounded in empirical testing (validated with 100% accuracy on HDFS-4058 and Hadoop-10508 test cases as noted in the PR description).


27-47: CRITICAL rules are well-structured and address the PR objective.

The two CRITICAL rules directly solve the problem described in the PR:

  1. Wildcards for substring matches (line 35-38): ensures queries use *register* instead of register
  2. Multi-word text quoting (line 44-47): correctly demonstrates combining wildcards and quotes

Both rules follow the documented best practices with concrete examples, action-first phrasing, and prominent placement. The example on line 46 ("*GET wp-admin*") effectively demonstrates the combined requirement.

Comment thread components/clp-mcp-server/clp_mcp_server/server/constants.py Outdated

@kirkrodrigues kirkrodrigues left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

fix(clp-mcp-server): Refine system prompt to make LLMs' KQL query generation more CLP friendly.

Comment thread components/clp-mcp-server/clp_mcp_server/server/constants.py Outdated
Comment thread components/clp-mcp-server/clp_mcp_server/server/constants.py Outdated
Comment thread components/clp-mcp-server/clp_mcp_server/server/constants.py
Comment on lines 13 to 26
# fmt: off
# System prompts should be LLM-friendly; while LLMs may not strictly enforce all rules, we
# empirically found the following practices effective for LLM to understand the listed rules:
#
# 1. Provide concrete examples to explain the rule.
# 2. Place critical rules at the beginning and mark them as "CRITICAL".
# 3. Use action-first sentence structure (e.g., "Use X to do Y" instead of "To do Y, use X").
# 4. Specify any behaviour that the agent needs to perform (like formatting hyperlinks) very early
# in the prompt.
# 5. Use terse language and bullet points; avoid complex sentence structures. LLMs can fill in the
# gaps.
# 6. Let some instructions and details be implicit to avoid overwhelming the LLM.
# 7. Use the same example across different rules to maintain consistency.
SYSTEM_PROMPT = (

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# fmt: off
# System prompts should be LLM-friendly; while LLMs may not strictly enforce all rules, we
# empirically found the following practices effective for LLM to understand the listed rules:
#
# 1. Provide concrete examples to explain the rule.
# 2. Place critical rules at the beginning and mark them as "CRITICAL".
# 3. Use action-first sentence structure (e.g., "Use X to do Y" instead of "To do Y, use X").
# 4. Specify any behaviour that the agent needs to perform (like formatting hyperlinks) very early
# in the prompt.
# 5. Use terse language and bullet points; avoid complex sentence structures. LLMs can fill in the
# gaps.
# 6. Let some instructions and details be implicit to avoid overwhelming the LLM.
# 7. Use the same example across different rules to maintain consistency.
SYSTEM_PROMPT = (
# System prompts should be LLM-friendly; while LLMs may not strictly enforce all rules, we
# empirically found the following practices effective for LLM to understand the listed rules:
#
# 1. Provide concrete examples to explain the rule.
# 2. Place critical rules at the beginning and mark them as "CRITICAL".
# 3. Use action-first sentence structure (e.g., "Use X to do Y" instead of "To do Y, use X").
# 4. Specify any behaviour that the agent needs to perform (like formatting hyperlinks) very early
# in the prompt.
# 5. Use terse language and bullet points; avoid complex sentence structures. LLMs can fill in the
# gaps.
# 6. Let some instructions and details be implicit to avoid overwhelming the LLM.
# 7. Use the same example across different rules to maintain consistency.
# fmt: off
SYSTEM_PROMPT = (

@kirkrodrigues kirkrodrigues changed the title fix(clp-mcp-server): Refine system prompt to make LLM's KQL query generation to be CLP friendly fix(clp-mcp-server): Refine system prompt to make LLMs' KQL query generation more CLP friendly. Nov 2, 2025
Comment thread components/clp-mcp-server/pyproject.toml

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b902444 and a008104.

📒 Files selected for processing (1)
  • components/clp-mcp-server/clp_mcp_server/server/constants.py (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1405
File: components/clp-package-utils/pyproject.toml:5-15
Timestamp: 2025-10-13T03:24:35.074Z
Learning: In the y-scope/clp repository, the Python 3.9 to 3.10 version requirement change was intentionally deferred to a separate PR (after PR #1405) to reduce review effort, as decided in an offline discussion between junhaoliao and kirkrodrigues.
📚 Learning: 2025-05-07T16:56:35.687Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.

Applied to files:

  • components/clp-mcp-server/clp_mcp_server/server/constants.py
📚 Learning: 2025-08-15T21:48:40.228Z
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1181
File: docs/src/user-guide/guides-retention.md:68-73
Timestamp: 2025-08-15T21:48:40.228Z
Learning: In documentation for the CLP project, when suggesting formatting improvements for variables in explanatory text, the user quinntaylormitchell prefers to maintain existing sentence structures (like keeping "that" in "i.e., that the difference...") while applying monospace formatting to technical terms and variables for consistency.

Applied to files:

  • components/clp-mcp-server/clp_mcp_server/server/constants.py
📚 Learning: 2025-09-04T13:51:02.311Z
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1283
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:206-210
Timestamp: 2025-09-04T13:51:02.311Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/compress.py, an empty string ("") is a valid value for the --timestamp-key parameter, so validation should not treat empty strings as missing values.

Applied to files:

  • components/clp-mcp-server/clp_mcp_server/server/constants.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: package-image
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (4)
components/clp-mcp-server/clp_mcp_server/server/constants.py (4)

13-25: Excellent documentation of LLM-friendly practices.

The guidelines clearly document the empirically validated practices for writing effective LLM prompts. This meta-documentation will help future maintainers understand why the prompt is structured as it is.


28-33: Clear introduction and behaviour specification.

The prompt introduction correctly follows guideline #4 by specifying the agent's behaviour (formatting hyperlinks) early in the prompt. The instruction to "start broad then narrow" is also helpful guidance for query refinement.


35-39: CRITICAL wildcard rule addresses the core PR objective.

This rule directly addresses the issue described in the PR where queries like register OR registration OR registered failed. The example request: *GET* clearly demonstrates the required wildcard usage. The marking as CRITICAL and the concrete example follow guidelines #2 and #1.


45-48: CRITICAL multi-word rule is clear and well-illustrated.

The example request: "*GET wp-admin*" effectively demonstrates the combination of quotes and wildcards for multi-word substring searches. The parenthetical reminder reinforces the requirement.

" @, $, !, #.\n"
' - In values, use backslash to escape searching for any of the literal characters: \\, ", ?,'
" *.\n"
"- Time range: use search_by_kql_with_timestamp_range to constrain by time.\n"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing critical timestamp format specification.

According to the PR discussion, you explicitly stated that timestamp format is critical: "query with timestamp that does not follows ISO 8601 in UTC format will be rejected, so we want to make sure AI generates the exact format." You proposed including: "Timestamps must follow the ISO 8601 UTC format (YYYY-MM-DDTHH:mm:ss.fffZ), where the trailing Z indicates UTC."

This specification has been flagged as a major issue in multiple past reviews and was marked as addressed in previous commits, but it is missing from the current code. Without this specification, LLMs may generate timestamps in incorrect formats (e.g., Unix timestamps, locale-specific formats), causing query failures and undermining the PR's goal of achieving 100% KQL accuracy.

Apply this diff to add the timestamp format specification:

-"- Time range: use search_by_kql_with_timestamp_range to constrain by time.\n"
+"- Time range: use search_by_kql_with_timestamp_range to constrain by time. Timestamps must"
+" follow ISO 8601 UTC format: YYYY-MM-DDTHH:mm:ss.fffZ (trailing Z indicates UTC). Example:"
+" 2024-10-15T08:30:00.000Z\n"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"- Time range: use search_by_kql_with_timestamp_range to constrain by time.\n"
"- Time range: use search_by_kql_with_timestamp_range to constrain by time. Timestamps must"
" follow ISO 8601 UTC format: YYYY-MM-DDTHH:mm:ss.fffZ (trailing Z indicates UTC). Example:"
" 2024-10-15T08:30:00.000Z\n"
🤖 Prompt for AI Agents
In components/clp-mcp-server/clp_mcp_server/server/constants.py at line 55, the
comment string is missing the required timestamp format specification; update
the message to explicitly state that timestamps must follow ISO 8601 UTC format
using the pattern YYYY-MM-DDTHH:mm:ss.fffZ (the trailing 'Z' indicates UTC) so
LLMs produce exact, KQL-acceptable timestamps and avoid Unix or locale-specific
formats.

@20001020ycx 20001020ycx merged commit 5cc777c into y-scope:main Nov 3, 2025
19 checks passed
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
…eration more CLP friendly. (y-scope#1494)

Co-authored-by: Rishikesh Devsot <rishikeshdevsot@gmail.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants