ClickHouse
diff --git a/‎.claude/skills/build/SKILL.md‎
Lines changed: 8 additions & 11 deletions b/‎.claude/skills/build/SKILL.md‎
Lines changed: 8 additions & 11 deletions
diff --git a/‎.claude/skills/test/SKILL.md‎
Lines changed: 14 additions & 20 deletions b/‎.claude/skills/test/SKILL.md‎
Lines changed: 14 additions & 20 deletions
diff --git a/‎.claude/tools/fetch_ci_report.js‎
Lines changed: 51 additions & 5 deletions b/‎.claude/tools/fetch_ci_report.js‎
Lines changed: 51 additions & 5 deletions
diff --git a/‎.github/PULL_REQUEST_TEMPLATE.md‎
Lines changed: 1 addition & 1 deletion b/‎.github/PULL_REQUEST_TEMPLATE.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/workflows/retry_infra_failures.yml‎
Lines changed: 56 additions & 19 deletions b/‎.github/workflows/retry_infra_failures.yml‎
Lines changed: 56 additions & 19 deletions
diff --git a/‎ci/jobs/integration_test_job.py‎
Lines changed: 8 additions & 0 deletions b/‎ci/jobs/integration_test_job.py‎
Lines changed: 8 additions & 0 deletions
@@ -3,7 +3,7 @@ name: build
 description: Build ClickHouse with various configurations (Release, Debug, ASAN, TSAN, etc.). Use when the user wants to compile ClickHouse.
 argument-hint: "[build-type] [target] [options]"
 disable-model-invocation: false
-allowed-tools: Task, Bash(ninja:*), Bash(cd:*), Bash(ls:*), Bash(pgrep:*), Bash(ps:*), Bash(pkill:*), Bash(mktemp:*), Bash(sleep:*)
+allowed-tools: Task, Bash(ninja:*), Bash(cd:*), Bash(ls:*), Bash(pgrep:*), Bash(ps:*), Bash(pkill:*), Bash(sleep:*)
 ---
 
 # ClickHouse Build Skill
@@ -77,22 +77,19 @@ Build ClickHouse in `build` or `build_debug`, `build_asan`, `build_tsan`, `build
 
 2. **Create log file and start the build:**
 
-   **Step 2a: Create temporary log file first:**
-   ```bash
-   mktemp /tmp/build_clickhouse_XXXXXX.log
-   ```
-   - This will print the log file path
+   **Step 2a: Determine log file path:**
+   - Use `[build_directory]/build_output.log` as the log file path
    - **IMMEDIATELY report to the user:**
-     - "Build logs will be written to: [log file path]"
+     - "Build logs will be written to: `[build_directory]/build_output.log`"
      - Then display in a copyable code block:
        ```bash
-       tail -f [log file path]
+       tail -f [build_directory]/build_output.log
        ```
      - Example: "You can monitor the build in real-time with:" followed by the tail command in a code block
 
    **Step 2b: Start the ninja build:**
    ```bash
-   cd [build_directory] && ninja [target] > [log file path] 2>&1
+   cd [build_directory] && ninja [target] > build_output.log 2>&1
    ```
    Where `[build_directory]` is the path found in step 1a.
 
@@ -116,7 +113,7 @@ Build ClickHouse in `build` or `build_debug`, `build_asan`, `build_tsan`, `build
    **ALWAYS use Task tool to analyze results** (both success and failure):
    - Use Task tool with `subagent_type=general-purpose` to analyze the build output
    - **Pass the log file path from step 2a** to the Task agent - let it read the file directly
-   - Example Task prompt: "Read and analyze the build output from: /tmp/build_clickhouse_abc123.log"
+   - Example Task prompt: "Read and analyze the build output from: [build_directory]/build_output.log"
    - The Task agent should read the file and provide:
 
      **If build succeeds:**
@@ -246,5 +243,5 @@ Build ClickHouse in `build` or `build_debug`, `build_asan`, `build_tsan`, `build
 - **MANDATORY:** After successful builds, this skill MUST check for running ClickHouse servers and ask the user if they want to stop them to use the new build
 - **MANDATORY:** ALL build output (success or failure) MUST be analyzed by a Task agent with `subagent_type=general-purpose`
 - **MANDATORY:** ALWAYS provide a final summary to the user at the end of the skill execution (step 6)
-- **CRITICAL:** Build output is redirected to a unique log file created with `mktemp`. The log file path is reported to the user in a copyable format BEFORE starting the build, allowing real-time monitoring with `tail -f`. The log file path is saved from step 2a and passed to the Task agent for analysis. This keeps large build logs out of the main context.
+- **CRITICAL:** Build output is redirected to `build_output.log` inside the build directory. The log file path is reported to the user in a copyable format BEFORE starting the build, allowing real-time monitoring with `tail -f`. The log file path is saved from step 2a and passed to the Task agent for analysis. This keeps large build logs out of the main context.
 - **Subagents available:** Task tool is used to analyze all build output (by reading from output file) and provide concise summaries. Additional agents (Explore or general-purpose) can be used for deeper investigation of complex build errors
@@ -3,7 +3,7 @@ name: test
 description: Run ClickHouse stateless or integration tests. Use when the user wants to run or execute tests.
 argument-hint: "[test-name] [--flags]"
 disable-model-invocation: false
-allowed-tools: Task, Bash(./tests/clickhouse-test:*), Bash(pgrep:*), Bash(./build/*/programs/clickhouse:*), Bash(./build*/programs/clickhouse:*), Bash(python:*), Bash(python3:*), Bash(mktemp:*), Bash(export:*), Bash(ls:*), Bash(test:*)
+allowed-tools: Task, Bash(./tests/clickhouse-test:*), Bash(pgrep:*), Bash(./build/*/programs/clickhouse:*), Bash(./build*/programs/clickhouse:*), Bash(python:*), Bash(python3:*), Bash(export:*), Bash(ls:*), Bash(test:*)
 ---
 
 # ClickHouse Test Runner Skill
@@ -125,23 +125,20 @@ The **build directory** is the path up to and including the parent of `programs/
 
 3. **Create log file and run the stateless test:**
 
-   **Step 3a: Create temporary log file first:**
-   ```bash
-   mktemp /tmp/test_clickhouse_XXXXXX.log
-   ```
-   - This will print the log file path
+   **Step 3a: Determine log file path:**
+   - Use `[build_directory]/test_output.log` as the log file path
    - **IMMEDIATELY report to the user:**
-     - "Test logs will be written to: [log file path]"
+     - "Test logs will be written to: `[build_directory]/test_output.log`"
      - Then display in a copyable code block:
        ```bash
-       tail -f [log file path]
+       tail -f [build_directory]/test_output.log
        ```
      - Example: "You can monitor the test progress in real-time with:" followed by the tail command in a code block
 
    **Step 3b: Start the stateless test:**
    ```bash
    # Add clickhouse binary to PATH using auto-detected build directory
-   export PATH="./[build_directory]/programs:$PATH" && ./tests/clickhouse-test <test_name> [flags] > [log file path] 2>&1
+   export PATH="./[build_directory]/programs:$PATH" && ./tests/clickhouse-test <test_name> [flags] > [build_directory]/test_output.log 2>&1
    ```
    Where `[build_directory]` is the path found during auto-detection.
 
@@ -173,22 +170,19 @@ The **build directory** is the path up to and including the parent of `programs/
 
 2. **Create log file and run the integration test:**
 
-   **Step 2a: Create temporary log file first:**
-   ```bash
-   mktemp /tmp/test_clickhouse_XXXXXX.log
-   ```
-   - This will print the log file path
+   **Step 2a: Determine log file path:**
+   - Use `[build_directory]/test_output.log` as the log file path
    - **IMMEDIATELY report to the user:**
-     - "Test logs will be written to: [log file path]"
+     - "Test logs will be written to: `[build_directory]/test_output.log`"
      - Then display in a copyable code block:
        ```bash
-       tail -f [log file path]
+       tail -f [build_directory]/test_output.log
        ```
      - Example: "You can monitor the test progress in real-time with:" followed by the tail command in a code block
 
    **Step 2b: Start the integration test with praktika:**
    ```bash
-   python -u -m ci.praktika run "integration" --test <test_name> [--path <absolute_binary_path>] > [log file path] 2>&1
+   python -u -m ci.praktika run "integration" --test <test_name> [--path <absolute_binary_path>] > [build_directory]/test_output.log 2>&1
    ```
 
    **Important:**
@@ -220,7 +214,7 @@ The **build directory** is the path up to and including the parent of `programs/
    **ALWAYS use Task tool to analyze results** (both pass and fail):
    - Use Task tool with `subagent_type=general-purpose` to analyze the test output
    - **Pass the log file path from step 3a** to the Task agent - let it read the file directly
-   - Example Task prompt: "Read and analyze the test output from: /tmp/test_clickhouse_abc123.log"
+   - Example Task prompt: "Read and analyze the test output from: [build_directory]/test_output.log"
    - The Task agent should read the file and provide:
 
      **If tests passed:**
@@ -273,7 +267,7 @@ The **build directory** is the path up to and including the parent of `programs/
    **ALWAYS use Task tool to analyze results** (both pass and fail):
    - Use Task tool with `subagent_type=general-purpose` to analyze the test output
    - **Pass the log file path from step 2a** to the Task agent - let it read the file directly
-   - Example Task prompt: "Read and analyze the test output from: /tmp/test_clickhouse_abc123.log"
+   - Example Task prompt: "Read and analyze the test output from: [build_directory]/test_output.log"
    - The Task agent should read the file and provide:
 
      **If tests passed:**
@@ -360,7 +354,7 @@ The test runner automatically detects and sets the necessary environment variabl
 - Test type is automatically detected based on name pattern or file location
 - **MANDATORY:** ALL test output (success or failure) MUST be analyzed by a Task agent with `subagent_type=general-purpose`
 - **MANDATORY:** For test failures, MUST prompt user if they want deeper analysis and use Task subagent if requested
-- **CRITICAL:** Test output is redirected to a unique log file created with `mktemp`. The log file path is reported to the user in a copyable format BEFORE starting the test, allowing real-time monitoring with `tail -f`. The log file path is saved and passed to the Task agent for analysis. This keeps large test logs out of the main context.
+- **CRITICAL:** Test output is redirected to `test_output.log` inside the build directory. The log file path is reported to the user in a copyable format BEFORE starting the test, allowing real-time monitoring with `tail -f`. The log file path is saved and passed to the Task agent for analysis. This keeps large test logs out of the main context.
 - **Subagents available:** Task tool is used to analyze all test output (by reading from log file) and provide concise summaries. Additional agents (Explore or general-purpose) are used for deeper investigation of test failures when user requests it
 
 ### Stateless Tests
 
@@ -176,6 +176,13 @@ function constructJsonUrl(baseUrl, suffix, sha, taskName) {
   return `${baseUrl}/${suffix}/${encodeURIComponent(sha)}/result_${normalizedTask}.json`;
 }
 
+/**
+ * Check if a status represents a failure
+ */
+function isFailureStatus(status) {
+  return status === 'failed' || status === 'FAIL' || status === 'failure';
+}
+
 /**
  * Parse test results from the JSON data
  */
@@ -192,13 +199,23 @@ function parseTestResults(jsonData) {
         // Nested results
         extractTests(result.results, prefix ? `${prefix}/${result.name}` : result.name);
       } else {
-        // Leaf result - this is a test
+        // Leaf result - this is a test or build step
         const test = {
           name: prefix ? `${prefix}/${result.name}` : result.name,
           status: result.status || 'UNKNOWN',
           duration: result.duration || 0
         };
 
+        // Include info field (contains build log tail for build failures)
+        if (result.info) {
+          test.info = result.info;
+        }
+
+        // Include links from this result
+        if (result.links && result.links.length > 0) {
+          test.links = result.links;
+        }
+
         // Extract CIDB links from ext.hlabels
         if (result.ext && result.ext.hlabels) {
           const cidbLinks = [];
@@ -391,7 +408,7 @@ async function fetchReport(inputUrl, options = {}) {
           }
 
           const { testResults = [] } = result;
-          const failed = testResults.filter(t => t.status === 'failed' || t.status === 'FAIL');
+          const failed = testResults.filter(t => isFailureStatus(t.status));
           const passed = testResults.filter(t => t.status === 'success' || t.status === 'OK');
           const skipped = testResults.filter(t => t.status === 'skipped' || t.status === 'SKIPPED');
 
@@ -418,6 +435,20 @@ async function fetchReport(inputUrl, options = {}) {
                   console.log(`         📊 CIDB: ${cidbLink}`);
                 }
               }
+              if (test.links && test.links.length > 0) {
+                for (const link of test.links) {
+                  console.log(`         🔗 ${link}`);
+                }
+              }
+              if (test.info) {
+                const lines = test.info.split('\n').filter(l => l.trim());
+                const tail = lines.slice(-30);
+                console.log('         --- log tail ---');
+                for (const line of tail) {
+                  console.log(`         ${line}`);
+                }
+                console.log('         --- end ---');
+              }
             }
           }
           console.log();
@@ -516,7 +547,7 @@ async function fetchReport(inputUrl, options = {}) {
     // For multi-report mode, don't filter by failed here - we'll show all in summary
     if (options.failedOnly && !options.isSingleReport) {
       filteredResults = filteredResults.filter(t =>
-        t.status === 'failed' || t.status === 'FAIL'
+        isFailureStatus(t.status)
       );
     }
 
@@ -528,21 +559,36 @@ async function fetchReport(inputUrl, options = {}) {
     // Print results for standalone report
     console.log('=== Test Results ===\n');
 
-    const failed = filteredResults.filter(t => t.status === 'failed' || t.status === 'FAIL');
+    const failed = filteredResults.filter(t => isFailureStatus(t.status));
     const passed = filteredResults.filter(t => t.status === 'success' || t.status === 'OK');
     const skipped = filteredResults.filter(t => t.status === 'skipped' || t.status === 'SKIPPED');
 
     console.log(`Total: ${filteredResults.length} | ✅ Passed: ${passed.length} | ❌ Failed: ${failed.length} | ⏭️  Skipped: ${skipped.length}\n`);
 
     if (failed.length > 0) {
-      console.log('--- Failed Tests ---');
+      console.log('--- Failures ---');
       for (const test of failed) {
         console.log(`❌ FAIL  ${test.name}  (${test.duration}s)`);
         if (options.showCidb && test.cidbLinks && test.cidbLinks.length > 0) {
           for (const cidbLink of test.cidbLinks) {
             console.log(`   📊 CIDB: ${cidbLink}`);
           }
         }
+        if (test.links && test.links.length > 0) {
+          for (const link of test.links) {
+            console.log(`   🔗 ${link}`);
+          }
+        }
+        if (test.info) {
+          // Show last 30 non-empty lines of info (build log tail with actual errors)
+          const lines = test.info.split('\n').filter(l => l.trim());
+          const tail = lines.slice(-30);
+          console.log('   --- log tail ---');
+          for (const line of tail) {
+            console.log(`   ${line}`);
+          }
+          console.log('   --- end ---');
+        }
       }
       console.log('');
     }
 
@@ -6,7 +6,7 @@
 - Backward Incompatible Change
 - Build/Testing/Packaging Improvement
 - Documentation (changelog entry is not required)
-- Critical Bug Fix (crash, data loss, RBAC) or LOGICAL_ERROR
+- Critical Bug Fix (crash, data loss, RBAC)
 - Bug Fix (user-visible misbehavior in an official stable release)
 - CI Fix or Improvement (changelog entry is not required)
 - Not for changelog (changelog entry is not required)
 
@@ -49,6 +49,11 @@ jobs:
             run_url="https://github.com/$GH_REPO/actions/runs/$run_id"
             echo "Checking run $run_url ..."
 
+            should_rerun=false
+
+            # Fetch all job data once (reused for multiple checks below)
+            jobs_raw=$(gh api "repos/$GH_REPO/actions/runs/$run_id/jobs?per_page=100" --paginate)
+
             # Collect per-job verdicts across all pages (runs can have >100 jobs).
             # Each failed job emits "true" (infrastructure) or "false" (real failure).
             # A job is considered an infrastructure failure if:
@@ -60,33 +65,65 @@ jobs:
             # - the "Run" step failed almost immediately (under 2 minutes), indicating
             #   a setup/download issue (e.g. missing S3 credentials) rather than a real
             #   test failure
-            verdicts=$(gh api "repos/$GH_REPO/actions/runs/$run_id/jobs?per_page=100" \
-              --paginate --jq '
-                .jobs[] | select(.conclusion == "failure") |
-                if .name == "Config Workflow" or .name == "Finish Workflow" then empty
-                else
-                  [.steps[] | select(.name == "Run")] |
-                  if length == 0 then true
-                  elif .[0].conclusion == "skipped" then true
-                  elif .[0].conclusion == "failure" then
-                    ((.[0].completed_at | fromdateiso8601) -
-                     (.[0].started_at | fromdateiso8601)) < 120
-                  else false
-                  end
+            verdicts=$(echo "$jobs_raw" | jq -r '
+              .jobs[] | select(.conclusion == "failure") |
+              if .name == "Config Workflow" or .name == "Finish Workflow" then empty
+              else
+                [.steps[] | select(.name == "Run")] |
+                if length == 0 then true
+                elif .[0].conclusion == "skipped" then true
+                elif .[0].conclusion == "failure" then
+                  ((.[0].completed_at | fromdateiso8601) -
+                   (.[0].started_at | fromdateiso8601)) < 120
+                else false
                 end
-              ')
+              end
+            ')
 
             # Infrastructure failure = at least one failed job, and all of them are infra
             if [ -z "$verdicts" ]; then
-              is_infra=false
+              :
             elif echo "$verdicts" | grep -q "false"; then
-              is_infra=false
+              :
             else
-              is_infra=true
+              should_rerun=true
+              echo "  Infrastructure failure detected."
+            fi
+
+            # Check if "Config Workflow" failed in its "Run" step (e.g. due to
+            # pr_labels_and_category.py rejecting the changelog category).  If the PR
+            # description was edited after the failure (same HEAD commit, so no new
+            # workflow run was triggered), re-run to pick up the fix.
+            if [ "$should_rerun" = "false" ]; then
+              config_failed_at=$(echo "$jobs_raw" | jq -r '
+                [.jobs[]
+                 | select(.name == "Config Workflow" and .conclusion == "failure")
+                 | .steps[] | select(.name == "Run" and .conclusion == "failure")
+                 | .completed_at
+                ] | first // empty
+              ')
+
+              if [ -n "$config_failed_at" ]; then
+                run_data=$(gh api "repos/$GH_REPO/actions/runs/$run_id" \
+                  --jq '{pr: .pull_requests[0].number, sha: .head_sha}')
+                pr_number=$(echo "$run_data" | jq -r '.pr // empty')
+                run_sha=$(echo "$run_data" | jq -r '.sha')
+
+                if [ -n "$pr_number" ]; then
+                  pr_data=$(gh api "repos/$GH_REPO/pulls/$pr_number" \
+                    --jq '{sha: .head.sha, updated: .updated_at}')
+                  pr_sha=$(echo "$pr_data" | jq -r '.sha')
+                  pr_updated=$(echo "$pr_data" | jq -r '.updated')
+
+                  if [ "$run_sha" = "$pr_sha" ] && [[ "$pr_updated" > "$config_failed_at" ]]; then
+                    should_rerun=true
+                    echo "  Config Workflow failed but PR #$pr_number was updated after — rerunning."
+                  fi
+                fi
+              fi
             fi
 
-            if [ "$is_infra" = "true" ]; then
-              echo "  Infrastructure failure detected, rerunning..."
+            if [ "$should_rerun" = "true" ]; then
               if gh run rerun "$run_id" --repo "$GH_REPO"; then
                 rerun_count=$((rerun_count + 1))
                 echo "  Triggered rerun: $run_url/attempts/2"
 
@@ -652,6 +652,14 @@ def main():
 
     failed_test_cases = []
 
+    # Clear dmesg to avoid false OOM detection from previous CI jobs on the same host.
+    # Do this only in CI (non-local runs) and via a non-interactive privileged helper.
+    if not info.is_local_run:
+        try:
+            Utils.clear_dmesg()
+        except Exception as ex:
+            print(f"Failed to clear dmesg before integration tests: {ex}")
+
     if parallel_test_modules:
         for attempt in range(module_repeat_cnt):
             log_file = f"{temp_path}/pytest_parallel.log"