Skip to content

Revert "Revert "Feat: A SQL function obfuscateQuery""#101027

Open
alexey-milovidov wants to merge 11 commits intomasterfrom
revert-100711-revert-98305-master
Open

Revert "Revert "Feat: A SQL function obfuscateQuery""#101027
alexey-milovidov wants to merge 11 commits intomasterfrom
revert-100711-revert-98305-master

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

Reverts #100711

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 28, 2026

Workflow [PR], commit [ea704bb]

Summary:

job_name test_name status info comment
Integration tests (amd_asan_ubsan, db disk, old analyzer, 2/6) failure
test_backup_restore_s3/test.py::test_backup_to_s3_different_credentials[data_file_name_from_first_file_name-native_multipart] FAIL cidb
Integration tests (amd_tsan, 2/6) failure
test_storage_kafka/test_batch_fast.py::test_kafka_commit_on_block_write[generate_old_create_table_query] FAIL cidb

AI Review

Summary

This PR re-introduces obfuscateQuery / obfuscateQueryWithSeed and includes two follow-up fixes (uppercase-identifier boundary check and numeric-literal-overflow handling) with corresponding stateless tests. I did not find additional correctness/performance/safety defects in the current diff beyond already-raised metadata concerns. Verdict is to request changes only for PR template/changelog metadata completeness.

PR Metadata

⚠️ Changelog category and Changelog entry are missing from the PR body and do not follow .github/PULL_REQUEST_TEMPLATE.md. Because this is user-facing SQL functionality, metadata should be completed with a concrete user-facing changelog item.

Suggested replacement text in PR description:

  • Changelog category (leave one): New Feature
  • Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md): Added SQL functions \obfuscateQuery` and `obfuscateQueryWithSeed` for query text obfuscation, including deterministic obfuscation with a provided seed.`
ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality ⚠️ Missing required PR template metadata (Changelog category, Changelog entry).
Safe rollout
Compilation time
Final Verdict
  • Status: ⚠️ Request changes
  • Minimum required actions:
    • Fill PR body using .github/PULL_REQUEST_TEMPLATE.md with a valid Changelog category and a specific user-facing Changelog entry.

@clickhouse-gh clickhouse-gh bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Mar 28, 2026

}

REGISTER_FUNCTION(obfuscateQuery)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR body does not follow .github/PULL_REQUEST_TEMPLATE.md: Changelog category and Changelog entry are missing. Since this PR re-introduces user-facing SQL functions (obfuscateQuery / obfuscateQueryWithSeed), please fill template metadata with a concrete user-readable changelog entry.

The ASan crash (stack-buffer-overflow) was caused by accessing
`src_pos[-1]` on the first character of the identifier in the CamelCase
split logic. For input like `'Ab'`, the first character `'A'` is
uppercase and alphanumeric, so `word_has_alphanumerics` is true, and the
code tried to check whether the previous character is lowercase — but
there is no previous character, reading one byte before the buffer.

Fix: add a bounds check `src_pos > src.data()` before the `src_pos[-1]`
access.

https://s3.amazonaws.com/clickhouse-test-reports/PRs/100683/14d36b8a2b692110d6a5f5eb898d4a37a3a612ca/unit_tests_asan_ubsan/job.log

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@alexey-milovidov alexey-milovidov marked this pull request as ready for review March 28, 2026 19:08
alexey-milovidov and others added 5 commits March 28, 2026 23:47
…flows to zero

`readIntText` can overflow a `uint64_t` for very large numeric literals,
resulting in `num == 0`. This then calls `bitScanReverse(0)` which triggers
`assert(x != 0)` in debug builds.

Handle the overflow case by using the obfuscated hash directly.

Found by the AST fuzzer in CI:
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=101027&sha=f9d39a5d7a9cd9d194db7593f4a84a0ebfd15e05&name_0=PR&name_1=AST%20fuzzer%20%28amd_debug%2C%20targeted%29

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

obfuscated = (1ULL << log2) + obfuscated % (1ULL << log2);
writeIntText(obfuscated, result);
if (num == 0)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ This change adds a dedicated overflow path when readIntText overflows to num == 0, but there is no regression test covering it.

Please add a stateless test that exercises a very large integer literal (larger than UInt64) through obfuscateQueryWithSeed to lock this behavior and prevent regressions in this branch.

alexey-milovidov and others added 2 commits March 30, 2026 03:46
Exercise the code path where `readIntText` overflows `uint64_t` to zero
for very large numeric literals, as requested in review feedback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 31, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.10% +0.00%
Functions 90.90% 90.90% +0.00%
Branches 76.70% 76.70% +0.00%

Changed lines: 94.72% (251/265) · Uncovered code

Full report · Diff report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant