Skip to content

Failure: RegressionTestsRelease /aggregate functions/part 1/uniqTheta - snapshot outdated #96

@CarlosFelipeOR

Description

@CarlosFelipeOR

Description

The uniqTheta aggregate function regression test (/aggregate functions/part 1/uniqTheta) is failing on ClickHouse versions >= 25.8.15. The test expects uniqTheta(number) to return "4" but it now returns "5".
This failure affects all PRs targeting the 25.8.15 stable branch and also impacts official ClickHouse builds (25.8.15.35-alpine, 25.12.4.35-alpine).
Affected test: /aggregate functions/part 1/uniqTheta
Affected check: with group by
Affected files:

  • aggregate_functions/tests/snapshots/steps.py.uniqtheta>=24.8_with_analyzer.x86_64.snapshot
  • aggregate_functions/tests/snapshots/steps.py.uniqtheta>=24.8_with_analyzer.aarch64.snapshot
  • aggregate_functions/tests/snapshots/steps.py.uniqtheta>=24.8.x86_64.snapshot
  • aggregate_functions/tests/snapshots/steps.py.uniqtheta>=24.8.aarch64.snapshot

Analysis

Test History Investigation

Querying the test history database revealed a clear pattern:

Version Result
25.8.14.17-alpine (official) ✅ OK
25.8.14.20001.altinityantalya ✅ OK
25.8.15.35-alpine (official) ❌ Fail
25.8.15.10001.altinitytest ❌ Fail
25.12.4.35-alpine (official) ❌ Fail
25.12.3.21-alpine ✅ OK
The failure started exactly at version 25.8.15.

Upstream ClickHouse Investigation

Investigation of the ClickHouse/ClickHouse repository revealed:
PR #94095: Fix accuracy of uniqTheta when using UInt8 aggregation keys in parallel

  • Merged: 2026-01-14
  • Fixes: Issue #45292 ("uniqTheta produces random results with multithreaded streams")
  • Backported to: 25.8, 25.10, 25.11, 25.12
  • Included in: v25.8.15.35-lts

Root Cause

This is NOT a regression in ClickHouse - it's a bug fix.
The uniqTheta function had a long-standing bug where it produced incorrect (undercounted) results when:

  1. Using parallel aggregation (max_threads > 1, the default)
  2. Using small aggregation key types (like UInt8)
    The failing test executes:
  SELECT number % 2 AS even, uniqTheta(number), any(toTypeName(number))
  FROM numbers(10)
  GROUP BY even

This groups numbers 0-9 by even/odd:

  • Group 0: {0, 2, 4, 6, 8} = 5 unique values
  • Group 1: {1, 3, 5, 7, 9} = 5 unique values
Behavior Result Correct?
Before fix (25.8.14) 4 ❌ No (data lost during parallel merge)
After fix (25.8.15) 5 ✅ Yes

The regression test snapshots were created against the bug behavior and expected "4". The fix in PR #94095 corrected the Theta sketch merging logic in ThetaSketchData.h, and now the function correctly returns "5".

Solution

Created new version-specific snapshots for ClickHouse >= 25.8 that expect the corrected value of "5".
Changes:

  1. Added version check for >=25.8 in uniqTheta.py.
  2. Created new snapshot files for >=25.8 with corrected expected values

Commit: a6e7230

References

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions