Skip to content

Fix Array join: Function writeSlice expects same column types#91784

Open
Ergus wants to merge 5 commits intoClickHouse:masterfrom
Ergus:fix_57243
Open

Fix Array join: Function writeSlice expects same column types#91784
Ergus wants to merge 5 commits intoClickHouse:masterfrom
Ergus:fix_57243

Conversation

@Ergus
Copy link
Copy Markdown
Member

@Ergus Ergus commented Dec 9, 2025

New approach for #80376 as recommended by @alexey-milovidov
Fixes #57243 #93343

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix an exception with ARRAY JOINs: Function writeSlice expects same column types for GenericArraySlice and GenericArraySink which happens when lowCardinality numeric types are used (issue #57243).

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Dec 9, 2025

Workflow [PR], commit [2ee20cf]

Summary:

job_name test_name status info comment
Integration tests (amd_binary, 4/5) failure
test_userspace_page_cache/test.py::test_size_adjustment FAIL cidb, issue
Integration tests (arm_binary, distributed plan, 2/4) failure
test_userspace_page_cache/test.py::test_size_adjustment FAIL cidb, issue

@Ergus Ergus requested a review from rschu1ze December 9, 2025 10:48
@clickhouse-gh clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Dec 9, 2025
@Ergus Ergus changed the title Fix 57243 Fix Array join: Function writeSlice expects same column types Dec 9, 2025
// insertRangeFrom casts numeric types if they are different, so it can insert them.
|| (sink.elements.lowCardinality()
&& source_elements.lowCardinality()
&& sink.elements.isNumeric()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it works for different data types, such as Float and Int.
insertRangeFrom only works for identical columns.

Copy link
Copy Markdown
Member Author

@Ergus Ergus Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexey-milovidov

I agree that the condition needs to be improved (maybe restricted to only intXX->intYY numeric types or tuned with some more elaborated check). I added more detailed comments in the original issue thread about the case by case condition relaxation.

In low cardinality columns insertRangeFrom relies on insertPositionsRange.

Low cardinality use a dictionary that automatically resizes when cardinality increases and insertion already checks sizes of types and indices and perform some casting and conversions:

void ColumnLowCardinality::Index::insertPositionsRange(const IColumn & column, UInt64 offset, UInt64 limit)

But the initial condition source_elements.structureEquals(sink.elements):

  1. Will fail for two "compatible" lox cardinality numeric columns (i.e: if one of them has cardinality <256 and the other >256).
  2. Inserting char -> int may be possible with no problem, but the condition doesn't allow that.
  3. The opposite int -> char should also work if all the input values are < CHAR_MAX. Otherwise an error may be triggered BUT in the inner casting code in order to trigger the most precise information about the problematic value.

const NullMap * size_null_map = size_nullable ? &size_nullable->getNullMapData() : nullptr;
const IColumn * size_nested_column = size_nullable ? &size_nullable->getNestedColumn() : &size_column;

checkAreLowCardinalityInsertable(array_source, sink);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "generic" implementation should already check that the types are identical, and the calling code should convert to the identical types before going here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is the generic function: A template that relies on all the writeSlice specializations. That's why the initial fix code was implemented in the specialized writeSlice function.

Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@alexey-milovidov alexey-milovidov self-assigned this Dec 9, 2025
Ergus added 4 commits January 5, 2026 12:59
* src/Functions/GatherUtils/Algorithms.h (writeSlice) : Extend the condition to allow use insertRangeFrom when lowCardinality numeric
types. This is a preliminar fix that works, but maybe this extension is not general enough.
* tests/queries/0_stateless/02447_join_numeric_low_cardinality.{sql,reference}: New tests with previously failing queries.

Fixes: ClickHouse#57243
NOTE: This is intended to be better for performance, BUT decouples the problem source from the check to prevent it.
So, I keep the assertion in case the issue triggers in the future again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Array join: Function writeSlice expects same column types for GenericArraySlice and GenericArraySink

2 participants