Skip to content

Add a natural sort key function#90322

Merged
vitlibar merged 7 commits intoClickHouse:masterfrom
nazarii-piontko:features/natural-sort
Mar 12, 2026
Merged

Add a natural sort key function#90322
vitlibar merged 7 commits intoClickHouse:masterfrom
nazarii-piontko:features/natural-sort

Conversation

@nazarii-piontko
Copy link
Copy Markdown
Contributor

@nazarii-piontko nazarii-piontko commented Nov 18, 2025

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

You can now use the natural sort key function as naturalSortKey(s).

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Details

  • Resolves Natural / version sort #14003
  • Uses prefix encoding and ignores leading zeros to guarantee natural ordering.
  • Implementation supports only positive numbers.
  • Implementation trims leading zeros; optionally, it can accept parameters to customize such behavior.
  • Implementation relies on a capacity function that allocates a power of 2 bytes as an additional buffer. Alternatively, one additional run over the data can provide an exact capacity estimation.
  • Implementation uses a printable ASCII range to encode the size of the number in one byte only.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Nov 18, 2025

CLA assistant check
All committers have signed the CLA.

@nazarii-piontko nazarii-piontko changed the title Add a natural sort key function WIP: Add a natural sort key function Nov 18, 2025
@nikitamikhaylov nikitamikhaylov added the can be tested Allows running workflows for external contributors label Nov 19, 2025
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Nov 19, 2025

Workflow [PR], commit [21a7727]

Summary:

job_name test_name status info comment
Upgrade check (amd_release) failure
Changed settings are not reflected in the settings changes history (see changed_settings.txt) FAIL cidb
BuzzHouse (amd_ubsan) failure
Segmentation fault (STID: 2355-2e2b) FAIL cidb

@clickhouse-gh clickhouse-gh bot added the pr-feature Pull request with new product feature label Nov 19, 2025
@nazarii-piontko
Copy link
Copy Markdown
Contributor Author

It is the first version of implementation. It has some limitations and considerations described in the description part. I'm happy to explain more about that and go ahead with the improvement suggestion.

@nazarii-piontko nazarii-piontko marked this pull request as ready for review November 19, 2025 18:58
@nazarii-piontko nazarii-piontko changed the title WIP: Add a natural sort key function Add a natural sort key function Nov 19, 2025
@vitlibar vitlibar self-assigned this Feb 4, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new string function naturalSortKey(s) that transforms digit runs inside strings into an order-preserving “natural sort key” so that lexicographic ordering matches human-friendly numeric ordering (e.g., file2 < file10). It also adds a stateless test covering a variety of real-world string patterns and edge cases.

Changes:

  • Add naturalSortKey function implementation in src/Functions/.
  • Add stateless SQL test cases exercising numeric segments, leading zeros, multiple numeric runs, unicode, NULL/empty, and long strings.
  • Add expected output reference for the new stateless test.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
src/Functions/naturalSortKey.cpp Implements naturalSortKey(s) by encoding digit-run lengths and digits into a comparable key; registers the function and its documentation.
tests/queries/0_stateless/03721_check_natural_sort_key.sql Adds coverage-oriented stateless queries that order by naturalSortKey(...) across many representative inputs.
tests/queries/0_stateless/03721_check_natural_sort_key.reference Adds the expected output for the new stateless test.

@vitlibar vitlibar mentioned this pull request Feb 28, 2026
1 task
@vitlibar
Copy link
Copy Markdown
Member

vitlibar commented Mar 7, 2026

@vitlibar vitlibar enabled auto-merge March 12, 2026 20:47
@vitlibar vitlibar added this pull request to the merge queue Mar 12, 2026
Merged via the queue into ClickHouse:master with commit afa3066 Mar 12, 2026
146 of 149 checks passed
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 12, 2026
@PedroTadim
Copy link
Copy Markdown
Member

PedroTadim commented Mar 16, 2026

A few comments:

  • Could this be implemented as a collation instead? That sounds more like the proper implementation for this?
  • AI comment: The implementation of encodeString fails to append a null terminator (\0) to the encoded strings. Could you verify if this is needed?
  • AI comment: The executeImpl method only handles ColumnString and throws an ILLEGAL_COLUMN exception for FixedString or other string-like columns, even though isString in getReturnTypeImpl allows them. Can you test with FixedString, LowCardinality, Nullable String types?

cc @vitlibar

@vitlibar
Copy link
Copy Markdown
Member

vitlibar commented Mar 16, 2026

Could this be implemented as a collation instead?

Comparing with collation, function naturalSortKey() can be used with arraySort, and also we can keep the results of this function in a separate column for some optimization scenarios.

The implementation of encodeString fails to append a null terminator (\0) to the encoded strings. Could you verify if this is needed?

We stopped appending a null terminator a while ago (#85063)

FixedString, LowCardinality, Nullable String types

For Nullable and LowCardinality it seems it should be OK. For FixedString it'll throw an exception ILLEGAL_TYPE_OF_ARGUMENT which is kind of OK because the documentation says that only String type is supported. Though we can add support for FixedString, that isn't difficult.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-feature Pull request with new product feature pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Natural / version sort

7 participants