Skip to content

Fix IPv4 parsing returning garbage for some invalid inputs#90545

Merged
al13n321 merged 2 commits intomasterfrom
lemire
Dec 3, 2025
Merged

Fix IPv4 parsing returning garbage for some invalid inputs#90545
al13n321 merged 2 commits intomasterfrom
lemire

Conversation

@al13n321
Copy link
Copy Markdown
Member

@al13n321 al13n321 commented Nov 21, 2025

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixed IPv4 parsing functions (e.g. IPv4StringToNumOrDefault) returning garbage for some invalid inputs. Resolves #90544. Resolves #87583

Details

Closes #90544

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Nov 21, 2025

Workflow [PR], commit [201870a]

Summary:

job_name test_name status info comment
Stateless tests (arm_asan, targeted) failure
02030_capnp_format FAIL cidb
02030_capnp_format FAIL cidb
02030_capnp_format FAIL cidb
Bugfix validation (functional tests) failure
03727_ipv4_parsing_bug FAIL cidb
Stateless tests (amd_debug, parallel) failure
03746_system_background_schedule_pool_log FAIL cidb
BuzzHouse (amd_debug) failure
Logical error: 'Inconsistent AST formatting: the query: FAIL cidb
BuzzHouse (amd_ubsan) failure
UndefinedBehaviorSanitizer: undefined behavior (STID: 4443-517f) FAIL cidb

@clickhouse-gh clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Nov 21, 2025
@al13n321
Copy link
Copy Markdown
Member Author

(Also tested it exhaustively with a throwaway program: https://pastila.nl/?00341389/481302c8be36778056a08a913fe30b35#3WESTWRfajr0rKdSkl8dsA== , doesn't seem worthwhile to polish it into a proper gtest.)

@Avogar Avogar self-assigned this Nov 21, 2025
@abashkeev abashkeev requested a review from Copilot November 25, 2025 18:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug in the IPv4 parsing implementation where invalid inputs could return garbage values due to hash collisions in the SSE-optimized parser. The fix adds validation to ensure the detected pattern matches the actual input structure before proceeding with parsing.

Key changes:

  • Added dotmask validation to prevent hash collision false positives
  • Extended pattern table to include expected dotmask for collision detection
  • Added test cases covering the problematic invalid inputs

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/Common/formatIPv6.cpp Core fix: Expanded pattern table from 16 to 18 bytes per entry to include dotmask validation data, and added collision detection check before pattern matching
tests/queries/0_stateless/03727_ipv4_parsing_bug.sql Test queries for invalid IPv4 strings that previously returned garbage
tests/queries/0_stateless/03727_ipv4_parsing_bug.reference Expected output showing all invalid inputs now correctly return 0.0.0.0

@al13n321 al13n321 enabled auto-merge December 3, 2025 21:34
@al13n321 al13n321 added this pull request to the merge queue Dec 3, 2025
Merged via the queue into master with commit 90326b2 Dec 3, 2025
124 of 130 checks passed
@al13n321 al13n321 deleted the lemire branch December 3, 2025 22:05
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Dec 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IPv4 parsing sometimes succeeds on invalid inputs Test 03212_variant_dynamic_cast_or_default is flaky

4 participants