Skip to content

Add output_format_float_precision setting to limit decimal digits in float output#99721

Open
phulv94 wants to merge 17 commits intoClickHouse:masterfrom
phulv94:add-output-format-float-precision-setting
Open

Add output_format_float_precision setting to limit decimal digits in float output#99721
phulv94 wants to merge 17 commits intoClickHouse:masterfrom
phulv94:add-output-format-float-precision-setting

Conversation

@phulv94
Copy link
Copy Markdown
Contributor

@phulv94 phulv94 commented Mar 17, 2026

When set to 0 (the default), the existing shortest round-trip representation (dragonbox) is used. When set to N, values are rounded to N decimal places using the double-conversion library's ToFixed. Closes #99199

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Add output_format_float_precision setting to control the number of decimal digits in floating-point text output.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@phulv94
Copy link
Copy Markdown
Contributor Author

phulv94 commented Mar 17, 2026

@rienath Can you take a look when you have freetime?

@rienath rienath self-assigned this Mar 17, 2026
@rienath rienath added the can be tested Allows running workflows for external contributors label Mar 17, 2026
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 17, 2026

Workflow [PR], commit [8b4cba8]

Summary:

job_name test_name status info comment
Stress test (arm_asan_ubsan, s3) failure
Logical error: 'std::exception. Code: 1001, type: std::invalid_argument, e.what() = stoi: no conversion (version 26.4.1.408), Stack trace: (STID: 2508-3132) FAIL cidb
Stress test (arm_tsan) failure
Logical error: Unexpected number of columns in result sample block: A expected B ([C] = [D] + [E] + [F]) (STID: 2980-385f) FAIL cidb, issue
Stress test (arm_msan) failure
MemorySanitizer: use-of-uninitialized-value (STID: 1003-358c) FAIL cidb
Stress test (arm_ubsan) failure
Hung check failed, possible deadlock found FAIL cidb
Finish Workflow failure
python3 ./ci/jobs/scripts/workflow_hooks/feature_docs.py failure

AI Review

Summary

This PR introduces output_format_float_precision and wires it through text serialization paths (SerializationNumber, writeJSONNumber) with fallback behavior for non-finite and large-magnitude values, plus coverage in a new stateless test. On the current PR head, I did not find additional correctness, safety, or performance issues beyond already-discussed inline comments.

Missing context
  • ⚠️ No CI run results/logs were provided in this review request, so runtime validation was limited to static analysis of the PR head diff and tests.
ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh bot added the pr-improvement Pull request with some product improvements label Mar 17, 2026
@phulv94
Copy link
Copy Markdown
Contributor Author

phulv94 commented Mar 19, 2026

@rienath seem like the CI look good, can you take a look?

@rienath
Copy link
Copy Markdown
Member

rienath commented Mar 19, 2026

@phulv94 please resolve conflicts

@phulv94
Copy link
Copy Markdown
Contributor Author

phulv94 commented Mar 20, 2026

@phulv94 please resolve conflicts

I've resolved conflicts.

@rienath rienath changed the title Add output format float precision setting Add output_format_float_precision setting to limit decimal digits in float output Mar 27, 2026
Copy link
Copy Markdown
Member

@rienath rienath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! The overall approach looks good. There are a couple of issues to address before merge.

With the current implementation, ToFixed always pads with trailing zeroes up to the requested precision, even when they carry no information. For example:

:) SELECT (1.1::BFloat16) settings output_format_float_precision=0;
-- 1.09375

:) SELECT (1.1::BFloat16) settings output_format_float_precision=30;
-- 1.093750000000000000000000000000

The exact representation has only 6 meaningful digits, but we get 24 extra zeroes. Similarly for negative zero:

:) SELECT (-0.0::BFloat16) settings output_format_float_precision=1;
-- -0.0

:) SELECT (-0.0::BFloat16) settings output_format_float_precision=0;
-- -0

The setting should mean "round to at most N decimal places", not "always pad to N decimal places". It would be great to strip trailing zeroes (and the trailing decimal point if it becomes unnecessary), so that output_format_float_precision = 30 produces 1.09375, not 1.093750000000000000000000000000.

Another important problem is numbers with magnitude >= 10^60 throw because ToFixed can't represent them. A format setting should never cause previously-working queries to fail on valid data.

:) SELECT 1e61 settings output_format_float_precision=0;
-- 1e61

:) SELECT 1e61 settings output_format_float_precision=1;
-- Code: 28. DB::Exception: Cannot print floating point number. (CANNOT_PRINT_FLOAT_OR_DOUBLE_NUMBER)

When you fixe these, please add regression tests. You could use the examples above

{"functions_h3_default_if_invalid", true, false, "A new setting for legacy behaviour to allow invalid inputs to h3 functions"},
{"max_skip_unavailable_shards_num", 0, 0, "New setting to limit the number of shards that can be silently skipped when skip_unavailable_shards is enabled."},
{"max_skip_unavailable_shards_ratio", 0, 0, "New setting to limit the ratio of shards that can be silently skipped when skip_unavailable_shards is enabled."},
{"output_format_float_precision", 0, 0, "A new setting to control decimal digits in float output"},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to merge master and move this to 26.4 section

bool null_as_default = true;
bool force_null_for_omitted_fields = false;
bool decimal_trailing_zeros = false;
UInt64 float_precision = 0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UInt64 is an overkill for a value that is capped at 60. Let's try UInt8 and add a regression test to see what happens when user chooses a value that exceeds it

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we also need to change the type in src/Core/FormatFactorySettings.h in DELACRE(...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seem like DELACRE( not support UInt8.

Comment on lines +18 to +23
-- Test both Float32 and Float64
SET output_format_float_precision = 4;
SELECT toFloat32(1.0/3);
SELECT toFloat64(1.0/3);
SELECT toFloat32(3.141592653589793);
SELECT toFloat64(3.141592653589793);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's test BFloat16 too

Comment on lines +1238 to +1239
Number of decimal digits after the decimal point for floating-point output (`Float32`, `Float64`, `BFloat16`).
If set to 0 (the default), uses the shortest round-trip representation.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Number of decimal digits after the decimal point for floating-point output (`Float32`, `Float64`, `BFloat16`).
If set to 0 (the default), uses the shortest round-trip representation.
When non-zero, round floating-point output (`Float32`, `Float64`, `BFloat16`) to this many digits after the decimal point. When 0 (default), use the default representation.

Comment on lines +193 to +195
if (settings.float_precision > 60)
throw Exception(ErrorCodes::CANNOT_PRINT_FLOAT_OR_DOUBLE_NUMBER,
"Too high precision requested for Float, must not be more than 60, got {}", settings.float_precision);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's currently hard-coded at 60. I think the library permits a higher value. Let's use some constant out of the library instead of hardcoding it if that is possible. Perhaps it will change it future. We want our code to pick up the change gracefully

Comment on lines +201 to +203
if (!result)
throw Exception(ErrorCodes::CANNOT_PRINT_FLOAT_OR_DOUBLE_NUMBER,
"Cannot print floating point number");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code path should be unreachable. So let's change the error type to LOGICAL_ERROR. I understand that we can currently have a large magnitude value that will throw, but it's best to fix this case

Comment on lines +61 to +66
SELECT toFloat64('inf');
SELECT toFloat64('-inf');
SELECT toFloat64('nan');
SELECT toFloat32('inf');
SELECT toFloat32('-inf');
SELECT toFloat32('nan');
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add BFloat16 case too

@clickhouse-gh clickhouse-gh bot added pr-feature Pull request with new product feature and removed pr-improvement Pull request with some product improvements labels Mar 29, 2026
-- Test CSV
SELECT 1.0/3 FORMAT CSV;

-- Test out-of-range precision (> 100) raises a BAD_ARGUMENTS exception
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says > 100, but the implementation validates against DoubleToStringConverter::kMaxFixedDigitsAfterPoint (currently 60) and this test uses 101 as just one out-of-range example.

Please update the comment to avoid confusion, e.g. -- Test out-of-range precision (> 60) raises a BAD_ARGUMENTS exception.

@clickhouse-gh clickhouse-gh bot added pr-improvement Pull request with some product improvements and removed pr-feature Pull request with new product feature labels Mar 30, 2026
@clickhouse-gh clickhouse-gh bot added pr-feature Pull request with new product feature and removed pr-improvement Pull request with some product improvements labels Mar 30, 2026
/// ToFixed returns false for values with magnitude >= 10^kMaxFixedDigitsBeforePoint.
/// Fall back to the default shortest round-trip representation for such values.
const double x_double = static_cast<double>(x);
if (std::abs(x_double) >= 1e60)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writeFloatText uses a hardcoded fallback threshold 1e60 while the guard comment references DoubleToStringConverter::kMaxFixedDigitsBeforePoint.

This is brittle: if upstream double-conversion changes the max fixed digits, behavior silently diverges from the actual converter limit. Please derive the threshold from the library constant (for example, compute pow(10, kMaxFixedDigitsBeforePoint) once) instead of embedding 1e60.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 30, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.20% 84.10% -0.10%
Functions 90.90% 90.90% +0.00%
Branches 76.80% 76.70% -0.10%

Changed lines: 98.88% (88/89) · Uncovered code

Full report · Diff report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add output_format_float_precision setting to control decimal digits in float output

2 participants