Add output_format_float_precision setting to limit decimal digits in float output#99721
Add output_format_float_precision setting to limit decimal digits in float output#99721phulv94 wants to merge 17 commits intoClickHouse:masterfrom
output_format_float_precision setting to limit decimal digits in float output#99721Conversation
|
@rienath Can you take a look when you have freetime? |
|
Workflow [PR], commit [8b4cba8] Summary: ❌
AI ReviewSummaryThis PR introduces Missing context
ClickHouse Rules
Final Verdict
|
tests/queries/0_stateless/03400_output_format_float_precision.sql
Outdated
Show resolved
Hide resolved
|
@rienath seem like the CI look good, can you take a look? |
|
@phulv94 please resolve conflicts |
I've resolved conflicts. |
output_format_float_precision setting to limit decimal digits in float output
rienath
left a comment
There was a problem hiding this comment.
Thanks for working on this! The overall approach looks good. There are a couple of issues to address before merge.
With the current implementation, ToFixed always pads with trailing zeroes up to the requested precision, even when they carry no information. For example:
:) SELECT (1.1::BFloat16) settings output_format_float_precision=0;
-- 1.09375
:) SELECT (1.1::BFloat16) settings output_format_float_precision=30;
-- 1.093750000000000000000000000000The exact representation has only 6 meaningful digits, but we get 24 extra zeroes. Similarly for negative zero:
:) SELECT (-0.0::BFloat16) settings output_format_float_precision=1;
-- -0.0
:) SELECT (-0.0::BFloat16) settings output_format_float_precision=0;
-- -0The setting should mean "round to at most N decimal places", not "always pad to N decimal places". It would be great to strip trailing zeroes (and the trailing decimal point if it becomes unnecessary), so that output_format_float_precision = 30 produces 1.09375, not 1.093750000000000000000000000000.
Another important problem is numbers with magnitude >= 10^60 throw because ToFixed can't represent them. A format setting should never cause previously-working queries to fail on valid data.
:) SELECT 1e61 settings output_format_float_precision=0;
-- 1e61
:) SELECT 1e61 settings output_format_float_precision=1;
-- Code: 28. DB::Exception: Cannot print floating point number. (CANNOT_PRINT_FLOAT_OR_DOUBLE_NUMBER)When you fixe these, please add regression tests. You could use the examples above
src/Core/SettingsChangesHistory.cpp
Outdated
| {"functions_h3_default_if_invalid", true, false, "A new setting for legacy behaviour to allow invalid inputs to h3 functions"}, | ||
| {"max_skip_unavailable_shards_num", 0, 0, "New setting to limit the number of shards that can be silently skipped when skip_unavailable_shards is enabled."}, | ||
| {"max_skip_unavailable_shards_ratio", 0, 0, "New setting to limit the ratio of shards that can be silently skipped when skip_unavailable_shards is enabled."}, | ||
| {"output_format_float_precision", 0, 0, "A new setting to control decimal digits in float output"}, |
There was a problem hiding this comment.
We need to merge master and move this to 26.4 section
src/Formats/FormatSettings.h
Outdated
| bool null_as_default = true; | ||
| bool force_null_for_omitted_fields = false; | ||
| bool decimal_trailing_zeros = false; | ||
| UInt64 float_precision = 0; |
There was a problem hiding this comment.
UInt64 is an overkill for a value that is capped at 60. Let's try UInt8 and add a regression test to see what happens when user chooses a value that exceeds it
There was a problem hiding this comment.
Then we also need to change the type in src/Core/FormatFactorySettings.h in DELACRE(...
There was a problem hiding this comment.
Seem like DELACRE( not support UInt8.
| -- Test both Float32 and Float64 | ||
| SET output_format_float_precision = 4; | ||
| SELECT toFloat32(1.0/3); | ||
| SELECT toFloat64(1.0/3); | ||
| SELECT toFloat32(3.141592653589793); | ||
| SELECT toFloat64(3.141592653589793); |
src/Core/FormatFactorySettings.h
Outdated
| Number of decimal digits after the decimal point for floating-point output (`Float32`, `Float64`, `BFloat16`). | ||
| If set to 0 (the default), uses the shortest round-trip representation. |
There was a problem hiding this comment.
| Number of decimal digits after the decimal point for floating-point output (`Float32`, `Float64`, `BFloat16`). | |
| If set to 0 (the default), uses the shortest round-trip representation. | |
| When non-zero, round floating-point output (`Float32`, `Float64`, `BFloat16`) to this many digits after the decimal point. When 0 (default), use the default representation. |
src/IO/WriteHelpers.cpp
Outdated
| if (settings.float_precision > 60) | ||
| throw Exception(ErrorCodes::CANNOT_PRINT_FLOAT_OR_DOUBLE_NUMBER, | ||
| "Too high precision requested for Float, must not be more than 60, got {}", settings.float_precision); |
There was a problem hiding this comment.
It's currently hard-coded at 60. I think the library permits a higher value. Let's use some constant out of the library instead of hardcoding it if that is possible. Perhaps it will change it future. We want our code to pick up the change gracefully
src/IO/WriteHelpers.cpp
Outdated
| if (!result) | ||
| throw Exception(ErrorCodes::CANNOT_PRINT_FLOAT_OR_DOUBLE_NUMBER, | ||
| "Cannot print floating point number"); |
There was a problem hiding this comment.
This code path should be unreachable. So let's change the error type to LOGICAL_ERROR. I understand that we can currently have a large magnitude value that will throw, but it's best to fix this case
| SELECT toFloat64('inf'); | ||
| SELECT toFloat64('-inf'); | ||
| SELECT toFloat64('nan'); | ||
| SELECT toFloat32('inf'); | ||
| SELECT toFloat32('-inf'); | ||
| SELECT toFloat32('nan'); |
…loat-precision-setting
| -- Test CSV | ||
| SELECT 1.0/3 FORMAT CSV; | ||
|
|
||
| -- Test out-of-range precision (> 100) raises a BAD_ARGUMENTS exception |
There was a problem hiding this comment.
The comment says > 100, but the implementation validates against DoubleToStringConverter::kMaxFixedDigitsAfterPoint (currently 60) and this test uses 101 as just one out-of-range example.
Please update the comment to avoid confusion, e.g. -- Test out-of-range precision (> 60) raises a BAD_ARGUMENTS exception.
src/IO/WriteHelpers.cpp
Outdated
| /// ToFixed returns false for values with magnitude >= 10^kMaxFixedDigitsBeforePoint. | ||
| /// Fall back to the default shortest round-trip representation for such values. | ||
| const double x_double = static_cast<double>(x); | ||
| if (std::abs(x_double) >= 1e60) |
There was a problem hiding this comment.
writeFloatText uses a hardcoded fallback threshold 1e60 while the guard comment references DoubleToStringConverter::kMaxFixedDigitsBeforePoint.
This is brittle: if upstream double-conversion changes the max fixed digits, behavior silently diverges from the actual converter limit. Please derive the threshold from the library constant (for example, compute pow(10, kMaxFixedDigitsBeforePoint) once) instead of embedding 1e60.
LLVM Coverage Report
Changed lines: 98.88% (88/89) · Uncovered code |
When set to 0 (the default), the existing shortest round-trip representation (dragonbox) is used. When set to N, values are rounded to N decimal places using the double-conversion library's ToFixed. Closes #99199
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Add output_format_float_precision setting to control the number of decimal digits in floating-point text output.
Documentation entry for user-facing changes