perf(clp-s::timestamp_parser): Improve marshalling speed for padded integer fields (resolves #1968).#2013
Conversation
WalkthroughAdds Changes
Sequence Diagram(s)(Skipped — changes are internal to a single component and do not introduce multi-component sequential interactions.) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (1)
751-773: 🧹 Nitpick | 🔵 TrivialNear-duplicate logic in the
'T'case could be extracted into a shared helper.Both the date-time marshal (lines 751–773) and numeric marshal (lines 872–894) create a temporary string, pad it, strip trailing zeros, check for all-zeros, and append to the buffer. This is an existing pattern, but the PR touches both blocks. Consider extracting this into a small helper to reduce duplication.
Also applies to: 872-894
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/core/src/clp_s/timestamp_parser/TimestampParser.cpp` around lines 751 - 773, Extract the repeated logic that builds a 9-digit left-padded nanoseconds string, trims trailing zeros, validates not-all-zero, and appends to the output into a small helper (e.g., formatOrAppendFractionalNanoseconds or appendTrimmedSubseconds) and call it from the 'T' branch in TimestampParser::... (the date-time marshal case) and from the numeric marshal case; the helper should accept the nanoseconds value (from time_of_day.subseconds().count() or equivalent) plus a reference to the buffer (or return a string and an ErrorCode) and must reproduce the current behavior: pad to 9 digits, strip trailing '0's, return ErrorCode::IncompatibleTimestampPattern when all digits are zero, and otherwise append the trimmed substring to the buffer so both code paths use the same implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/core/src/clp_s/timestamp_parser/TimestampParser.cpp`:
- Around line 946-951: The current code in TimestampParser.cpp creates a
temporary std::string via std::to_string(value) before appending padding and the
value to buffer; replace that allocation by formatting the integer directly into
a small stack char array using std::to_chars (include <charconv>), compute
digits written, compute num_padding_characters as before, then call
buffer.append(num_padding_characters, padding_character) and buffer.append(ptr,
ptr + digits_written) instead of using value_str—target the block that declares
value, length, padding_character and the buffer appends to eliminate the heap
allocation.
- Around line 936-953: The function append_positive_left_padded_integer
currently takes an int which causes implicit narrowing at call sites; change the
parameter type of value from int to int64_t in both the function declaration and
the function definition (append_positive_left_padded_integer) so it accepts
int64_t (matching extract_absolute_subsecond_nanoseconds()/epochtime_t and
chrono .count() results); ensure any includes for <cstdint> are present and
update any forward declarations or prototypes to the new signature so callers no
longer perform narrowing conversions.
---
Outside diff comments:
In `@components/core/src/clp_s/timestamp_parser/TimestampParser.cpp`:
- Around line 751-773: Extract the repeated logic that builds a 9-digit
left-padded nanoseconds string, trims trailing zeros, validates not-all-zero,
and appends to the output into a small helper (e.g.,
formatOrAppendFractionalNanoseconds or appendTrimmedSubseconds) and call it from
the 'T' branch in TimestampParser::... (the date-time marshal case) and from the
numeric marshal case; the helper should accept the nanoseconds value (from
time_of_day.subseconds().count() or equivalent) plus a reference to the buffer
(or return a string and an ErrorCode) and must reproduce the current behavior:
pad to 9 digits, strip trailing '0's, return
ErrorCode::IncompatibleTimestampPattern when all digits are zero, and otherwise
append the trimmed substring to the buffer so both code paths use the same
implementation.
| auto append_positive_left_padded_integer( | ||
| std::string& buffer, | ||
| size_t length, | ||
| int value, | ||
| char padding_character | ||
| ) -> ystdlib::error_handling::Result<void> { | ||
| if (value < 0) { | ||
| return ErrorCode{ErrorCodeEnum::InvalidDate}; | ||
| } | ||
|
|
||
| auto value_str{std::to_string(value)}; | ||
| size_t const num_padding_characters{ | ||
| value_str.length() >= length ? size_t{0} : length - value_str.length() | ||
| }; | ||
| buffer.append(num_padding_characters, padding_character); | ||
| buffer.append(value_str); | ||
| return ystdlib::error_handling::success(); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find epochtime_t definition
echo "=== epochtime_t definition ==="
rg -n 'using epochtime_t|typedef.*epochtime_t' --type=cpp --type=hppRepository: y-scope/clp
Length of output: 130
🏁 Script executed:
#!/bin/bash
# Find the file structure and locate the function declaration
echo "=== Function declaration (should be around line 312) ==="
sed -n '310,315p' components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
echo ""
echo "=== Function definition (lines 936-953) ==="
sed -n '936,953p' components/core/src/clp_s/timestamp_parser/TimestampParser.cppRepository: y-scope/clp
Length of output: 889
🏁 Script executed:
#!/bin/bash
# Find ALL call sites of append_positive_left_padded_integer with context
echo "=== All call sites of append_positive_left_padded_integer ==="
rg -n 'append_positive_left_padded_integer' components/core/src/clp_s/timestamp_parser/TimestampParser.cpp -B2 -A2Repository: y-scope/clp
Length of output: 6859
🏁 Script executed:
#!/bin/bash
# Search more broadly for epochtime_t definition
echo "=== Searching for epochtime_t in all files ==="
find . -type f \( -name "*.h" -o -name "*.hpp" -o -name "*.cpp" \) -exec grep -l "epochtime_t" {} \; | head -10
echo ""
echo "=== First file containing epochtime_t ==="
find . -type f \( -name "*.h" -o -name "*.hpp" -o -name "*.cpp" \) -exec grep -l "epochtime_t" {} \; | head -1 | xargs grep -n "epochtime_t" | head -5Repository: y-scope/clp
Length of output: 8995
🏁 Script executed:
#!/bin/bash
# Find chrono types for hours, minutes, seconds, subseconds
echo "=== Check time_of_day type definition ==="
grep -rn "time_of_day\|make_time" components/core/src/clp_s/timestamp_parser/TimestampParser.cpp --include="*.cpp" | head -10
echo ""
echo "=== Search for date header and its types ==="
grep -n "#include.*date" components/core/src/clp_s/timestamp_parser/TimestampParser.cpp | head -5Repository: y-scope/clp
Length of output: 927
🏁 Script executed:
#!/bin/bash
# Find extract_absolute_subsecond_nanoseconds return type
echo "=== extract_absolute_subsecond_nanoseconds definition/declaration ==="
rg -n "extract_absolute_subsecond_nanoseconds" components/core/src/clp_s/timestamp_parser/TimestampParser.cpp -B3 -A8 | head -40
echo ""
echo "=== cLeapSecond definition ==="
grep -n "cLeapSecond" components/core/src/clp_s/timestamp_parser/TimestampParser.cpp | head -5Repository: y-scope/clp
Length of output: 2511
Parameter type int causes implicit narrowing conversions at call sites—use int64_t instead.
The value parameter receives values from extract_absolute_subsecond_nanoseconds() (which returns epochtime_t/int64_t) and chrono .count() methods (which return long long). Passing these wider types to an int parameter at lines 729, 740, 747, 754, 850, 861, 868, 875 triggers implicit narrowing conversions. While values fit in int range at runtime, this is a type safety concern that could cause -Wconversion warnings and is fragile if ranges ever change.
Changing the parameter to int64_t eliminates all narrowing with zero performance cost. Update both the declaration (line 309) and definition (line 936).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/core/src/clp_s/timestamp_parser/TimestampParser.cpp` around lines
936 - 953, The function append_positive_left_padded_integer currently takes an
int which causes implicit narrowing at call sites; change the parameter type of
value from int to int64_t in both the function declaration and the function
definition (append_positive_left_padded_integer) so it accepts int64_t (matching
extract_absolute_subsecond_nanoseconds()/epochtime_t and chrono .count()
results); ensure any includes for <cstdint> are present and update any forward
declarations or prototypes to the new signature so callers no longer perform
narrowing conversions.
| std::string& buffer, | ||
| size_t length, | ||
| int value, | ||
| char padding_character |
There was a problem hiding this comment.
How about:
int value,
size_t min_padded_length,
char padding_character,
std::string& bufferlength->min_padded_lengthto be more explicit about what this length is.- Put
bufferto the end since it's an output param.
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
There was a problem hiding this comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@components/core/src/clp_s/timestamp_parser/TimestampParser.cpp`:
- Around line 309-314: The declaration of append_positive_left_padded_integer
currently takes an int which causes implicit narrowing at call sites that pass
long long/int64_t; change the parameter type from int to int64_t in the function
declaration and the corresponding definition
(append_positive_left_padded_integer) so callers (chrono .count(),
extract_absolute_subsecond_nanoseconds, epochtime_t, etc.) no longer narrow;
keep the same validation inside the function that checks value >= 0 and no other
behavior changes are required—update any matching forward declarations or
overloads to use int64_t as well.
- Around line 936-953: The function append_positive_left_padded_integer
currently uses std::to_string which heap-allocates; change it to use
std::to_chars writing into a stack-local std::array<char,20> (sized for int64_t)
and compute the digit count from the returned ptr to avoid allocations. Keep the
negative check, optionally change the parameter from int to int64_t (and adjust
checks) so the buffer size is safe, compute num_padding_characters from the
digit count, then append the padding and the digit range to buffer via
buffer.append(pointer, count). Ensure error handling remains the same and return
success().
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/core/src/clp_s/timestamp_parser/TimestampParser.cpp`:
- Around line 879-885: Remove the stray blank line in the call to
append_positive_left_padded_integer wrapped by YSTDLIB_ERROR_HANDLING_TRYV so
the argument list is contiguous (match the style used in
marshal_date_time_timestamp); specifically, edit the call that passes
subsecond_nanoseconds, 9, '0', subsecond_nanoseconds_str to remove the empty
line between subsecond_nanoseconds and 9 so the macro invocation and its
arguments are formatted consistently.
- Around line 879-885: Remove the stray empty line inside the argument list of
the append_positive_left_padded_integer call in the 'T' case of
marshal_numeric_timestamp: locate the call to
append_positive_left_padded_integer with symbols subsecond_nanoseconds, 9, '0',
and subsecond_nanoseconds_str and ensure there is no blank line between the
first argument (subsecond_nanoseconds) and the next argument (9) so the argument
list matches the formatting used in marshal_date_time_timestamp.
- Around line 957-959: The current early return uses a std::error_code named err
(created from ec) which yields std::generic_category; instead return the project
ErrorCode type using the appropriate ErrorCodeEnum value used elsewhere in this
file (replace "return err;" with "return
ErrorCode{ErrorCodeEnum::<appropriate_enum>}" so the error belongs to the
project's category), and remove the now-unnecessary `#include` <system_error>;
locate the occurrence in TimestampParser.cpp around the to_chars handling where
err is declared and mirror the enum choice used by other returns in this file.
- Around line 957-959: The code currently returns a std::error_code (err) which
uses std::generic_category; replace that with the project-specific ErrorCode so
the returned error belongs to the project's category. In the block with
variables ec/err (the to_chars failure path), map the std::error_code to the
appropriate ErrorCode by returning ErrorCode{ErrorCodeEnum::<appropriate_enum>}
(pick the closest existing enum such as
ParseError/InvalidTimestamp/UnexpectedFailure used elsewhere in this file)
instead of returning err, so all error returns in TimestampParser.cpp use the
project ErrorCode type and category.
---
Duplicate comments:
In `@components/core/src/clp_s/timestamp_parser/TimestampParser.cpp`:
- Around line 311-312: The function append_positive_left_padded_integer
currently takes an int (append_positive_left_padded_integer(int value,...))
which causes implicit narrowing when callers pass long long/int64_t
(chrono.count(), epochtime_t, extract_absolute_subsecond_nanoseconds); change
the parameter type to a wider integer (e.g., long long or int64_t) in both the
declaration and definition of append_positive_left_padded_integer and adjust any
internal arithmetic/formatting to use that wider type so callers no longer
perform narrowing conversions (no other semantic changes required).
- Around line 311-312: The function append_positive_left_padded_integer
currently takes an int but is called with int64_t/long long values (e.g.,
time_of_day.hours/minutes/seconds/subseconds().count() and
extract_absolute_subsecond_nanoseconds results), causing implicit narrowing;
change the parameter type of append_positive_left_padded_integer to a 64-bit
signed type (e.g., int64_t or long long) and update any related local
temporaries accordingly, then add explicit range checks/assertions to ensure the
value fits the expected digit width before doing any narrowing/static_cast to
int for digit extraction or buffer indexing; locate the function by name
append_positive_left_padded_integer and adjust callers only if they rely on
overload ambiguity.
- Line 951: The use of the GCC-only typeof in the declaration of
value_str_buffer should be replaced with standard C++ decltype: update the array
type expression in the line that defines std::array<char,
std::numeric_limits<typeof(value)>::digits10 + 1> value_str_buffer{} to use
std::numeric_limits<decltype(value)>::digits10 + 1 instead; ensure the change
targets the value_str_buffer declaration in TimestampParser.cpp so the code
remains portable and compiles with non-GCC compilers.
- Line 951: The declaration of value_str_buffer uses the non-standard GCC
extension typeof; replace typeof(value) with the standard C++ decltype by
changing the expression to use std::numeric_limits<decltype(value)>::digits10 +
1 so value_str_buffer is declared as std::array<char,
std::numeric_limits<decltype(value)>::digits10 + 1> value_str_buffer{}; (update
the line in TimestampParser.cpp where value_str_buffer is defined).
| YSTDLIB_ERROR_HANDLING_TRYV(append_positive_left_padded_integer( | ||
| subsecond_nanoseconds, | ||
|
|
||
| 9, | ||
| '0', | ||
| subsecond_nanoseconds_str | ||
| )); |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Stray blank line inside argument list.
Line 881 is an empty line between the first and second arguments of append_positive_left_padded_integer. The equivalent call in marshal_date_time_timestamp (lines 757–762) has no such blank line.
♻️ Proposed fix
YSTDLIB_ERROR_HANDLING_TRYV(append_positive_left_padded_integer(
subsecond_nanoseconds,
-
9,
'0',
subsecond_nanoseconds_str
));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| YSTDLIB_ERROR_HANDLING_TRYV(append_positive_left_padded_integer( | |
| subsecond_nanoseconds, | |
| 9, | |
| '0', | |
| subsecond_nanoseconds_str | |
| )); | |
| YSTDLIB_ERROR_HANDLING_TRYV(append_positive_left_padded_integer( | |
| subsecond_nanoseconds, | |
| 9, | |
| '0', | |
| subsecond_nanoseconds_str | |
| )); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/core/src/clp_s/timestamp_parser/TimestampParser.cpp` around lines
879 - 885, Remove the stray blank line in the call to
append_positive_left_padded_integer wrapped by YSTDLIB_ERROR_HANDLING_TRYV so
the argument list is contiguous (match the style used in
marshal_date_time_timestamp); specifically, edit the call that passes
subsecond_nanoseconds, 9, '0', subsecond_nanoseconds_str to remove the empty
line between subsecond_nanoseconds and 9 so the macro invocation and its
arguments are formatted consistently.
🧹 Nitpick | 🔵 Trivial
Stray blank line inside the argument list.
Line 881 is an empty line between subsecond_nanoseconds, and 9, inside the append_positive_left_padded_integer call in the 'T' case of marshal_numeric_timestamp. The identical call in marshal_date_time_timestamp (lines 757–762) has no such blank line.
♻️ Proposed fix
YSTDLIB_ERROR_HANDLING_TRYV(append_positive_left_padded_integer(
subsecond_nanoseconds,
-
9,
'0',
subsecond_nanoseconds_str
));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/core/src/clp_s/timestamp_parser/TimestampParser.cpp` around lines
879 - 885, Remove the stray empty line inside the argument list of the
append_positive_left_padded_integer call in the 'T' case of
marshal_numeric_timestamp: locate the call to
append_positive_left_padded_integer with symbols subsecond_nanoseconds, 9, '0',
and subsecond_nanoseconds_str and ensure there is no blank line between the
first argument (subsecond_nanoseconds) and the next argument (9) so the argument
list matches the formatting used in marshal_date_time_timestamp.
| if (auto err{std::make_error_code(ec)}; err) { | ||
| return err; | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
return err propagates a std::generic_category error, inconsistent with the project's ErrorCode category.
All other error returns in this file use return ErrorCode{ErrorCodeEnum::...}. The err object here is a std::error_code with std::generic_category (from std::errc). While this code path is unreachable in practice (a correctly-sized buffer never causes integer to_chars to fail), if it ever is reached the propagated error will not belong to the project's error category.
♻️ Proposed fix
- if (auto err{std::make_error_code(ec)}; err) {
- return err;
- }
+ if (std::errc{} != ec) {
+ return ErrorCode{ErrorCodeEnum::InvalidDate};
+ }This also removes the need for #include <system_error>.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/core/src/clp_s/timestamp_parser/TimestampParser.cpp` around lines
957 - 959, The current early return uses a std::error_code named err (created
from ec) which yields std::generic_category; instead return the project
ErrorCode type using the appropriate ErrorCodeEnum value used elsewhere in this
file (replace "return err;" with "return
ErrorCode{ErrorCodeEnum::<appropriate_enum>}" so the error belongs to the
project's category), and remove the now-unnecessary `#include` <system_error>;
locate the occurrence in TimestampParser.cpp around the to_chars handling where
err is declared and mirror the enum choice used by other returns in this file.
🧹 Nitpick | 🔵 Trivial
return err propagates a std::generic_category error code, inconsistent with the project's ErrorCode category.
All other error returns in this file use return ErrorCode{ErrorCodeEnum::...}. Here, err is a std::error_code with std::generic_category (from std::errc). While this path is unreachable in practice (a correctly-sized buffer never causes to_chars to fail for integers), if ever reached the propagated error will not belong to the project's error category, which could confuse callers inspecting the error category.
Consider mapping to a project-specific code instead:
♻️ Proposed fix
- if (auto err{std::make_error_code(ec)}; err) {
- return err;
- }
+ if (std::errc{} != ec) {
+ return ErrorCode{ErrorCodeEnum::InvalidDate};
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/core/src/clp_s/timestamp_parser/TimestampParser.cpp` around lines
957 - 959, The code currently returns a std::error_code (err) which uses
std::generic_category; replace that with the project-specific ErrorCode so the
returned error belongs to the project's category. In the block with variables
ec/err (the to_chars failure path), map the std::error_code to the appropriate
ErrorCode by returning ErrorCode{ErrorCodeEnum::<appropriate_enum>} (pick the
closest existing enum such as ParseError/InvalidTimestamp/UnexpectedFailure used
elsewhere in this file) instead of returning err, so all error returns in
TimestampParser.cpp use the project ErrorCode type and category.
LinZhihao-723
left a comment
There was a problem hiding this comment.
Two more nitpicks (one of them was missed by my previous suggested code).
Otherwise we're good to go.
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
There was a problem hiding this comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@components/core/src/clp_s/timestamp_parser/TimestampParser.cpp`:
- Around line 942-947: Change the append_positive_left_padded_integer function
signature to accept a 64-bit integer type (e.g., int64_t) instead of int to
avoid implicit narrowing at its call sites (calls that pass
time_of_day.hours().count(), .minutes().count(), .seconds().count(),
.subseconds().count(), and extract_absolute_subsecond_nanoseconds which return
long long/int64_t). Update the parameter type in the function
declaration/definition (append_positive_left_padded_integer) and any internal
uses to the new int64_t type so callers no longer trigger narrowing or
-Wconversion warnings.
- Around line 880-886: Remove the stray blank line inside the
YSTDLIB_ERROR_HANDLING_TRYV call to append_positive_left_padded_integer: ensure
the arguments (subsecond_nanoseconds, 9, '0', subsecond_nanoseconds_str) are
adjacent as in the marshal_date_time_timestamp example; update the call using
the same argument formatting so the macro invocation and parameters
(append_positive_left_padded_integer, subsecond_nanoseconds, 9, '0',
subsecond_nanoseconds_str) have no extra blank line between them.
- Around line 958-960: The current code returns a std::error_code created from
ec (std::generic_category) which is inconsistent with the project's ErrorCode;
replace the `if (auto const err{std::make_error_code(ec)}; err) { return err; }`
branch to return the project's ErrorCode type instead (e.g., `return
ErrorCode{ErrorCodeEnum::...}`) using the same ErrorCodeEnum value used for
other to_chars/parse failures in this file (look for existing `return
ErrorCode{ErrorCodeEnum::...}` usages in TimestampParser.cpp to pick the correct
enum variant), and remove the now-unnecessary `#include <system_error>`.
- Line 952: Replace the non-standard GCC extension typeof with the standard C++
decltype for the numeric_limits instantiation: change the array declaration that
references std::numeric_limits<typeof(value)>::digits10 to use
std::numeric_limits<decltype(value)>::digits10 (i.e., update the expression in
the std::array<> template parameter for value_str_buffer in
TimestampParser.cpp).
…nteger fields (resolves y-scope#1968). (y-scope#2013) Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Description
This PR significantly improves decompression speed for padded integer fields in timestamps, leading to much higher decompression speeds overall. As identified in #1968 most of the overhead for timestamp marshalling was coming from
fmt::format()calls, which appeared to be dynamically parsing the format strings on each call.Decompression speedup for open source datasets compared to decompression speed without this optimization:
Checklist
breaking change.
Validation performed
Summary by CodeRabbit
Refactor
Bug Fixes