Conversation
byteSwap accepts an integer `operand` and returns the integer which is obtained by swapping the **endianness** of `operand` i.e. reversing the bytes of the `operand`. Issue: ClickHouse#54734
UInt[8|16|32|64] TODOs: - Improve NOT_IMPLEMENTED error message. - Add implementation for FixedStrings (reverse the bytes). - See whether this needs to be implemented for UInt[128|256] and signed integers as well.
|
Hello, I've added an implementation for the I have the following queries with regards to this feature:
|
| template <typename T> | ||
| inline T byteSwap(T) | ||
| { | ||
| throw Exception(ErrorCodes::NOT_IMPLEMENTED, "byteSwap() is not implemented for {} datatype", demangle(typeid(T).name())); |
There was a problem hiding this comment.
Right now for anything greater than or equal to 2^64, this prints the following error:
SELECT byteSwap(18446744073709552000.)
Query id: 7e644036-8f5a-42ac-be5e-ee1f2286c46e
0 rows in set. Elapsed: 0.005 sec.
Received exception from server (version 23.9.1):
Code: 48. DB::Exception: Received from localhost:9000. DB::Exception: byteSwap() is not implemented for double datatype: While processing byteSwap(18446744073709552000.). (NOT_IMPLEMENTED)
I'd love to know if there's a way to get better type names.
There was a problem hiding this comment.
Here's my attempt at an implementation for UInt128:
template <typename T>
requires std::is_same_v<T, UInt128>
inline T byteSwap(T x)
{
UInt64 lower_half = x & 0xFFFFFFFFFFFFFFFF;
UInt64 upper_half = (x >> 64) & 0xFFFFFFFFFFFFFFFF;
UInt64 swapped_lower_half = __builtin_bswap64(lower_half);
UInt64 swapped_upper_half = __builtin_bswap64(upper_half);
UInt128 new_upper_half = static_cast<UInt128> (swapped_lower_half) << 64;
UInt128 new_lower_half = static_cast<UInt128> (swapped_upper_half);
return new_upper_half | new_lower_half;
}
But I'm not able to test the same. As soon as I go over (2^64-1), for some reason, it doesn't reach the implementation. Also notice that the last four digits of the input change (1616 -> 2000.). Any reasons why this could be happening?
clickhouse-400817.internal :) SELECT byteSwap(18446744073709551616);
SELECT byteSwap(18446744073709552000.)
Query id: 0743873e-f48b-423f-a160-1b160f5b25c3
0 rows in set. Elapsed: 0.125 sec.
Received exception from server (version 23.9.1):
Code: 48. DB::Exception: Received from localhost:9000. DB::Exception: byteSwap() is not implemented for double datatype: While processing byteSwap(18446744073709552000.). (NOT_IMPLEMENTED)
There was a problem hiding this comment.
But I'm not able to test the same. As soon as I go over (2^64-1), for some reason, it doesn't reach the implementation. Also notice that the last four digits of the input change (1616 -> 2000.). Any reasons why this could be happening?
Because the example is interpreted as float:
SELECT toTypeName(18446744073709551616);
┌─toTypeName(18446744073709552000.)─┐
│ Float64 │
└───────────────────────────────────┘Try: SELECT byteSwap(18446744073709551616::UInt128);
I wonder why byteSwap on UInt64 and UInt128 operates on 2x8, respectively 4x8 bytes instead of 1x16 / 1x32 bytes? To swap all bytes uniformly, you could use reverseMemcpy() (base/base/unaligned.h).
|
This is an automated comment for commit b7936cb with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page Successful checks
|
tests/queries/0_stateless/02415_all_new_functions_must_be_documented.reference
Outdated
Show resolved
Hide resolved
Let's try.
It's ok to make this function considered injective. The injectiveness property is used to eliminate function application, e.g. from GROUP BY. It means that only the domain of one data type is relevant. |
- Consider byteswap injective. - Make function case-insensitive. - Add in-code documentation and copy-paste it to the markdown docs.
- Also allow signed ints now because std::byteswap accepts them. - Fix for style check.
This comment was marked as outdated.
This comment was marked as outdated.
Also: - Add comments in tests. - Add an example in docs where an IPv4 is casted to an int, byteswapped and then casted back to an IPv4.
Co-authored-by: Priyansh Agrawal <agrawal.priyansh@yahoo.in>
Co-authored-by: Priyansh Agrawal <agrawal.priyansh@yahoo.in>
|
@rschu1ze @alexey-milovidov, thanks for the merge! Hoping to make many more contributions in the future. |
Fixes #54734.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
byteSwapwhich reverses the bytes of unsigned integers. This is particularly useful for reversing values of types which are represented as unsigned integers internally such as IPv4.Documentation entry for user-facing changes