GH-39402: [C++] bit_util TrailingBits faster by Hattonuri · Pull Request #39403 · apache/arrow

Hattonuri · 2024-01-01T13:43:00Z

Rationale for this change

TrailingBits operation is called on every read operation in parquets. And it takes significant amount of time from reading levels

My change implements the same functionality but faster

What changes are included in this PR?

Are these changes tested?

https://quick-bench.com/q/3IbTOnH4rShshgE7pwcX6dCbJuY

https://godbolt.org/z/K6YToW7x3

And also i tested the same for-loop(but with higher upper limit) with Checking for equality

Are there any user-facing changes?

Closes: [C++] bit_util TrailingBits can be made much faster #39402

github-actions · 2024-01-01T13:43:23Z

⚠️ GitHub issue #39402 has been automatically assigned in GitHub to PR creator.

mapleFU

Thanks @Hattonuri and wish you have a happy new year!

From your godbolt optimizations, seems the current impl might enable bzhi when during the current implemention. And the second is doing

The current:

        mov     rax, rdi
        bzhi    rax, rax, rsi
        ret

Your impl:

        shlx    rax, rax, rsi
        not     rax
        and     rax, rdi
        ret

I'm not an expert on this. Would it be faster in some case? And maybe we should also testing this on ARM @cyb70289

mapleFU · 2024-01-01T14:17:23Z

cpp/src/arrow/util/bit_util.h


 // Returns the 'num_bits' least-significant bits of 'v'.
 static inline uint64_t TrailingBits(uint64_t v, int num_bits) {
-  if (ARROW_PREDICT_FALSE(num_bits == 0)) return 0;


(so generally, checks num_bits == 0 is not highly related to the optimization?

i think related, because ((v >> 0) << 0) ^ v == v ^ v == 0

and i think that is the main reason for performance increase

and i think that is the main reason for performance increase

As I benchmark. Without -march=skylake, removing the ARROW_PREDICT_FALSE(num_bits == 0) would not make it faster or slower( seems quickbench runs faster because shr is slow?). And as mentioned #39403 (comment) . It might affect the command generation later. Would you mind testing that?

mapleFU · 2024-01-01T14:34:44Z

I found a similar logic in snappy: https://github.com/google/snappy/blob/main/snappy.cc#L1008

Would we do the similar thing? (Also cc @pitrou because this might bmi2 related?)

pitrou · 2024-01-01T14:48:52Z

Please, can you post actual benchmarks of reading Parquet files? Micro-benchmarks of a tiny helper function are not that interesting.

pitrou · 2024-01-01T14:49:02Z

@ursabot please benchmark

ursabot · 2024-01-01T14:49:10Z

Benchmark runs are scheduled for commit 7f736fb. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

mapleFU · 2024-01-01T15:28:56Z

I found with most compilers TrailingBits3 are just equal to TrailingBits2.

uint64_t TrailingBits2(uint64_t v, int num_bits) {
  if (__builtin_expect(num_bits >= 64, 0)) return v;
  return ((v >> num_bits) << num_bits) ^ v;
}

uint64_t TrailingBits3(uint64_t v, int num_bits) {
  if (__builtin_expect(num_bits >= 64, 0))
    return v;
  uint64_t mask = 0xffffffffffffffff;
  return v & ~(mask << num_bits);
}

With -O3/-O2 and without -march=skylake, the generated code is like https://quick-bench.com/q/3IbTOnH4rShshgE7pwcX6dCbJuY . The generated code is:

Current:

        neg     cl
        shl     rax, cl
        shr     rax, cl

after:

        not     rdx
        mov     rax, rdx
        and     rax, rdi

Would you mind test which is faster with a CPU with avx2 and bmi2 enabled? @Hattonuri

After I try them they generate same code with clang17.0.1 and argument -std=c++20 -O3 -march=skylake. The differences is listed here: #39403 (review) . The current code might uses bzhi because it matches (d) rules here: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp#L3710-L3716

mapleFU · 2024-01-01T15:52:17Z

Oh I think I've found the reason...

uint64_t TrailingBits2(uint64_t v, int num_bits) {
  if (__builtin_expect(num_bits == 0, 0)) return 0;
  if (__builtin_expect(num_bits >= 64, 0)) return v;
  return ((v >> num_bits) << num_bits) ^ v;
}

Also generate bzhi. Seems we need rethink the problem here: #39403 (comment)

Hattonuri · 2024-01-01T16:57:08Z

About benchmarks

Tested my program in which parquet library take ~60% of time.
Data is stored on ram fs to exclude any disk problems. And made some runs before to exclude things like "binary code is warmed in cache"
First is with my PR, second is what it was
We can see that on average we get -1sec in user time. which is 2% performance increase.
And 2% / 0.6 ~ 3.3% performance increase in parquet library

dstasenko@bench-prod15:~$ for i in seq 5; do echo $i; time ./parquet_playground ; done
1

real 0m46.382s
user 0m46.185s
sys 0m0.180s
2

real 0m46.416s
user 0m46.120s
sys 0m0.276s
3

real 0m46.711s
user 0m46.460s
sys 0m0.232s
4

real 0m46.406s
user 0m46.185s
sys 0m0.200s
5

real 0m46.369s
user 0m46.150s
sys 0m0.200s
dstasenko@bench-prod15:~$ for i in seq 5; do echo $i; time ./parquet_playground2 ; done
1

real 0m47.161s
user 0m46.967s
sys 0m0.176s
2

real 0m47.344s
user 0m47.083s
sys 0m0.244s
3

real 0m47.233s
user 0m47.009s
sys 0m0.204s
4

real 0m47.300s
user 0m47.060s
sys 0m0.240s
5

real 0m48.072s
user 0m47.109s
sys 0m0.944s

Hattonuri · 2024-01-01T18:04:09Z

I tested variant with mask 0xffffffffffffffff and saw no difference with mine
But i saw strange thing that on raptorlake none of 3 variants transformed into bzhi but the processor support both avx2 and bmi2
And i don't have skylake benchmark :(
I test on 13th Gen intel core i9

mapleFU · 2024-01-01T18:18:58Z

What did your compiler and instruction like? Maybe I can take a testing tomorrow. I'm mentioning this because I'm afraid this change might cause some vendoring become even slower...

Hattonuri · 2024-01-01T18:33:10Z

I use /usr/bin/twix-clang++-17 -march=raptorlake -O3 -g -fno-omit-frame-pointer -std=c++2a with jemalloc and libstdc++

And this transform into instructions like that
But on master it transforms into two jumps(because of two ifs)

Hattonuri · 2024-01-01T18:34:15Z

I also tried to remove inline word but nothing changed

mapleFU · 2024-01-01T19:53:40Z

Your options is same as here: #39403 (comment) . This is proved to optimize.

I'll try using AVX2 tomorrow.

I also tried to remove inline word but nothing changed

Emmm you can force not inline if you like...

mapleFU · 2024-01-01T20:18:58Z

And this transform into instructions like that
But on master it transforms into two jumps(because of two ifs)

Would you mind add the if (ARROW_PREDICT_FALSE(...) back and test?

conbench-apache-arrow · 2024-01-01T22:54:18Z

Thanks for your patience. Conbench analyzed the 6 benchmarking runs that have been run so far on PR commit 7f736fb.

There were 2 benchmark results indicating a performance regression:

Pull Request Run on ursa-thinkcentre-m75q at 2024-01-01 21:28:19Z
- RoundArrayBenchmark (C++) with params=<Round, Int64Type, RoundMode::DOWN>/size:524288/inverse_null_proportion:0, source=cpp-micro, suite=arrow-compute-scalar-round-benchmark
- CopyShortVector (C++) with params=<SMALL_VECTOR(int)>, source=cpp-micro, suite=arrow-small-vector-benchmark

The full Conbench report has more details.

mapleFU · 2024-01-02T05:02:18Z

Aha since on some machine the performance even got worse...

mapleFU · 2024-01-02T05:04:22Z

static inline uint64_t TrailingBits(uint64_t v, int num_bits) {
  if (ARROW_PREDICT_FALSE(num_bits == 0)) return 0;
  if (ARROW_PREDICT_FALSE(num_bits >= 64)) return v;
  return ((v >> num_bits) << num_bits) ^ v;

Would you mind change to the code above and I'll rerun a benchmark?
Maybe I should wait for Yibo's idea on ARM machine

cyb70289 · 2024-01-02T06:05:07Z

cpp/src/arrow/util/bit_util.h

  if (ARROW_PREDICT_FALSE(num_bits >= 64)) return v;
-  int n = 64 - num_bits;
-  return (v << n) >> n;
+  return ((v >> num_bits) << num_bits) ^ v;


What about return v & ~(-1ULL << num_bits); ?
It enables gcc to optimize the code with bmi2 bzhi.
https://godbolt.org/z/oq9zx4nhf
And looks it's slightly faster even without bmi2.
https://quick-bench.com/q/rgBQzUFls9IP48xm3JiRPtJoI_M

Aha I've tried on clang 17.0.1 and it doesn't generate bzhi, compiler is so hacking...

Link: https://godbolt.org/z/h83oxvoWj

cyb70289 · 2024-01-02T06:07:11Z

2. Maybe I should wait for Yibo's idea on ARM machine

I will try on Arm when have time. This change looks reasonable to me.

Hattonuri · 2024-01-02T17:13:46Z

What do you think about changing if on 64 bits to assertion?

cyb70289 · 2024-01-03T01:48:53Z

What do you think about changing if on 64 bits to assertion?

This changes the code behaviour. I don't think we can do it.

mapleFU · 2024-01-03T05:40:37Z

What do you think about changing if on 64 bits to assertion?

You remind me that might because the port scheduling. Checking might uses same ports with BMI...So in clang they might generate other instruction...

cyb70289 · 2024-01-03T06:03:14Z

We can run llvm-mca tool at godbolt. Looks the non-bzhi code might be better (higher IPC, etc).
https://godbolt.org/z/Kh1PascMs

Hattonuri · 2024-01-03T20:19:43Z

By the way, in gcc on llvm mca https://godbolt.org/z/MKrEsdPxd my variant shows the best "total cycles" and best "IPC" score 🤔

Hattonuri · 2024-01-04T00:31:47Z

As i understood, IPC is higher but we need to compare "instructions" field, because IPC is higher but IPC*Total cycles. Because from two compiled representations of snappy these multiplications is the same. So the difference only on instruction intensiveness.

But why my implementation in non optimized way has less total cycles remains....

mapleFU · 2024-01-04T02:42:31Z

https://www.agner.org/optimize/instruction_tables.pdf
Some instr might be expansive

cyb70289 · 2024-01-04T02:50:46Z

Coming back to this PR. Is there any Arrow benchmark improve after this change? Are the two regressions related?

mapleFU · 2024-01-04T03:22:31Z

(Perhaps not, and via the post https://abseil.io/fast/39 , maybe we should benchmark the user of this function)

Hattonuri · 2024-01-04T13:34:58Z

https://www.agner.org/optimize/instruction_tables.pdf
Some instr might be expansive

I compared total cycles, not total instructions

mapleFU · 2024-01-04T13:59:00Z

perhaps for each ret it do a counting :-(

Hattonuri · 2024-01-11T11:17:19Z

can we merge this?)

mapleFU · 2024-01-11T11:22:45Z

I'm becoming very busy in these few days, maybe you can draft a micro benchmark

Also cc @pitrou

Hattonuri · 2024-01-11T11:46:28Z

I think your ReadLevels benchmark should work fine, because ReadLevels is the function that uses TrailingBits the most on flamegraph in the PR

pitrou · 2024-01-11T15:44:31Z

@Hattonuri Could you rebase on git main?

mapleFU · 2024-02-05T18:06:42Z

#39705

I've merged a pr about readLevels. This patch didn't change benchmark result on my M1 Mac. I'll try it on x86 tomorrow. Would you mind rebase it?

cc @Hattonuri @pitrou

Hattonuri · 2024-02-06T18:45:50Z

Sorry, i forgot about first ping(

pitrou · 2024-02-07T13:45:04Z

This PR decreases performance here (AMD Ryzen 9 3900X, gcc 12.3.0):

before:

--------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                  Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------------------------------------------
ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1              2325 ns         2326 ns       298740 bytes_per_second=6.48372Gi/s items_per_second=3.48092G/s
ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7              8060 ns         8059 ns        87115 bytes_per_second=1.87109Gi/s items_per_second=1.00453G/s
ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1024            668 ns          670 ns      1046591 bytes_per_second=22.5109Gi/s items_per_second=12.0854G/s
ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1              2122 ns         2123 ns       330590 bytes_per_second=7.10169Gi/s items_per_second=3.81269G/s
ReadLevels_Rle/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1              1948 ns         1949 ns       356752 bytes_per_second=7.7368Gi/s items_per_second=4.15367G/s
ReadLevels_Rle/MaxLevel:3/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1              1777 ns         1778 ns       395234 bytes_per_second=8.48193Gi/s items_per_second=4.5537G/s
ReadLevels_Rle/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7              7568 ns         7568 ns        91365 bytes_per_second=1.99251Gi/s items_per_second=1.06972G/s
ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1          1252 ns         1257 ns       560376 bytes_per_second=11.9933Gi/s items_per_second=6.43886G/s
ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7          1257 ns         1263 ns       556189 bytes_per_second=11.9407Gi/s items_per_second=6.41061G/s
ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1024       1316 ns         1322 ns       550602 bytes_per_second=11.4078Gi/s items_per_second=6.12452G/s
ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1          1300 ns         1306 ns       517873 bytes_per_second=11.5467Gi/s items_per_second=6.19908G/s
ReadLevels_BitPack/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1          1312 ns         1318 ns       456593 bytes_per_second=11.4407Gi/s items_per_second=6.14218G/s
ReadLevels_BitPack/MaxLevel:3/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1          1250 ns         1256 ns       562519 bytes_per_second=12.0082Gi/s items_per_second=6.44683G/s
ReadLevels_BitPack/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7          1301 ns         1307 ns       519667 bytes_per_second=11.5369Gi/s items_per_second=6.19383G/s

after:

--------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                  Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------------------------------------------
ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1              2563 ns         2566 ns       273640 bytes_per_second=5.87793Gi/s items_per_second=3.15569G/s
ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7              8134 ns         8135 ns        85865 bytes_per_second=1.85376Gi/s items_per_second=995.23M/s
ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1024            725 ns          726 ns       956867 bytes_per_second=20.7575Gi/s items_per_second=11.1441G/s
ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1              2304 ns         2306 ns       302796 bytes_per_second=6.53916Gi/s items_per_second=3.51068G/s
ReadLevels_Rle/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1              2087 ns         2089 ns       334483 bytes_per_second=7.21913Gi/s items_per_second=3.87574G/s
ReadLevels_Rle/MaxLevel:3/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1              1849 ns         1851 ns       377499 bytes_per_second=8.14629Gi/s items_per_second=4.37351G/s
ReadLevels_Rle/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7              8214 ns         8216 ns        85257 bytes_per_second=1.83534Gi/s items_per_second=985.339M/s
ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1          1370 ns         1372 ns       508348 bytes_per_second=10.9904Gi/s items_per_second=5.90044G/s
ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7          1366 ns         1368 ns       507420 bytes_per_second=11.024Gi/s items_per_second=5.91848G/s
ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1024       1367 ns         1369 ns       510455 bytes_per_second=11.0135Gi/s items_per_second=5.91285G/s
ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1          1377 ns         1379 ns       507504 bytes_per_second=10.9347Gi/s items_per_second=5.87051G/s
ReadLevels_BitPack/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1          1367 ns         1369 ns       511657 bytes_per_second=11.0171Gi/s items_per_second=5.91476G/s
ReadLevels_BitPack/MaxLevel:3/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1          1332 ns         1333 ns       524574 bytes_per_second=11.3119Gi/s items_per_second=6.07305G/s
ReadLevels_BitPack/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7          1373 ns         1374 ns       510605 bytes_per_second=10.9713Gi/s items_per_second=5.89016G/s

pitrou · 2024-02-07T13:46:33Z

However, it seems that another micro-benchmark becomes slightly faster:

before:

---------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------------
BM_DefinitionLevelsToBitmapRepeatedAllMissing        2394 ns         2394 ns       289688 bytes_per_second=815.98Mi/s
BM_DefinitionLevelsToBitmapRepeatedAllPresent        4292 ns         4291 ns       163130 bytes_per_second=455.15Mi/s
BM_DefinitionLevelsToBitmapRepeatedMostPresent       4191 ns         4191 ns       167253 bytes_per_second=466.054Mi/s

after:

---------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------------
BM_DefinitionLevelsToBitmapRepeatedAllMissing        2312 ns         2311 ns       303845 bytes_per_second=845.034Mi/s
BM_DefinitionLevelsToBitmapRepeatedAllPresent        4110 ns         4109 ns       170384 bytes_per_second=475.335Mi/s
BM_DefinitionLevelsToBitmapRepeatedMostPresent       4049 ns         4048 ns       173085 bytes_per_second=482.458Mi/s

pitrou · 2024-02-07T13:47:34Z

In both cases, the difference is rather minor (up to 10% on micro-benchmarks).

pitrou · 2024-02-07T13:47:44Z

@ursabot please benchmark

ursabot · 2024-02-07T13:47:51Z

Benchmark runs are scheduled for commit 4722067. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

mapleFU · 2024-02-07T15:37:03Z

void DefLevelsToBitmap(const int16_t* def_levels, int64_t num_def_levels,
                       LevelInfo level_info, ValidityBitmapInputOutput* output) {
  // It is simpler to rely on rep_level here until PARQUET-1899 is done and the code
  // is deleted in a follow-up release.
  if (level_info.rep_level > 0) {
#if defined(ARROW_HAVE_RUNTIME_BMI2)
    if (CpuInfo::GetInstance()->HasEfficientBmi2()) {
      return DefLevelsToBitmapBmi2WithRepeatedParent(def_levels, num_def_levels,
                                                     level_info, output);
    }
#endif
    standard::DefLevelsToBitmapSimd</*has_repeated_parent=*/true>(
        def_levels, num_def_levels, level_info, output);
  } else {
    standard::DefLevelsToBitmapSimd</*has_repeated_parent=*/false>(
        def_levels, num_def_levels, level_info, output);
  }
}

@pitrou is bmi2 enabled in your DefinitionLevelsToBitmapRepeated benchmark?

pitrou · 2024-02-07T15:50:44Z

On my CPU, it shouldn't, no.

conbench-apache-arrow · 2024-02-07T16:34:47Z

Thanks for your patience. Conbench analyzed the 5 benchmarking runs that have been run so far on PR commit 4722067.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

github-actions · 2025-11-18T11:20:55Z

Thank you for your contribution. Unfortunately, this pull request has been marked as stale because it has had no activity in the past 365 days. Please remove the stale label or comment below, or this PR will be closed in 14 days. Feel free to re-open this if it has been closed in error. If you do not have repository permissions to reopen the PR, please tag a maintainer.

:

7f736fb

github-actions bot added Component: C++ awaiting review Awaiting review labels Jan 1, 2024

mapleFU reviewed Jan 1, 2024

View reviewed changes

github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jan 1, 2024

cyb70289 reviewed Jan 2, 2024

View reviewed changes

Hattonuri mentioned this pull request Jan 7, 2024

GH-39398: [C++][Parquet] Use std::count in ColumnReader ReadLevels #39397

Merged

Merge branch 'main' into faster_trailing_bits

4722067

github-actions bot added the Status: stale-warning Issues and PRs flagged as stale which are due to be closed if no indication otherwise label Nov 18, 2025

github-actions bot closed this Jan 1, 2026

Conversation

Hattonuri commented Jan 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions bot commented Jan 1, 2024

Uh oh!

mapleFU left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mapleFU Jan 1, 2024

Choose a reason for hiding this comment

Uh oh!

Hattonuri Jan 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Hattonuri Jan 1, 2024

Choose a reason for hiding this comment

Uh oh!

mapleFU Jan 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mapleFU commented Jan 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pitrou commented Jan 1, 2024

Uh oh!

pitrou commented Jan 1, 2024

Uh oh!

ursabot commented Jan 1, 2024

Uh oh!

mapleFU commented Jan 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mapleFU commented Jan 1, 2024

Uh oh!

Hattonuri commented Jan 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Hattonuri commented Jan 1, 2024

Uh oh!

mapleFU commented Jan 1, 2024

Uh oh!

Hattonuri commented Jan 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Hattonuri commented Jan 1, 2024

Uh oh!

mapleFU commented Jan 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mapleFU commented Jan 1, 2024

Uh oh!

conbench-apache-arrow bot commented Jan 1, 2024

Uh oh!

mapleFU commented Jan 2, 2024

Uh oh!

mapleFU commented Jan 2, 2024

Uh oh!

cyb70289 Jan 2, 2024

Choose a reason for hiding this comment

Uh oh!

mapleFU Jan 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cyb70289 commented Jan 2, 2024

Uh oh!

Hattonuri commented Jan 2, 2024

Uh oh!

cyb70289 commented Jan 3, 2024

Uh oh!

mapleFU commented Jan 3, 2024

Uh oh!

Hattonuri commented Jan 1, 2024 •

edited

Loading

mapleFU left a comment •

edited

Loading

Hattonuri Jan 1, 2024 •

edited

Loading

mapleFU Jan 1, 2024 •

edited

Loading

mapleFU commented Jan 1, 2024 •

edited

Loading

mapleFU commented Jan 1, 2024 •

edited

Loading

Hattonuri commented Jan 1, 2024 •

edited

Loading

Hattonuri commented Jan 1, 2024 •

edited

Loading

mapleFU commented Jan 1, 2024 •

edited

Loading

mapleFU Jan 2, 2024 •

edited

Loading

Hattonuri commented Jan 3, 2024 •

edited

Loading

mapleFU commented Feb 7, 2024 •

edited

Loading