Skip to content

Conversation

@teh-cmc
Copy link
Member

@teh-cmc teh-cmc commented Oct 14, 2025

There are 3 problems involved:

  1. ArrowMsg::uncompressed_size is defined as an int32. That issue speaks for itself.
  2. The encoding path tracks payload sizes using usize, and then silently cast them to int32 when encoding the Protobuf data, leading to gibberish and crashes during decoding.
  3. lz4_flex has similar casting bugs: it can encode 64bit values but cannot decode them. Once again the as keyword is involved.

This PR fixes 1 and 2. All the changes are backward and forward compatible because:

  • In protobuf, uint64 is a strict superset of int32: both are unsigned varints on the wire.
  • There exist no valid RRD files out there with negative values for uncompressed_size.

We can already log much bigger chunks than before, but not too big, otherwise lz4 itself will during decoding:

thread 'decompress' panicked at 'attempt to add with overflow'
lz4_flex-0.11.5/src/block/decompress_safe.rs:39
stack backtrace:
   6: core::panicking::panic_fmt
             at core/src/panicking.rs:75:14
   7: core::panicking::panic_const::panic_const_add_overflow
             at core/src/panicking.rs:175:17
   8: std::sys::backtrace::__rust_begin_short_backtrace
   9: core::ops::function::FnOnce::call_once{{vtable.shim}}
  10: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at alloc/src/boxed.rs:1966:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at alloc/src/boxed.rs:1966:9
      std::sys::pal::unix::thread::Thread::new::thread_start
             at std/src/sys/pal/unix/thread.rs:97:17
  11: <unknown>
  12: <unknown>

I'll have a follow-up PR for the lz4_flex issue.



Checks:

  • rerun with --nb_scans 2000
  • rerun rrd stats with --nb_scans 2000
  • rerun with --nb_scans 3000 -> bug in lz4
  • rerun rrd stats with --nb_scans 3000 -> bug in lz4
  • All dataplatform test suites pass
  • Can load old RRDs no problem

@github-actions
Copy link

github-actions bot commented Oct 14, 2025

Web viewer built successfully.

Result Commit Link Manifest
d84b0f2 https://rerun.io/viewer/pr/11525 +nightly +main

Note: This comment is updated whenever you push a commit.

@teh-cmc teh-cmc added 🪳 bug Something isn't working include in changelog 🪵 Log & send APIs Affects the user-facing API for all languages labels Oct 14, 2025
}
crate::codec::Compression::LZ4 => re_protos::log_msg::v1alpha1::Compression::Lz4 as i32,
},
uncompressed_size: payload.uncompressed_size as i32,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original bug: Payload::uncompressed_size was a usize that could definitely grow beyond i32::MAX, so that silenced cast leads to disaster.

@teh-cmc teh-cmc force-pushed the cmc/arrowmsg_32bit_overflow branch from 9393571 to d84b0f2 Compare October 14, 2025 10:42
@rerun-io rerun-io deleted a comment from github-actions bot Oct 14, 2025
@teh-cmc
Copy link
Member Author

teh-cmc commented Oct 14, 2025

@rerun-bot full-check

@teh-cmc teh-cmc marked this pull request as ready for review October 14, 2025 10:42
@github-actions
Copy link

Started a full build: https://github.com/rerun-io/rerun/actions/runs/18493924895

@emilk
Copy link
Member

emilk commented Oct 14, 2025

Compression compression = 2;

int32 uncompressed_size = 3;
uint64 uncompressed_size = 3;
Copy link
Member

@emilk emilk Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a breaking change? If so, can we make it a non-breaking change by adding a new field instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, see PR description


// Being able to log fast isn't particularly useful if the data happens to be corrupt at the
// other end, so make sure we can encode/decode everything that was logged.
if check && let Some(storage) = storage {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this run on CI?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't believe any of this stuff runs on CI. Maybe I'll have a look in a follow-up.

@teh-cmc
Copy link
Member Author

teh-cmc commented Oct 14, 2025

merge & pray

@teh-cmc teh-cmc merged commit 19b350c into main Oct 14, 2025
97 of 99 checks passed
@teh-cmc teh-cmc deleted the cmc/arrowmsg_32bit_overflow branch October 14, 2025 12:07
oxkitsune pushed a commit that referenced this pull request Oct 14, 2025
There are 3 problems involved:
1. `ArrowMsg::uncompressed_size` is defined as an `int32`. That issue
speaks for itself.
2. The encoding path tracks payload sizes using `usize`, and then
silently cast them to `int32` when encoding the Protobuf data, leading
to gibberish and crashes during decoding.
3. `lz4_flex` has similar casting bugs: it can encode 64bit values but
cannot decode them. Once again the `as` keyword is involved.

This PR fixes 1 and 2. All the changes are backward and forward
compatible because:
* In protobuf, `uint64` is a strict superset of `int32`: both are
unsigned varints on the wire.
* There exist no valid RRD files out there with negative values for
`uncompressed_size`.

We can already log much bigger chunks than before, but not too big,
otherwise `lz4` itself will during decoding:
```
thread 'decompress' panicked at 'attempt to add with overflow'
lz4_flex-0.11.5/src/block/decompress_safe.rs:39
stack backtrace:
   6: core::panicking::panic_fmt
             at core/src/panicking.rs:75:14
   7: core::panicking::panic_const::panic_const_add_overflow
             at core/src/panicking.rs:175:17
   8: std::sys::backtrace::__rust_begin_short_backtrace
   9: core::ops::function::FnOnce::call_once{{vtable.shim}}
  10: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at alloc/src/boxed.rs:1966:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at alloc/src/boxed.rs:1966:9
      std::sys::pal::unix::thread::Thread::new::thread_start
             at std/src/sys/pal/unix/thread.rs:97:17
  11: <unknown>
  12: <unknown>
```

I'll have a follow-up PR for the `lz4_flex` issue.

---

* Half of fixing #11516
  * The other half requires patching `lz4_flex` (on it)

---

Checks:
* [x] `rerun` with `--nb_scans 2000`
* [x] `rerun rrd stats` with `--nb_scans 2000`
* [x] ⛔ `rerun` with `--nb_scans 3000` -> bug in lz4
* [x] ⛔ `rerun rrd stats` with `--nb_scans 3000` -> bug in lz4
* [x] All dataplatform test suites pass
* [x] Can load old RRDs no problem
teh-cmc added a commit that referenced this pull request Nov 12, 2025
This is a follow up to #11525. It
fixes the remaining half of
#11516.

We don't want to merge it as is yet. Let's see if we can get the fix
merged upstream first, so we don't have to depend on yet another fork:
* PSeitz/lz4_flex#192

---

* Fixes #11516

---

Checks:
* [x] `rerun` with `--nb_scans 3000` now works
* [x] `rerun rrd stats` with `--nb_scans 3000` now works
IsseW pushed a commit that referenced this pull request Nov 12, 2025
This is a follow up to #11525. It
fixes the remaining half of
#11516.

We don't want to merge it as is yet. Let's see if we can get the fix
merged upstream first, so we don't have to depend on yet another fork:
* PSeitz/lz4_flex#192

---

* Fixes #11516

---

Checks:
* [x] `rerun` with `--nb_scans 3000` now works
* [x] `rerun rrd stats` with `--nb_scans 3000` now works
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🪳 bug Something isn't working include in changelog 🪵 Log & send APIs Affects the user-facing API for all languages

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants