Skip to content

Conversation

@a10y
Copy link
Contributor

@a10y a10y commented Jan 14, 2026

A user report came in with the following stacktrace:

                 at /usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-array-56.2.0/src/builder/generic_bytes_view_builder.rs:176:9
      12: <arrow_array::array::byte_view_array::GenericByteViewArray<V> as core::convert::From<&arrow_array::array::byte_array::GenericByteArray<FROM>>>::from
                 at /usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-array-56.2.0/src/array/byte_view_array.rs:913:39
      13: vortex_array::arrays::varbin::vtable::canonical::<impl vortex_array::vtable::canonical::CanonicalVTable<vortex_array::arrays::varbin::vtable::VarBinVTable> for vortex_array::arrays::varbin::vtable::VarBinVTable>::canonicalize
                 at /usr/local/cargo/git/checkouts/vortex-e8ac85adeb77362c/f1b2ae9/vortex-array/src/arrays/varbin/vtable/canonical.rs:38:26
      14: <vortex_array::array::ArrayAdapter<V> as vortex_array::array::Array>::to_canonical
                 at /usr/local/cargo/git/checkouts/vortex-e8ac85adeb77362c/f1b2ae9/vortex-array/src/array/mod.rs:609:25
      15: <A as vortex_array::canonical::ToCanonical>::to_varbinview
                 at /usr/local/cargo/git/checkouts/vortex-e8ac85adeb77362c/f1b2ae9/vortex-array/src/canonical.rs:374:14
      16: <vortex_array::builders::varbinview::VarBinViewBuilder as vortex_array::builders::ArrayBuilder>::extend_from_array_unchecked
                 at /usr/local/cargo/git/checkouts/vortex-e8ac85adeb77362c/f1b2ae9/vortex-array/src/builders/varbinview.rs:278:27
      17: <vortex_array::builders::fixed_size_list::FixedSizeListBuilder as vortex_array::builders::ArrayBuilder>::extend_from_array_unchecked
                 at /usr/local/cargo/git/checkouts/vortex-e8ac85adeb77362c/f1b2ae9/vortex-array/src/builders/fixed_size_list.rs:243:31
      18: vortex_array::stats::array::StatsSetRef::compute_stat
                 at /usr/local/cargo/git/checkouts/vortex-e8ac85adeb77362c/f1b2ae9/vortex-array/src/stats/array.rs:177:29
      19: vortex_array::stats::array::StatsSetRef::compute_all
                 at /usr/local/cargo/git/checkouts/vortex-e8ac85adeb77362c/f1b2ae9/vortex-array/src/stats/array.rs:200:35
      20: <vortex_layout::layouts::compressed::CompressingStrategy as vortex_layout::strategy::LayoutStrategy>::write_stream::{{closure}}::{{closure}}::{{closure}}
                 at /usr/local/cargo/git/checkouts/vortex-e8ac85adeb77362c/f1b2ae9/vortex-layout/src/layouts/compressed.rs:144:26

The failure is inside of arrow-rs, in its conversion from LargeStringArray -> StringViewArray.

Arrow checks that the final offset is < i32::MAX, and then pushes the buffer. This will not work if we're trying to convert a large VarBinArray that's been sliced, so that the final offset is not the maximum size of the buffer.

Upstream arrow-rs should get a fix as well, but in the meantime, this should fix the behavior of compressing large chunks, by rezeroing offsets before we canonicalize VarBin.

Comment on lines 101 to 112
fn test_massive() {
// Attempt to convert a really large dataset to Arrow.
let strings = VarBinArray::from_iter_nonnull(
["1234567890123"].iter().cycle().take(500_000_000),
DType::Utf8(Nullability::NonNullable),
);

let sliced = strings.slice(0..5);

let vbv = sliced.to_varbinview();
assert_eq!(vbv.len(), 5);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can't check this in because it's too slow, but this was failing before and now it's not (you actually need to update VarBinArray::from_iter_nonnull to use u64 offsets to make it pass)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we not use the slow run on post commit?

@a10y a10y added the fix label Jan 14, 2026
@a10y a10y force-pushed the varbin-arrow-fix branch from 9ab1f44 to c5d63c5 Compare January 14, 2026 23:17
@codspeed-hq
Copy link

codspeed-hq bot commented Jan 14, 2026

CodSpeed Performance Report

Merging this PR will not alter performance

Comparing varbin-arrow-fix (85b569c) with develop (4bbafe7)

Summary

✅ 1254 untouched benchmarks
⏩ 1254 skipped benchmarks1

Footnotes

  1. 1254 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@a10y a10y force-pushed the varbin-arrow-fix branch from c5d63c5 to ca83938 Compare January 15, 2026 14:36
@a10y a10y force-pushed the varbin-arrow-fix branch 2 times, most recently from fd32064 to f42a231 Compare January 15, 2026 22:28
@a10y a10y enabled auto-merge (squash) January 15, 2026 22:28
@codecov
Copy link

codecov bot commented Jan 15, 2026

Codecov Report

❌ Patch coverage is 85.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.88%. Comparing base (4bbafe7) to head (85b569c).

Files with missing lines Patch % Lines
vortex-array/src/arrays/varbin/array.rs 83.33% 3 Missing ⚠️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@a10y a10y force-pushed the varbin-arrow-fix branch from f42a231 to 85b569c Compare January 15, 2026 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants