Skip to content

arrow2's estimated_size_bytes performance issues #1738

@teh-cmc

Description

@teh-cmc

The problem is so severe that it is pretty much impossible to use it anywhere in the store without everything slowing down to a crawl, which makes implementing statistics really, really painful.

Even for incremental measurements this is way too slow, orders of magnitudes slower than everything else on the write path.

See #1743 for detailed benchmarks.


If we cannot optimize it any further, my proposal is to move the problem upstream: compute byte sizes within the batching system, therefore:

  1. distributing the load to the clients
  2. making its cost irrelevant, since batching happens on a separate thread

Metadata

Metadata

Assignees

Labels

🏹 arrowApache Arrow📉 performanceOptimization, memory use, etc😤 annoyingSomething in the UI / SDK is annoying to use

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions