Skip to content

arrow2 erased refcounted clones performance issues #1746

@teh-cmc

Description

@teh-cmc

That one is very bad, this is obviously something we do all over the place.

Background:

  • Basically all arrow Arrays are Buffers when you reach the actual physical type (which can be quite deep because all our components are compound types etc)
  • Buffer is literally an Arc over some bytes, so cheap to clone
  • But then we carry around erased arrow arrays, i.e. trait objects: Box<dyn Array>
  • Trait objects cannot implement Clone because they literally dont know their size/layout at compile time
  • But there's a hack for that
  • All of the above is what I mean when I say "erased refcounted clone"

See #1745 for detailed benchmarks.

This is yet another example where fixing the issue once in DataCell and then using DataCell everywhere would be nice...

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions