Skip to content

Upgrade arrow2/convert and use native buffers for the tensor u8 types#1375

Merged
jleibs merged 2 commits intomainfrom
jleibs/buffer_for_u8_tensor
May 9, 2023
Merged

Upgrade arrow2/convert and use native buffers for the tensor u8 types#1375
jleibs merged 2 commits intomainfrom
jleibs/buffer_for_u8_tensor

Conversation

@jleibs
Copy link
Copy Markdown
Contributor

@jleibs jleibs commented Feb 22, 2023

Depends on upstream: DataEngineeringLabs/arrow2-convert#103

Using Buffer types here should keep us from making a copy on deserialize.

TODO:

  • Show this works based on memory profiling.

Checklist

  • I have read and agree to Contributor Guide and the Code of Conduct
  • I've included a screenshot or gif (if applicable)
  • I've added a line to CHANGELOG.md (if this is a big enough change to warrant it)

@jleibs jleibs marked this pull request as ready for review February 22, 2023 01:34
@emilk emilk marked this pull request as draft February 22, 2023 13:20
@emilk
Copy link
Copy Markdown
Member

emilk commented Feb 22, 2023

Converted to draft until we have an arrow2_convert release

@jleibs jleibs force-pushed the jleibs/buffer_for_u8_tensor branch from 902bb08 to 92b759c Compare February 22, 2023 16:28
@jleibs jleibs added 🏹 arrow Apache Arrow 📉 performance Optimization, memory use, etc labels Feb 22, 2023
@emilk
Copy link
Copy Markdown
Member

emilk commented Feb 24, 2023

Sadly, this does not help with the memory duplication of tensors.
This can be seen with RERUN_TRACK_ALLOCATIONS=1 cargo r -p test_image_memory:

image

We see that the archive step still creates a complete copy of the memory.

The only real solution to this is:

@Wumpf
Copy link
Copy Markdown
Member

Wumpf commented Apr 19, 2023

@jleibs do we have the necessary arrow2convert version by now?

@emilk
Copy link
Copy Markdown
Member

emilk commented Apr 19, 2023

No; there hasn't been a new arrow2-convert for over two months. It is time we fork arrow2-convert and make our own version of it.

@jleibs jleibs force-pushed the jleibs/buffer_for_u8_tensor branch from 755b81a to 90f51b6 Compare May 9, 2023 12:36
@jleibs jleibs removed the blocked can't make progress right now label May 9, 2023
@jleibs jleibs marked this pull request as ready for review May 9, 2023 12:36
@jleibs jleibs changed the title Use buffers for the tensor u8 types Upgrade arrow2/convert and use native buffers for the tensor u8 types May 9, 2023
@jleibs jleibs merged commit ba5e177 into main May 9, 2023
@jleibs jleibs deleted the jleibs/buffer_for_u8_tensor branch May 9, 2023 13:43
jprochazk pushed a commit that referenced this pull request May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🏹 arrow Apache Arrow 📉 performance Optimization, memory use, etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants