feat: add to_/from_safetensors by pfackeldey · Pull Request #3685 · scikit-hep/awkward

pfackeldey · 2025-10-17T12:06:53Z

This PR adds to and from safetensors conversions. They're extremely fast at the cost of file size because they to not include any compression. The idea is that all buffers are saved as a long sequence of uncompressed bytes along with metadata that remembers where each buffers starts and stops (similar to an awkward array). Loading it mmaps the file and accessing individual buffers loads only the corresponding slice into memory. This is basically what zarr does but with a dynamic chunk size instead of a static one (which is good for us, because we don't have rectangular arrays) and when one turns off compression.

codecov · 2025-10-17T12:11:53Z

Codecov Report

❌ Patch coverage is 87.27273% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.70%. Comparing base (b749e49) to head (03f2e73).
⚠️ Report is 447 commits behind head on main.

Files with missing lines	Patch %	Lines
src/awkward/operations/ak_to_safetensors.py	84.61%	4 Missing ⚠️
src/awkward/operations/ak_from_safetensors.py	88.88%	3 Missing ⚠️

Additional details and impacted files

Files with missing lines	Coverage Δ
src/awkward/operations/__init__.py	`100.00% <100.00%> (ø)`
src/awkward/operations/ak_from_safetensors.py	`88.88% <88.88%> (ø)`
src/awkward/operations/ak_to_safetensors.py	`84.61% <84.61%> (ø)`

... and 197 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-10-17T12:35:54Z

The documentation preview is ready to be viewed at http://preview.awkward-array.org.s3-website.us-east-1.amazonaws.com/PR3685

pfackeldey · 2025-10-17T13:53:00Z

Something is looking weird with the API docs of these two functions, but I don't see what I did wrong... Any ideas?

ianna

@pfackeldey - excellent work! A few minor comments, please, check. Also you correctly support str, pathlib.Path, or file-like objects for destination in docstring, but the implementation does not explicitly normalize Path objects. While safetensors.numpy.save_file accepts paths, an explicit cast like:

import os
from pathlib import Path

if isinstance(destination, Path):
    destination = os.fspath(destination)

can make behavior more predictable across platforms.

src/awkward/operations/ak_from_safetensors.py

src/awkward/operations/ak_to_safetensors.py

src/awkward/operations/ak_from_safetensors.py

ianna · 2025-10-17T14:54:16Z

Something is looking weird with the API docs of these two functions, but I don't see what I did wrong... Any ideas?

Ah, this should come first:

    """
    Args:
...

and then the function description, I think.

ikrommyd · 2025-10-17T16:11:00Z

@pfackeldey do you wanna add tests for every single layout type? You can just copy the layouts from tests/test_3608_to_packed_for_typetracer_backed_arrays.py. I remember adding all the layouts there recently at least. Or tell an LLM to do it actually :)

Co-authored-by: Ianna Osborne <ianna.osborne@cern.ch>

pfackeldey · 2025-10-20T08:41:49Z

@pfackeldey do you wanna add tests for every single layout type? You can just copy the layouts from tests/test_3608_to_packed_for_typetracer_backed_arrays.py. I remember adding all the layouts there recently at least. Or tell an LLM to do it actually :)

no, this uses to/from_buffers under-the-hood which is well-tested already. I don't think it makes sense to add redundant test cases. This conversion here works as long as ak.to/from_buffers works.

Co-authored-by: Ianna Osborne <ianna.osborne@cern.ch>

requirements-test-full.txt

ikrommyd · 2025-10-20T15:50:47Z

@pfackeldey maybe I missed something in the code, but shouldn't you materialize before writing to safetensors? to_buffers doesn't by itself. It spits out VirtualNDArray instances. Maybe to_packed is worth it too?

pfackeldey · 2025-10-21T08:32:41Z

@pfackeldey maybe I missed something in the code, but shouldn't you materialize before writing to safetensors? to_buffers doesn't by itself. It spits out VirtualNDArray instances. Maybe to_packed is worth it too?

good point! I'll add that 👍

ikrommyd · 2025-10-21T08:48:16Z

And I had one more thing that I just thought of. Maybe there should be a check that the array is not typetracer-backed when writing? I'm not sure what other IO functions to, I didn't check before writing this . I am saying this because to_buffers will work and to_packed but then you'd try to convert to bytes a typetracer which will probably give not a super clean error

pfackeldey · 2025-10-21T09:04:04Z

And I had one more thing that I just thought of. Maybe there should be a check that the array is not typetracer-backed when writing? I'm not sure what other IO functions to, I didn't check before writing this . I am saying this because to_buffers will work and to_packed but then you'd try to convert to bytes a typetracer which will probably give not a super clean error

it fails with a correct and good error already:

... 
TypeError: cannot call 'to_buffers' on an array without concrete data

ikrommyd · 2025-10-21T14:02:09Z

Ah good. I was under the impression from buffers would be fine. I should have tried it before speaking I guess. Thanks for checking.

pfackeldey · 2025-10-22T08:41:54Z

Hi @ianna , I've addressed all comments and switch the file handling to fsspec to normalize file paths and support remote read and write. It's following the implementation of ak.to_/from_parquet.

ianna

@pfackeldey - Great! Thanks for addressing the comments! Please, go ahead and merge it if you are done with it. Thanks!

pfackeldey and others added 3 commits October 17, 2025 14:02

feat: add to_/from_safetensors

d5180f6

Merge branch 'main' into to_from_safetensors.py

257006f

style: pre-commit fixes

c4345c5

pfackeldey added 2 commits October 17, 2025 14:12

satisfy pre-commit

1c9e370

add test

ce4a86b

pfackeldey marked this pull request as ready for review October 17, 2025 12:20

satisfy pylint too

98fb3cc

pfackeldey requested a review from ianna October 17, 2025 13:47

ianna requested changes Oct 17, 2025

View reviewed changes

ianna added the pr-next-release Required for the next release label Oct 17, 2025

pfackeldey and others added 6 commits October 20, 2025 10:37

Update src/awkward/operations/ak_from_safetensors.py

aefccf9

Co-authored-by: Ianna Osborne <ianna.osborne@cern.ch>

Update src/awkward/operations/ak_from_safetensors.py

0edea75

Co-authored-by: Ianna Osborne <ianna.osborne@cern.ch>

Update src/awkward/operations/ak_from_safetensors.py

19871f4

Co-authored-by: Ianna Osborne <ianna.osborne@cern.ch>

Update src/awkward/operations/ak_to_safetensors.py

15338b2

Co-authored-by: Ianna Osborne <ianna.osborne@cern.ch>

Update src/awkward/operations/ak_from_safetensors.py

5737374

Co-authored-by: Ianna Osborne <ianna.osborne@cern.ch>

Update src/awkward/operations/ak_to_safetensors.py

818ddda

Co-authored-by: Ianna Osborne <ianna.osborne@cern.ch>

pfackeldey and others added 8 commits October 20, 2025 10:42

Update src/awkward/operations/ak_from_safetensors.py

c4a8aa5

Co-authored-by: Ianna Osborne <ianna.osborne@cern.ch>

Update src/awkward/operations/ak_to_safetensors.py

1c68716

Co-authored-by: Ianna Osborne <ianna.osborne@cern.ch>

Update src/awkward/operations/ak_from_safetensors.py

fecc00e

Co-authored-by: Ianna Osborne <ianna.osborne@cern.ch>

Update src/awkward/operations/ak_from_safetensors.py

1de11b9

Co-authored-by: Ianna Osborne <ianna.osborne@cern.ch>

Update src/awkward/operations/ak_from_safetensors.py

65829b4

Co-authored-by: Ianna Osborne <ianna.osborne@cern.ch>

Update src/awkward/operations/ak_from_safetensors.py

3f23cd5

Co-authored-by: Ianna Osborne <ianna.osborne@cern.ch>

Update src/awkward/operations/ak_to_safetensors.py

a3339b6

Co-authored-by: Ianna Osborne <ianna.osborne@cern.ch>

Update src/awkward/operations/ak_from_safetensors.py

a6fb568

Co-authored-by: Ianna Osborne <ianna.osborne@cern.ch>

pfackeldey and others added 6 commits October 20, 2025 10:46

Update src/awkward/operations/ak_to_safetensors.py

895888b

Co-authored-by: Ianna Osborne <ianna.osborne@cern.ch>

Update src/awkward/operations/ak_to_safetensors.py

b63b72b

Co-authored-by: Ianna Osborne <ianna.osborne@cern.ch>

Update src/awkward/operations/ak_to_safetensors.py

c7724d3

Co-authored-by: Ianna Osborne <ianna.osborne@cern.ch>

Update src/awkward/operations/ak_to_safetensors.py

72191b4

Co-authored-by: Ianna Osborne <ianna.osborne@cern.ch>

address remaining comments

76246fa

Merge branch 'main' into to_from_safetensors.py

42d8920

ikrommyd reviewed Oct 20, 2025

View reviewed changes

requirements-test-full.txt Show resolved Hide resolved

make sure arrays are packed before serializing to safetensors

960c99c

use fsspec to allow remote writing and reading

03f2e73

ianna approved these changes Oct 22, 2025

View reviewed changes

pfackeldey merged commit 4f34af8 into scikit-hep:main Oct 22, 2025
41 checks passed

pfackeldey deleted the to_from_safetensors.py branch October 22, 2025 11:44

pfackeldey mentioned this pull request Nov 6, 2025

Related issues iris-hep/integration-challenge#4

Open

26 tasks

Conversation

pfackeldey commented Oct 17, 2025

Uh oh!

codecov bot commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Oct 17, 2025

Uh oh!

pfackeldey commented Oct 17, 2025

Uh oh!

ianna left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ianna commented Oct 17, 2025

Uh oh!

ikrommyd commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pfackeldey commented Oct 20, 2025

Uh oh!

Uh oh!

ikrommyd commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pfackeldey commented Oct 21, 2025

Uh oh!

ikrommyd commented Oct 21, 2025

Uh oh!

pfackeldey commented Oct 21, 2025

Uh oh!

ikrommyd commented Oct 21, 2025

Uh oh!

pfackeldey commented Oct 22, 2025

Uh oh!

ianna left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Oct 17, 2025 •

edited

Loading

ikrommyd commented Oct 17, 2025 •

edited

Loading

ikrommyd commented Oct 20, 2025 •

edited

Loading