GH-47602: [Python] Make Schema hashable even when it has metadata#47601
GH-47602: [Python] Make Schema hashable even when it has metadata#47601raulcd merged 2 commits intoapache:mainfrom
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
|
|
|
@github-actions crossbow submit -g python |
Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com>
|
Revision: ad52c12 Submitted crossbow builds: ursacomputing/crossbow @ actions-6cf16e3659 |
|
The crossbow stuff for some reason coredumped for ubuntu-22.04 (https://github.com/ursacomputing/crossbow/actions/runs/17939291515/job/51011788800, https://github.com/ursacomputing/crossbow/actions/runs/17939291336/job/51011788225). But I wouldn't see how that is connected to this change? 🤔 |
|
These three failures are not connected. |
raulcd
left a comment
There was a problem hiding this comment.
LGTM
Sorry for taking a little to review, I've been busy with other issues.
Thanks for the ping
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit ef718a7. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…ta (apache#47601) ### Rationale for this change In Python, `pyarrow.Schema` before was not hashable when it has `metadata` set. ``` >>> import pyarrow >>> schema = pyarrow.schema([], metadata={b"1": b"1"}) >>> hash(schema) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "pyarrow/types.pxi", line 2921, in pyarrow.lib.Schema.__hash__ TypeError: unhashable type: 'dict' ``` This is because the metadata (which is a dict) was tried to be hashed as-is, which doesn't work. ### What changes are included in this PR? Slightly change how hashes are computed for Schema, by converting the `dict[str, str]` to the frozenset of key- and value tuples. For reference, this is faster than computing the hash of a sorted tuple of key- and value tuples (https://stackoverflow.com/a/6014481/10070873). ### Are these changes tested? Yes. ### Are there any user-facing changes? Besides that `Schema` now correctly is hashable, no. * GitHub Issue: apache#47602 Lead-authored-by: Jonas Dedden <university@jonas-dedden.de> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
Rationale for this change
In Python,
pyarrow.Schemabefore was not hashable when it hasmetadataset.This is because the metadata (which is a dict) was tried to be hashed as-is, which doesn't work.
What changes are included in this PR?
Slightly change how hashes are computed for Schema, by converting the
dict[str, str]to the frozenset of key- and value tuples.For reference, this is faster than computing the hash of a sorted tuple of key- and value tuples (https://stackoverflow.com/a/6014481/10070873).
Are these changes tested?
Yes.
Are there any user-facing changes?
Besides that
Schemanow correctly is hashable, no.