Skip to content

Add normalize_token for pyarrow.parquet.ParquetSchema/FileMetaData#872

Merged
phofl merged 1 commit intodask:mainfrom
milesgranger:milesgranger/703-filemetadata-hashing
Feb 14, 2024
Merged

Add normalize_token for pyarrow.parquet.ParquetSchema/FileMetaData#872
phofl merged 1 commit intodask:mainfrom
milesgranger:milesgranger/703-filemetadata-hashing

Conversation

@milesgranger
Copy link
Copy Markdown
Collaborator

Will close #703

@milesgranger milesgranger force-pushed the milesgranger/703-filemetadata-hashing branch 2 times, most recently from ea2cc34 to 711386f Compare February 13, 2024 13:14
@phofl
Copy link
Copy Markdown
Collaborator

phofl commented Feb 13, 2024

Can you add tests?

We generally check that tokenising the same thing twice results in an equal token

@milesgranger milesgranger force-pushed the milesgranger/703-filemetadata-hashing branch from 711386f to 37fc0f3 Compare February 14, 2024 06:39
@milesgranger milesgranger force-pushed the milesgranger/703-filemetadata-hashing branch from 37fc0f3 to ea2829b Compare February 14, 2024 06:41
@phofl phofl merged commit 9098aa8 into dask:main Feb 14, 2024
@phofl
Copy link
Copy Markdown
Collaborator

phofl commented Feb 14, 2024

thx

there are tests in dask/dask that failed before, could you take a brief look if you can find them and enable them if yes? They are somewhere in the io folder in dataframe

@milesgranger milesgranger deleted the milesgranger/703-filemetadata-hashing branch February 14, 2024 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement hashing for FileMetaData object from Arrow

2 participants