Skip to content

Add BYTE_STREAM_SPLIT data file#45

Merged
pitrou merged 6 commits intoapache:masterfrom
mwlon:byte-stream-split
Jan 9, 2024
Merged

Add BYTE_STREAM_SPLIT data file#45
pitrou merged 6 commits intoapache:masterfrom
mwlon:byte-stream-split

Conversation

@mwlon
Copy link
Copy Markdown
Contributor

@mwlon mwlon commented Jan 8, 2024

Allow better testing of BYTE_STREAM_SPLIT encoding implementations.

@mwlon
Copy link
Copy Markdown
Contributor Author

mwlon commented Jan 8, 2024

@tustvold This should help with the byte stream split PR I'll make soon

@mapleFU
Copy link
Copy Markdown
Member

mapleFU commented Jan 8, 2024

Should we link byte_stream_split.zstd.parquet in readme.md?

@mwlon
Copy link
Copy Markdown
Contributor Author

mwlon commented Jan 8, 2024

Yep, added it. I hadn't noticed the readme.

@@ -0,0 +1,27 @@
`byte_stream_split.zstd.parquet` is generated by pyarrow 14.0.2 using the following code:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps inline this in the README as for other files?

'f64': np.random.normal(size=300).astype(np.float64),
})

f = '/Users/martin/Downloads/byte_stream_split.parquet'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just writing into the current directory would make this script more readily usable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops - fixed

@pitrou pitrou changed the title add byte_stream_split.zstd.parquet Add BYTE_STREAM_SPLIT data file Jan 9, 2024
@pitrou pitrou merged commit 4cb3cff into apache:master Jan 9, 2024
@mwlon mwlon deleted the byte-stream-split branch January 10, 2024 00:06
pitrou added a commit to apache/arrow that referenced this pull request Jan 11, 2024
…39570)

### Rationale for this change

In apache/parquet-testing#45 , an integration file for BYTE_STREAM_SPLIT was added to the parquet-testing repo.

### What changes are included in this PR?

Add a test reading that file and ensuring the decoded values are as expected.

### Are these changes tested?

By definition.

### Are there any user-facing changes?

No.
* Closes: #39560

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…PLIT (apache#39570)

### Rationale for this change

In apache/parquet-testing#45 , an integration file for BYTE_STREAM_SPLIT was added to the parquet-testing repo.

### What changes are included in this PR?

Add a test reading that file and ensuring the decoded values are as expected.

### Are these changes tested?

By definition.

### Are there any user-facing changes?

No.
* Closes: apache#39560

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants