Skip to content

GH-24834: [C#] Support writing compressed IPC data#39871

Merged
CurtHagenlocher merged 11 commits intoapache:mainfrom
adamreeve:dotnet-compression
Feb 7, 2024
Merged

GH-24834: [C#] Support writing compressed IPC data#39871
CurtHagenlocher merged 11 commits intoapache:mainfrom
adamreeve:dotnet-compression

Conversation

@adamreeve
Copy link
Copy Markdown
Contributor

@adamreeve adamreeve commented Feb 1, 2024

Rationale for this change

This allows using compression when writing IPC streams and files with the Arrow .NET library

What changes are included in this PR?

  • Adds a compress method to the ICompressionCodec interface and implements this for Zstd and LZ4Frame in the Apache.Arrow.Compression package
  • Adds new compression related options to IpcOptions
  • Implements buffer compression in ArrowStreamWriter

Are these changes tested?

Yes, new unit tests have been added

Are there any user-facing changes?

Yes, this is a new user-facing feature and the status.rst and csharp/README files have been updated

skip_testers.add("JS")
skip_testers.add("Rust")
if prefix == '2.0.0-compression':
skip_testers.add("C#")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've enabled the compression integration tests here, but it looks like those only test reading compressed data, so could have been enabled previously. As far as I can see the integration tests aren't currently set up to allow reading compressed IPC data written by C# from another implementation.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Feb 1, 2024
/// The compression codec factory used to create compression codecs.
/// Must be provided if a CompressionCodec is specified.
/// </summary>
public ICompressionCodecFactory CompressionCodecFactory { get; set; }
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether the factory should be included in the options or whether it makes more sense to be passed to the writer constructor. It's nice to have here to avoid complicating the constructor too much, and because it goes together with the codec option, but I don't know if it should really be considered an "option".

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another approach might be to take an ICompressionCodec instead; that seems a bit more options-like than taking both the factory and the codec identity. This would probably require ICompressionCodec to expose a CompressionCodecType property, though.

I don't know that I feel strongly either way.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's true, although if you don't feel too strongly about it then I'm happy to leave it as is.

using System.Threading.Tasks;
using Apache.Arrow.Tests;
using K4os.Compression.LZ4;
using Xunit;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: sort

using System.Threading.Tasks;
using Apache.Arrow.Tests;
using K4os.Compression.LZ4;
using Xunit;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: sort

/// The compression codec factory used to create compression codecs.
/// Must be provided if a CompressionCodec is specified.
/// </summary>
public ICompressionCodecFactory CompressionCodecFactory { get; set; }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another approach might be to take an ICompressionCodec instead; that seems a bit more options-like than taking both the factory and the codec identity. This would probably require ICompressionCodec to expose a CompressionCodecType property, though.

I don't know that I feel strongly either way.

@CurtHagenlocher CurtHagenlocher merged commit c9f6e04 into apache:main Feb 7, 2024
@CurtHagenlocher CurtHagenlocher removed the awaiting committer review Awaiting committer review label Feb 7, 2024
@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Feb 7, 2024
@conbench-apache-arrow
Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit c9f6e04.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

@adamreeve adamreeve deleted the dotnet-compression branch February 7, 2024 20:03
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
### Rationale for this change

This allows using compression when writing IPC streams and files with the Arrow .NET library

### What changes are included in this PR?

* Adds a compress method to the `ICompressionCodec` interface and implements this for Zstd and LZ4Frame in the `Apache.Arrow.Compression` package
* Adds new compression related options to `IpcOptions`
* Implements buffer compression in `ArrowStreamWriter`

### Are these changes tested?

Yes, new unit tests have been added

### Are there any user-facing changes?

Yes, this is a new user-facing feature and the `status.rst` and `csharp/README` files have been updated

* Closes: apache#24834

Authored-by: Adam Reeve <adreeve@gmail.com>
Signed-off-by: Curt Hagenlocher <curt@hagenlocher.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C#] Create implementation of ARROW-300 / IPC record batch body buffer compression

2 participants