GH-24834: [C#] Support writing compressed IPC data#39871
GH-24834: [C#] Support writing compressed IPC data#39871CurtHagenlocher merged 11 commits intoapache:mainfrom
Conversation
| skip_testers.add("JS") | ||
| skip_testers.add("Rust") | ||
| if prefix == '2.0.0-compression': | ||
| skip_testers.add("C#") |
There was a problem hiding this comment.
I've enabled the compression integration tests here, but it looks like those only test reading compressed data, so could have been enabled previously. As far as I can see the integration tests aren't currently set up to allow reading compressed IPC data written by C# from another implementation.
| /// The compression codec factory used to create compression codecs. | ||
| /// Must be provided if a CompressionCodec is specified. | ||
| /// </summary> | ||
| public ICompressionCodecFactory CompressionCodecFactory { get; set; } |
There was a problem hiding this comment.
I'm not sure whether the factory should be included in the options or whether it makes more sense to be passed to the writer constructor. It's nice to have here to avoid complicating the constructor too much, and because it goes together with the codec option, but I don't know if it should really be considered an "option".
There was a problem hiding this comment.
Another approach might be to take an ICompressionCodec instead; that seems a bit more options-like than taking both the factory and the codec identity. This would probably require ICompressionCodec to expose a CompressionCodecType property, though.
I don't know that I feel strongly either way.
There was a problem hiding this comment.
Yeah that's true, although if you don't feel too strongly about it then I'm happy to leave it as is.
| using System.Threading.Tasks; | ||
| using Apache.Arrow.Tests; | ||
| using K4os.Compression.LZ4; | ||
| using Xunit; |
| using System.Threading.Tasks; | ||
| using Apache.Arrow.Tests; | ||
| using K4os.Compression.LZ4; | ||
| using Xunit; |
| /// The compression codec factory used to create compression codecs. | ||
| /// Must be provided if a CompressionCodec is specified. | ||
| /// </summary> | ||
| public ICompressionCodecFactory CompressionCodecFactory { get; set; } |
There was a problem hiding this comment.
Another approach might be to take an ICompressionCodec instead; that seems a bit more options-like than taking both the factory and the codec identity. This would probably require ICompressionCodec to expose a CompressionCodecType property, though.
I don't know that I feel strongly either way.
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit c9f6e04. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
### Rationale for this change This allows using compression when writing IPC streams and files with the Arrow .NET library ### What changes are included in this PR? * Adds a compress method to the `ICompressionCodec` interface and implements this for Zstd and LZ4Frame in the `Apache.Arrow.Compression` package * Adds new compression related options to `IpcOptions` * Implements buffer compression in `ArrowStreamWriter` ### Are these changes tested? Yes, new unit tests have been added ### Are there any user-facing changes? Yes, this is a new user-facing feature and the `status.rst` and `csharp/README` files have been updated * Closes: apache#24834 Authored-by: Adam Reeve <adreeve@gmail.com> Signed-off-by: Curt Hagenlocher <curt@hagenlocher.org>
Rationale for this change
This allows using compression when writing IPC streams and files with the Arrow .NET library
What changes are included in this PR?
ICompressionCodecinterface and implements this for Zstd and LZ4Frame in theApache.Arrow.CompressionpackageIpcOptionsArrowStreamWriterAre these changes tested?
Yes, new unit tests have been added
Are there any user-facing changes?
Yes, this is a new user-facing feature and the
status.rstandcsharp/READMEfiles have been updated