Skip to content

GH-39519: [Swift] Fix null count when using reader#39520

Merged
kou merged 1 commit intoapache:mainfrom
abandy:GH-39519
Jan 13, 2024
Merged

GH-39519: [Swift] Fix null count when using reader#39520
kou merged 1 commit intoapache:mainfrom
abandy:GH-39519

Conversation

@abandy
Copy link
Copy Markdown
Contributor

@abandy abandy commented Jan 8, 2024

Currently the reader is not properly setting the null count when building an array from a stream. This PR adds a fix for this.

@abandy abandy requested a review from kou as a code owner January 8, 2024 23:47
@abandy abandy changed the title GH-37938:[Swift] fix null count when using reader GH-39519:[Swift] fix null count when using reader Jan 8, 2024
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 8, 2024

⚠️ GitHub issue #39519 has been automatically assigned in GitHub to PR creator.

@kou kou changed the title GH-39519:[Swift] fix null count when using reader GH-39519: [Swift] Fix null count when using reader Jan 9, 2024
@kou
Copy link
Copy Markdown
Member

kou commented Jan 9, 2024

This approach passes nullCount from Apache Arrow data (.arrow/.arrows data) to ArrowArray via ArrowNullBuffer, right?
The C++ implementation passes it to arrow::Array (arrow::ArrayData) directly:

out->null_count = node->null_count();

If we use this approach in Swift too, we will use API something like the following:

diff --git a/swift/Arrow/Sources/Arrow/ArrowReader.swift b/swift/Arrow/Sources/Arrow/ArrowReader.swift
index d9dc1bdb47..c985b37ca6 100644
--- a/swift/Arrow/Sources/Arrow/ArrowReader.swift
+++ b/swift/Arrow/Sources/Arrow/ArrowReader.swift
@@ -88,7 +88,7 @@ public class ArrowReader {
             let valueBuffer = loadInfo.recordBatch.buffers(at: loadInfo.bufferIndex + 2)!
             let arrowValueBuffer = makeBuffer(valueBuffer, fileData: loadInfo.fileData,
                                               length: UInt(node.length), messageOffset: loadInfo.messageOffset)
-            return makeArrayHolder(loadInfo.field, buffers: [arrowNullBuffer, arrowOffsetBuffer, arrowValueBuffer])
+            return makeArrayHolder(loadInfo.field, buffers: [arrowNullBuffer, arrowOffsetBuffer, arrowValueBuffer], UInt(node.nullCount))
         } catch let error as ArrowError {
             return .failure(error)
         } catch {
diff --git a/swift/Arrow/Sources/Arrow/ArrowReaderHelper.swift b/swift/Arrow/Sources/Arrow/ArrowReaderHelper.swift
index fa52160478..13d6209dee 100644
--- a/swift/Arrow/Sources/Arrow/ArrowReaderHelper.swift
+++ b/swift/Arrow/Sources/Arrow/ArrowReaderHelper.swift
@@ -126,7 +126,8 @@ private func makeFixedHolder<T>(
 
 func makeArrayHolder( // swiftlint:disable:this cyclomatic_complexity
     _ field: org_apache_arrow_flatbuf_Field,
-    buffers: [ArrowBuffer]
+    buffers: [ArrowBuffer],
+    nullCount: UInt
 ) -> Result<ArrowArrayHolder, ArrowError> {
     let type = field.typeType
     switch type {

What do you think this approach?

@abandy
Copy link
Copy Markdown
Contributor Author

abandy commented Jan 13, 2024

Hi @kou, I have made the requested updates. Please review when you get a chance. Thank you!

Copy link
Copy Markdown
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@kou kou merged commit e632364 into apache:main Jan 13, 2024
@kou kou removed the awaiting review Awaiting review label Jan 13, 2024
@github-actions github-actions bot added the awaiting merge Awaiting merge label Jan 13, 2024
@conbench-apache-arrow
Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit e632364.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them.

dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
Currently the reader is not properly setting the null count when building an array from a stream.  This PR adds a fix for this.

* Closes: apache#39519

Authored-by: Alva Bandy <abandy@live.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@abandy abandy deleted the GH-39519 branch April 2, 2024 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Swift] Null count is not being set when reading from stream

2 participants