Skip to content

GH-35718: [Go][Parquet] Fix for null-only encoding panic#39497

Merged
zeroshade merged 2 commits intoapache:mainfrom
MagicBoost:main
Jan 9, 2024
Merged

GH-35718: [Go][Parquet] Fix for null-only encoding panic#39497
zeroshade merged 2 commits intoapache:mainfrom
MagicBoost:main

Conversation

@MagicBoost
Copy link
Copy Markdown
Contributor

@MagicBoost MagicBoost commented Jan 8, 2024

Rationale for this change

closes: #35718

What changes are included in this PR?

Fix painc writing with DeltaBinaryPacked or DeltaByteArray when column only has nulls

Are these changes tested?

Yes

  • add a test writing nulls to columns with DeltaBinaryPacked / DeltaByteArray / DeltaLengthByteArray encodings

Are there any user-facing changes?

No

@MagicBoost MagicBoost requested a review from zeroshade as a code owner January 8, 2024 03:19
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 8, 2024

⚠️ GitHub issue #35718 has been automatically assigned in GitHub to PR creator.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm why List is tested here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking this is not necessary,I just copy the previous test and want to add some simple tests on nested formats.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to ensure we're testing nulls for multiple types

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind testing all these encoders?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

( Maybe go/parquet/internal/encoding/encoding_test.go is also a good place for testing?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it, I have already used these encoders testing nulls encoding.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no panic or error happens in the encoding process, panic happens when writing data into a file (call written function on a nil bitWriter not inited).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this patch we check DeltaBinaryPack and DeltaByteArray. DeltaLengthByteArray uses DeltaBinaryPack, and DeltaByteArray uses DeltaLengthByteArray. So I think it's ok to testing them

Besides, there're some other encodings: RleBoolean, Plain ..etc. Should we also testing them?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I added more encording tests.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jan 8, 2024
@mapleFU
Copy link
Copy Markdown
Member

mapleFU commented Jan 8, 2024

(I'm afraid https://github.com/apache/arrow/pull/39347/files does the similiar things but less than your contributions in this patch, but it doesn't get merged, I will wait @zeroshade for a while...)

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jan 8, 2024
@zeroshade
Copy link
Copy Markdown
Member

I've rebased this and fixed the conflict after merging my change. Shrinks this down to a fairly small change plus all the new tests. This all seems good to me assuming that all the CI passes. Thanks much @MagicBoost !

@zeroshade zeroshade merged commit f0879ed into apache:main Jan 9, 2024
@zeroshade zeroshade removed the awaiting changes Awaiting changes label Jan 9, 2024
@mapleFU
Copy link
Copy Markdown
Member

mapleFU commented Jan 10, 2024

// EstimatedDataEncodedSize returns the current number of bytes that have
// been buffered so far
func (enc *PlainBooleanEncoder) EstimatedDataEncodedSize() int64 {
	return int64(enc.sink.Len() + int(bitutil.BytesForBits(int64(enc.wr.Pos()))))
}

@zeroshade This might suffer from problem when enc.wr == nil. However, if all input is null, the Put would also be called here...Would we leave it here or also fixing this?

@conbench-apache-arrow
Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit f0879ed.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…e#39497)

### Rationale for this change

closes: apache#35718 

### What changes are included in this PR?

Fix painc writing with DeltaBinaryPacked or DeltaByteArray when column only has nulls

### Are these changes tested?

Yes

- add a test writing nulls to columns with DeltaBinaryPacked / DeltaByteArray / DeltaLengthByteArray encodings

### Are there any user-facing changes?

No

* Closes: apache#35718

Lead-authored-by: yufanmo <yufan.mo@transwarp.io>
Co-authored-by: Matt Topol <zotthewizard@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Go] [Parquet] panic writing with DeltaBinaryPacked when column only has nulls

3 participants