Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Aug 25, 2024

Which issue does this PR close?

Follow on to #6231 from @xinlifoobar

Rationale for this change

These new APIs were added in #6231 and I wanted to clarify their behavior (so we can potentailly use them in DataFusion)

What changes are included in this PR?

Update doc comments

Are there any user-facing changes?

Documentation, no functional change

@alamb alamb added the documentation Improvements or additions to documentation label Aug 25, 2024
@github-actions github-actions bot added the arrow Changes to the arrow crate label Aug 25, 2024
/// Returns an iterator over the first `prefix_len` bytes of each array
/// element, including null values.
///
/// If `prefix_len` is larger than the element's length, the iterator will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels rather surprising to me?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a statement or a question?

If you have a suggestion of how to make it less surprising perhaps @xinlifoobar would be willing to give it a try

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I would have thought returning Option<&[u8]> would be perhaps more intuitive and likely perform similarly? That being said I am surprised that skipping the null mask yields performance returns, as this hasn't been my experience with regular StringArray, so my intuition about performance may be off

Copy link
Contributor

@xinlifoobar xinlifoobar Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions! @tustvold.

I created a new PR #6312 to put this discussion. I did this in the early version but didn't see performance gains like this time - it did well for inline values. We might do a combination based on the bench result, e.g., null masks for prefix_iter and slice for suffix_iter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The results on #6312 seem to show that using null masks does indeed reduce performance

So is the conclusion that these APIs are good to go?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, I think this API would be better if it returned None for a prefix too long, even if it continues to not inspect null masks

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs look good, we can always update should we alter the signature

@alamb
Copy link
Contributor Author

alamb commented Aug 28, 2024

Ok, merging in these docs so they match the current code. As @tustvold says if the code needs to be updated we can also update the docs

@alamb alamb merged commit dc8427f into apache:master Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants