Skip to content

feat: Add Spark to_pretty_string function#12780

Closed
ArnavBalyan wants to merge 2 commits intofacebookincubator:mainfrom
ArnavBalyan:arnavb/pretty-support-v
Closed

feat: Add Spark to_pretty_string function#12780
ArnavBalyan wants to merge 2 commits intofacebookincubator:mainfrom
ArnavBalyan:arnavb/pretty-support-v

Conversation

@ArnavBalyan
Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 24, 2025
@ArnavBalyan
Copy link
Copy Markdown
Contributor Author

cc @rui-mo @jinchengchenghh can you PTAL thanks!

@netlify
Copy link
Copy Markdown

netlify Bot commented Mar 24, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit b093478
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6807748140394c0008681ce7

Copy link
Copy Markdown
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Comment thread velox/docs/functions/spark/string.rst Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The document doesn't appear to render properly. Could you take a look?

Screenshot 2025-03-25 at 15 24 03

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears the date-to-string cast throws for runtime error, but here we return "NULL" for both user error and runtime error. Which is aligned with Spark?

if (!ue.isUserError()) {
throw;
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CastExpr::applyToSelectedNoThrowLocal adopts the below logic for NULL.

try {
func(row);
} catch (const VeloxException& e) {
if (!e.isUserError()) {
throw;
}
result->setNull(row, true);
} catch (const std::exception&) {
result->setNull(row, true);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean match the VeloxException, then catch std::exception? @rui-mo This is not addressed. @ArnavBalyan

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation throws a user error when invalid timestamp is encountered, is this not what was intended? thanks!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current implementation catches all the std::exception and converts them as user error, while in the cast, exception is thrown if not a user error, and set null result if it is. If the only differences with cast are the listed ones, we might align with cast here.

@rui-mo rui-mo changed the title feat: Add support for Spark ToPrettyString feat: Add Spark to_pretty_string function Mar 25, 2025
@ArnavBalyan
Copy link
Copy Markdown
Contributor Author

cc @rui-mo if you could please take a look

@rui-mo
Copy link
Copy Markdown
Contributor

rui-mo commented Mar 28, 2025

Would you run the Spark expression fuzzer test for 2 minutes to make sure it works? You could specify --only to_pretty_string to test this function only.

The command for reference: https://github.com/facebookincubator/velox/blob/main/.github/workflows/scheduled.yml#L587-L596

@ArnavBalyan ArnavBalyan force-pushed the arnavb/pretty-support-v branch from 0429c3e to bd0f369 Compare April 13, 2025 17:31
@ArnavBalyan
Copy link
Copy Markdown
Contributor Author

Would you run the Spark expression fuzzer test for 2 minutes to make sure it works? You could specify --only to_pretty_string to test this function only.

The command for reference: https://github.com/facebookincubator/velox/blob/main/.github/workflows/scheduled.yml#L587-L596

Done thanks:

E0413 17:29:58.007323 315291 ExpressionFuzzerVerifier.cpp:218] ==============================> Top 1 by number of rows processed
E0413 17:29:58.008945 315291 ExpressionFuzzerVerifier.cpp:220] Format: functionName numTimesSelected proportionOfTimesSelected numProcessedRows
E0413 17:29:58.009104 315291 ExpressionFuzzerVerifier.cpp:224] to_pretty_string 596890 100.00% 21313401
E0413 17:29:58.009243 315291 ExpressionFuzzerVerifier.cpp:231] ==============================> Bottom 1 by number of rows processed
E0413 17:29:58.009339 315291 ExpressionFuzzerVerifier.cpp:233] Format: functionName numTimesSelected proportionOfTimesSelected numProcessedRows
E0413 17:29:58.009405 315291 ExpressionFuzzerVerifier.cpp:238] to_pretty_string 596890 100.00% 21313401
E0413 17:29:58.009486 315291 ExpressionFuzzerVerifier.cpp:251] ==============================> All stats sorted by number of times the function was chosen
E0413 17:29:58.009547 315291 ExpressionFuzzerVerifier.cpp:253] Format: functionName numTimesSelected proportionOfTimesSelected numProcessedRows
E0413 17:29:58.009609 315291 ExpressionFuzzerVerifier.cpp:257] to_pretty_string 596890 100.00% 21313401
E0413 17:29:58.009671 315291 ExpressionFuzzerVerifier.cpp:486] Total test cases: 864029
E0413 17:29:58.009732 315291 ExpressionFuzzerVerifier.cpp:487] Total test cases verified in the reference DB: 864029
E0413 17:29:58.009791 315291 ExpressionFuzzerVerifier.cpp:489] Total test cases failed in both Velox and the reference DB: 0

@rui-mo
Copy link
Copy Markdown
Contributor

rui-mo commented Apr 17, 2025

@jinchengchenghh Do you have further comment? Thanks.

@jinchengchenghh
Copy link
Copy Markdown
Contributor

LGTM @rui-mo
Please fix the code format @ArnavBalyan

Copy link
Copy Markdown
Collaborator

@assignUser assignUser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure CI passes, currently a test is failing. I'd recommend rebasing/merging main too.

Comment thread velox/functions/sparksql/ToPrettyString.h Outdated
@ArnavBalyan ArnavBalyan force-pushed the arnavb/pretty-support-v branch from bd0f369 to 07c3af1 Compare April 17, 2025 13:02
@ArnavBalyan
Copy link
Copy Markdown
Contributor Author

LGTM @rui-mo Please fix the code format @ArnavBalyan

Updated thanks!

@ArnavBalyan ArnavBalyan force-pushed the arnavb/pretty-support-v branch from 07c3af1 to 6f5963a Compare April 18, 2025 08:31
@ArnavBalyan ArnavBalyan force-pushed the arnavb/pretty-support-v branch from 6f5963a to 5568329 Compare April 18, 2025 08:54
@ArnavBalyan
Copy link
Copy Markdown
Contributor Author

cc @assignUser can you pls rerun the tests

@rui-mo
Copy link
Copy Markdown
Contributor

rui-mo commented Apr 22, 2025

@ArnavBalyan
Copy link
Copy Markdown
Contributor Author

@ArnavBalyan You could apply the below diff to fix the format issue.

https://github.com/facebookincubator/velox/actions/runs/14532602668/attempts/2#summary-40925598950

Thanks, updated

@ArnavBalyan ArnavBalyan requested a review from assignUser April 23, 2025 05:00
@ArnavBalyan
Copy link
Copy Markdown
Contributor Author

cc @assignUser could you pls take a look thanks!

Copy link
Copy Markdown
Collaborator

@assignUser assignUser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@rui-mo
Copy link
Copy Markdown
Contributor

rui-mo commented Apr 29, 2025

@ArnavBalyan It seems your updates are not pushed in, including #12780 (comment), could you confirm?

@stale
Copy link
Copy Markdown

stale Bot commented Jul 28, 2025

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions!

@stale stale Bot added the stale label Jul 28, 2025
@stale stale Bot closed this Aug 11, 2025
meta-codesync Bot pushed a commit that referenced this pull request Feb 7, 2026
Summary:
- Add support for to_pretty_string used by Spark, this operator is used for `df.show()` to show the result to terminal.
 - Builds upon jinchengchenghh's contribution here #10359 and ArnavBalyan's contribution here #12780

Pull Request resolved: #16245

Reviewed By: tanjialiang

Differential Revision: D92490039

Pulled By: xiaoxmeng

fbshipit-source-id: fb70a1c8a8c9f7f7402f8626a29c810ad6a01048
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants