-
Notifications
You must be signed in to change notification settings - Fork 270
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
DataFusion's initcap behaves differently than Spark's. While both do "upper-case the first letter of each word and lowercase others", Spark considers as words anything separated by whitespace (' '), while DataFusion considers anything separated by non-ascii-alphanumeric as words. (DF's code would also fail to uppercase or lowercase non-ascii chars, but that doesn't materialize as a separate issue as it considers them separators already in the first place.)
#1051 shows the problem by adding two cases to the test, one using a dash and one using non-ascii letters (from Finnish).
== Results ==
!== Correct Answer - 7 == == Spark Answer - 7 ==
struct<initcap(name):string> struct<initcap(name):string>
[James Smith] [James Smith]
[James Smith] [James Smith]
![James Ähtäri] [James äHtäRi]
[Michael Rose] [Michael Rose]
[Rames Rose] [Rames Rose]
![Robert Rose-smith] [Robert Rose-Smith]
[Robert Williams] [Robert Williams]
Steps to reproduce
Call initcap with an input containing non-ascii-alphanumeric non-whitespace characters
Expected behavior
Match Spark
Additional context
No response
andygrove, viirya, mbutrovich and comphead
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working