Fix for Case Correction of Acronyms causing doubling by theunrepentantgeek · Pull Request #65 · gobuffalo/flect

theunrepentantgeek · 2023-02-12T22:59:44Z

What is being done in this PR?

I discovered an edge case in the interaction of two features of flect.

If your string matches a known acronym, it's case-corrected to all upper-case.

If the end of your string also matches one of the rules for pluralization, flect attempts to replace the end of the string.

However, the suffix-to-remove comes from the original string, not the case corrected part - so the call to strings.TrimSuffix() doesn't remove the old suffix.

The results are a little odd. The two examples I found, when calling flect.Pluralize() are:

Input	Expected	Actual
ssh	SSHes	sshSSH
eia	EIAs	Eium

What are the main choices made to get to this solution?

~~The first commit adds two test cases to show the bug in action. I'm not 100% convinced that the plural's I've used are correct, but a unit-test at least makes the bug easy to verify.~~

~~The second commit makes the suffix-removing behaviour of Ident.ReplaceSuffix() case insensitive; this seems to fix the bug without breaking anything else.~~

After discussion, I've implemented preferred rules; this required a little bit of work to avoid inadvertently using the non-acronym rules at the wrong time. E.g. the Singular form of EIA is not EIUM.

What was discovered while working on it? (Optional)

I discovered the design choice that acronyms are always converted to upper case, which surprised me - I expected flect to preserve letter case (e.g. I expected flect.Pluralize("html") to return either `"htmls"' or '"html"' but not '"HTML"').

List the manual test cases you've covered before sending this PR:

I've reproduced my manual test cases as unit tests in flect_test.go

sio4 · 2023-02-13T16:56:27Z

Hi @theunrepentantgeek,
Thank you for filing the PR.

Indeed, the current behavior is not correct and it should be fixed!

However, I have a few different points of view.

First of all, Flect cannot be a general inflection module from the natural language perspective. It should be limited to specific use cases and we need to understand the limitation.

You fixed the wrong behavior by fixing Ident.ReplaceSuffix(), but in my opinion, the root cause of the incorrect behavior is the following finding you already mentioned:

If your string matches a known acronym, it's case-corrected to all upper-case.

I don't think this is good behavior. The issue started here, and I think it is the user's choice if they use SSH or just lowercase ssh. We should not ignore the user's intention.

I discovered the design choice that acronyms are always converted to upper case, which surprised me - I expected flect to preserve letter case (e.g. I expected flect.Pluralize("html") to return either `"htmls"' or '"html"' but not '"HTML"').

Exactly!

So I would like to fix it as:

we do not convert anything to upper case or another, just preserve the user's input as is.
Just treat abbreviations as a special case when we run pluralize or singularize, and the rule could be just adding a single 's' to simplify the rule. (I'd like to follow the rule in https://editorsmanual.com/articles/plurals-of-abbreviations/)

Actually, there was heavy work on #62 a few weeks ago, and after that, my plan for this module became to make the purpose of this module clearer and simpler, then fix things aligned with the goal.

Even though PR should be a space for reviewing not for discussion, I would love to hear your opinion or updated PR, also I would love to discuss it in a separate issue if you want to continue to discuss the topic further.

theunrepentantgeek · 2023-02-13T23:55:33Z

Thanks for the quick response @sio4.

I started writing an issue and realized the easiest way to demonstrate the issue was with a snippet of code, hence the PR.

The behavour you've described makes sense to me; I'll have a go at doing that and will update this PR.

theunrepentantgeek · 2023-02-16T02:01:19Z

@sio4 I've updated the PR - it includes tests, and I've linked to that doc you referenced so future developers know why things are done this way.

sio4 · 2023-02-16T04:07:26Z

Thanks! Will take a look as soon as possible!

sio4 · 2023-02-16T16:43:39Z

Hi, I took a look at the PR and it seems like there could be a compatibility issue. (not to uppercase them all)

This is my fault, I thought/think we should preserve the user's input as-is but I need to check if this change will break its users. Flect is mainly used for runtime automation for keywords (mostly DBMS column names, model names, etc) and this behavior change will cause compatibility issues with existing users.

For the behavior change, we need to consider it deeper later and need to decide whether to fix it or keep it, but now we need to keep it the same. Actually, we completed a milestone and I am about to develop the next plan with breaking changes, so this issue could be added to the TODO items.

Sorry for the time to fix the code.

By the way, what is the original issue that you wanted to fix? I would like to help you with the particular issue.

theunrepentantgeek · 2023-02-16T18:27:48Z

By the way, what is the original issue that you wanted to fix?

flect.Singularize("ssh") returns sshSSH - doubling the text, which is quite incorrect.
This one tripped me up when testing my project.

Flect is mainly used for runtime automation for keywords (mostly DBMS column names, model names, etc)

FWIW, this is similar to my use case - I'm working on a code generator which consumes OpenAPI definitions, generating Go code.

I totally understand your reservations about breaking-changes. 😃

theunrepentantgeek · 2023-02-16T18:53:55Z

What about providing an option to retain the users letter case for acronyms, but with a default that preserves the current behaviour for existing users?

One way to achieve this would be to extend flect.New() with a varidic Option parameter, allowing customization.

This would mean flect.New("foo") would retain the existing behaviour, but flect.New("foo", flect.PreserveCase) would have the new behaviour.

Not actually sure if this is a good approach, but figured the idea was worth sharing. What do you think?

sio4 · 2023-02-17T17:46:13Z

Thanks for your kind response and suggestions! I really love to hear that!

By the way, what is the original issue that you wanted to fix?

flect.Singularize("ssh") returns sshSSH - doubling the text, which is quite incorrect. This one tripped me up when testing my project.

Indeed, that should be fixed regardless of the design issue.

I totally understand your reservations about breaking-changes. smiley

Thanks for your kind understanding. The design issue should be fixed in the next major version update. (with breaking changes)

What about providing an option to retain the users letter case for acronyms, but with a default that preserves the current behaviour for existing users?

One way to achieve this would be to extend flect.New() with a varidic Option parameter, allowing customization.

This would mean flect.New("foo") would retain the existing behaviour, but flect.New("foo", flect.PreserveCase) would have the new behaviour.

This could be nice, but since the users may use Pluralize and Singularize directly, we need to option support to those functions too in case of if we take this approach. It could be a bigger task.

How about the following fix? It will cover your issue:

if strings.HasSuffix(s, r.suffix) (using s instead of lowered ls) is the key to your issue
additional block will cover IDs and IDSes

As a side note, we used the term acronyms for both acronyms and abbreviation/initial and it is also problematic. It should be addressed in the next version too. (if we keep the concept of the module wider)

diff --git a/flect_test.go b/flect_test.go
index 848d77a..4ab6251 100644
--- a/flect_test.go
+++ b/flect_test.go
@@ -322,6 +322,28 @@ var singlePluralAssertions = []dict{
 	{"waste", "wastes", true, true},
 	{"psi", "psis", true, true},
 	{"pepsi", "pepsis", true, true},
+
+	// Acronyms
+	{"widget_uuid", "widget_uuids", true, true},
+	{"WidgetUUID", "WidgetUUIDs", true, true},
+	{"widgetUUID", "widgetUUIDs", true, true},
+	{"widgetUuid", "widgetUuids", true, true},
+	{"widget_UUID", "widget_UUIDs", true, true},
+
+	{"ID", "IDs", true, true},
+	{"IDS", "IDSes", true, true},
+	// id to ids (ID), ids to idses (IDS) is not supported
+	{"api", "apis", true, true},
+	{"API", "APIs", true, true},
+	{"html", "htmls", true, true},
+	{"HTML", "HTMLs", true, true},
+	{"FYI", "FYIs", true, true},
+	{"LAN", "LANs", true, true},
+	{"ssh", "sshs", true, true}, // sh
+	{"SSH", "SSHs", true, true},
+	{"eia", "eias", true, true}, // ia
+	{"EIA", "EIAs", true, true},
+	{"DNS", "DNSes", true, true},
 }
 
 func init() {
diff --git a/plural_rules.go b/plural_rules.go
index 53c6081..8f3df40 100644
--- a/plural_rules.go
+++ b/plural_rules.go
@@ -202,6 +202,8 @@ var dictionary = []word{
 	{singular: "shoe", plural: "shoes"},
 	{singular: "toe", plural: "toes", exact: true},
 	{singular: "graffiti", plural: "graffiti"},
+
+	{singular: "ID", plural: "IDs", exact: true},
 }
 
 // singleToPlural is the highest priority map for Pluralize().
@@ -366,6 +368,8 @@ var singularToPluralSuffixList = []singularToPluralSuffix{
 	{"o", "oes"},
 	{"x", "xes"},
 
+	{"S", "Ses"},
+
 	// excluded rules: seems rare
 	// Words from Hebrew that add -im or -ot (eg, cherub becomes cherubim)
 	// - cherub (cherubs or cherubim), seraph (seraphs or seraphim)
diff --git a/pluralize.go b/pluralize.go
index a983f72..ad640e5 100644
--- a/pluralize.go
+++ b/pluralize.go
@@ -38,6 +38,13 @@ func (i Ident) Pluralize() Ident {
 	pluralMoot.RLock()
 	defer pluralMoot.RUnlock()
 
+	if p, ok := singleToPlural[i.Original]; ok {
+		return i.ReplaceSuffix(i.Original, p)
+	}
+	if _, ok := pluralToSingle[i.Original]; ok {
+		return i
+	}
+
 	ls := strings.ToLower(s)
 	if _, ok := pluralToSingle[ls]; ok {
 		return i
@@ -51,7 +58,7 @@ func (i Ident) Pluralize() Ident {
 	}
 
 	for _, r := range pluralRules {
-		if strings.HasSuffix(ls, r.suffix) {
+		if strings.HasSuffix(s, r.suffix) {
 			return i.ReplaceSuffix(s, r.fn(s))
 		}
 	}
diff --git a/singularize.go b/singularize.go
index 9993d65..9e592dd 100644
--- a/singularize.go
+++ b/singularize.go
@@ -35,6 +35,13 @@ func (i Ident) Singularize() Ident {
 	singularMoot.RLock()
 	defer singularMoot.RUnlock()
 
+	if p, ok := pluralToSingle[i.Original]; ok {
+		return i.ReplaceSuffix(i.Original, p)
+	}
+	if _, ok := singleToPlural[i.Original]; ok {
+		return i
+	}
+
 	ls := strings.ToLower(s)
 	if p, ok := pluralToSingle[ls]; ok {
 		if s == Capitalize(s) {
@@ -48,7 +55,7 @@ func (i Ident) Singularize() Ident {
 	}
 
 	for _, r := range singularRules {
-		if strings.HasSuffix(ls, r.suffix) {
+		if strings.HasSuffix(s, r.suffix) {
 			return i.ReplaceSuffix(s, r.fn(s))
 		}
 	}

theunrepentantgeek · 2023-02-20T19:36:36Z

That's a much simpler fix. Looks good to me!

Feel free to close this PR.

sio4 · 2023-02-23T10:26:04Z

Thanks for confirmation @theunrepentantgeek. Filed #66 and will close this.

theunrepentantgeek added 2 commits February 13, 2023 11:53

Add edge cases

b5e706c

Potential fix

92b9c6f

theunrepentantgeek mentioned this pull request Feb 13, 2023

Fix naming issues Azure/azure-service-operator#2722

Merged

1 task

sio4 self-assigned this Feb 13, 2023

theunrepentantgeek added 5 commits February 16, 2023 14:58

Reverse previous changes to ReplaceSuffix()

9a07dc1

Preserve user case for acronyms

fa63d1c

Update tests for preservation of acronym case

6a9193c

Add tests for expected behaviour of acronyms

8462d4c

Implement rules for acronyms

5c9f479

sio4 added bug Something isn't working s: accepted was accepted or confirmed breaking change This feature/fix introduces breaking changes labels Feb 17, 2023

sio4 mentioned this pull request Feb 23, 2023

fixed incorrect singularize/pluralize for abbreviations #66

Merged

sio4 closed this Feb 23, 2023

theunrepentantgeek deleted the fix/acronym-doubling branch February 25, 2023 01:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix for Case Correction of Acronyms causing doubling#65

Fix for Case Correction of Acronyms causing doubling#65
theunrepentantgeek wants to merge 7 commits into
gobuffalo:mainfrom
theunrepentantgeek:fix/acronym-doubling

theunrepentantgeek commented Feb 12, 2023 •

edited

Loading

Uh oh!

sio4 commented Feb 13, 2023

Uh oh!

theunrepentantgeek commented Feb 13, 2023

Uh oh!

theunrepentantgeek commented Feb 16, 2023

Uh oh!

sio4 commented Feb 16, 2023

Uh oh!

sio4 commented Feb 16, 2023

Uh oh!

theunrepentantgeek commented Feb 16, 2023

Uh oh!

theunrepentantgeek commented Feb 16, 2023

Uh oh!

sio4 commented Feb 17, 2023

Uh oh!

theunrepentantgeek commented Feb 20, 2023

Uh oh!

sio4 commented Feb 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

theunrepentantgeek commented Feb 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is being done in this PR?

What are the main choices made to get to this solution?

What was discovered while working on it? (Optional)

List the manual test cases you've covered before sending this PR:

Uh oh!

sio4 commented Feb 13, 2023

Uh oh!

theunrepentantgeek commented Feb 13, 2023

Uh oh!

theunrepentantgeek commented Feb 16, 2023

Uh oh!

sio4 commented Feb 16, 2023

Uh oh!

sio4 commented Feb 16, 2023

Uh oh!

theunrepentantgeek commented Feb 16, 2023

Uh oh!

theunrepentantgeek commented Feb 16, 2023

Uh oh!

sio4 commented Feb 17, 2023

Uh oh!

theunrepentantgeek commented Feb 20, 2023

Uh oh!

sio4 commented Feb 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

theunrepentantgeek commented Feb 12, 2023 •

edited

Loading