Skip to content

Fix for Case Correction of Acronyms causing doubling#65

Closed
theunrepentantgeek wants to merge 7 commits into
gobuffalo:mainfrom
theunrepentantgeek:fix/acronym-doubling
Closed

Fix for Case Correction of Acronyms causing doubling#65
theunrepentantgeek wants to merge 7 commits into
gobuffalo:mainfrom
theunrepentantgeek:fix/acronym-doubling

Conversation

@theunrepentantgeek

@theunrepentantgeek theunrepentantgeek commented Feb 12, 2023

Copy link
Copy Markdown

What is being done in this PR?

I discovered an edge case in the interaction of two features of flect.

If your string matches a known acronym, it's case-corrected to all upper-case.

If the end of your string also matches one of the rules for pluralization, flect attempts to replace the end of the string.

However, the suffix-to-remove comes from the original string, not the case corrected part - so the call to strings.TrimSuffix() doesn't remove the old suffix.

The results are a little odd. The two examples I found, when calling flect.Pluralize() are:

Input Expected Actual
ssh SSHes sshSSH
eia EIAs Eium

What are the main choices made to get to this solution?

The first commit adds two test cases to show the bug in action. I'm not 100% convinced that the plural's I've used are correct, but a unit-test at least makes the bug easy to verify.

The second commit makes the suffix-removing behaviour of Ident.ReplaceSuffix() case insensitive; this seems to fix the bug without breaking anything else.

After discussion, I've implemented preferred rules; this required a little bit of work to avoid inadvertently using the non-acronym rules at the wrong time. E.g. the Singular form of EIA is not EIUM.

What was discovered while working on it? (Optional)

I discovered the design choice that acronyms are always converted to upper case, which surprised me - I expected flect to preserve letter case (e.g. I expected flect.Pluralize("html") to return either `"htmls"' or '"html"' but not '"HTML"').

List the manual test cases you've covered before sending this PR:

I've reproduced my manual test cases as unit tests in flect_test.go

@sio4

sio4 commented Feb 13, 2023

Copy link
Copy Markdown
Member

Hi @theunrepentantgeek,
Thank you for filing the PR.

Indeed, the current behavior is not correct and it should be fixed!

However, I have a few different points of view.

First of all, Flect cannot be a general inflection module from the natural language perspective. It should be limited to specific use cases and we need to understand the limitation.

You fixed the wrong behavior by fixing Ident.ReplaceSuffix(), but in my opinion, the root cause of the incorrect behavior is the following finding you already mentioned:

If your string matches a known acronym, it's case-corrected to all upper-case.

I don't think this is good behavior. The issue started here, and I think it is the user's choice if they use SSH or just lowercase ssh. We should not ignore the user's intention.

I discovered the design choice that acronyms are always converted to upper case, which surprised me - I expected flect to preserve letter case (e.g. I expected flect.Pluralize("html") to return either `"htmls"' or '"html"' but not '"HTML"').

Exactly!

So I would like to fix it as:

  • we do not convert anything to upper case or another, just preserve the user's input as is.
  • Just treat abbreviations as a special case when we run pluralize or singularize, and the rule could be just adding a single 's' to simplify the rule. (I'd like to follow the rule in https://editorsmanual.com/articles/plurals-of-abbreviations/)

Actually, there was heavy work on #62 a few weeks ago, and after that, my plan for this module became to make the purpose of this module clearer and simpler, then fix things aligned with the goal.

Even though PR should be a space for reviewing not for discussion, I would love to hear your opinion or updated PR, also I would love to discuss it in a separate issue if you want to continue to discuss the topic further.

@theunrepentantgeek

Copy link
Copy Markdown
Author

Thanks for the quick response @sio4.

I started writing an issue and realized the easiest way to demonstrate the issue was with a snippet of code, hence the PR.

The behavour you've described makes sense to me; I'll have a go at doing that and will update this PR.

@theunrepentantgeek

Copy link
Copy Markdown
Author

@sio4 I've updated the PR - it includes tests, and I've linked to that doc you referenced so future developers know why things are done this way.

@sio4

sio4 commented Feb 16, 2023

Copy link
Copy Markdown
Member

Thanks! Will take a look as soon as possible!

@sio4

sio4 commented Feb 16, 2023

Copy link
Copy Markdown
Member

Hi, I took a look at the PR and it seems like there could be a compatibility issue. (not to uppercase them all)

This is my fault, I thought/think we should preserve the user's input as-is but I need to check if this change will break its users. Flect is mainly used for runtime automation for keywords (mostly DBMS column names, model names, etc) and this behavior change will cause compatibility issues with existing users.

For the behavior change, we need to consider it deeper later and need to decide whether to fix it or keep it, but now we need to keep it the same. Actually, we completed a milestone and I am about to develop the next plan with breaking changes, so this issue could be added to the TODO items.

Sorry for the time to fix the code.

By the way, what is the original issue that you wanted to fix? I would like to help you with the particular issue.

@theunrepentantgeek

Copy link
Copy Markdown
Author

By the way, what is the original issue that you wanted to fix?

flect.Singularize("ssh") returns sshSSH - doubling the text, which is quite incorrect.
This one tripped me up when testing my project.

Flect is mainly used for runtime automation for keywords (mostly DBMS column names, model names, etc)

FWIW, this is similar to my use case - I'm working on a code generator which consumes OpenAPI definitions, generating Go code.

I totally understand your reservations about breaking-changes. 😃

@theunrepentantgeek

Copy link
Copy Markdown
Author

What about providing an option to retain the users letter case for acronyms, but with a default that preserves the current behaviour for existing users?

One way to achieve this would be to extend flect.New() with a varidic Option parameter, allowing customization.

This would mean flect.New("foo") would retain the existing behaviour, but flect.New("foo", flect.PreserveCase) would have the new behaviour.

Not actually sure if this is a good approach, but figured the idea was worth sharing. What do you think?

@sio4

sio4 commented Feb 17, 2023

Copy link
Copy Markdown
Member

Thanks for your kind response and suggestions! I really love to hear that!

By the way, what is the original issue that you wanted to fix?

flect.Singularize("ssh") returns sshSSH - doubling the text, which is quite incorrect. This one tripped me up when testing my project.

Indeed, that should be fixed regardless of the design issue.

I totally understand your reservations about breaking-changes. smiley

Thanks for your kind understanding. The design issue should be fixed in the next major version update. (with breaking changes)

What about providing an option to retain the users letter case for acronyms, but with a default that preserves the current behaviour for existing users?

One way to achieve this would be to extend flect.New() with a varidic Option parameter, allowing customization.

This would mean flect.New("foo") would retain the existing behaviour, but flect.New("foo", flect.PreserveCase) would have the new behaviour.

This could be nice, but since the users may use Pluralize and Singularize directly, we need to option support to those functions too in case of if we take this approach. It could be a bigger task.

How about the following fix? It will cover your issue:

  • if strings.HasSuffix(s, r.suffix) (using s instead of lowered ls) is the key to your issue
  • additional block will cover IDs and IDSes

As a side note, we used the term acronyms for both acronyms and abbreviation/initial and it is also problematic. It should be addressed in the next version too. (if we keep the concept of the module wider)

diff --git a/flect_test.go b/flect_test.go
index 848d77a..4ab6251 100644
--- a/flect_test.go
+++ b/flect_test.go
@@ -322,6 +322,28 @@ var singlePluralAssertions = []dict{
 	{"waste", "wastes", true, true},
 	{"psi", "psis", true, true},
 	{"pepsi", "pepsis", true, true},
+
+	// Acronyms
+	{"widget_uuid", "widget_uuids", true, true},
+	{"WidgetUUID", "WidgetUUIDs", true, true},
+	{"widgetUUID", "widgetUUIDs", true, true},
+	{"widgetUuid", "widgetUuids", true, true},
+	{"widget_UUID", "widget_UUIDs", true, true},
+
+	{"ID", "IDs", true, true},
+	{"IDS", "IDSes", true, true},
+	// id to ids (ID), ids to idses (IDS) is not supported
+	{"api", "apis", true, true},
+	{"API", "APIs", true, true},
+	{"html", "htmls", true, true},
+	{"HTML", "HTMLs", true, true},
+	{"FYI", "FYIs", true, true},
+	{"LAN", "LANs", true, true},
+	{"ssh", "sshs", true, true}, // sh
+	{"SSH", "SSHs", true, true},
+	{"eia", "eias", true, true}, // ia
+	{"EIA", "EIAs", true, true},
+	{"DNS", "DNSes", true, true},
 }
 
 func init() {
diff --git a/plural_rules.go b/plural_rules.go
index 53c6081..8f3df40 100644
--- a/plural_rules.go
+++ b/plural_rules.go
@@ -202,6 +202,8 @@ var dictionary = []word{
 	{singular: "shoe", plural: "shoes"},
 	{singular: "toe", plural: "toes", exact: true},
 	{singular: "graffiti", plural: "graffiti"},
+
+	{singular: "ID", plural: "IDs", exact: true},
 }
 
 // singleToPlural is the highest priority map for Pluralize().
@@ -366,6 +368,8 @@ var singularToPluralSuffixList = []singularToPluralSuffix{
 	{"o", "oes"},
 	{"x", "xes"},
 
+	{"S", "Ses"},
+
 	// excluded rules: seems rare
 	// Words from Hebrew that add -im or -ot (eg, cherub becomes cherubim)
 	// - cherub (cherubs or cherubim), seraph (seraphs or seraphim)
diff --git a/pluralize.go b/pluralize.go
index a983f72..ad640e5 100644
--- a/pluralize.go
+++ b/pluralize.go
@@ -38,6 +38,13 @@ func (i Ident) Pluralize() Ident {
 	pluralMoot.RLock()
 	defer pluralMoot.RUnlock()
 
+	if p, ok := singleToPlural[i.Original]; ok {
+		return i.ReplaceSuffix(i.Original, p)
+	}
+	if _, ok := pluralToSingle[i.Original]; ok {
+		return i
+	}
+
 	ls := strings.ToLower(s)
 	if _, ok := pluralToSingle[ls]; ok {
 		return i
@@ -51,7 +58,7 @@ func (i Ident) Pluralize() Ident {
 	}
 
 	for _, r := range pluralRules {
-		if strings.HasSuffix(ls, r.suffix) {
+		if strings.HasSuffix(s, r.suffix) {
 			return i.ReplaceSuffix(s, r.fn(s))
 		}
 	}
diff --git a/singularize.go b/singularize.go
index 9993d65..9e592dd 100644
--- a/singularize.go
+++ b/singularize.go
@@ -35,6 +35,13 @@ func (i Ident) Singularize() Ident {
 	singularMoot.RLock()
 	defer singularMoot.RUnlock()
 
+	if p, ok := pluralToSingle[i.Original]; ok {
+		return i.ReplaceSuffix(i.Original, p)
+	}
+	if _, ok := singleToPlural[i.Original]; ok {
+		return i
+	}
+
 	ls := strings.ToLower(s)
 	if p, ok := pluralToSingle[ls]; ok {
 		if s == Capitalize(s) {
@@ -48,7 +55,7 @@ func (i Ident) Singularize() Ident {
 	}
 
 	for _, r := range singularRules {
-		if strings.HasSuffix(ls, r.suffix) {
+		if strings.HasSuffix(s, r.suffix) {
 			return i.ReplaceSuffix(s, r.fn(s))
 		}
 	}

@sio4 sio4 added bug Something isn't working s: accepted was accepted or confirmed breaking change This feature/fix introduces breaking changes labels Feb 17, 2023
@theunrepentantgeek

Copy link
Copy Markdown
Author

That's a much simpler fix. Looks good to me!

Feel free to close this PR.

@sio4

sio4 commented Feb 23, 2023

Copy link
Copy Markdown
Member

Thanks for confirmation @theunrepentantgeek. Filed #66 and will close this.

@sio4 sio4 closed this Feb 23, 2023
@theunrepentantgeek theunrepentantgeek deleted the fix/acronym-doubling branch February 25, 2023 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking change This feature/fix introduces breaking changes bug Something isn't working s: accepted was accepted or confirmed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants