Fix for Case Correction of Acronyms causing doubling#65
Fix for Case Correction of Acronyms causing doubling#65theunrepentantgeek wants to merge 7 commits into
Conversation
|
Hi @theunrepentantgeek, Indeed, the current behavior is not correct and it should be fixed! However, I have a few different points of view. First of all, Flect cannot be a general inflection module from the natural language perspective. It should be limited to specific use cases and we need to understand the limitation. You fixed the wrong behavior by fixing
I don't think this is good behavior. The issue started here, and I think it is the user's choice if they use
Exactly! So I would like to fix it as:
Actually, there was heavy work on #62 a few weeks ago, and after that, my plan for this module became to make the purpose of this module clearer and simpler, then fix things aligned with the goal. Even though PR should be a space for reviewing not for discussion, I would love to hear your opinion or updated PR, also I would love to discuss it in a separate issue if you want to continue to discuss the topic further. |
|
Thanks for the quick response @sio4. I started writing an issue and realized the easiest way to demonstrate the issue was with a snippet of code, hence the PR. The behavour you've described makes sense to me; I'll have a go at doing that and will update this PR. |
|
@sio4 I've updated the PR - it includes tests, and I've linked to that doc you referenced so future developers know why things are done this way. |
|
Thanks! Will take a look as soon as possible! |
|
Hi, I took a look at the PR and it seems like there could be a compatibility issue. (not to uppercase them all) This is my fault, I thought/think we should preserve the user's input as-is but I need to check if this change will break its users. Flect is mainly used for runtime automation for keywords (mostly DBMS column names, model names, etc) and this behavior change will cause compatibility issues with existing users. For the behavior change, we need to consider it deeper later and need to decide whether to fix it or keep it, but now we need to keep it the same. Actually, we completed a milestone and I am about to develop the next plan with breaking changes, so this issue could be added to the TODO items. Sorry for the time to fix the code. By the way, what is the original issue that you wanted to fix? I would like to help you with the particular issue. |
FWIW, this is similar to my use case - I'm working on a code generator which consumes OpenAPI definitions, generating Go code. I totally understand your reservations about breaking-changes. 😃 |
|
What about providing an option to retain the users letter case for acronyms, but with a default that preserves the current behaviour for existing users? One way to achieve this would be to extend This would mean Not actually sure if this is a good approach, but figured the idea was worth sharing. What do you think? |
|
Thanks for your kind response and suggestions! I really love to hear that!
Indeed, that should be fixed regardless of the design issue.
Thanks for your kind understanding. The design issue should be fixed in the next major version update. (with breaking changes)
This could be nice, but since the users may use How about the following fix? It will cover your issue:
As a side note, we used the term acronyms for both acronyms and abbreviation/initial and it is also problematic. It should be addressed in the next version too. (if we keep the concept of the module wider) diff --git a/flect_test.go b/flect_test.go
index 848d77a..4ab6251 100644
--- a/flect_test.go
+++ b/flect_test.go
@@ -322,6 +322,28 @@ var singlePluralAssertions = []dict{
{"waste", "wastes", true, true},
{"psi", "psis", true, true},
{"pepsi", "pepsis", true, true},
+
+ // Acronyms
+ {"widget_uuid", "widget_uuids", true, true},
+ {"WidgetUUID", "WidgetUUIDs", true, true},
+ {"widgetUUID", "widgetUUIDs", true, true},
+ {"widgetUuid", "widgetUuids", true, true},
+ {"widget_UUID", "widget_UUIDs", true, true},
+
+ {"ID", "IDs", true, true},
+ {"IDS", "IDSes", true, true},
+ // id to ids (ID), ids to idses (IDS) is not supported
+ {"api", "apis", true, true},
+ {"API", "APIs", true, true},
+ {"html", "htmls", true, true},
+ {"HTML", "HTMLs", true, true},
+ {"FYI", "FYIs", true, true},
+ {"LAN", "LANs", true, true},
+ {"ssh", "sshs", true, true}, // sh
+ {"SSH", "SSHs", true, true},
+ {"eia", "eias", true, true}, // ia
+ {"EIA", "EIAs", true, true},
+ {"DNS", "DNSes", true, true},
}
func init() {
diff --git a/plural_rules.go b/plural_rules.go
index 53c6081..8f3df40 100644
--- a/plural_rules.go
+++ b/plural_rules.go
@@ -202,6 +202,8 @@ var dictionary = []word{
{singular: "shoe", plural: "shoes"},
{singular: "toe", plural: "toes", exact: true},
{singular: "graffiti", plural: "graffiti"},
+
+ {singular: "ID", plural: "IDs", exact: true},
}
// singleToPlural is the highest priority map for Pluralize().
@@ -366,6 +368,8 @@ var singularToPluralSuffixList = []singularToPluralSuffix{
{"o", "oes"},
{"x", "xes"},
+ {"S", "Ses"},
+
// excluded rules: seems rare
// Words from Hebrew that add -im or -ot (eg, cherub becomes cherubim)
// - cherub (cherubs or cherubim), seraph (seraphs or seraphim)
diff --git a/pluralize.go b/pluralize.go
index a983f72..ad640e5 100644
--- a/pluralize.go
+++ b/pluralize.go
@@ -38,6 +38,13 @@ func (i Ident) Pluralize() Ident {
pluralMoot.RLock()
defer pluralMoot.RUnlock()
+ if p, ok := singleToPlural[i.Original]; ok {
+ return i.ReplaceSuffix(i.Original, p)
+ }
+ if _, ok := pluralToSingle[i.Original]; ok {
+ return i
+ }
+
ls := strings.ToLower(s)
if _, ok := pluralToSingle[ls]; ok {
return i
@@ -51,7 +58,7 @@ func (i Ident) Pluralize() Ident {
}
for _, r := range pluralRules {
- if strings.HasSuffix(ls, r.suffix) {
+ if strings.HasSuffix(s, r.suffix) {
return i.ReplaceSuffix(s, r.fn(s))
}
}
diff --git a/singularize.go b/singularize.go
index 9993d65..9e592dd 100644
--- a/singularize.go
+++ b/singularize.go
@@ -35,6 +35,13 @@ func (i Ident) Singularize() Ident {
singularMoot.RLock()
defer singularMoot.RUnlock()
+ if p, ok := pluralToSingle[i.Original]; ok {
+ return i.ReplaceSuffix(i.Original, p)
+ }
+ if _, ok := singleToPlural[i.Original]; ok {
+ return i
+ }
+
ls := strings.ToLower(s)
if p, ok := pluralToSingle[ls]; ok {
if s == Capitalize(s) {
@@ -48,7 +55,7 @@ func (i Ident) Singularize() Ident {
}
for _, r := range singularRules {
- if strings.HasSuffix(ls, r.suffix) {
+ if strings.HasSuffix(s, r.suffix) {
return i.ReplaceSuffix(s, r.fn(s))
}
} |
|
That's a much simpler fix. Looks good to me! Feel free to close this PR. |
|
Thanks for confirmation @theunrepentantgeek. Filed #66 and will close this. |
What is being done in this PR?
I discovered an edge case in the interaction of two features of
flect.If your string matches a known acronym, it's case-corrected to all upper-case.
If the end of your string also matches one of the rules for pluralization,
flectattempts to replace the end of the string.However, the suffix-to-remove comes from the original string, not the case corrected part - so the call to
strings.TrimSuffix()doesn't remove the old suffix.The results are a little odd. The two examples I found, when calling
flect.Pluralize()are:What are the main choices made to get to this solution?
The first commit adds two test cases to show the bug in action. I'm not 100% convinced that the plural's I've used are correct, but a unit-test at least makes the bug easy to verify.The second commit makes the suffix-removing behaviour ofIdent.ReplaceSuffix()case insensitive; this seems to fix the bug without breaking anything else.After discussion, I've implemented preferred rules; this required a little bit of work to avoid inadvertently using the non-acronym rules at the wrong time. E.g. the Singular form of
EIAis notEIUM.What was discovered while working on it? (Optional)
I discovered the design choice that acronyms are always converted to upper case, which surprised me - I expectedflectto preserve letter case (e.g. I expectedflect.Pluralize("html")to return either `"htmls"' or '"html"' but not '"HTML"').List the manual test cases you've covered before sending this PR:
I've reproduced my manual test cases as unit tests in
flect_test.go