Add latin mode to slugify by alextsui05 · Pull Request #6509 · jekyll/jekyll

alextsui05 · 2017-11-01T05:44:07Z

This is in response to #4623

To summarize the discussion in the above issue, trying to slugify strings with Swedish letters in them (accented characters in general) produced unexpected output. ascii mode does the job but drops accented characters, which is probably undesirable to some users. So we add latin mode, which will transliterate accented characters to the ASCII equivalent.

Example:

{{ 'kvalité, då, äta, öl' | slugify: 'ascii' }} # -> kvalit-d-ta-l
{{ 'kvalité, då, äta, öl' | slugify: 'latin' }} # -> kvalite-da-ata-ol

Changes

Add a new latin mode to be used as an option for #slugify
Add runtime dependency on i18n gem to perform the transliteration
Update the documentation to demonstrate latin mode, and also ascii mode, which existed but was not documented

Documentation screenshots

ashmaroli · 2017-11-01T09:13:19Z

Hello @alextsui05 from a first-pass, it looks like you're calling I18n::transliterate instead of ActiveSupport::Inflector::transliterate. Do correct me if I'm wrong, but I like using I18n directly so that we need not pull in the entire ActiveSupport library but can instead depend on gem "i18n" directly.

That said, from the failing CI logs, it looks like the intended results are being delivered, but for the maintainers to accept this, they'd like to have the tests passing and have this change documented because suddenly users of the :ascii mode are going to find their slugs different.. ergo, a broken-link and the 404 fiasco..

pathawks · 2017-11-01T14:03:07Z

because suddenly users of the :ascii mode are going to find their slugs different.. ergo, a broken-link and the 404 fiasco..

Which makes this a major change rather than an enhancement, correct?

ashmaroli · 2017-11-01T14:07:27Z

Which makes this a major change rather than an enhancement, correct?

As of now, yes, but if the author proposes a new mode altogether, then it can still be an enhancement..

ashmaroli · 2017-11-01T14:40:22Z

@alextsui05 You'll need to make a couple of changes:

Jekyll uses double-quotes everywhere
You may want to consider introducing a new slugify mode so that existing :ascii mode slugs do not break. (I'm thinking maybe, :ascii_plus, is nice, but its open to your creativity..)

DirtyF · 2017-11-01T14:41:46Z

Gemfile


 # Dependency of jekyll-mentions. RubyGems in Ruby 2.1 doesn't shield us from this.
-gem "activesupport", "~> 4.2", :groups => [:test_legacy, :site] if RUBY_VERSION < "2.2.2"
+gem 'activesupport', '~> 4.2', :groups => [:test_legacy, :site] if RUBY_VERSION < "2.2.2"


Our Rubocop config is set to double quotes

alextsui05 · 2017-11-01T14:43:31Z

I realized active_support is a heavy requirement, so I took @ashmaroli 's suggestion and updated the code to use i18n (+ updated the test).

The more I think about it, maybe it would be better for compatibility + clearer to define a new mode that is more explicit about what it does -- maybe slugify: 'latin' gives a good hint to the user.

I18n.transliterate('á, à, ç, é, è, í, ï, ó, ò, ú, ü, n·h, s·h.')
 => "a, a, c, e, e, i, i, o, o, u, u, n?h, s?h."

I'll make another pass at the code at the end of the day to address the style issues too. Thanks for the help.

alextsui05 · 2017-11-02T05:07:07Z

I updated the PR to make it clear that we are adding a new latin mode rather than modifying the ascii mode. The CI tests are still winding down but I believe style issues are all ironed out.

ashmaroli · 2017-11-02T10:18:43Z

lib/jekyll/utils.rb

+    # See Utils#slugify for a description of the character sequence specified
+    # by each mode.
+    private
+    def replace_character_sequence_with_hyphen(string, mode:)


not a big fan of the long name :replace_character_sequence_with_hyphen but its a private method and the name expresses the method's intention.. so its okay I guess..

But, how about making the :default mode the default parameter?

private def replace_character_sequence_with_hyphen(string, mode: "default") replaceable_char = case mode when "raw" SLUGIFY_RAW_REGEXP when "pretty" # "._~!$&'()+,;=@" is human readable (not URI-escaped) in URL # and is allowed in both extN and NTFS. SLUGIFY_PRETTY_REGEXP when "ascii" # For web servers not being able to handle Unicode, the safe # method is to ditch anything else but latin letters and numeric # digits. SLUGIFY_ASCII_REGEXP else SLUGIFY_DEFAULT_REGEXP end string.gsub(replaceable_char, "-") end

(and effectively reduce the total no.of case branches`)

ghost

LGTM but i like @ashmaroli's suggestion

DirtyF

La France 🇫🇷 🥖 🧀 thanks you for this new latin mode @alextsui05 ❤️❤️ ❤️

alextsui05 · 2017-11-02T13:24:52Z

I took @ashmaroli 's advice and simplified the switch statement.

Thanks all for the reviews and help!

ashmaroli · 2017-11-02T13:42:22Z

@alextsui05 Thank you very much! This is already an awesome First-Contribution to Jekyll! 🎉
Congratulations in advance! 😃

Wanna take it to a higher level?
If yes, you could document this new mode by editing this section and table right above

On a side-note, if you've the time, you can open a new PR documenting the ascii mode

mattr-

This is looking great! Thanks for your work on this!

I do have two small changes that I request you make before I feel this is ready to merge.

mattr- · 2017-11-02T13:52:25Z

lib/jekyll/utils.rb

    #   slugify("The _config.yml file", "ascii")
    #   # => "the-config.yml-file"
    #
+    #   slugify("The _config.yml file", "latin")


I think it would be better to see the example here demonstrate the difference between the ascii mode and the latin mode. This will make it clearer as to why the latin mode exists.

@mattr- Yeah, I thought so, too. To really demonstrate the difference, the input string would need an accented character, but when I tried that, one of the builds failed with a note about non-ASCII characters disallowed in comments. I would point out the failed build but I'm not sure how to go through the build history for this PR in TravisCI.

mattr- · 2017-11-02T13:57:04Z

lib/jekyll/utils.rb


-      # Strip according to the mode
-      slug = string.gsub(re, "-")
+      slug = replace_character_sequence_with_hyphen(string, :mode => mode)


Using the symbol syntax would be better here, so that you're consistent with the rest of the codebase.

mode: mode

@mattr- Another style error in the build came up here as well, something about using hash rocket syntax, so I wrote it this way.

mattr- · 2017-11-02T14:05:32Z

Related to my above review comment, I would prefer that the documentation work for the latin filter be done as part of this PR as well. The documentation for the ascii mode isn't required since it's an existing part of the codebase.

alextsui05 · 2017-11-02T14:15:47Z

@mattr- Thanks, I'll update the documentation in the docs/ subdirectory after the work day is over today. Also, I responded to the review comments about what I ran into, so if you could advise me on what to do there, I'll also take care of those changes.

mattr- · 2017-11-02T14:26:41Z

@alextsui05 If making those changes would cause the build to fail, then let's not do those right now. I think that's on us as maintainers to fix those. 😃

Looking forward to seeing the new documentation and getting this merged soon! Thanks!

ashmaroli · 2017-11-03T02:11:16Z

I think its best that you make only changes related to the latin mode in this PR.. everything else in a separate PR..

alextsui05 · 2017-11-03T02:14:09Z

@mattr- I just pushed a documentation update that includes a description of the new latin mode. While I was at it, I also added a description for ascii mode that @ashmaroli mentioned -- it was existing but was not actually in the documentation.

I posted screenshots on the main body of the PR for the sections that changed. Once that's been finalized, I'll also update the table in the link @ashmaroli posted in an earlier comment.

ashmaroli · 2017-11-03T02:27:27Z

I'll also update the table in the link

@alextsui05 Don't worry about it. The "table" has already been updated with your latest commit..

ashmaroli · 2017-11-03T02:44:59Z

it was existing but was not actually in the documentation.

again, that should be handled in a separate PR..

alextsui05 · 2017-11-03T02:45:49Z

Okay, let me split out the ascii part of the documentation into a separate commit and put it in a new PR.

mattr- · 2017-11-03T02:46:50Z

Okay, let me split out the ascii part of the documentation into a separate commit and put it in a new PR.

Please don't. That's just churn for no good reason.

pathawks

Thanks for all the work you put into this! Really appreciate all the polish 👍

ashmaroli · 2017-11-03T02:58:19Z

That's just churn for no good reason.

¯\_(ツ)_/¯

😆

alextsui05 · 2017-11-03T03:03:12Z

On a side-note, if you've the time, you can open a new PR documenting the ascii mode

@ashmaroli Sorry, I remembered the part about documenting ascii mode but forgot the part about in a new PR (this gets me at work, too). I'll keep it in mind for the next PR!

mattr- · 2017-11-03T03:07:25Z

I see no reason for the appveyor build to fail since all I did was merge in master so that we can merge this, so away we go!

@jekyllbot: merge +minor

@alextsui05 Thank you again for making the changes we've requested and congratulations on your first contribution to Jekyll!

Add transliterate step on slugify with ascii mode

73a0271

alextsui05 mentioned this pull request Nov 1, 2017

Slugify a string doesn't seem to work on Unicode/Swedish letters #4623

Closed

Merge branch 'master' into transliterate-on-slugify-ascii

498f1fa

DirtyF requested a review from a team November 1, 2017 09:00

DirtyF added the enhancement label Nov 1, 2017

Use i18n for transliterate + update test

25dd4cd

DirtyF reviewed Nov 1, 2017

View reviewed changes

Add latin mode to slugify

a5cfbb1

alextsui05 changed the title ~~Add transliterate step on slugify with ascii mode~~ Add latin mode to slugify Nov 2, 2017

Fix style issues

d122ec6

ashmaroli reviewed Nov 2, 2017

View reviewed changes

ghost approved these changes Nov 2, 2017

View reviewed changes

DirtyF approved these changes Nov 2, 2017

View reviewed changes

Simplify switch statement

7d2b95e

mattr- requested changes Nov 2, 2017

View reviewed changes

ashmaroli added this to the v3.7.0 milestone Nov 2, 2017

alextsui05 added 2 commits November 2, 2017 19:05

Correct example in slugify comment

02d653b

Update slugify documentation

f057c14

mattr- approved these changes Nov 3, 2017

View reviewed changes

Merge branch 'master' into transliterate-on-slugify-ascii

a94858b

pathawks approved these changes Nov 3, 2017

View reviewed changes

jekyllbot merged commit 93e3eb0 into jekyll:master Nov 3, 2017

jekyllbot added the enhancement label Nov 3, 2017

jekyllbot added a commit that referenced this pull request Nov 3, 2017

Update history to reflect merge of #6509 [ci skip]

fc272d4

DirtyF mentioned this pull request Nov 12, 2017

slugify option 'latin' unreleased #6549

Closed

2 tasks

pathawks mentioned this pull request Feb 20, 2018

Do not hardcode locale unless certainly necessary #6791

Merged

jekyll locked and limited conversation to collaborators Jul 12, 2019

jekyllbot added the frozen-due-to-age label Jul 12, 2019

Uh oh!

Conversation

alextsui05 commented Nov 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Example:

Changes

Documentation screenshots

Uh oh!

ashmaroli commented Nov 1, 2017

Uh oh!

pathawks commented Nov 1, 2017

Uh oh!

ashmaroli commented Nov 1, 2017

Uh oh!

ashmaroli commented Nov 1, 2017

Uh oh!

DirtyF Nov 1, 2017

Choose a reason for hiding this comment

Uh oh!

alextsui05 commented Nov 1, 2017

Uh oh!

alextsui05 commented Nov 2, 2017

Uh oh!

ashmaroli Nov 2, 2017

Choose a reason for hiding this comment

Uh oh!

ghost left a comment

Choose a reason for hiding this comment

Uh oh!

DirtyF left a comment

Choose a reason for hiding this comment

Uh oh!

alextsui05 commented Nov 2, 2017

Uh oh!

ashmaroli commented Nov 2, 2017

Uh oh!

mattr- left a comment

Choose a reason for hiding this comment

Uh oh!

mattr- Nov 2, 2017

Choose a reason for hiding this comment

Uh oh!

alextsui05 Nov 2, 2017

Choose a reason for hiding this comment

Uh oh!

mattr- Nov 2, 2017

Choose a reason for hiding this comment

Uh oh!

alextsui05 Nov 2, 2017

Choose a reason for hiding this comment

Uh oh!

mattr- commented Nov 2, 2017

Uh oh!

alextsui05 commented Nov 2, 2017

Uh oh!

mattr- commented Nov 2, 2017

Uh oh!

ashmaroli commented Nov 3, 2017

Uh oh!

alextsui05 commented Nov 3, 2017

Uh oh!

ashmaroli commented Nov 3, 2017

Uh oh!

ashmaroli commented Nov 3, 2017

Uh oh!

alextsui05 commented Nov 3, 2017

Uh oh!

mattr- commented Nov 3, 2017

Uh oh!

pathawks left a comment

Choose a reason for hiding this comment

Uh oh!

ashmaroli commented Nov 3, 2017

Uh oh!

alextsui05 commented Nov 3, 2017

Uh oh!

mattr- commented Nov 3, 2017

Uh oh!

Reviewers

Assignees

alextsui05 commented Nov 1, 2017 •

edited

Loading