Skip to content

Add latin mode to slugify#6509

Merged
jekyllbot merged 9 commits intojekyll:masterfrom
alextsui05:transliterate-on-slugify-ascii
Nov 3, 2017
Merged

Add latin mode to slugify#6509
jekyllbot merged 9 commits intojekyll:masterfrom
alextsui05:transliterate-on-slugify-ascii

Conversation

@alextsui05
Copy link
Copy Markdown
Contributor

@alextsui05 alextsui05 commented Nov 1, 2017

This is in response to #4623

To summarize the discussion in the above issue, trying to slugify strings with Swedish letters in them (accented characters in general) produced unexpected output. ascii mode does the job but drops accented characters, which is probably undesirable to some users. So we add latin mode, which will transliterate accented characters to the ASCII equivalent.

Example:

{{ 'kvalité, då, äta, öl' | slugify: 'ascii' }} # -> kvalit-d-ta-l
{{ 'kvalité, då, äta, öl' | slugify: 'latin' }} # -> kvalite-da-ata-ol

Changes

  • Add a new latin mode to be used as an option for #slugify
  • Add runtime dependency on i18n gem to perform the transliteration
  • Update the documentation to demonstrate latin mode, and also ascii mode, which existed but was not documented

Documentation screenshots

vmplayer_2017-11-02_18-04-38

vmplayer_2017-11-02_18-05-00

@DirtyF DirtyF requested a review from a team November 1, 2017 09:00
@ashmaroli
Copy link
Copy Markdown
Member

Hello @alextsui05 from a first-pass, it looks like you're calling I18n::transliterate instead of ActiveSupport::Inflector::transliterate. Do correct me if I'm wrong, but I like using I18n directly so that we need not pull in the entire ActiveSupport library but can instead depend on gem "i18n" directly.

That said, from the failing CI logs, it looks like the intended results are being delivered, but for the maintainers to accept this, they'd like to have the tests passing and have this change documented because suddenly users of the :ascii mode are going to find their slugs different.. ergo, a broken-link and the 404 fiasco..

@pathawks
Copy link
Copy Markdown
Member

pathawks commented Nov 1, 2017

because suddenly users of the :ascii mode are going to find their slugs different.. ergo, a broken-link and the 404 fiasco..

Which makes this a major change rather than an enhancement, correct?

@ashmaroli
Copy link
Copy Markdown
Member

Which makes this a major change rather than an enhancement, correct?

As of now, yes, but if the author proposes a new mode altogether, then it can still be an enhancement..

@ashmaroli
Copy link
Copy Markdown
Member

@alextsui05 You'll need to make a couple of changes:

  • Jekyll uses double-quotes everywhere
  • You may want to consider introducing a new slugify mode so that existing :ascii mode slugs do not break. (I'm thinking maybe, :ascii_plus, is nice, but its open to your creativity..)

Gemfile Outdated

# Dependency of jekyll-mentions. RubyGems in Ruby 2.1 doesn't shield us from this.
gem "activesupport", "~> 4.2", :groups => [:test_legacy, :site] if RUBY_VERSION < "2.2.2"
gem 'activesupport', '~> 4.2', :groups => [:test_legacy, :site] if RUBY_VERSION < "2.2.2"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our Rubocop config is set to double quotes

@alextsui05
Copy link
Copy Markdown
Contributor Author

I realized active_support is a heavy requirement, so I took @ashmaroli 's suggestion and updated the code to use i18n (+ updated the test).

The more I think about it, maybe it would be better for compatibility + clearer to define a new mode that is more explicit about what it does -- maybe slugify: 'latin' gives a good hint to the user.

I18n.transliterate('á, à, ç, é, è, í, ï, ó, ò, ú, ü, n·h, s·h.')
 => "a, a, c, e, e, i, i, o, o, u, u, n?h, s?h."

I'll make another pass at the code at the end of the day to address the style issues too. Thanks for the help.

@alextsui05 alextsui05 changed the title Add transliterate step on slugify with ascii mode Add latin mode to slugify Nov 2, 2017
@alextsui05
Copy link
Copy Markdown
Contributor Author

I updated the PR to make it clear that we are adding a new latin mode rather than modifying the ascii mode. The CI tests are still winding down but I believe style issues are all ironed out.

# See Utils#slugify for a description of the character sequence specified
# by each mode.
private
def replace_character_sequence_with_hyphen(string, mode:)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a big fan of the long name :replace_character_sequence_with_hyphen but its a private method and the name expresses the method's intention.. so its okay I guess..

But, how about making the :default mode the default parameter?

private
def replace_character_sequence_with_hyphen(string, mode: "default")
  replaceable_char =
    case mode
    when "raw"
      SLUGIFY_RAW_REGEXP
    when "pretty"
      # "._~!$&'()+,;=@" is human readable (not URI-escaped) in URL
      # and is allowed in both extN and NTFS.
      SLUGIFY_PRETTY_REGEXP
    when "ascii"
      # For web servers not being able to handle Unicode, the safe
      # method is to ditch anything else but latin letters and numeric
      # digits.
      SLUGIFY_ASCII_REGEXP
    else
      SLUGIFY_DEFAULT_REGEXP
    end

  string.gsub(replaceable_char, "-")
end

(and effectively reduce the total no.of case branches`)

Copy link
Copy Markdown

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but i like @ashmaroli's suggestion

Copy link
Copy Markdown
Member

@DirtyF DirtyF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

La France 🇫🇷 🥖 🧀 thanks you for this new latin mode @alextsui05 ❤️❤️ ❤️

@alextsui05
Copy link
Copy Markdown
Contributor Author

I took @ashmaroli 's advice and simplified the switch statement.

Thanks all for the reviews and help!

@ashmaroli
Copy link
Copy Markdown
Member

@alextsui05 Thank you very much! This is already an awesome First-Contribution to Jekyll! 🎉
Congratulations in advance! 😃

Wanna take it to a higher level?
If yes, you could document this new mode by editing this section and table right above


On a side-note, if you've the time, you can open a new PR documenting the ascii mode

Copy link
Copy Markdown
Member

@mattr- mattr- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! Thanks for your work on this!

I do have two small changes that I request you make before I feel this is ready to merge.

# slugify("The _config.yml file", "ascii")
# # => "the-config.yml-file"
#
# slugify("The _config.yml file", "latin")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to see the example here demonstrate the difference between the ascii mode and the latin mode. This will make it clearer as to why the latin mode exists.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattr- Yeah, I thought so, too. To really demonstrate the difference, the input string would need an accented character, but when I tried that, one of the builds failed with a note about non-ASCII characters disallowed in comments. I would point out the failed build but I'm not sure how to go through the build history for this PR in TravisCI.


# Strip according to the mode
slug = string.gsub(re, "-")
slug = replace_character_sequence_with_hyphen(string, :mode => mode)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the symbol syntax would be better here, so that you're consistent with the rest of the codebase.

mode: mode

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattr- Another style error in the build came up here as well, something about using hash rocket syntax, so I wrote it this way.

@mattr-
Copy link
Copy Markdown
Member

mattr- commented Nov 2, 2017

Related to my above review comment, I would prefer that the documentation work for the latin filter be done as part of this PR as well. The documentation for the ascii mode isn't required since it's an existing part of the codebase.

@alextsui05
Copy link
Copy Markdown
Contributor Author

@mattr- Thanks, I'll update the documentation in the docs/ subdirectory after the work day is over today. Also, I responded to the review comments about what I ran into, so if you could advise me on what to do there, I'll also take care of those changes.

@mattr-
Copy link
Copy Markdown
Member

mattr- commented Nov 2, 2017

@alextsui05 If making those changes would cause the build to fail, then let's not do those right now. I think that's on us as maintainers to fix those. 😃

Looking forward to seeing the new documentation and getting this merged soon! Thanks!
bitmoji

@ashmaroli ashmaroli added this to the v3.7.0 milestone Nov 2, 2017
@ashmaroli
Copy link
Copy Markdown
Member

I think its best that you make only changes related to the latin mode in this PR.. everything else in a separate PR..

@alextsui05
Copy link
Copy Markdown
Contributor Author

@mattr- I just pushed a documentation update that includes a description of the new latin mode. While I was at it, I also added a description for ascii mode that @ashmaroli mentioned -- it was existing but was not actually in the documentation.

I posted screenshots on the main body of the PR for the sections that changed. Once that's been finalized, I'll also update the table in the link @ashmaroli posted in an earlier comment.

@ashmaroli
Copy link
Copy Markdown
Member

I'll also update the table in the link

@alextsui05 Don't worry about it. The "table" has already been updated with your latest commit..

@ashmaroli
Copy link
Copy Markdown
Member

it was existing but was not actually in the documentation.

again, that should be handled in a separate PR..

@alextsui05
Copy link
Copy Markdown
Contributor Author

Okay, let me split out the ascii part of the documentation into a separate commit and put it in a new PR.

@mattr-
Copy link
Copy Markdown
Member

mattr- commented Nov 3, 2017

Okay, let me split out the ascii part of the documentation into a separate commit and put it in a new PR.

Please don't. That's just churn for no good reason.

Copy link
Copy Markdown
Member

@pathawks pathawks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work you put into this! Really appreciate all the polish 👍

@ashmaroli
Copy link
Copy Markdown
Member

That's just churn for no good reason.

¯\_(ツ)_/¯

😆

@alextsui05
Copy link
Copy Markdown
Contributor Author

On a side-note, if you've the time, you can open a new PR documenting the ascii mode

@ashmaroli Sorry, I remembered the part about documenting ascii mode but forgot the part about in a new PR (this gets me at work, too). I'll keep it in mind for the next PR!

@mattr-
Copy link
Copy Markdown
Member

mattr- commented Nov 3, 2017

I see no reason for the appveyor build to fail since all I did was merge in master so that we can merge this, so away we go!

@jekyllbot: merge +minor

@alextsui05 Thank you again for making the changes we've requested and congratulations on your first contribution to Jekyll!
bitmoji

@jekyllbot jekyllbot merged commit 93e3eb0 into jekyll:master Nov 3, 2017
jekyllbot added a commit that referenced this pull request Nov 3, 2017
@DirtyF DirtyF mentioned this pull request Nov 12, 2017
2 tasks
@jekyll jekyll locked and limited conversation to collaborators Jul 12, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants