fix(url,editor): locale-aware slug preview for German umlauts and other digraphs#76248
fix(url,editor): locale-aware slug preview for German umlauts and other digraphs#76248apermo wants to merge 2 commits intoWordPress:trunkfrom
Conversation
…raphs The slug preview in the block editor was wrong for locales that require digraph replacements rather than simple diacritic stripping. For de_DE, ä became 'a' instead of 'ae', ö became 'o' instead of 'oe', etc. This caused a data-integrity problem: when a user clicks the slug field to edit it (standard SEO workflow), the field was pre-filled with the wrong JS-generated value, permanently baking in wrong transliterations. Fix: add an optional locale param to cleanForSlug() in @wordpress/url, applying the same digraph rules as PHP remove_accents() before the removeAccents() call. The PostURL component reads getSite().language from the core store and passes it through. Locales covered (mirroring PHP formatting.php lines 1957-1989): - de_* (de_DE, de_CH, de_AT): ä->ae, ö->oe, ü->ue, ß->ss - da_DK: æ->ae, ø->oe, å->aa - ca: l·l->ll - sr_RS, bs_BA: Đ->DJ Fixes WordPress#12907
|
Warning: Type of PR label mismatch To merge this PR, it requires exactly 1 label indicating the type of PR. Other labels are optional and not being checked here.
Read more about Type labels in Gutenberg. Don't worry if you don't have the required permissions to add labels; the PR reviewer should be able to help with the task. |
|
Warning: Type of PR label mismatch To merge this PR, it requires exactly 1 label indicating the type of PR. Other labels are optional and not being checked here.
Read more about Type labels in Gutenberg. Don't worry if you don't have the required permissions to add labels; the PR reviewer should be able to help with the task. |
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the Unlinked AccountsThe following contributors have not linked their GitHub and WordPress.org accounts: @Dominic-t3ch, @IIM-Arvid, @Steve-Fenton, @anubisthejackle. Contributors, please read how to link your accounts to ensure your work is properly credited in WordPress releases. If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message. To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
|
👋 Thanks for your first Pull Request and for helping build the future of Gutenberg and WordPress, @apermo! In case you missed it, we'd love to have you join us in our Slack community. If you want to learn more about WordPress development in general, check out the Core Handbook full of helpful information. |
|
@fabiankaegy would be especially nice to have your review, as german you're destined to test it :) |
|
Note: I've opened a trac ticket and a PR on Core to add the uppercase eszett, so depending on how fast that PR is accepted, we should consider to add this here too if the PR is accepted. WordPress/wordpress-develop#11188 |
|
As @dmsnell already updated my PR for the |
Mirrors WordPress/wordpress-develop#11188 which adds the same mapping to PHP remove_accents(). ẞ (U+1E9E) was standardized in German orthography in 2017 (DIN 5008) and should map to SS, not fall through to URL-encoded output.
What?
Closes #12907
Adds an optional
localeparameter tocleanForSlug()in@wordpress/urlso the slug preview in the block editor matches what WordPress generates server-side for locales that require digraph replacements.Why?
The slug preview has been wrong for German (and other) locales since the block editor launched.
äbecomesainstead ofae,öbecomesoinstead ofoe, etc. This is not just cosmetic:Data integrity bug: When a user clicks the slug field to edit it (standard SEO workflow — shorten/optimize the auto-generated slug), the field is pre-filled with the wrong JS-generated value. Any manual edit at that point bakes the wrong transliteration in permanently. WordPress locks the slug after the first manual edit, so the post has the wrong URL for its entire lifetime.
This was the original 2019 report and is still present today.
How?
cleanForSlug()in@wordpress/urluses theremove-accentsnpm package which has no locale concept. PHP'sremove_accents()handles this via a locale check (str_starts_with($locale, 'de'), etc.). The fix mirrors that exact locale block in JS, running the digraph replacements beforeremoveAccents()so thatübecomesuerather than being stripped tou.The
PostURLcomponent readsgetSite().languagefrom thecorestore and passes it togetEditedPostSlug()and to theonBlursanitization call.Locales covered (mirroring
wp-includes/formatting.phplines 1957–1989):de_*da_DKcasr_RS,bs_BAThe capital Eszett (
ẞ, U+1E9E) mapping is included in anticipation of WordPress/wordpress-develop#11188, which adds the same mapping to PHP'sremove_accents().All other locales are unaffected — the function is a no-op when
localeis empty or unrecognised, preserving existing behaviour exactly.All expected values were verified against
remove_accents( $input, $locale )+sanitize_title_with_dashes()running insidewp-env. A dedicated parity test is atpackages/url/src/test/compare-slug-php-parity.test.js.Testing Instructions
Künstler überraschen Hörerkuenstler-ueberraschen-hoererimmediately ✓To verify the data integrity fix:
kuenstler-ueberraschen-hoerer(notkunstler-uberraschen-horer)Testing Instructions for Keyboard
Navigate to the Permalink panel via Tab, enter the title, then Tab into the slug field and verify the pre-filled value is correct.
Research and implementation by Claude (Anthropic). Reviewed, tested, and steered by @apermo.