-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
One of the features I found extremely useful in XSLT/XPath/XQuery was the normalisation of white spaces within a string (in other words, on top of trimming a string, any multiple occurrence of a white space character gets replaced by a single white space character).
For example the string hello\r\nworld!\t would be normalised simply as hello world!.
This is extremely useful in non-latin (read Japanese, I live in Tokyo) languages where a number of characters can be used for separating words. In Japanese, normally people type the unicode character \u3000 as that's what entered by default when hitting "space" on the keyboard, but that's not necessarily what one might want to retain.
For example, I'd love for the string 山田 太郎 (the space used here is the normal space I get when using the Japanese keyboard, copy and paste it into hexdump -C on a UTF8 console) into a more normal 山田 太郎 (replaced with ASCII 0x20).