Unicode Property Escapes JavaScript Regular Expressions

Unicode property escapes in JavaScript regular expressions allow you to match characters based on their Unicode properties using the u flag. This feature enables precise matching of characters by their Unicode categories, scripts, or properties.

Syntax

/\p{PropertyName}/u
/\P{PropertyName}/u  // Negated form

The \p{} matches characters with the specified property, while \P{} matches characters WITHOUT that property.

Common Unicode Properties

Property Description Example Characters
Letter Any letter A, B, ?, ?, ?
Number Any number 1, 2, ?, ?
Emoji_Presentation Emoji characters ?, ?, ?
Script=Latin Latin script A-Z, a-z

Example: Extracting Emojis

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Unicode Property Escapes</title>
</head>
<body>
    <div id="text">Hello ?? World ?? 123!</div>
    <button onclick="extractEmojis()">Extract Emojis</button>
    <div id="result"></div>
    
    <script>
        function extractEmojis() {
            const text = document.getElementById('text').textContent;
            const emojiRegex = /\p{Emoji_Presentation}/gu;
            const emojis = text.match(emojiRegex);
            
            document.getElementById('result').innerHTML = 
                'Found emojis: ' + (emojis ? emojis.join(' ') : 'None');
        }
    </script>
</body>
</html>

Example: Matching Letters and Numbers

const text = "Hello ?? 123 ??? !@#";

// Match all letters
const letters = text.match(/\p{Letter}/gu);
console.log("Letters:", letters);

// Match all numbers
const numbers = text.match(/\p{Number}/gu);
console.log("Numbers:", numbers);

// Match non-letters (negated)
const nonLetters = text.match(/\P{Letter}/gu);
console.log("Non-letters:", nonLetters);
Letters: [ 'H', 'e', 'l', 'l', 'o', '?', '?', '?', '?', '?' ]
Numbers: [ '1', '2', '3' ]
Non-letters: [ ' ', ' ', '1', '2', '3', ' ', ' ', '!', '@', '#' ]

Example: Script-Specific Matching

const mixedText = "Hello ???? ?????? ??? ?? ????";

// Match Latin script
const latin = mixedText.match(/\p{Script=Latin}/gu);
console.log("Latin:", latin.join(''));

// Match Devanagari (Hindi)
const devanagari = mixedText.match(/\p{Script=Devanagari}/gu);
console.log("Devanagari:", devanagari.join(''));

// Match Han (Chinese)
const han = mixedText.match(/\p{Script=Han}/gu);
console.log("Han:", han.join(''));
Latin: Hello
Devanagari: ???????????????
Han: ????

Key Points

  • Always use the u flag for Unicode property escapes to work
  • Use \p{} for positive matching and \P{} for negative matching
  • Property names are case-sensitive
  • Supports both general categories (Letter, Number) and specific scripts (Latin, Han)

Browser Compatibility

Unicode property escapes are supported in modern browsers (Chrome 64+, Firefox 78+, Safari 11.1+). Not supported in Internet Explorer.

Conclusion

Unicode property escapes provide powerful character matching capabilities based on Unicode properties. They're essential for internationalized applications and precise text processing across different languages and scripts.

Updated on: 2026-03-15T23:18:59+05:30

252 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements