Article Categories

Selected Reading

Unicode Property Escapes JavaScript Regular Expressions

Javascript Web Development Object Oriented Programming

Unicode property escapes in JavaScript regular expressions allow you to match characters based on their Unicode properties using the u flag. This feature enables precise matching of characters by their Unicode categories, scripts, or properties.

Syntax

/\p{PropertyName}/u
/\P{PropertyName}/u  // Negated form

The \p{} matches characters with the specified property, while \P{} matches characters WITHOUT that property.

Common Unicode Properties

Property	Description	Example Characters
`Letter`	Any letter	A, B, ?, ?, ?
`Number`	Any number	1, 2, ?, ?
`Emoji_Presentation`	Emoji characters	?, ?, ?
`Script=Latin`	Latin script	A-Z, a-z

Example: Extracting Emojis

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Unicode Property Escapes</title>
</head>
<body>
    <div id="text">Hello ?? World ?? 123!</div>
    <button onclick="extractEmojis()">Extract Emojis</button>
    <div id="result"></div>
    
    <script>
        function extractEmojis() {
            const text = document.getElementById('text').textContent;
            const emojiRegex = /\p{Emoji_Presentation}/gu;
            const emojis = text.match(emojiRegex);
            
            document.getElementById('result').innerHTML = 
                'Found emojis: ' + (emojis ? emojis.join(' ') : 'None');
        }
    </script>
</body>
</html>

Example: Matching Letters and Numbers

const text = "Hello ?? 123 ??? !@#";

// Match all letters
const letters = text.match(/\p{Letter}/gu);
console.log("Letters:", letters);

// Match all numbers
const numbers = text.match(/\p{Number}/gu);
console.log("Numbers:", numbers);

// Match non-letters (negated)
const nonLetters = text.match(/\P{Letter}/gu);
console.log("Non-letters:", nonLetters);

Letters: [ 'H', 'e', 'l', 'l', 'o', '?', '?', '?', '?', '?' ]
Numbers: [ '1', '2', '3' ]
Non-letters: [ ' ', ' ', '1', '2', '3', ' ', ' ', '!', '@', '#' ]

Example: Script-Specific Matching

const mixedText = "Hello ???? ?????? ??? ?? ????";

// Match Latin script
const latin = mixedText.match(/\p{Script=Latin}/gu);
console.log("Latin:", latin.join(''));

// Match Devanagari (Hindi)
const devanagari = mixedText.match(/\p{Script=Devanagari}/gu);
console.log("Devanagari:", devanagari.join(''));

// Match Han (Chinese)
const han = mixedText.match(/\p{Script=Han}/gu);
console.log("Han:", han.join(''));

Latin: Hello
Devanagari: ???????????????
Han: ????

Key Points

Always use the u flag for Unicode property escapes to work
Use \p{} for positive matching and \P{} for negative matching
Property names are case-sensitive
Supports both general categories (Letter, Number) and specific scripts (Latin, Han)

Browser Compatibility

Unicode property escapes are supported in modern browsers (Chrome 64+, Firefox 78+, Safari 11.1+). Not supported in Internet Explorer.

Conclusion

Unicode property escapes provide powerful character matching capabilities based on Unicode properties. They're essential for internationalized applications and precise text processing across different languages and scripts.

AmitDiwan

Updated on: 2026-03-15T23:18:59+05:30

252 Views

Kickstart Your Career

Get certified by completing the course

Get Started

Previous Next