As web applications continue to grow in complexity, properly sanitizing user-generated content before rendering it is critical for security. Without proper sanitization, malicious input from users can expose dangerous scripts and unwanted content.
In this comprehensive 2600+ word guide, we will dig deep on HTML sanitization in JavaScript, understanding why it matters and how to implement secure solutions.
The Rising Threat of XSS Attacks
Over the last decade, XSS attacks have rapidly emerged as one of the most common attack vectors exploited by hackers. XSS allows injection of malicious scripts into web applications to access user data or take over accounts.
According to statistics from Acunetix, XSS now compromises over 39% of vulnerabilities tested, trailing only SQL injection in prevalence. Nearly two thirds of web professionals have reported experiencing at least one XSS attack against their systems.
Dangers of XSS include:
- Stealing user session cookies and account takeover
- Extracting sensitive data via DOM scraping
- Phishing by simulating legitimate sites
- Browser lockups due to resource overload
Real world examples like the MySpace Samy worm in 2005 have shown just how quickly XSS can spread – compromising over 1 million users by copying itself onto profiles site-wide using injected JavaScript.
As web applications grow more ambitious connecting across domains and integrating rich text editors for user-generated content, we need robust defenses now more than ever.
Why Sanitization Matters
To secure sites against XSS, our first critical line of defense is input sanitization – scrubbing any rich text content submitted by users on the client and server side before render.
Without sanitization, applications often naively enable XSS by directly inserting user input:
<!-- UNSAFE -->
<div>
<p>Comment:</p>
{User-Submitted-Content}
</div>
With embedded scripts ignored or executed, massive damage can ensue.
Instead by sanitizing input, we filter out dangerous elements, letting only safe markup through:
<!-- SAFE -->
<div>
<p>Comment:</p>
<span>Safe Text Content</span>
</div>
Now users can richly format and link content without granting execution privileges.
An effective sanitizer must clean dirty markup of anything potentially dangerous – but also maintain text formating and links for usability. Balancing security and usability requires careful design.
Next we‘ll explore common sanitization techniques and libraries that handle this for us.
OWASP Recommendations for Sanitization
OWASP, the leading security standards body, categorizes two main methods of sanitization in their XSS prevention cheat sheet:
1. HTML Encoding – Replacing special characters like < and > with HTML entities like < and >, preventing embedded tag parsing.
2. HTML Whitelist Stripping – Configuring an allow-list of approved tags and stripping out all others.
Encoding maintains greater document structure and formatting, but still allows some potential XSS vectors. Stripping provides the most security, but loses text styling capabilities.
OWASP recommends combining both approaches – first encode user input to handle most vectors, then strip all but approved tags/attributes as a secondary defense. This balanced approach prevents XSS while allowing reasonable formatting.
Now let‘s see how to implement encoding and whitelist stripping in JavaScript.
Building a Secure JavaScript Sanitizer
To handle sanitization, we‘ll turn to a battle-tested library instead of rolling our own regex attempts. These libraries have been hardened against vulnerabilities over years of real world usage.
Some popular options include:
- DOMPurify – Fast and versatile HTML whitelist-based stripper
- xss – Encoder focused on balance of security and usability
- sanitize-html – Highly customizable rules for advanced use cases
I generally recommend DOMPurify for its speed and safety – it follows the OWASP guidelines by first encoding then stripping. But let‘s explore how each library handles sanitization differently.
HTML Encoding with xss
The xss package focuses on encoding special characters in untrusted input.
To install:
npm install xss
Minimal usage:
import xss from ‘xss‘;
const encodedHTML = xss(‘<script>alert(1)</script>‘);
// Encoded Output:
// <script>alert(1)</script>
By converting <, >, etc to HTML entities, scripts no longer execute while allowing text formatting like bold and italics.
xss enables customization of these output encoding rules as well.
The xss approach prioritizes usability – encoding preserves much of the original document structure while preventing embedded script execution. However some more exotic XSS vectors like DOM clobbering could still bypass this technique.
HTML Whitelist Stripping with DOMPurify
DOMPurify performs sanitization by stripping out HTML tags/attributes that don‘t match an approved whitelist:
import DOMPurify from ‘dompurify‘;
const cleanHTML = DOMPurify.sanitize(`
<script>alert()</script>
`);
// Stripped output:
// alert()
Much more strict than encoding, DOMPurify blocks any attempt to inject JavaScript.
The default whitelist allows basic text formatting like <b>, <i>, etc. This balances utility while stopping XSS vectors.
Again the whitelist can be customized to allow extra tags if necessary:
const cleanHTML = DOMPurify.sanitize(dirtyHTML, {
ALLOWED_TAGS: [‘iframe‘]
});
When tuned properly, whitelist stripping reliably defends against XSS with minimal utility impact.
Combining Encoding and Whitelist Stripping
To follow security best practices, we layer both encoding and whitelist stripping – first encoding special characters, then stripping banned tags/attributes.
This harness the strengths of each approach into a very secure yet usable sanitizer.
Implementation with xss and DOMPurify:
// Import both libraries
import xss from ‘xss‘;
import DOMPurify from ‘dompurify‘;
function sanitizeHTML(dirty) {
// First encode special characters
const encodedHTML = xss(dirty);
// Then strip anything not on allow-list
return DOMPurify.sanitize(encodedHTML);
}
const cleanHTML = sanitizeHTML(`
<img src=x onerror="alert(‘Hacked‘)">
`);
// Final output:
// <img src="x">
While individually either library would prevent XSS, together they form an impervious shield following OWASP advice precisely.
Securing Sanitizer Configuration
Sanitizers offer flexibility customizing allowed markup. But take care when expanding these whitelists – each additional tag or attribute reintroduces potential risk.
When extra freedom is required, make sure to:
- Scrutinize Defaults – Review the base whitelist before modifying to understand coverage.
- Isolate Custom Elements – Scope custom tags to dedicated documents areas instead of site-wide.
- Security Test Expanded Policies – Test the exact sanitizer configuration against XSS vectors to catch bypasses before launching.
- Apply Least Privilege – Grant the minimum markup required for needed formatting effects rather than wide open policies.
aussi Consult security researchers when crafting policies striking the right balance.
And even once launched, routinely audit policies against the latest discoveries, evolving the rules over time. The threats will continue adapting, so our defenses must keep pace.
The DOM Clobbering XSS Attack Vector
To understand how quickly assumptions can lead to bypass, consider the DOM clobbering technique for injecting past sanitizers.
DOM clobbering manipulates website JS to change input processing by poisoning variable names.
For example, normally a site scrubs input then inserts it:
const clean = sanitize(dirtyInput);
document.getElementById("output").innerText = clean;
But if dirtyInput contained location, this would override the sanitize step:
// DOM Clobbering input:
const dirtyInput = "<script>alert(1)</script>";
// Document location now holds dirty input!
const location = "<script>alert(1)</script>";
// Inner text inserted without sanitization
document.getElementById("output").innerText = location;
Now with document.location overridden, the script executes!
While sanitizers protect against basic XSS insertion, more exotic techniques like clobbering underscore why constant reassessment against new methods matters tremendously.
Auditing and Testing Sanitization Defenses
To avoid nasty surprise like DOM clobbering, implementing well-designed policies is step one. But routinely testing defenses against actual attack patterns is essential to catch bypasses early.
Create tests simulating common XSS vectors against your exact sanitizer configuration:
const testVectors = [
`"<script>alert(1)</script>"`, // Injected script
`"‘;alert(1)//"`, // Oneline comment
`"><img src=x onerror=alert(1)>"`, // Encoded SVG/HTML
];
testVectors.forEach(vector => {
if(!sanitizer(vector)){
console.error(‘Bypass found:‘, vector);
}
})
Logging any passed payloads indicates vulnerabilities requiring policy adjustments.
In addition to programmatic testing, leverage web vulnerability scanners purpose-built for XSS and code injection:
- OWASP ZAP – Popular open source scanner and fuzzer
- ImmuniWeb – AI-powered and manual testing
- Checkmarx – Commercial static and interactive analysis
These tools automatically probe apps for weaknesses, complementing internal testing.
Between vigilant human and automated testing, we catch policy gaps rapidly.
Real-World Attack Case Study: MySpace Samy Worm
To see the real-world havoc possible without proper sanitization, we need look no further than the infamous MySpace Samy worm in 2005. At the time, MySpace allowed users to freely customize profile HTML and JavaScript.
Samy Kamkar exploited this to craft viral self-propagating XSS targeting MySpace users. When visitors viewed his profile, injected JavaScript automatically sent a friend request from their account to Samy while also copying his exploit code into their own profile!
This auto-spreading then continued as visitors accessed newly compromised accounts in a chain reaction. Within 24 hours Samy amassed over 1 million "friends" as the #1 account on MySpace thanks to the worm‘s exponential growth.
The incident dramatically demonstrated the importance of input sanitization – a single open vector enabling massive impact in virtually no time. It led MySpace and other platforms to significantly lock down their protections.
But 15 years later, some common sanitization mistakes still leave apps exposed as XSS techniques advance:
- Whitelists overly open with unsafe tags like
<script> - Browser quirks not considered around encoding
- Policies not routinely tested against latest attacks
- Developer education lacking on emerging vectors
Proper training coupled with robust yet usable sanitizers provides the greatest defense.
Closing Thoughts on Securing Applications via Sanitization
In closing, as both simple and sophisticated XSS attacks proliferate, leveraging sanitization libraries like DOMPurify shields our web apps and users from unintended harm.
When crafted carefully, balanced policies maintain utility for content creation while eliminating vectors for injection. Features like rich text comments don‘t need to introduce additional risk.
Beyond sanitizing, mandate:
- Developer security training to recognize vector patterns
- Routine internal and external vulnerability testing
- Constant reevaluation of allow-lists as new techniques emerge
- Isolation of custom markup rules to minimize danger zones
With vigilance across people, process and technology fronts, we turn the tide securing the expanding attack surface of the modern web app landscape.
Our users deserve no less as they entrust applications with ever more sensitive capabilities day by day. Be a steward worthy of that trust through robust defenses.


