SyntaxStudy
Sign Up
Web Security Output Encoding and Sanitisation to Prevent XSS
Web Security Beginner 1 min read

Output Encoding and Sanitisation to Prevent XSS

The primary defence against XSS is context-aware output encoding: converting characters that have special meaning in the output context into their harmless entity or escape equivalents. In an HTML context, encode `<` as `<`, `>` as `>`, `"` as `"`, `'` as `'`, and `&` as `&`. In a JavaScript string context, the encoding rules differ — use `\uXXXX` escapes. In a URL context, apply percent-encoding. Using the wrong encoding for the context leaves the application vulnerable. Modern templating engines such as Blade, Twig, Jinja2, and React's JSX perform HTML encoding automatically when using their default output syntax. Blade's `{{ $value }}` HTML-encodes; `{!! $value !!}` does not — unescaped output should only be used for trusted, pre-sanitised HTML. React's JSX encodes all string values interpolated with `{}`, with `dangerouslySetInnerHTML` as the explicit unsafe escape hatch. When you genuinely need to allow rich user-generated HTML — for example, a WYSIWYG editor — use a well-maintained HTML sanitiser rather than trying to write your own. PHP's HTML Purifier, Python's Bleach, and JavaScript's DOMPurify parse the HTML and strip disallowed tags and attributes using a strict allowlist. Never attempt to sanitise HTML with a regex; HTML has too complex a grammar for reliable regex-based filtering.
Example
<?php
// PHP: context-aware output encoding

$userInput = '<script>alert("XSS")</script> & "quotes" \'apostrophe\'';

// HTML context — htmlspecialchars with correct flags
echo htmlspecialchars($userInput, ENT_QUOTES | ENT_HTML5, 'UTF-8');
// Output: &lt;script&gt;alert(&quot;XSS&quot;)&lt;/script&gt; &amp; &quot;quotes&quot; &#039;apostrophe&#039;

// JavaScript string context — json_encode produces a safely-escaped JSON string
$jsVar = json_encode($userInput, JSON_HEX_TAG | JSON_HEX_APOS | JSON_HEX_QUOT | JSON_HEX_AMP | JSON_UNESCAPED_UNICODE);
echo "<script>var userInput = {$jsVar};</script>";

// URL context — rawurlencode
$param = rawurlencode($userInput);
echo "<a href=\"/search?q={$param}\">Search</a>";

// ----------------------------------------------------------------
// HTML Purifier — allow rich HTML but strip XSS vectors
// ----------------------------------------------------------------
// require_once 'HTMLPurifier.auto.php';
// $config   = HTMLPurifier_Config::createDefault();
// $config->set('HTML.Allowed', 'p,b,i,a[href],ul,ol,li,br');
// $purifier = new HTMLPurifier($config);
// $clean    = $purifier->purify($dirtyHtml);

// ----------------------------------------------------------------
// JavaScript: DOMPurify for client-side sanitisation
// ----------------------------------------------------------------
// import DOMPurify from 'dompurify';
// const clean = DOMPurify.sanitize(dirtyHtml, { ALLOWED_TAGS: ['b','i','a','p'] });
// element.innerHTML = clean;  // safe after purification