Obfuscated HTML, or HTML string attribute values have no text restriction

Published

December 24, 2022

TIL html js

HTML is incredibly permissive and allows a huge range of text to be used within it. The HTML specification says the following:

…there is no restriction on what text can be specified in such attribute values. (link)

This is great when you need an id value that doesn’t match JavaScript identifier syntax. This is recommended because browsers make your element with an id available via a named property on the window object. This makes variable name collisions possible with id values, e.g.:

<!-- `window.basic` references this `div` element. -->
<div id="basic"></div>

You can get pretty creative when looking for an alternative id value that is not a JavaScript identifier.

For example, all of the following HTML id attributes are valid:

<div id="basic"></div>
<div id="'"></div>
<div id='"'></div>
<div id="🧬"></div>
<div id="¯\_(ツ)_/¯"></div>
<div id="#basic"></div>

Each of these elements can be respectively queried with JavaScript:

document.querySelector(`#basic`);
document.querySelector(`#\\'`);
document.querySelector(`#\\"`);
document.querySelector(`#🧬`);
document.querySelector(`#¯\\\\_\\\(ツ\\\)_\\\/¯`);
document.querySelector(`#\\#basic`);

The escaping gets a little crazy in the more complicated cases.

This also applies with other string attributes, such as class. For example this class attribute looks like a CSS selector:

<div class="div#basic > p"></div>

which can be selected with:

document.querySelector(`[class="div#basic > p"]`);
// or
document.querySelector(`.\\>`);

I needed to know this recently when writing a custom CSS selector utility, which had to be fairly robust and needed to handle many surprising cases. However, this knowledge could also be potentially used to create some incredibly baffling obfuscated HTML.

If only you could also obfuscate the HTML tag name? Luckily, the browser does have a way to define custom HTML tags. This way you can replace readable HTML with extremely confusing obfuscated HTML.

You just have to ensure the custom obfuscated HTML tag name starts with [a-z]. Now you can write the following HTML tag:

<aØ-·🙃></aØ-·🙃>

with your obfuscated tag defined in some JavaScript:

customElements.define("aØ-·🙃", class extends HTMLElement {
  constructor() {
    super();
    this.innerHTML = `<p>Hello, world</p>`;
  }
});

Put together you could write HTML that looks like this:

<aØ-·🙃
  id="¯\_(ツ)_/¯<-'#🧬"
  class="div#basic' > p"
></aØ-·🙃>

Cool!