HTML comment too big, use bogus comments!


June 2, 2023

TIL html

Is all the typing required to declare an HTML comment getting you down? Do you never want to type another <!-- -->, or maybe you want to send less bytes down the wire giving customers your site moments earlier!?

What if I told you the 7 bytes making up <!----> could be reduced to just 3 bytes: <?>.

These all result in the browser inserting a comment node:

<!--Normal HTML comment-->
<?Bogus HTML comment>
</ Another bogus HTML comment>
<!Yet another bogus comment>

Don’t believe me, try it out in the browser developer tools or take a look at this sample!

What is going on here?

The Bogus comment state is defined in the Parsing HTML documents specification, so we can see exactly why the comments are created.

From the Tokenization section we know that the HTML tokenizer starts in the data state.

When a less-than sign (<) is encountered, we switch to the tag open state. In tag open state, if the next character consumed is a question mark (?), then create a comment node and reconsume in bogus comment state.

And that is how we get a comment node for only 3 bytes!

As some fun homework, trace the states in the tokenizer specification to find out how: </ this comment gets to bogus comment state>, or <!this comment>.