How browsers parse HTML

By Jane Doe, April 2026 — a long-enough body so Readability accepts it.

This article explains the three big phases of how a browser turns HTML bytes into pixels. We talk about byte streams, tokenization, and DOM construction. Plenty of words here so Readability doesn't bail out on charThreshold.

Phases

  1. Decode bytes into characters
  2. Tokenize into start/end tags
  3. Build the DOM
    • Parents adopt children
    • Implicit elements get inserted

An example

function parse(input: string): Node[] {
  return tokenize(input).map(toNode);
}

"Browsers are the most complex software you'll touch." — old internet wisdom

Comparison table

PhaseOutput
DecodeCode points
TokenizeTokens
ConstructDOM tree

Read more in the follow-up post or look at Parsing diagram the diagram. Any plain paragraph is fine — Readability just needs ~200 chars of running text to stay confident this is the article body. So here we are with extra padding to clear that bar without straining anyone's eyeballs.