Lllm2doc
Conversion · 7 min read · Updated April 2026

HTML → clean Markdown

The converter uses Turndown with the GFM plugin, so structural HTML round-trips faithfully — code blocks keep their language, tables keep their alignment, lists keep their depth. Inline styles and visual decoration don’t carry through, by design.

Key facts
  • Engine: Turndown + turndown-plugin-gfm for tables, strikethrough, and task lists.
  • Output flavor: ATX headings (# ... ######), fenced code blocks (```language), - for bullets, inline link style.
  • What round-trips: Headings, paragraphs, lists (any depth), code blocks (with language hint), inline code, blockquotes, tables (alignment honored), images, links, bold, italic, strikethrough, task lists, horizontal rules.
  • What gets dropped: Inline styles, wrapper <div>/<section>, iframes, forms, data-* attributes, CSS classes.
  • LaTeX detection: Custom Turndown rules detect .math-inline, .katex, and .katex-display spans, converting back to $ ... $ / $$ ... $$.
  • Privacy: Conversion runs entirely in your browser. No upload.

The flow

  1. 1. Switch input mode. Click HTML Input at the top of the editor (or use the keyboard shortcut). The left panel now expects HTML.
  2. 2. Paste. Paste your HTML — full document or fragment, either works. The right panel renders the converted Markdown live.
  3. 3. Copy or export. Use the Copy Markdown button to grab the source, or click MD in the export bar to download a .md file.

What round-trips cleanly

Headings

<h1>–<h6> → # to ######

Paragraphs

<p> → blank-line separated text

Lists

<ul>/<ol> → -/1., nested cleanly

Code blocks

<pre><code class='language-x'> → ```x fences with the language preserved

Inline code

<code> → backticks

Links

<a href> → [text](url) — title attribute kept where present

Images

<img src alt> → ![alt](src)

Tables

<table> → GFM pipe tables, alignment honored

Bold / italic

<strong>/<em> → **bold** / *italic*

Blockquotes

<blockquote> → > prefix, nested where applicable

Horizontal rules

<hr> → ---

What doesn’t survive (and why that’s usually fine)

Inline styles

color, background, font-size — Markdown has no equivalent. They're dropped silently. If the styling carries semantic meaning, capture it as text or HTML before converting.

Wrapper divs / sections

Plain <div> and <section> get unwrapped — their children survive, the container doesn't. <article> and semantic landmarks behave the same way.

iframes / embeds

<iframe>, <video>, <object> have no Markdown equivalent. The converter strips them. If you need them in the output, leave the page as HTML or use a static HTML export.

Forms

<form>, <input>, <button> get dropped. Markdown isn't an interactive format.

Custom data attributes

data-* attributes don't survive. They're metadata for JS, not content.

Class-based theming

Tailwind classes, BEM, and CSS modules are all just strings to the converter — no styling carries through. Markdown styling is structural, not visual.

When this is the right tool

CMS migration

Moving from WordPress to a markdown-based static site? Paste each post's HTML, copy the resulting Markdown, drop into your repo.

Web clipping

Save a clean, portable version of an article. Strip ads, sidebars, and theme noise — keep the prose.

Documentation cleanup

Inherited an HTML doc set? Round-trip it to Markdown for cleaner diffs in version control.

AI input prep

Many LLMs handle Markdown better than raw HTML. Pre-convert before pasting into a long-context prompt.

Pro tip: trim before you convert

The cleaner the HTML in, the cleaner the Markdown out. If you’re grabbing from a live web page, open DevTools, find the article body in the Elements panel, right-click and Copy → Copy outerHTML. That gives you just the prose, none of the navigation, ads, or theme wrappers — and the resulting Markdown will be near-publishable.

Try it on a real page

Switch to HTML Input, paste, watch the Markdown render. Re-export as DOCX or PDF if you need the document form.

Open the converter