Transforming plain text to HTML for EPUB 3

Stewart Haines

6 Jan, 2026

screenshot of the seed html app

This post summarizes the formats that I've experimented with in the SEED.html app to create EPUB 3 books. I'll present a minimal goal HTML fragment, and show what the plain text source looks like for each format and what it generates as an html fragment.

This is a list of the formats with links to various javascript implementations of parsers/converters that I've worked with in the SEED.html app.

Markdown (libraries include MarkdownIt, Showdown.js, CommonMark.js, Marked, Kramed, and using a custom ixml grammar via grammix)
Asciidoc (using asciidoctor.js)
Textile (see textile-lang.com)
Org mode (using org.js)
LaTex (using latex.js)
Screenplays (using fountain.io)

The javascript libraries all run in the SEED.html app to convert plain text to html fragments. There is some further tweaking to turn these fragments into chapter XHTML content.

I've blogged previously about some of these setups, and there are a few more samples that you can see on the SEED.html app page. The others I haven't spent much time with, either because they're not a good fit for producing accessible EPUB materials or they don't meet my stated goal of being a simple general format for source content (Fountain and LaTeX respectively). They might be relevant for other types of EPUB content like MathML that has not been my focus.

Markdown using an Invisible XML grammar
Markdown using MarkdownIt
Textile for Album Liner Notes
Fountain for screenplays
LaTeX sample because it was possible

The SEED.html app home page has additional sample EPUB projects that you can view and edit;

Introduction to Asciidoc sample using asciidoctor.js
Introduction to Org Mode sample using org.js

Format / Output Comparison

This is the goal output html fragment illustrating basic XHTML content for an EPUB chapter including headings, paragraphs, inline semantic emphasis and strong tags.

<h1>Heading 1</h1>
<p>First paragraph text with <em>emphasis</em>.</p>
<h2>Heading 2</h2>
<p>Second paragraph text with <strong>strong</strong>.</p>
<ul>
  <li>item one</li>
  <li>item two</li>
</ul>

The following sections show the various javascript libraries' plain text input and processed html output to give a flavour of what is possible with the SEED.html app.

Markdown

Here's the basic form of the plain text as markdown.

# Heading 1

First paragraph text with _emphasis_.

## Heading 2

Second paragraph text with **strong**.

* item one
* item two

Here's the output html fragment produced by the showdown.js parser. It generates nice and clean markup. Note that the heading elements acquire id attributes that are stubs generated from the text content.

<h1 id="heading1">Heading 1</h1>
<p>First paragraph text with <em>emphasis</em>.</p>
<h2 id="heading2">Heading 2</h2>
<p>Second paragraph text with <strong>strong</strong>.</p>
<ul>
  <li>item one</li>
  <li>item two</li>
</ul>

I've mostly been using MarkdownIt rather than showdown.js because its plugin architecture lets me write custom handlers for non-standard formats. The custom gillemets delimiters handling that I describe in the Language Shift post was implemented as a MarkdownIt plugin.

Textile

Textile is a markup language (like Markdown) for formatting text in a blog or a content management system (CMS).

Out of the box it lets the author add id/class attributes. This example puts an id of #jump-here on the h2 element.

h1. Heading 1

First paragraph with _emphasis_.

h2(#jump-here). Heading 2

Second paragraph with *strong*.

* item one
* item two

Equally clean html fragment, and a couple of additional features out of the box that I've found useful in authoring EPUBs.

Textile makes available em, i, strong and b. (markdown only provides em and strong.) This flexibility is of value when looking in detail at EPUB accessibility for example if read-aloud scenarios.

<h1>Heading 1</h1>
<p>First paragraph with <em>emphasis</em>.</p>
<h2 id="jump-here">Heading 2</h2>
<p>Second paragraph with <strong>strong</strong>.</p>
<ul>
  <li>item one</li>
  <li>item two</li>
</ul>

Asciidoc

AsciiDoc is a plain text markup language for writing technical content.

Asciidoctor.js is the Ruby implementation translated into javascript using Opal. There are a lot of features, and the html fragment gets a lot of structural markup that's not necessarily a good fit for accessible EPUB.

= Heading 1

First paragraph text with _emphasis_.

== Heading 2

Second paragraph text with *strong*.

* item one
* item two

The generated HTML tends to have div wrappers where EPUB wants clean paragraphs. Some of this can be configured, but I found myself fighting unwanted structural markup.

Like showdown and textile it usefully adds id attributes on headings, which can be important for EPUB navigation and table of contents.

<h1 id="id-heading-1" class="sect0">Heading 1</h1>
<p>First paragraph text with <em>emphasis</em>.</p>
<div class="sect1">
  <h2 id="id-heading-2">Heading 2</h2>
  <div class="sectionbody">
    <p>Second paragraph text with <strong>strong</strong>.</p>
    <div class="ulist">
      <ul>
        <li>
          <p>item one</p>
        </li>
        <li>
          <p>item two</p>
        </li>
      </ul>
    </div>
  </div>
</div>

LaTeX

LaTeX is a document preparation system used for the communication and publication of scientific documents.

It's an odd fit for the goal of plain text source because the plain text is quite heavy with formatting instructions.

\documentclass{book}

\begin{document}

\chapter*{Heading 1}

First paragraph text with \emph{emphasis}.

\section*{Heading 2}

Second paragraph text with \bfseries{strong}.

\begin{itemize}
    \item item one
    \item item two
\end{itemize}

\end{document}

The generated markup has undesirable spans for inline emphasis/strong and the unordered list structure is a bit of a mess. The library is more interesting to me for its drawing capabilities than as a top-level html fragment generator. I will write more about this in another post.

<h1>Heading 1</h1>
<p>First paragraph text with <span class="it">emphasis</span>.</p>
<h2>Heading 2</h2>
<p>Second paragraph text with <span class="bf">strong</span><span class="bf">.</span></p>
<ul class="list">
   <li>
      <span class="itemlabel"><span class="hbox llap">•</span></span>
      <p>item one</p>
   </li>
   <li>
      <span class="itemlabel"><span class="hbox llap">•</span></span>
      <p>item two</p>
   </li>
</ul>
</div>

Org Mode

Org Mode is an authoring tool and a TODO lists manager for GNU Emacs.

org.js is a parser and converter for org-mode notation.

There's some messing around with title here to force an h1 tag in the output, but for a chapter I'd be happy with h2 as the chapter title.

#+title: Header 1

First paragraph text with /emphasis/.

** Header 2

Second paragraph text with *strong*.

- item one
- item two

The generated markup is clean, but again leans on i and b presentation tags rather than semantic em and strong.

I like it and I'm leaving it on a list of formats to explore when I need more structured output than markdown or textile provide.

<h1>Header 1</h1>
<p>First paragraph text with <i>emphasis</i>.</p>
<h2 id="header-0-1"><span class="section-number">0.1</span>Header 2</h2>
<p>Second paragraph text with <b>strong</b>.</p>
<ul>
   <li>item one</li>
   <li>item two</li>
</ul>

Fountain

Fountain is a plain text markup language for screenwriting.

I find it interesting as a domain-specific plain text format that almost entirely does away with markup.

The example input here is contrived to render the goal output. This is not what the format is designed for. It's designed for screenplays and does a phenomenal job at that task.

There's a fountain sample epub on the SEED.html home page that uses it in a more idiomatic way to create an EPUB format screenplay. Check that out.

Title:
    Heading 1

First paragraph text with *emphasis*.

>Heading 2

Second paragraph text with **strong**.

<ul><li>item one </li><li>item two</li></ul>

<h1>Heading 1</h1>
<p>First paragraph text with <span class="italic">emphasis</span>.</p>
<h2 id="heading-smash-cut-to">Heading 2</h2>
<p>Second paragraph text with <span class="bold">strong</span>.</p>
<p></p>
<ul>
   <li>item one </li>
   <li>item two</li>
</ul>

Future directions

Writing Invisible XML grammars and parsing/converting sources using Grammix. This seems really promising for encapsulating a grammar for custom markup within the EPUB itself.

I'm imagining a custom ixml grammar that takes textile as a starting point - block elements have opening delimeter tag name, and id/class, and some conventions for defining inline elements. Kind of riffing on the 'language shift' idea described in an earlier post.

Next up: other plain text formats for special purposes

In a future post I'll show the workings for my experiments around processing plain text representations for diagrams, music and custom transforms of code blocks in EPUB 3.

ABC music notation (using abcjs and abc2svg)
Syntax highlighting (using highlight.js and prism.js)
SVG Diagrams (using d3.js, mermaid.js and latex.js)

latex markdown textile

Transforming plain text to HTML for EPUB 3

Previous posts

Format / Output Comparison

Markdown

Textile

Asciidoc

LaTeX

Org Mode

Fountain

Future directions

Next up: other plain text formats for special purposes

Followers

Popular Posts

Archive

Previous posts

Format / Output Comparison

Markdown

Textile

Asciidoc

LaTeX

Org Mode

Fountain

Future directions

Next up: other plain text formats for special purposes

Followers

Popular Posts

Transforming plain text to HTML for EPUB 3

Digital Resources for Choirs

Easy Hypertext for Genealogy in EPUB

Archive