Invisible XML to EPUB 3


Last week I read for the first time about Invisible XML. (The first place I saw it referenced was on mastodon - @ndw@toot.wales)

"Invisible XML is a language for describing the implicit structure of data, and a set of technologies for making that structure explicit as XML markup. It allows you to write a declarative description of the format of some text and then leverage that format to represent the text as structured information."

Given that the purpose of the SEED.html app is the transformation of plain text into xml for packaging as EPUB I was more than curious. Curiosest?

There is a javascript parser called Grammix by Alain Couthures that was easily added as an extension script in a SEED.html project. The project then includes two very minimal ixml grammars - one for markdown, the other for a subset of the textile format.

I found the process of writing/developing a grammar worked well using John Lumley's web-based jωXML processor because the error reporting (via SaxonJS) is so informative and educational.

(Once the ixml grammar is transferred to the SEED.html app any problems are kind of opaque and poorly chosen rules have a tendency to hang the browser for many seconds.)

The core idea has obvious value - the plain text content and its grammar for transformation to xhtml are expressed in a concise way that travels with the content. Only one parser is needed for a range of distinct grammars, hence the combined markdown/textile content in the sample below.

Here's the content of the transformText.js that implements the ixml grammar in the EPUB file that is linked below. Provied here in case you recognize Invisible XML grammars. :)


/* textile-inspired grammar generated by Gemini 3
 */
const textile_grammar = `
body = block++sep, -sep?.
-sep = -#a+.

-block = heading ; p.

-heading: h1 ; h2 ; h3 ; h4 ; h5 ; h6.
h1 = -"h1", attributes?, -". ", inline.
h2 = -"h2", attributes?, -". ", inline.
h3 = -"h3", attributes?, -". ", inline.
h4 = -"h4", attributes?, -". ", inline.
h5 = -"h5", attributes?, -". ", inline.
h6 = -"h6", attributes?, -". ", inline.

p = p-tag ; p-plain.
-p-tag = -"p", attributes?, -". ", inline.
-p-plain = not-trigger, inline? .
-not-trigger = ~["h"; "p"; "*"; "_"; "%"; #a]
             ; ("h", ~["1"; "2"; "3"; "4"; "5"; "6"; #a])
             ; ("p", ~["("; "."; #a]).

-inline = (strong ; em ; span ; plain)+.

strong = -"*", (em ; span ; plain)+, -"*" .
em = -"_", (strong ; span ; plain)+, -"_" .
span = -"%", attributes?, (strong ; em ; plain)+, -"%" .

-plain = char.
-char = ~["*"; "%"; #a].

-attributes = -"(", (class, id? ; id), -")" .
@class = [L; N; "-"; "_"]+ .
@id = -"#", [L; N; "-"; "_"]+ .
`;

/* grammar lightly modified from 
 * https://homepages.cwi.nl/~steven/ixml/advanced/tutorial.xhtml
 * and
 * https://homepages.cwi.nl/~steven/ixml/advanced/examples/markdown.ixml
 */
const markdown_grammar = `
     body: part++(-#a+).
    -part: heading; para.
 -heading: h1; h2; h3; h4; h5; h6.
    -para: p; pre.

       h1: -"# ", htext, -"#"*, -#a.
       h2: -"## ", htext, -"#"*, -#a.
       h3: -"### ", htext, -"#"*, -#a.
       h4: -"#### ", htext, -"#"*, -#a.
       h5: -"##### ", htext, -"#"*, -#a.
       h6: -"###### ", htext, -"#"*, -#a.
   -htext: hc+.
      -hc: ~["#"; #a]; "#", ~["#"; #a].

	p: ~["# "], line++nl, -#a.
    -line: c+.
      -nl: #a.
       -c: ~[#a; "*_\`["]; em; strong; code; a.

a: -"[", -text, -"](", @href, -")".
    -text: ~["]"]*.
    @href: ~[")"]*.

   strong: -"**", cstar+ , -"**";
	   -"__", cunder+ , -"__".

       em: -"*", ~["*"], cstar+, ~["*"], -"*";
	   -"_", ~["_"], cunder+, ~["_"], -"_".

     code: -"\`", ccode+, -"\`".

   -cstar: ~["*"; #a].
  -cunder: ~["_"; #a].
   -ccode: ~["\`"; #a].

      pre: (-" ", preline)++nl, -#a.
 -preline: ~[#a]*.
`;

/**
 * Convert simple text to well-formed XHTML
 * @param {string} text - plain text
 * @param {string|undefined} idref - Spine item idref for context-aware transforms
 * @returns {string} Valid XHTML output
 */
function transformText(text, idref) {
  try {
    let currentGrammar;
    if (idref.includes('_t')) {
      currentGrammar = grammix.fromIXml(textile_grammar);
    } else {
      currentGrammar = grammix.fromIXml(markdown_grammar);
    }
    const fragment = currentGrammar.parse(text);
    return fragment.toIndentedString();
  } catch (err) {
    return err;
  }
}

The EPUB file itself doesn't contain anything interesting. If you're interested to peek under the hood you'll need to open it with the 'Edit in SEED.html' button. Simples!

Page
Previous Post
No Comment
Add Comment
comment url