Export to Origami from Author

Case study: Document prepared for the Author development team


1. The .liquid Package: What You Have

The .liquid format is a macOS package (directory) containing:

Document.liquid/
└── Contents/
    ├── Content.rtfd/TXT.rtf      ← Main document text (RTF)
    ├── Content.liquidstore        ← Binary plist (NSAttributedString archive)
    ├── Author.plist               ← Document metadata (author, title, institution)
    ├── Citations.plist            ← Full citation database, keyed by UUID
    ├── DynamicView.json           ← Map view: nodes, positions, connections
    ├── LeftMarginDynamicView.json ← Left margin map data
    ├── RightMarginDynamicView.json← Right margin map data
    ├── glossary.json              ← Defined concepts with internal link IDs
    ├── Cuttings.plist             ← Clippings/excerpts (binary RTF data)
    ├── Version.plist              ← Format version, font size, citation style
    ├── endnotes.json              ← Endnotes (empty in sample)
    ├── inlineNotes.json           ← Inline notes (empty in sample)
    ├── customJSON.json            ← Custom JSON payload (null in sample)
    ├── Comments.plist             ← Comments (empty in sample)
    └── Hidden.plist               ← Hidden content (empty in sample)

Key structural observations

Content encoding: The main text is RTF with Cocoa extensions (\cocoartf2870). Headings are distinguished by font and size: title is Baskerville \fs50, section headings are Baskerville \fs38, body text is TimesNewRomanPSMT \fs34. Key/opening sentences use colour index \cf4 (brown/amber); body text uses \cf3 (dark grey).

Citations in text: Inline citations appear as parenthetical text within the RTF (e.g. (Halevi, Moed, Bar-Ilan 2015)), and the full citation records live in Citations.plist keyed by UUID. Each citation record contains: titlecitationAuthors (array of first/last/middle/prefix/suffix), yearComponentdoiwebAddressjournalvolumepageRangebibTeXTypenoteabstract, and more.

The ID Mismatch Problem: Three separate UUID spaces exist:

SystemExample IDLocation
Map nodesD10045AE-F24E-4A3B-...DynamicView.json → nodes[].identifier
Glossary entries98884DEF-5EE8-4BC4-...glossary.json → entries{}.identifier
Glossary internal links18A6C556-E84D-40C4-...glossary.json → entries{}.internalLinkId
CitationsC3736192-02AE-4635-...Citations.plist → keys

These must be unified at export time (see Stage 2 below).


2. Target: EPUB 3 Structure

The output EPUB is a ZIP file with this layout:

document.epub (ZIP)
├── mimetype                          ← MUST be first entry, uncompressed
├── META-INF/
│   └── container.xml                 ← Points to the OPF package
├── OEBPS/
│   ├── content.opf                   ← Package manifest + metadata + spine
│   ├── nav.xhtml                     ← EPUB 3 navigation document (TOC)
│   ├── content.xhtml                 ← The document body
│   └── style.css                     ← Minimal stylesheet
│   [Stage 2 additions:]
│   ├── origami-meta/
│   │   ├── citations.json            ← Full citation database as JSON
│   │   ├── glossary.json             ← Defined concepts
│   │   ├── spatial-layout.json       ← Unified XR/Map layout data
│   │   └── visual-meta.json          ← Visual-Meta block (document's own BibTeX etc.)
│   └── [any images from Content.rtfd]

STAGE 1: Basic Valid EPUB 3

Goal: produce an EPUB that passes EPUBCheck and opens correctly in Apple Books, Calibre, Thorium, and Kobo.

Step 1.1 — Parse the RTF into semantic HTML

This is the primary engineering task. The RTF parser must:

Detect headings by font/size pattern:

  • \f0\fs50 (Baskerville 25pt) → <h1> (document title)
  • \f0\fs38 (Baskerville 19pt) → <h2> (section headings)
  • \f1\fs34 (TimesNewRoman 17pt) → <p> (body paragraphs)

Convert formatting:

  • \f3\b (TimesNewRomanPS-BoldMT) → <strong>
  • \f2\i (TimesNewRomanPS-ItalicMT) → <em>
  • RTF bullet lists (\li300\fi-300 with \'95) → <ul><li>
  • Tab-indented paragraphs → new <p> (not &nbsp; indentation)
  • \cf4 colour text (key sentences) → at this stage, simply merge into <p> with no special markup (Stage 2 adds <span class="key-sentence">)
  • RTF \\'92 → ' (right single quote), \\'91 → ' (left single quote), \\'93 → "\\'94 → "\\'95 → bullet, \\'97 → em dash
  • RTF HYPERLINK fields → <a href="...">

Produce valid XHTML:

  • XML declaration: <?xml version="1.0" encoding="UTF-8"?>
  • XHTML namespace: <html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
  • All tags self-closing where appropriate (<br/><img/>)
  • All attribute values quoted
  • No unclosed tags

Minimal output structure:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" xml:lang="en">
<head>
  <meta charset="UTF-8"/>
  <title>Origami Text</title>
  <link rel="stylesheet" type="text/css" href="style.css"/>
</head>
<body>
  <h1>Origami Text</h1>
  <p><em>Minimal EPUB, Rich Metadata</em></p>
  <h2>Why Current Scholarly Formats Fall Short</h2>
  <p>To solve the urgent, complex problems...</p>
  <!-- etc. -->
</body>
</html>

Step 1.2 — Generate the navigation document (nav.xhtml)

Walk the heading structure from Step 1.1 and generate:

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
<head><title>Table of Contents</title></head>
<body>
  <nav epub:type="toc" id="toc">
    <h1>Contents</h1>
    <ol>
      <li><a href="content.xhtml#sec-why-current">Why Current Scholarly Formats Fall Short</a></li>
      <li><a href="content.xhtml#sec-origami-approach">The Origami Approach</a></li>
      <!-- one <li> per <h2> in content.xhtml -->
    </ol>
  </nav>
</body>
</html>

Each <h2> in the content needs a corresponding id attribute (e.g. id="sec-origami-approach") for the TOC links. Generate these as slugified heading text or sequential IDs.

Step 1.3 — Create the minimal stylesheet (style.css)

body {
  font-family: serif;
  line-height: 1.6;
  margin: 1em;
}
h1 { font-size: 1.8em; margin-bottom: 0.3em; }
h2 { font-size: 1.3em; margin-top: 1.5em; margin-bottom: 0.5em; }
p { margin: 0.5em 0; text-indent: 0; }
ul { margin: 0.5em 0 0.5em 1.5em; }
a { color: inherit; text-decoration: underline; }
strong { font-weight: bold; }
em { font-style: italic; }

Keep this deliberately sparse — this is the Origami philosophy of minimal formatting.

Step 1.4 — Create the package document (content.opf)

Read Author.plist for metadata. Generate a UUID-based unique identifier for the EPUB.

<?xml version="1.0" encoding="UTF-8"?>
<package xmlns="http://www.idpf.org/2007/opf" version="3.0" unique-identifier="uid">
  <metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
    <dc:identifier id="uid">urn:uuid:[GENERATE-UUID]</dc:identifier>
    <dc:title>Origami Article</dc:title>
    <dc:creator>Frode Alexander Hegland</dc:creator>
    <dc:language>en</dc:language>
    <meta property="dcterms:modified">[ISO-8601-TIMESTAMP]</meta>
  </metadata>
  <manifest>
    <item id="nav" href="nav.xhtml" media-type="application/xhtml+xml" properties="nav"/>
    <item id="content" href="content.xhtml" media-type="application/xhtml+xml"/>
    <item id="css" href="style.css" media-type="text/css"/>
  </manifest>
  <spine>
    <itemref idref="content"/>
  </spine>
</package>

Step 1.5 — Create META-INF/container.xml

This is boilerplate:

<?xml version="1.0" encoding="UTF-8"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
  <rootfiles>
    <rootfile full-path="OEBPS/content.opf" media-type="application/oebps-package+xml"/>
  </rootfiles>
</container>

Step 1.6 — Create the mimetype file

A plain text file containing exactly: application/epub+zip — no trailing newline, no BOM.

Step 1.7 — ZIP packaging

This is where most first-time EPUB builders fail. The rules:

  1. The mimetype file MUST be the first entry in the ZIP archive
  2. The mimetype file MUST be stored with NO compression (store method)
  3. The mimetype file MUST NOT have an extra field in its ZIP header
  4. All other files use standard deflate compression

In Swift, using Archive or a ZIP library:

// Pseudocode
let archive = ZIPArchive(path: outputPath)
archive.addEntry("mimetype", data: mimetypeData, compression: .none)
archive.addEntry("META-INF/container.xml", data: containerData)
archive.addEntry("OEBPS/content.opf", data: opfData)
archive.addEntry("OEBPS/nav.xhtml", data: navData)
archive.addEntry("OEBPS/content.xhtml", data: contentData)
archive.addEntry("OEBPS/style.css", data: cssData)

Step 1.8 — Validation

Run EPUBCheck (https://github.com/w3c/epubcheck) on every generated EPUB during development. It catches namespace errors, missing manifest entries, invalid XHTML, and mimetype packaging issues. Make this part of the test suite.

Step 1.9 — Reader testing

Test in at minimum: Apple Books (macOS + iOS), Calibre, Thorium Reader, and Kobo. Apple Books is the most important given Author’s platform, and it has specific quirks around nav document handling.

Estimated effort for Stage 1: 2–3 weeks (one developer, assuming familiarity with the Author codebase and RTF parsing).


STAGE 2: Origami Text EPUB (Addressable, Metadata-Rich)

This builds on the valid EPUB from Stage 1 and adds the three pillars: addressing, metadata, and the unified ID table.

Step 2.1 — High-resolution addressing (Purple Numbers)

Every structural element in content.xhtml gets a unique, immutable id attribute:

<h2 id="ot-001">Why Current Scholarly Formats Fall Short</h2>
<p id="ot-002">To solve the urgent, complex problems facing our world...</p>
<p id="ot-003">Neither PDF nor HTML, as currently deployed...</p>
<ul>
  <li id="ot-004"><strong>PDF</strong> provides stability...</li>
  <li id="ot-005"><strong>HTML</strong> delivers exceptional interactivity...</li>
</ul>

ID format recommendation: ot-NNN (sequential, zero-padded to three digits for documents under 1000 elements; extend as needed). Alternatively, use short stable hashes. The key requirement: IDs must be immutable once assigned — they are the citing address and must not change if the document is re-exported.

Author must persist these IDs in the .liquid format (perhaps in the liquidstore or a new addressing.json), so that re-export produces the same IDs for the same content.

Step 2.2 — Citation markup in the body

Replace inline parenthetical citations with semantic markup:

Before (Stage 1):

<p>...researchers routinely accumulate large local collections
(Halevi, Moed, Bar-Ilan 2015), managed through...</p>

After (Stage 2):

<p id="ot-027">...researchers routinely accumulate large local collections
<a class="origami-cite" data-cite-id="C3736192-02AE-4635-AD53-8DC896A6F500"
   href="#ref-C3736192">(Halevi, Moed &amp; Bar-Ilan 2015)</a>, managed through...</p>

This requires matching the inline citation text in the RTF to the corresponding entry in Citations.plist. The matching logic should use author-year tuples from the citation records.

Step 2.3 — Key sentences

The RTF uses \cf4 to mark key/opening sentences (rendered in a brown/amber colour in Author). In Stage 2, wrap these in:

<span class="key-sentence">The rise of large language models has made the
format question unexpectedly urgent.</span>

This gives LLMs and reader software a semantic handle on the document’s argumentative structure — something no current format provides.

Step 2.4 — Embed citation metadata as JSON

Create OEBPS/origami-meta/citations.json by serialising the entire Citations.plist into clean JSON. Strip the binary RTF authorInfodata fields and empty fields. Structure:

{
  "format": "origami-citations-v1",
  "entries": [
    {
      "id": "C3736192-02AE-4635-AD53-8DC896A6F500",
      "type": "article",
      "title": "Accessing, Reading and Interacting with Scientific Literature as a Factor of Academic Role",
      "authors": [
        {"given": "Gali", "family": "Halevi"},
        {"given": "Henk F.", "family": "Moed"},
        {"given": "Judit", "family": "Bar-Ilan"}
      ],
      "year": 2015,
      "journal": "Publishing Research Quarterly",
      "volume": "31",
      "pages": "102--121",
      "doi": "10.1007/s12109-015-9404-9",
      "url": "",
      "bibtex": "@article{Halevi2015, author = {Halevi, Gali and Moed, Henk F. and Bar-Ilan, Judit}, ...}"
    }
  ]
}

Also generate a BibTeX entry for this document itself (the self-citation entry), using data from Author.plist.

Add both files to the OPF manifest.

Step 2.5 — Embed glossary/defined concepts

Serialise glossary.json into OEBPS/origami-meta/glossary.json, cleaning up the structure:

{
  "format": "origami-glossary-v1",
  "entries": [
    {
      "id": "98884DEF-5EE8-4BC4-94C3-0765448AA480",
      "phrase": "1. 21st Century Origami Text",
      "description": "21st Century Origami Text",
      "tag": "section",
      "internal_anchor": "ot-001",
      "citation_ids": []
    }
  ]
}

Note the internal_anchor field — this is where the ID unification happens (see Step 2.7).

Step 2.6 — Embed spatial/XR layout data

This is the key innovation. Create OEBPS/origami-meta/spatial-layout.json from DynamicView.json, unified with glossary and citation references:

{
  "format": "origami-spatial-v1",
  "views": [
    {
      "name": "main",
      "source": "DynamicView",
      "settings": {
        "nodeVisibility": "showAll"
      },
      "nodes": [
        {
          "id": "D10045AE-F24E-4A3B-A917-B9D29F4E8CBA",
          "type": "text",
          "name": "1. 21st Century Origami Text",
          "content_anchor": "ot-001",
          "glossary_id": "98884DEF-5EE8-4BC4-94C3-0765448AA480",
          "citation_ids": [],
          "position": {"x": 0, "y": 0, "z": 0},
          "is_hidden": false
        }
      ],
      "connections": []
    },
    {
      "name": "left-margin",
      "source": "LeftMarginDynamicView",
      "nodes": [],
      "connections": []
    },
    {
      "name": "right-margin",
      "source": "RightMarginDynamicView",
      "nodes": [],
      "connections": []
    }
  ]
}

Step 2.7 — The Unified ID Table (solving the mismatch)

This is the critical architectural piece. Create an internal lookup that maps between all four ID namespaces at export time:

{
  "format": "origami-id-map-v1",
  "mappings": [
    {
      "content_anchor": "ot-001",
      "map_node_id": "D10045AE-F24E-4A3B-A917-B9D29F4E8CBA",
      "glossary_id": "98884DEF-5EE8-4BC4-94C3-0765448AA480",
      "glossary_link_id": "18A6C556-E84D-40C4-938D-02F6286BDDB4",
      "citation_ids": [],
      "type": "section",
      "label": "1. 21st Century Origami Text"
    }
  ]
}

How to build this table:

  1. Parse the RTF and assign sequential ot-NNN anchors to every block element.
  2. For each Map node in DynamicView.json, match its name field against heading text in the document to find the corresponding ot-NNN anchor.
  3. For each glossary entry, use its internalLinkId to find the corresponding paragraph/heading in the document (this likely maps to a text range in the liquidstore), and resolve that to an ot-NNN anchor.
  4. For each citation, map it by UUID to every inline citation reference in the body.

This table can either be embedded as a separate origami-meta/id-map.json file for reader consumption, or it can remain an internal build artefact used to populate the cross-references in the other JSON files. I’d recommend embedding it — it gives the Origami reader (and any future tool) a single lookup for all relationships.

Step 2.8 — Visual-Meta block

Create OEBPS/origami-meta/visual-meta.json:

{
  "format": "visual-meta-v1",
  "self_citation": {
    "bibtex": "@article{Hegland2026, author = {Hegland, Frode Alexander}, title = {Origami Article}, year = {2026}, institution = {University of Southampton}}",
    "type": "article",
    "title": "Origami Article",
    "authors": [{"given": "Frode Alexander", "family": "Hegland"}]
  },
  "origami_version": "1.0",
  "export_date": "2026-05-27T12:44:00Z",
  "author_app_version": "2250",
  "format_version": "8.11"
}

Step 2.9 — Update the OPF manifest

All new files must be listed in the manifest:

<item id="citations-json" href="origami-meta/citations.json" media-type="application/json"/>
<item id="glossary-json" href="origami-meta/glossary.json" media-type="application/json"/>
<item id="spatial-json" href="origami-meta/spatial-layout.json" media-type="application/json"/>
<item id="visual-meta-json" href="origami-meta/visual-meta.json" media-type="application/json"/>
<item id="id-map-json" href="origami-meta/id-map.json" media-type="application/json"/>

Step 2.10 — Graceful degradation

This is the Origami principle: a Stage 2 EPUB must still be a valid, readable EPUB in any standard reader. The JSON files in origami-meta/ will simply be ignored by readers that don’t understand them. The content.xhtml is valid XHTML with or without the Origami-specific data-cite-id attributes and class="origami-cite" markup.

Test this explicitly: open the Stage 2 EPUB in Apple Books, Calibre, and Kobo. Verify that it reads correctly, that the TOC works, and that no errors appear.

Estimated additional effort for Stage 2: 2–3 weeks (on top of Stage 1), with the bulk of the time in Step 2.7 (ID unification) and Step 2.2 (citation matching).


Summary: Total Effort Estimate

PhaseTaskEstimated Time
Stage 1RTF → XHTML parser5–7 days
Stage 1EPUB packaging (OPF, nav, mimetype, ZIP)2–3 days
Stage 1Testing & EPUBCheck fixes3–5 days
Stage 2Paragraph addressing (Purple Numbers)2–3 days
Stage 2Citation matching & semantic markup3–4 days
Stage 2JSON metadata serialisation2–3 days
Stage 2Unified ID table3–4 days
Stage 2Integration testing & reader compatibility2–3 days
Total~4–6 weeks (one developer)

If Author already has an HTML export pipeline, the RTF parsing effort may be substantially less — the existing HTML output could serve as the starting point for XHTML conversion, which would reduce Stage 1 by roughly a week.


Appendix: Files from the sample .liquid package

Author.plist metadata:

  • firstName: Frode
  • middleName: Alexander
  • lastName: Hegland
  • institution: University of Southampton
  • title: Origami Article

Citations.plist: 14 citation entries with full bibliographic data including DOIs, URLs, abstracts, and structured author names.

DynamicView.json: 1 node (“1. 21st Century Origami Text”), no spatial positions populated, no connections. In a production document the nodes array and nodePositions array would be substantially richer.

glossary.json: 1 entry linking “1. 21st Century Origami Text” to internal anchor 18A6C556-E84D-40C4-938D-02F6286BDDB4.

Version.plist: format version 8.11, AuthorBundleVersion 2250, citation format “nameAndDateInBrackets”, last opened on iOS.