Case study: Document prepared for the Author development team
1. The .liquid Package: What You Have
The .liquid format is a macOS package (directory) containing:
Document.liquid/
└── Contents/
├── Content.rtfd/TXT.rtf ← Main document text (RTF)
├── Content.liquidstore ← Binary plist (NSAttributedString archive)
├── Author.plist ← Document metadata (author, title, institution)
├── Citations.plist ← Full citation database, keyed by UUID
├── DynamicView.json ← Map view: nodes, positions, connections
├── LeftMarginDynamicView.json ← Left margin map data
├── RightMarginDynamicView.json← Right margin map data
├── glossary.json ← Defined concepts with internal link IDs
├── Cuttings.plist ← Clippings/excerpts (binary RTF data)
├── Version.plist ← Format version, font size, citation style
├── endnotes.json ← Endnotes (empty in sample)
├── inlineNotes.json ← Inline notes (empty in sample)
├── customJSON.json ← Custom JSON payload (null in sample)
├── Comments.plist ← Comments (empty in sample)
└── Hidden.plist ← Hidden content (empty in sample)
Key structural observations
Content encoding: The main text is RTF with Cocoa extensions (\cocoartf2870). Headings are distinguished by font and size: title is Baskerville \fs50, section headings are Baskerville \fs38, body text is TimesNewRomanPSMT \fs34. Key/opening sentences use colour index \cf4 (brown/amber); body text uses \cf3 (dark grey).
Citations in text: Inline citations appear as parenthetical text within the RTF (e.g. (Halevi, Moed, Bar-Ilan 2015)), and the full citation records live in Citations.plist keyed by UUID. Each citation record contains: title, citationAuthors (array of first/last/middle/prefix/suffix), yearComponent, doi, webAddress, journal, volume, pageRange, bibTeXType, note, abstract, and more.
The ID Mismatch Problem: Three separate UUID spaces exist:
| System | Example ID | Location |
|---|---|---|
| Map nodes | D10045AE-F24E-4A3B-... | DynamicView.json → nodes[].identifier |
| Glossary entries | 98884DEF-5EE8-4BC4-... | glossary.json → entries{}.identifier |
| Glossary internal links | 18A6C556-E84D-40C4-... | glossary.json → entries{}.internalLinkId |
| Citations | C3736192-02AE-4635-... | Citations.plist → keys |
These must be unified at export time (see Stage 2 below).
2. Target: EPUB 3 Structure
The output EPUB is a ZIP file with this layout:
document.epub (ZIP)
├── mimetype ← MUST be first entry, uncompressed
├── META-INF/
│ └── container.xml ← Points to the OPF package
├── OEBPS/
│ ├── content.opf ← Package manifest + metadata + spine
│ ├── nav.xhtml ← EPUB 3 navigation document (TOC)
│ ├── content.xhtml ← The document body
│ └── style.css ← Minimal stylesheet
│ [Stage 2 additions:]
│ ├── origami-meta/
│ │ ├── citations.json ← Full citation database as JSON
│ │ ├── glossary.json ← Defined concepts
│ │ ├── spatial-layout.json ← Unified XR/Map layout data
│ │ └── visual-meta.json ← Visual-Meta block (document's own BibTeX etc.)
│ └── [any images from Content.rtfd]
STAGE 1: Basic Valid EPUB 3
Goal: produce an EPUB that passes EPUBCheck and opens correctly in Apple Books, Calibre, Thorium, and Kobo.
Step 1.1 — Parse the RTF into semantic HTML
This is the primary engineering task. The RTF parser must:
Detect headings by font/size pattern:
\f0\fs50(Baskerville 25pt) →<h1>(document title)\f0\fs38(Baskerville 19pt) →<h2>(section headings)\f1\fs34(TimesNewRoman 17pt) →<p>(body paragraphs)
Convert formatting:
\f3\b(TimesNewRomanPS-BoldMT) →<strong>\f2\i(TimesNewRomanPS-ItalicMT) →<em>- RTF bullet lists (
\li300\fi-300with\'95) →<ul><li> - Tab-indented paragraphs → new
<p>(not indentation) \cf4colour text (key sentences) → at this stage, simply merge into<p>with no special markup (Stage 2 adds<span class="key-sentence">)- RTF
\\'92→'(right single quote),\\'91→'(left single quote),\\'93→",\\'94→",\\'95→ bullet,\\'97→ em dash - RTF
HYPERLINKfields →<a href="...">
Produce valid XHTML:
- XML declaration:
<?xml version="1.0" encoding="UTF-8"?> - XHTML namespace:
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops"> - All tags self-closing where appropriate (
<br/>,<img/>) - All attribute values quoted
- No unclosed tags
Minimal output structure:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" xml:lang="en">
<head>
<meta charset="UTF-8"/>
<title>Origami Text</title>
<link rel="stylesheet" type="text/css" href="style.css"/>
</head>
<body>
<h1>Origami Text</h1>
<p><em>Minimal EPUB, Rich Metadata</em></p>
<h2>Why Current Scholarly Formats Fall Short</h2>
<p>To solve the urgent, complex problems...</p>
<!-- etc. -->
</body>
</html>
Step 1.2 — Generate the navigation document (nav.xhtml)
Walk the heading structure from Step 1.1 and generate:
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
<head><title>Table of Contents</title></head>
<body>
<nav epub:type="toc" id="toc">
<h1>Contents</h1>
<ol>
<li><a href="content.xhtml#sec-why-current">Why Current Scholarly Formats Fall Short</a></li>
<li><a href="content.xhtml#sec-origami-approach">The Origami Approach</a></li>
<!-- one <li> per <h2> in content.xhtml -->
</ol>
</nav>
</body>
</html>
Each <h2> in the content needs a corresponding id attribute (e.g. id="sec-origami-approach") for the TOC links. Generate these as slugified heading text or sequential IDs.
Step 1.3 — Create the minimal stylesheet (style.css)
body {
font-family: serif;
line-height: 1.6;
margin: 1em;
}
h1 { font-size: 1.8em; margin-bottom: 0.3em; }
h2 { font-size: 1.3em; margin-top: 1.5em; margin-bottom: 0.5em; }
p { margin: 0.5em 0; text-indent: 0; }
ul { margin: 0.5em 0 0.5em 1.5em; }
a { color: inherit; text-decoration: underline; }
strong { font-weight: bold; }
em { font-style: italic; }
Keep this deliberately sparse — this is the Origami philosophy of minimal formatting.
Step 1.4 — Create the package document (content.opf)
Read Author.plist for metadata. Generate a UUID-based unique identifier for the EPUB.
<?xml version="1.0" encoding="UTF-8"?>
<package xmlns="http://www.idpf.org/2007/opf" version="3.0" unique-identifier="uid">
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:identifier id="uid">urn:uuid:[GENERATE-UUID]</dc:identifier>
<dc:title>Origami Article</dc:title>
<dc:creator>Frode Alexander Hegland</dc:creator>
<dc:language>en</dc:language>
<meta property="dcterms:modified">[ISO-8601-TIMESTAMP]</meta>
</metadata>
<manifest>
<item id="nav" href="nav.xhtml" media-type="application/xhtml+xml" properties="nav"/>
<item id="content" href="content.xhtml" media-type="application/xhtml+xml"/>
<item id="css" href="style.css" media-type="text/css"/>
</manifest>
<spine>
<itemref idref="content"/>
</spine>
</package>
Step 1.5 — Create META-INF/container.xml
This is boilerplate:
<?xml version="1.0" encoding="UTF-8"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
<rootfiles>
<rootfile full-path="OEBPS/content.opf" media-type="application/oebps-package+xml"/>
</rootfiles>
</container>
Step 1.6 — Create the mimetype file
A plain text file containing exactly: application/epub+zip — no trailing newline, no BOM.
Step 1.7 — ZIP packaging
This is where most first-time EPUB builders fail. The rules:
- The
mimetypefile MUST be the first entry in the ZIP archive - The
mimetypefile MUST be stored with NO compression (store method) - The
mimetypefile MUST NOT have an extra field in its ZIP header - All other files use standard deflate compression
In Swift, using Archive or a ZIP library:
// Pseudocode
let archive = ZIPArchive(path: outputPath)
archive.addEntry("mimetype", data: mimetypeData, compression: .none)
archive.addEntry("META-INF/container.xml", data: containerData)
archive.addEntry("OEBPS/content.opf", data: opfData)
archive.addEntry("OEBPS/nav.xhtml", data: navData)
archive.addEntry("OEBPS/content.xhtml", data: contentData)
archive.addEntry("OEBPS/style.css", data: cssData)
Step 1.8 — Validation
Run EPUBCheck (https://github.com/w3c/epubcheck) on every generated EPUB during development. It catches namespace errors, missing manifest entries, invalid XHTML, and mimetype packaging issues. Make this part of the test suite.
Step 1.9 — Reader testing
Test in at minimum: Apple Books (macOS + iOS), Calibre, Thorium Reader, and Kobo. Apple Books is the most important given Author’s platform, and it has specific quirks around nav document handling.
Estimated effort for Stage 1: 2–3 weeks (one developer, assuming familiarity with the Author codebase and RTF parsing).
STAGE 2: Origami Text EPUB (Addressable, Metadata-Rich)
This builds on the valid EPUB from Stage 1 and adds the three pillars: addressing, metadata, and the unified ID table.
Step 2.1 — High-resolution addressing (Purple Numbers)
Every structural element in content.xhtml gets a unique, immutable id attribute:
<h2 id="ot-001">Why Current Scholarly Formats Fall Short</h2>
<p id="ot-002">To solve the urgent, complex problems facing our world...</p>
<p id="ot-003">Neither PDF nor HTML, as currently deployed...</p>
<ul>
<li id="ot-004"><strong>PDF</strong> provides stability...</li>
<li id="ot-005"><strong>HTML</strong> delivers exceptional interactivity...</li>
</ul>
ID format recommendation: ot-NNN (sequential, zero-padded to three digits for documents under 1000 elements; extend as needed). Alternatively, use short stable hashes. The key requirement: IDs must be immutable once assigned — they are the citing address and must not change if the document is re-exported.
Author must persist these IDs in the .liquid format (perhaps in the liquidstore or a new addressing.json), so that re-export produces the same IDs for the same content.
Step 2.2 — Citation markup in the body
Replace inline parenthetical citations with semantic markup:
Before (Stage 1):
<p>...researchers routinely accumulate large local collections
(Halevi, Moed, Bar-Ilan 2015), managed through...</p>
After (Stage 2):
<p id="ot-027">...researchers routinely accumulate large local collections
<a class="origami-cite" data-cite-id="C3736192-02AE-4635-AD53-8DC896A6F500"
href="#ref-C3736192">(Halevi, Moed & Bar-Ilan 2015)</a>, managed through...</p>
This requires matching the inline citation text in the RTF to the corresponding entry in Citations.plist. The matching logic should use author-year tuples from the citation records.
Step 2.3 — Key sentences
The RTF uses \cf4 to mark key/opening sentences (rendered in a brown/amber colour in Author). In Stage 2, wrap these in:
<span class="key-sentence">The rise of large language models has made the
format question unexpectedly urgent.</span>
This gives LLMs and reader software a semantic handle on the document’s argumentative structure — something no current format provides.
Step 2.4 — Embed citation metadata as JSON
Create OEBPS/origami-meta/citations.json by serialising the entire Citations.plist into clean JSON. Strip the binary RTF authorInfodata fields and empty fields. Structure:
{
"format": "origami-citations-v1",
"entries": [
{
"id": "C3736192-02AE-4635-AD53-8DC896A6F500",
"type": "article",
"title": "Accessing, Reading and Interacting with Scientific Literature as a Factor of Academic Role",
"authors": [
{"given": "Gali", "family": "Halevi"},
{"given": "Henk F.", "family": "Moed"},
{"given": "Judit", "family": "Bar-Ilan"}
],
"year": 2015,
"journal": "Publishing Research Quarterly",
"volume": "31",
"pages": "102--121",
"doi": "10.1007/s12109-015-9404-9",
"url": "",
"bibtex": "@article{Halevi2015, author = {Halevi, Gali and Moed, Henk F. and Bar-Ilan, Judit}, ...}"
}
]
}
Also generate a BibTeX entry for this document itself (the self-citation entry), using data from Author.plist.
Add both files to the OPF manifest.
Step 2.5 — Embed glossary/defined concepts
Serialise glossary.json into OEBPS/origami-meta/glossary.json, cleaning up the structure:
{
"format": "origami-glossary-v1",
"entries": [
{
"id": "98884DEF-5EE8-4BC4-94C3-0765448AA480",
"phrase": "1. 21st Century Origami Text",
"description": "21st Century Origami Text",
"tag": "section",
"internal_anchor": "ot-001",
"citation_ids": []
}
]
}
Note the internal_anchor field — this is where the ID unification happens (see Step 2.7).
Step 2.6 — Embed spatial/XR layout data
This is the key innovation. Create OEBPS/origami-meta/spatial-layout.json from DynamicView.json, unified with glossary and citation references:
{
"format": "origami-spatial-v1",
"views": [
{
"name": "main",
"source": "DynamicView",
"settings": {
"nodeVisibility": "showAll"
},
"nodes": [
{
"id": "D10045AE-F24E-4A3B-A917-B9D29F4E8CBA",
"type": "text",
"name": "1. 21st Century Origami Text",
"content_anchor": "ot-001",
"glossary_id": "98884DEF-5EE8-4BC4-94C3-0765448AA480",
"citation_ids": [],
"position": {"x": 0, "y": 0, "z": 0},
"is_hidden": false
}
],
"connections": []
},
{
"name": "left-margin",
"source": "LeftMarginDynamicView",
"nodes": [],
"connections": []
},
{
"name": "right-margin",
"source": "RightMarginDynamicView",
"nodes": [],
"connections": []
}
]
}
Step 2.7 — The Unified ID Table (solving the mismatch)
This is the critical architectural piece. Create an internal lookup that maps between all four ID namespaces at export time:
{
"format": "origami-id-map-v1",
"mappings": [
{
"content_anchor": "ot-001",
"map_node_id": "D10045AE-F24E-4A3B-A917-B9D29F4E8CBA",
"glossary_id": "98884DEF-5EE8-4BC4-94C3-0765448AA480",
"glossary_link_id": "18A6C556-E84D-40C4-938D-02F6286BDDB4",
"citation_ids": [],
"type": "section",
"label": "1. 21st Century Origami Text"
}
]
}
How to build this table:
- Parse the RTF and assign sequential
ot-NNNanchors to every block element. - For each Map node in
DynamicView.json, match itsnamefield against heading text in the document to find the correspondingot-NNNanchor. - For each glossary entry, use its
internalLinkIdto find the corresponding paragraph/heading in the document (this likely maps to a text range in theliquidstore), and resolve that to anot-NNNanchor. - For each citation, map it by UUID to every inline citation reference in the body.
This table can either be embedded as a separate origami-meta/id-map.json file for reader consumption, or it can remain an internal build artefact used to populate the cross-references in the other JSON files. I’d recommend embedding it — it gives the Origami reader (and any future tool) a single lookup for all relationships.
Step 2.8 — Visual-Meta block
Create OEBPS/origami-meta/visual-meta.json:
{
"format": "visual-meta-v1",
"self_citation": {
"bibtex": "@article{Hegland2026, author = {Hegland, Frode Alexander}, title = {Origami Article}, year = {2026}, institution = {University of Southampton}}",
"type": "article",
"title": "Origami Article",
"authors": [{"given": "Frode Alexander", "family": "Hegland"}]
},
"origami_version": "1.0",
"export_date": "2026-05-27T12:44:00Z",
"author_app_version": "2250",
"format_version": "8.11"
}
Step 2.9 — Update the OPF manifest
All new files must be listed in the manifest:
<item id="citations-json" href="origami-meta/citations.json" media-type="application/json"/>
<item id="glossary-json" href="origami-meta/glossary.json" media-type="application/json"/>
<item id="spatial-json" href="origami-meta/spatial-layout.json" media-type="application/json"/>
<item id="visual-meta-json" href="origami-meta/visual-meta.json" media-type="application/json"/>
<item id="id-map-json" href="origami-meta/id-map.json" media-type="application/json"/>
Step 2.10 — Graceful degradation
This is the Origami principle: a Stage 2 EPUB must still be a valid, readable EPUB in any standard reader. The JSON files in origami-meta/ will simply be ignored by readers that don’t understand them. The content.xhtml is valid XHTML with or without the Origami-specific data-cite-id attributes and class="origami-cite" markup.
Test this explicitly: open the Stage 2 EPUB in Apple Books, Calibre, and Kobo. Verify that it reads correctly, that the TOC works, and that no errors appear.
Estimated additional effort for Stage 2: 2–3 weeks (on top of Stage 1), with the bulk of the time in Step 2.7 (ID unification) and Step 2.2 (citation matching).
Summary: Total Effort Estimate
| Phase | Task | Estimated Time |
|---|---|---|
| Stage 1 | RTF → XHTML parser | 5–7 days |
| Stage 1 | EPUB packaging (OPF, nav, mimetype, ZIP) | 2–3 days |
| Stage 1 | Testing & EPUBCheck fixes | 3–5 days |
| Stage 2 | Paragraph addressing (Purple Numbers) | 2–3 days |
| Stage 2 | Citation matching & semantic markup | 3–4 days |
| Stage 2 | JSON metadata serialisation | 2–3 days |
| Stage 2 | Unified ID table | 3–4 days |
| Stage 2 | Integration testing & reader compatibility | 2–3 days |
| Total | ~4–6 weeks (one developer) |
If Author already has an HTML export pipeline, the RTF parsing effort may be substantially less — the existing HTML output could serve as the starting point for XHTML conversion, which would reduce Stage 1 by roughly a week.
Appendix: Files from the sample .liquid package
Author.plist metadata:
- firstName: Frode
- middleName: Alexander
- lastName: Hegland
- institution: University of Southampton
- title: Origami Article
Citations.plist: 14 citation entries with full bibliographic data including DOIs, URLs, abstracts, and structured author names.
DynamicView.json: 1 node (“1. 21st Century Origami Text”), no spatial positions populated, no connections. In a production document the nodes array and nodePositions array would be substantially richer.
glossary.json: 1 entry linking “1. 21st Century Origami Text” to internal anchor 18A6C556-E84D-40C4-938D-02F6286BDDB4.
Version.plist: format version 8.11, AuthorBundleVersion 2250, citation format “nameAndDateInBrackets”, last opened on iOS.