The critique below is edited for formatting and once recommendations by AI are implemented, they are removed from the following, with expectations of independent AI reviews in the future.
By Claude.ai
“The key value propositions only materialise in readers that don’t yet exist.”
The XR reconstitution of spatial knowledge maps, AI parsing of conceptual topology, citation-aware reading — these all require specialised readers that aren’t deployed. A skeptic will say the use cases are aspirational, not demonstrated.
How to address it: The page links to an “in use” scenario — lean harder on that. Better still, ship a working demo: a simple web-based reader that renders the Visual-Meta spatial layout even partially. Seeing is believing in format arguments.
(Note: The benefit is immediate for the basic document in that compatibility between different readers will be greater when publishing in the stripped down EPUB format)
“Academic publishing is controlled by journals, not authors. Format innovations that don’t address submission workflows die.”
Researchers can’t submit Origami Text to Nature or PLOS. The format addresses the reading and archiving problem but not the submission pipeline, where most friction lives.
How to address it: Explicitly scope the claim. Origami Text is a post-acceptance, post-publication layer — a richer distribution format, not a submission format. Frame it alongside, not replacing, the Word/LaTeX → journal pipeline. Authors export to Origami Text for their personal archive, preprint server, and XR readers; journals get their usual PDFs.
(note: Peer sharing of academic documents does of course also fit within the model, but this is indeed primarily a publishing format)
“Why not extend existing standards? JATS XML, schema.org, or even structured HTML already do much of this.”
A JATS-aware critic will point out that publishers like PLOS and eLife already have machine-readable XML with structured citations, MathML, and metadata. A Semantic Scholar engineer will note they already extract concepts and citation graphs from PDFs without author cooperation.
How to address it: Acknowledge JATS directly and distinguish the use case: JATS is a publisher-side format requiring professional markup; Origami Text is an author-side format requiring only a writing tool. The distinction is who does the structuring — author at writing time vs. post-hoc extraction. This also means the spatial/epistemic data is available that JATS never captures.
(Note: The author end user can more easily add semantic metadata simply through adding headings and other functions supporting the authoring process, and not stripping it out when publishing)
“The graceful degradation argument cuts both ways — if you can ignore all the rich layers, most readers will.”
Graceful degradation is presented as a feature. A pragmatist will say: if an EPUB reader ignores the Visual-Meta, an AI ignores the concept markers, and an XR reader doesn’t exist yet, the format degrades all the way to a slightly-cleaner EPUB — which is not a compelling enough reason for adoption.
How to address it: The counter is that adoption is gradual by design. Each incremental reader capability (even just citation import from CSL-JSON attributes in Zotero) delivers value. Consider listing the minimal adoption wins explicitly: “Even with no XR reader, you get one-click citation import and AI-ready structure.”
By Gemini.google
The Rationale Behind “Origami Text”
The core rationale of the project is that as computing moves into Extended Reality (XR) , text shouldn’t just be a flat, 2D simulation floating in space. Instead, knowledge should be “spatial” and “foldable”—allowing users to collapse, expand, and manipulate documents dynamically (like origami) to manage cognitive load and focus. Furthermore, the project emphasizes building this on Open Standards so that spatial documents aren’t locked inside the proprietary walled gardens of Big Tech (e.g., Apple Vision Pro or Meta Quest ecosystems).
Technical Feasibility & Evaluation
The Future Text Lab has been debating how to technically represent these spatial documents, specifically weighing HTML vs. JSON.
1. The Data Structure Debate (HTML vs. JSON)
- HTML is highly semantic. It inherently understands headers, paragraphs, and links, making it the bedrock of document accessibility. However, HTML was not built for Z-axis coordinates, rotational values, or 3D bounding boxes.
- JSON is highly structured and natively favored by 3D rendering engines (like Unity, Unreal, or WebGL environments) to pass coordinate data and node properties.
- Feasibility: Using a hybrid approach—where Visual-Meta is appended to a standard document format (like PDF or HTML) to carry a JSON payload detailing spatial coordinates and foldability state—is highly feasible. This allows the document to degrade gracefully on 2D screens while unlocking “origami” features in XR.
2. Visual-Meta Integration
Visual-Meta’s existing approach is to place machine-readable metadata in an appendix at the end of a document. Expanding this to include spatial/origami parameters is technically sound because it doesn’t break legacy PDF or HTML readers.
Potential Issues & How to Address Them
If they are pursuing this approach, they will likely run into a few specific technical hurdles:
The WebXR vs. Native Ecosystem Lock-in
- The Problem: While the rationale is to avoid Big Tech lock-in using Open Standards, the best spatial computing experiences are currently tightly coupled to proprietary OS-level APIs (like Apple’s visionOS).
- The Solution: They must prioritize WebXR. By building the “Origami Text” parser to run in WebXR-compatible browsers, the spatial documents can be hosted as standard webpages and viewed interactively across different headsets without relying on proprietary app stores. (Note: Origami Text uses simple spatial coordinates, easily used by any rendering system)
The broader architectural goal of “Origami Text”—using open metadata to create interactive, collapsible 3D text environments—is technically viable, provided the team focuses heavily on WebXR standards and SDF rendering to overcome the inherent limits of spatial text.
By ChaGPT
Bottom line
The approach is technically feasible and strategically sensible: EPUB is already a ZIP-based, open, single-file package of XHTML/CSS/resources, and EPUBCheck exists as the official conformance checker. (W3C)
But the claim should be softened from “this will work everywhere” to:
“The readable paper layer should work in normal EPUB readers; the Origami/Visual-Meta layer can travel inside the file and be used by specialised tools, provided the profile is carefully constrained and validated.”
That is a good, achievable goal.
What is strong
The rationale is sound. The page correctly identifies the gap: PDF gives portability and stable visual form; HTML gives structure and interactivity; scholarship needs portability, structure, citation addressability, and long-term readability. The proposal’s layered idea is good: ordinary readers ignore the extra data, while aware tools use it. (Visual-Meta)
Using EPUB rather than inventing a new format is the right instinct. EPUB content documents are XHTML, EPUB is packaged as a single container, and EPUB supports structured web content in a portable publication. (W3C)
The strongest practical claim is graceful degradation: if the extra Visual-Meta data is ignored, the file can still be read as an academic paper. That is exactly the right adoption posture. (Visual-Meta)
Main issues and how to deal with them
Inline JSON in <script> is risky
The page proposes or considers embedded JSON in a script block. EPUB 3 does include scripting sections, but some reading systems may treat anything script-related conservatively. The compatibility page already flags this. (Future Text Lab)
Fix: make visual-meta.json in the EPUB package the authoritative metadata source. Treat inline JSON only as optional convenience. If inline embedding causes even one major reader problem, remove it.
EPUB 3 requires XHTML discipline
EPUB XHTML must conform to XML syntax. That means well-formed markup, quoted attributes, namespace correctness, closed tags, and no sloppy browser-style HTML. (W3C)
Fix: build export around an XML serializer, not string concatenation. Every export should run EPUBCheck automatically before release. EPUBCheck is the official conformance checker. (GitHub)
The “single XHTML file” idea may conflict with scale and EPUB norms
For a 20–40 page paper, one XHTML file is probably fine. For monographs, image-heavy documents, or long reports, a single spine item may become slow or awkward. The compatibility page admits this. (Future Text Lab)
Fix: define two profiles:
Origami Article Profile: one main XHTML file.
Origami Book/Profile Longform: multiple chapter XHTML files, shared visual-meta.json, stable IDs across sections.
Page numbers and citation addressability need a firmer answer
The EPUB Issues page correctly says the academic blocker is stable addressability, not just comfort. It proposes paragraph-level addressability, possibly Engelbart-style purple numbers. (Future Text Lab)
This is critical. EPUB already has EPUB CFI for fragment/range references, designed to recover locations through parser variation and document revisions. (W3C GitHub)
Fix: use visible or invisible stable paragraph IDs plus optional EPUB CFI. Do not rely only on generated location numbers. Every paragraph, figure, table, citation, glossary item, and equation should have a durable ID.
“PDF hides structure” is practically true, but technically overstated
The critique is fair for most real-world PDFs, but PDF can contain tagged structure, XMP metadata, links, attachments, and embedded metadata. The stronger claim is not “PDF cannot,” but “PDF culture and tooling rarely make scholarly structure dependable.”
Fix: phrase the rationale as an ecosystem/practice problem, not an absolute file-format impossibility.
Visual-Meta schema must be versioned
The proposal says JSON will carry concepts, citations, glossary, spatial layout, and addressability. That is plausible. But without a strict schema, different implementations will drift.
Fix: publish a small formal schema: profileVersion, documentID, canonicalSource, paragraphIDs, citations, glossary, spatialNodes, edges, coordinateSystem, units, createdBy, exportedAt.
Spatial layout needs semantic anchoring, not just coordinates
Carrying z-depth and node positions is easy. Making them meaningful later is harder. A future XR reader needs to know what each node refers to, what coordinate system was used, whether layout is authorial, generated, or user-edited, and what happens when the text changes.
Fix: each spatial node should point to a stable document object ID, not just a text string. Store layout provenance: author-created, generated, imported, revised.
Cross-document citation opening is harder than described
The “if they own the cited document, click to open it to the cited section” scenario is attractive, but it needs reliable document identity, local discovery, permissions, and version matching. (Visual-Meta)
Fix: include DOI, ISBN, URL, document hash, Visual-Meta ID, and target fragment/paragraph ID. If the exact document is missing, degrade to search by DOI/title/author/year.
Feasibility verdict
Core EPUB export: highly feasible.
Embedded citation/glossary metadata: feasible.
AI-readable structured paper: feasible and useful.
XR reconstruction of spatial knowledge maps: feasible as a specialised reader feature, but needs a stricter schema.
Universal compatibility: feasible only for the ordinary reading layer, not the enhanced layer.
Academic adoption: possible, but only if stable citation/addressability and validation are treated as first-class requirements.
The project is strongest when presented not as “EPUB finally fixed,” but as:
A disciplined EPUB profile for scholarly documents, with embedded, versioned Visual-Meta that preserves intellectual structure while remaining readable everywhere.