Note provided by Google Gemini:
Operating under the premise of a direct JATS XML pilot with ACM, the standard front-end TAPS barriers (requiring Word/LaTeX uploads) disappear. With a direct-to-JATS pipeline on the table, the technical hurdles shift from system rejection to schema alignment, rendering pipeline contracts, and data integrity.
Here is the revised critique of the design document tailored to a direct ACM JATS ingestion pilot:
1. The Rendering Pipeline Contract (The “Key Sentence” Problem)
- The Issue: The document maps the core Visual-Meta feature—”Key Sentences”—to
<named-content content-type="key-sentence">. While this is perfectly valid JATS 1.4 syntax, JATS is merely a semantic storage format; it does not dictate visual rendering. - The Pilot Risk: When ACM ingests this JATS file, their automated downstream engines (which generate the final reader-facing HTML5 and PDFs) will use standard stylesheets. Because
"key-sentence"is a custom string unique to your app, ACM’s default rendering pipeline will likely ignore the attribute and display it as plain text, stripping the visual highlighting entirely. - The Action Item: As part of your dialogue, you need to establish a rendering agreement. ACM will either need to map your custom
content-typeto a specific class in their CSS/XSLT stylesheets, or you must negotiate a standard fallback tag that they already style natively (though few standard JATS tags represent semantic “skimming highlights”).
2. ACM-Specific Schematron and Validation Layers
- The Issue: The document focuses heavily on validating against the standard JATS Article Authoring 1.4 DTD. Passing DTD validation is only the first baseline. Major publishers rarely ingest raw DTD-compliant XML without running it through a secondary validation layer called Schematron.
- The Pilot Risk: ACM’s internal Schematron rules likely mandate specific data points that standard JATS leaves optional. For example, ACM may strictly require:
- Valid, live ORCID iDs for all authors within
<contrib>. - Specific, pre-approved license URLs inside
<permissions>. - Identification of specific funding registries (like Crossref Funder Registry IDs) in the metadata.
- Valid, live ORCID iDs for all authors within
- The Action Item: Ask your ACM contacts for their house Schematron schema or publisher-specific XML validation rules. If your export module only tests against the generic JATS DTD, ACM’s ingestion scripts may still reject the file due to missing metadata fields required by their database architecture.
3. Fragile Citation Heuristics (Section 5.3)
- The Issue: The document still relies on regular expressions and fuzzy string-matching to scan flat RTF text for patterns like
(AuthorNames Year)to retroactively bind them via<xref>to a reference ID in theCitations.plist. - The Pilot Risk: In a direct-to-JATS submission, broken internal links are a critical failure. If a regex parser misinterprets a customized in-text citation (e.g.,
(see Halevi et al. 2015, p. 11)or narrative styling likeHalevi and colleagues (2015) argued...), the resulting JATS will have an broken or missing<xref target="...">attribute. This breaks the hyperlink in the final ACM Digital Library layout. - The Action Item: Do not reverse-engineer flat strings post-hoc. Because the “Author” app controls the native environment, it should embed a hidden structural token or UUID directly into the rich text/RTF stream at the exact position the citation is inserted. The export module should read these internal node markers to generate the
<xref>tags flawlessly, avoiding regex parsing entirely.
4. Export Dialog Data Collection Gaps (Section 4.2 & 8)
- The Issue: Because the underlying
.liquidschema lacks metadata slots for co-authors, multiple affiliations, abstracts, and keywords, the document suggests prompting the user via a modal dialog right at the moment of export (Option 2). - The Architectural Risk: For a pilot to be successful, authors need a low-friction workflow. Forcing them to manually type out complex, multi-author, multi-department academic metadata into a transient pop-up box at the very end of their writing process is highly error-prone. Furthermore, if the app crashes, or if the author spots a typo in their paper and needs to re-export five minutes later, they will have to manually re-enter all that metadata from scratch.
- The Action Item: Shift Option 1 (updating the native
.liquidschema to structurally support co-authors, abstracts, and keywords directly inside the document editor) from a “future roadmap” item to an immediate prerequisite for the pilot. The export step should be a single-click action, not a data-entry bottleneck.
Summary Checklist for your ACM Dialogue
To make this JATS pilot successful, consider bringing these three specific technical questions to your next meeting with ACM’s production team:
- “Can you provide the specific Schematron rules or metadata constraints your ingestion engine enforces on top of standard JATS 1.4?”
- “How can we ensure that custom semantic tags like
<named-content content-type="key-sentence">are preserved and styled in the final HTML/PDF rendering pipeline?” - “Does your system require specific syntax for institutional metadata, such as ROR (Research Organization Registry) identifiers, within the
<aff>tags?”