• Overview
  • Documentation
  • Using xSweet
  • Get Involved
Overview
XSweet Core
HTMLevator
Editoria Typescript

Editoria Typescript

Editoria Typescript transforms HTML into a format required for the Coko Foundation’s Wax WYSIWYG word processor for Ketida (previously named Editoria). While Wax has been built specifically for book editing and publication, it is by no means its only application, and it could be repurposed. Other similar chains could be implemented to target another format.

Editoria Typescript translates the document structure, inline and class formatting, endnotes and footnotes into a subset of near-HTML, while eliminating HTML attributes not used by Wax.

Contents#

  • Pipeline
  • p-split-around-br.xsl
  • editoria-basic.xsl
  • editoria-reduce.xsl

Pipeline#

Editoria Typescript should be run in the following order:

  1. p-split-around-br.xsl
  2. editoria-basic.xsl
  3. editoria-reduce.xsl
p-split-around-br.xsl#

It is possible to specify line breaks within paragraphs in Word (<w:br/>, which are extracted as XHTML <br class="br" /> tags).

As Wax does not support <br>s, this step simply divides paragraphs on breaks, removing the break and creating two separate <p> elements instead.

<p style="font-family: Times New Roman; text-indent: 36pt">
  Kṛṣṇadevarāya discusses this practice in the following verse:
  <br class="br"/>
  Make trustworthy Brahmins
</p>

becomes

<p>Kṛṣṇadevarāya discusses this practice in the following verse:</p>
<p>Make trustworthy Brahmins</p>
<p>The commanders of your forts</p>
editoria-basic.xsl#

XSweet’s initial extraction divides the contents of the HTML document into sections: <div class-"docx-content">, <div class-"docx-endnotes">, and <div class-"docx-footnotes">. This step rearranges the content:

  • <div class="docx-content"> becomes <container id="main">
  • Notes are reformatted and moved into a <div id="notes">

Notes and their ids are also rewritten, from:

<div class="docx-endnotes">
  <div class="docx-endnote" id="en1">
    <p class="EndnoteText">
      <span class="EndnoteReference">
        <span class="endnoteRef">1</span>
      </span> endnote</p>
  </div>
</div>
<div class="docx-footnotes">
  <div class="docx-footnote" id="fn1">
    <p class="FootnoteText">
      <span class="FootnoteReference">
        <span class="footnoteRef">a</span>
      </span> footnote</p>
  </div>
</div>

to

<div id="notes">
  <note-container id="container-en1">
    <p class="EndnoteText"> endnote</p>
  </note-container>
  <note-container id="container-fn1">
    <p class="FootnoteText"> footnote</p>
  </note-container>
</div>

These are then properly linked and nicely displayed in Wax. Endnotes and footnotes are combined into one sequential list:

editoria-basic.xsl writes some properties from CSS style attributes inline:

  • font-style: italic is written to inline elements wrapped in an <em> tag
  • font-weight: bold is written inline as <strong> tags
  • text-decoration: underline is written inline as <i> tags, which is*

The following inline formatting tag mapping then occur:

  • <b>s are converted to <strong>
  • <u> is converted to <i>
  • <i> is then converted to <em>

Note that we have made the decision convert underlining to italics. Wax does not currently support underlining.

editoria-reduce.xsl#
  • All class and style information is dropped. Bye bye class, bye-bye style!

    •  <p class="EndnoteText"> endnote</p> becomes <p> endnote</p>
  • Other tag attributes (e.g. id) are passed through

  • <sub> and <sup> tags are passed through

  • Inline markup on whitespace only (spaces, tabs) is removed, e.g. <b> <b>

  • tabs are removed: <span class="tab">

  • Paragraphs or headings with only whitespace or no content at all are removed, e.g. <p></p>, <p> </p>, <h1></h1>

  • Internal-to-Word bookmarks (see this example](/xsweet-core/#links)) are removed

  • <head><style> tag is removed

HTMLevator

Stay up to date

xSweet is made thanks to

  • ljaf foundation
  • Moore foundation
  • ljaf foundation

© 2021 Copyright Adam Hyde. CC-BY-SA.