1Current system we are using to publish this website

The page you are looking at is written in Org Mode, and then converted to HTML with Pandoc and a Pandoc Lua filter. My only reason for using Org Mode is its Outline Mode.

2Why do we want our own publishing system?

Because existing systems do not and will not satisfy our requirements. When nobody else is going to do it, we have to do it ourselves.

These are strong reasons for rolling our own publication system in Prolog:

As we age, we learn that a hack/quick-fix/duct-tape only leads to misery, regret, and self-hate.

However, the last time we tried to create our own publishing system was a big XML disaster.

2.1What unsatisfactory things exist?

LaTeX2HTML1

Why not MathML? Chicken-and-egg problem.2

These things do not satisfy us and will not satisfy us.

2.1.1Pandoc

(WONTFIX) Org Mode Reader duplicates footnotes3.

(workaround available) HTML writer does not separate adjacent footnote references4.

amsthm, theorem/lemma/definition/proof

We want by default #+OPTIONS: ^:nil.

Multi-authorship?

Issue 16085

LaTeX command argument, hard to fix, practically wontfix67

2.1.2LaTeX

2.1.3Markdown

2.1.4Org Mode

2.1.5RST

2.2What unsatisfactory things have we tried?

2.2.1Org Exporter

I tried Org Exporter; it was too slow. On <2019-01-01>, I tried pandoc-scholar, but it was too slow for my needs; I need to routinely convert my website. Thus I wrote a Pandoc Lua filter to translate org mode to jekyll html. It is still somewhat slow, but its output is acceptable. The alternative is to switch from Org to Markdown, but find a Markdown editor that can fold sections8. Set up vim markdown folding, or get a markdown editor that can fold sections, maybe vscode. I prefer vim to emacs because I can move around much faster in vim. Vim is designed for motion. Emacs is designed for extensibility. I could also use regex to replace org *** with markdown ### as a preprocessing step before calling Pandoc, but I would rather have a non-hacky long-term solution. I think I will need to reimplement some of Jekyll + Pandoc + Org in Prolog.

3Our requirements

<2018-12-19> Example 24 "Markdown citations" https://pandoc.org/demos.html

3.1Input

It has to work with all my old documents: some Markdown, some LaTeX, some Org Mode.

The input may have Unicode characters, mathematics, physical quantities with units, and chemical equations.

The input encoding is UTF-8.

The input may be Org Mode or Modified LaTeX.

The input must be as semantic as possible (content, not presentation). Consider sTeX9 [1].

Fortunately, most of my LaTeX files are already in such semantic/declarative orientation and oblivious to formatting.

3.2Output

The output may be HTML, possibly with CSS and JavaScript.

The output may be LaTeX acceptable by xelatex for PDF output.

The output must be a static page that can easily be assembled into a static website.

The output must not depend on URL rewriting.

3.3Build

Incremental build.

There are several choices: MathML, SVG fonts, HTML+CSS flex boxes.

LaTeX math requirements: underbrace, index, superscript, subscript, left superscript, left subscript, big delimiters, cases, equal-sign alignment, equation numbering, theorem environments.

3.4Online-navigation features?

Breadcrumbs?

Tagging?

3.5Book-publishing features

Table of contents with word count and reading time.

Indexing, local table of contents, list of tables, list of figures, list of theorems.

LaTeX index entry format.

Cross-references to pages, sections, equations, etc.

3.6LaTeX features

siunitx

3.7Academic publishing features

Citations.

Bibliography. Bibtex/Biblatex input with all strange characters.

3.8Features, goals, and constraints

This basically reimplements some of Pandoc in Prolog.

I need this big yak shaving because:

  • I need citations in my file:proglang.org.
  • I don't trust Emacs.
  • I don't trust org-ref.
  • I don't trust the interaction between org, pandoc, and pandoc-citeproc.
  • I feel Emacs is too slow.
  • I hate editing and compiling Pandoc; GHC is slow.
  • Markdown is not enough.

Problem:

  • pandoc can't read org metadata/directives?

The constraints:

  • The user edits the file using unmodified Emacs Org-Mode.

Work breakdown:

  • citeproc? bibtex? biblatex? bibliography management?
  • Citing from website.
  • The program exports the Org files to HTML for web publishing without using the slow Emacs Org Exporter.
  • Parse LaTeX begin end environment.
  • YAML metadata preamble.
  • Generate local table of contents for each section that has more one or more child.
  • word count, to replace page numbers as navigation aid
  • Why bibtex? Because Google Scholar exports bibtex. We can then just copy the bibtex fragment from Google Scholar.
  • Parse Org Mode.
    • Inline: bold, italic, underline, code, at-at-html.
    • Heading.
    • Block: EXAMPLE, SRC, HTML
    • Syntax highlighting in code blocks? We skip this because this is not essential.

VERY IMPORTANT: the ability to handle citations without slowing down everything

We can use http://www.swi-prolog.org/pack/list?p=bibtex

We are going to use LMX instead of https://github.com/SWI-Prolog/packages-ltx2htm

4Ontology

A page has a name, a title, an author list, a creation date, an update date, a file name.

A file name has a name and an optional dotted extension.

5Write a Prolog static site generator to translate Org-Mode and Markdown to HTML

I need a publishing system, preferably written in Prolog.

6The LMX (LaTeX-Markdown-XML) markup language?

LMX is a LaTeX-like surface syntax for XML; this enables you to replace the closing tag with a closing brace.

LMX is a combination of Org-Mode.

This is a paragraph.
Separate a paragraph with a blank line.

This is another paragraph.

- This is a list.

\passthru{
WHAT.
}

\cite{foo}

\bibliography

LMX \Tag[Attr1=Val1,Attr2=Val2]{Content} translates to XML <Tag Attr1=Val1 Attr2=Val2>Content</Tag>
\ul{
  \li{This is the first item.}
  \li{This is the second item.}
}

\div[class=foo]{
  \div[class=bar]{
  }
}

Curly braces quote/escape attribute values.
\Tag[Attr={,}]
\Tag[Attr={[}]
\div[data-foo=4123]{}
\h1{bar}
\img[src=foo.png]


Escape a curly brace by prepending a backslash.
\Tag[Beg=\{,End=\}]{}

\table{
  \tr{\td{Foo}\td{Bar}}
}

inline math \( 1 + 2 \)

environments
\begin{align}
1 + 2 &= 3
\\
4 + 5 &\neq 6
\end{align}

\strong{\em{strong emphasized text}}

\p{This is an example paragraph. It may contain \em{emphasis}.}
\p{ Leading and trailing whitespaces are stripped away. }
\p{\ Backslash-space escapes a space that would otherwise be ignored.}

HTML entities:

\nbsp expands to &nbsp;

Difference from TeX/LaTeX:

  • In TeX, " b" inserts no space. We insert a space. If you don't want a space, write "b" or "{}b".
  • In LaTeX, you use "\bibliography{FILE}". In LMX, you use "\bibliography", and specify the FILE from the command line or from Prolog code.

Similar but not the same 1999 "TEXML: Typesetting XML with TEX" https://www.tug.org/TUG99-web/pdf/lovell.pdf

"SXML: an XML document as an S-expression" http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.729.3639&rep=rep1&type=pdf

R markdown https://bookdown.org/

LaTeX syntax for XML.

\tag[attr1=val1,attr2=val2]{content} = <tag attr1="val1" attr2="val2">content</tag>
\amp = &amp;
\e{amp} = &amp;

Make a semantic-LaTeX parser/interpreter in Prolog. Treat LaTeX as declarative markup language and not imperative command language.

7Designing our publishing system

7.1How far can we go with HTML+CSS?

http://www.zipcon.net/~swhite/docs/math/math.html

http://jkorpela.fi/math/

Fractions with HTML tables or CSS flex boxes.

Superscripts, subscripts, indexes

Proof trees with HTML tables or CSS flex boxes.

Drupal will not do. We want a static website. https://math.stackexchange.com/questions/51903/best-way-to-set-up-a-wiki-for-maintaining-a-structured-math-journal

Radicals/surds/roots? 12th root of 2? https://www.mathworks.com/matlabcentral/answers/298325-how-to-make-published-html-look-as-good-as-the-blogs-on-matlab-central

https://www.authorea.com/users/5713/articles/19359-latex-is-dead-long-live-latex-typesetting-in-the-digital-age/_show_article

https://janmr.com/blog/2015/01/typesetting-math-with-html-and-css-fractions/

7.2How far can we go JavaScript?

https://stackoverflow.com/questions/796890/whats-the-best-way-to-write-mathematical-equations-on-the-web

DOM layout reflow is slow.

8How should we publish on the Internet?

I also write Using Emacs and Org Mode, Emacs for vi users, and Working around Markdown.

9Bibliography

[1] Kohlhase, M. 2008. Using latex as a semantic markup format. Mathematics in Computer Science. 2, 2 (2008), 279–304. url: <https://kwarc.info/people/mkohlhase/papers/mcs08-stex.pdf>.


  1. https://texfaq.org/FAQ-LaTeX2HTML

  2. https://jblevins.org/log/mathml-and-chrome

  3. https://github.com/jgm/pandoc/issues/5196

  4. https://github.com/jgm/pandoc/issues/5197

  5. https://github.com/jgm/pandoc/issues/1608

  6. https://github.com/jgm/pandoc/issues/5261

  7. https://github.com/jgm/pandoc/issues/3047

  8. https://vi.stackexchange.com/questions/9543/how-to-fold-markdown-using-the-built-in-markdown-mode

  9. https://kwarc.info/systems/sTeX/