Backburner Month 01: Van de Graaff

September 3, 2021

This is an entry from a list of projects I hope to some day finish. Return to the backburnered projects index.

What is it? Van de Graaff is a static site generator built around declarative pipelines. The original design used a vaguely Lisp-like format which could describe how to take various source files and produce downstream files from them, with the goal of being unreasonably flexible in terms of the format and interpretation of input data. A small fragment of a site configuration looked like this:

# `every` produces a single output file for each input file
(every "posts/%.md" {
  # the output filename is a function of the input one
  produce: "posts/%.html"
  # these are the arguments passed to the standard page
  # template, defined elsewhere in the config file
  with: {
    # the template here is invoked with `content` set to
    # the result of rendering the file contents with markdown
    content: (render markdown _)
    # this pulls out of the front matter at the beginning of
    # the Markdown file
    title: (front-matter "title" _)
  }
})

The above example is pretty simple: take every Markdown file in one directory, produce a corresponding HTML page to each in another directory by rendering the Markdown and pulling metadata out of the front matter. But the goal of Van de Graaff is also that the underlying set of “functions” should allow for a massive amount of customizability. For example, maybe we don't want to use front-matter in a Markdown file: maybe each post should be contained in its own directory, with metadata living in a separate file from the source itself, and indeed with the source file's location and markup format defined in the metadata file. The goal of Van de Graaff was to be flexible enough to accomodate nearly any kind of data organization you wanted while remaining fundamentally declarative:

(every {meta: "posts/%/metadata.json" source: (json-elem "content" meta)} {
  produce: "posts/%.html"
  with: {
    content: (render (if (== (json-elem "format" meta) "rst")
                         rst
                         markdown)
                     source)
    title: (json-elem "title" meta)
  }
})

Notice some of what's going on here: our “input” file has become a pair of named files, one of which depends on the content of another. There's now conditional logic to decide how to actually render the input file, and which renderer to use depends on the other file. This is certainly a kind of data organization that'd be harder to express in most existing static site generators.

The original goal was to build Van de Graaff as a tool with a decidable (i.e. non-Turing-complete) language, hence the Lisp-inspired-but-clearly-data-driven formatting system. Whenever I return to this project, my current plan is to reconsider this decision and instead write the pipelines in something like Lua or Wren, but still keep the core idea intact: the script itself would not actually load and process any files, but rather define a set of declarative pipelines, and those pipelines would be used to process the files, allowing reuse of intermediate outputs as well as cached computation of inputs known to be identical.

Why write it? There are two custom static site generators that I've written which I still use: one of them generates my sadly-languishing random-topic blog What Happens When Computer, and the other generates my prose and quote-collection site. They're rather different in almost every way: the former is written in Scheme (complete with an S-expression based templating mechanism) and ingests a custom-designed markup format, while the latter is written in an unholy combination of Mustache, Python, Bash, and Make, and ingests a combination of Markdown and YAML.

Despite being radically different from one another, they do have features in common: they both care about taking collections of things (posts, snippets of text, specific pages, &s) which they read, process, and munge into several different output files. A blog like What Happens When Computer would be generally well-supported by an off-the-shelf solution like Hugo, but I'd have to do a little bit more tweaking for my prose-and-quotes site. On the other hand, I could easily port the prose-and-quotes site to use a CMS-like solution like Python's Lektor, but at the cost of abandoning my own data organization: I'd have to convert my existing data to use such a new format, which would break other tools I've already written that use this data in other ways.

Van de Graaff was my attempt to write a flexible-but-still-constrained system which could accommodate them both as well as anything else I'd like to write, replacing not just static site generators but any system where I write shell scripts or Makefiles to ran Pandoc for me. It's rather overengineered, and it's a terrible effort-to-reward ratio, but I still like many of the underlying ideas.

Why the name? It's a static generator, so I immediately thought of a Van de Graaff generator. (According to old notes and repos, I also at one point called it TMS but literally cannot remember what that was supposed to stand for.)

#backburner #software #tool