tool

Backburner Month 25: Potrero and Treant and others

September 26, 2021

This is an entry from a list of projects I hope to some day finish. Return to the backburnered projects index.

Edit (2024-09-03): One of the projects described below has something resembling a proper release! The Matzo language has a reasonably usable version available on Github or to try in the browser. For more details, see this blog post about the language.

What is it? Various programming-language-centric approaches to procedural generation. I've done a lot of these, and Potrero and Treant are two of the newest ones, but I'm using this to talk about the whole family of these tools.

My approach here is to build tools for procedural generation as programming languages, and usually specifically as functional or logical languages: the UI is fundamentally text-centric, based on writing bits of structured text, and the operations defined can be broken down into atomic units that can be understood and reasoned about in isolation without having to reason about broader state. Additionally, because they are ideally decidable—i.e. you can ensure that they don't loop forever, but instead will always complete in a finite amount of time—you get a lot of ability to statically analyze them.

I think there's a really interesting design space here, and different experiments have focused on exposing them in different ways. There are some thoughts in common between all of them:

I think (as mentioned) that decidability is important here, both because “this will never loop forever” is a nice property but also because it opens the way for some sophisticated static analysis, like, “Will this program ever produce output with such-and-such a property?”
I think mostly-pure functional programming is a useful way to think about these things, where the only side effect is non-determinism
I think there's some interesting inspiration to be taken from logic programming or answer-set programming (c.f. Smith & Mateas, Answer Set Programming for Procedural Content Generation: A Design Space Approach and also Karth & Smith, WaveFunctionCollapse is Constraint Solving in the Wild)
I think there's a lot of API work to be done around exposing common resources (like Darius Kazemi's corpora) as “standard libraries” to draw on
I think there's also a lot of API work to be done around the right primitives for structuring output either textually or non-textually (although I'll freely admit that there's research here that I'm not conversant with!)

So the goal of my various tools has been to chip away at this space and experiment with these various theses!

Why write it? The big tool in this space right now is Tracery. Tracery is a great tool and I love it. That said, I'm very picky about tools.

To give an example of what I mean, here's an example of a Tracery program which creates a brief description of meeting a person:

{
  "origin": ["#[#setNoun#]story#"],
  "setNoun": [
    "[person:man][them:him]",
    "[person:woman][them:her]",
    "[person:person][them:them]"],
  "adj": ["familiar", "strange", "mysterious"],
  "adv": ["warmly", "cautiously", "profusely"],
  "story": ["You meet a #adj# #person#. You greet #them# #adv#."]
}

You'll notice that the syntax is all JSON, with all the advantages and disadvantages that brings: it's a universally available format, which is good, but it's also more finicky and particular, and also now important syntax is embedded in the string literals in a way that's not (for example) syntax-highlighted by default. Note also the "setNoun" rule above: what this does is actually create new rules effectively by injecting them into the set of rules, so depending on which choice is made, it'll define the rule "noun" as being either "man", "woman", or "person", and define "them" as being a corresponding pronoun.

Here's the same generator expressed in Matzo, a dynamically typed programming language for procedural generation that was also the very first tool I built along these lines:

person := "man" | "woman" | "person";
pronoun :=
  { "man"    => "him"
  ; "woman"  => "her"
  ; "person" => "them"
  };
adj ::= familiar unfamiliar mysterious;
adv ::= warmly cautiously profusely;

n := person; fix n;
puts "You come across a " adj " " n ".";
puts "You greet " pronoun.n " " adv ".";

This has a lot of little convenience features, like the ::= taking a set of bare words that are implicitly understood to be a disjunction. (Indeed, you could rewrite the first line of the program to person ::= man woman person; and it would have the same meaning: I wrote it with := to showcase both approaches.) It also has functions—the value of pronoun is a function which branches on its argument—and the fix operation, which takes a rule which otherwise would be vary, evaluates it once, and modifies its value to be the result of that: i.e. after that line, person will still randomly be one of three values, but n will always deterministically be one of those three values.

After Matzo, I worked on a tool called Latka which was similar but with a light static type system. You'll notice this one relies more heavily on things like (structurally typed) algebraic data types and helper functions to assemble sentences, like pa for a paragraph or se for a sentence. You might also notice that fixed is no longer a thing which changes an existing rule, but a property of the binding site of the variable.

let gender = M | F | N
let noun M = "man"
  | noun F = "woman"
  | noun N = "person"
let them M = "him"
  | noun F = "her"
  | noun N = "them"
let adj = %{familiar unfamiliar mysterious}
let adv = %{warmly cautiously profusely}

let main =
  let fixed n = gender
  pa.[
    se.["You come across a", adj, noun.n],
    se.["You greet", them.n, adv],
  ]

These obviously have tradeoffs relative to Tracery, but they more closely match the way I think about these problems and the way I'd naturally gravitate towards solving them. That said, I also don't think they're competing: in fact, one thing I've wanted to do (but never finished a prototype of) is building cross-calling between them: I'd love to be able to directly include a Tracery program in a Latka program and vice versa.

As with other projects I've described, I think that the affordances of functional or expression-oriented programming have a lot of benefits, and I think decidable programming languages are really useful. And of course, one reason I want to write it is that I love random generation of things.

There's a reason this project is also the last backburner post: it's the project I most regret having left on the backburner, and the one I would love to spend more time on soon once I develop motivation to do so again.¹

Why the name? Latka and Matzo were named together but otherwise chosen at random. I was originally gonna call another experimental language Mesa, which is Spanish for 'table' (by analogy with random tables you'd find in an RPG book) but there's already a graphics driver suite called mesa. So I decided to choose the name of a related geographical phenomenon, since a potrero is a kind of mesa. Similarly, the original version of Treant was designed around probabalistic tree rewriting, and treant is a name used in fantasy role-playing games to refer to a public-domain equivalent to Tolkien's Ents.

#backburner #tool #language #procedural

Hell, I've even given a talk about these before—although a bad one which I'm thankful wasn't recorded, since I was feverish and recovering from a throad infection at the time—and I never did release Latka in a usable format afterwards, which is an ambient regret of mine. I would love to go back and release one of these properly instead!

Backburner Month 19: Aloysius

September 20, 2021

This is an entry from a list of projects I hope to some day finish. Return to the backburnered projects index.

What is it? A reverse proxy and web server interface.

Aloysius was originally inspired by the old academic servers where a person could host static pages under their username. What I wondered was, how could I design a system like that for dynamic sites, where a given user account could run their own services via a reverse proxy but not conflict with any other service on the machine?

So the approach was based on directories and symlinks. The top-level configuration is a directory that contains zero or more subdirectories, each of which uses different files to describe both filters and forwarding logic. Filters are based on matching on either path (i.e. the HTTP requst path) or domain (i.e. the server) or both, and then they can forward in one of three ways—maybe more over time, too, but these were the first three I experimented with—as specified by a mode file: either http (which forwards the request to host host on port port), or redir (which uses HTTP response code resp and redirects to host host), or aloysius, where you specify another directory by the symlink conf, and then recursively check configuration there.

That last one is important, because it allows you to symlink to any readable directory. One idea here is that different subdomains can map to different users on the same machine. For example, let's say you're keeping your Aloysius configuration in $ALOYSIUSDIR, and you've got a user on the machine yan who doesn't have root access but wants to be able to run some dynamic sites. You can set up something like the following:

# create a dir to forward yan's config
$ mkdir -p $ALOYSIUSDIR/00-yan
# match against yan.example.com
$ echo yan.example.com >$ALOYSIUSDIR/00-yan/domain
# use a recursive aloysius config
$ echo aloysius >$ALOYSIUSDIR/00-yan/mode
# and forward it to yan's home directory
$ ln -s /home/yan/aloysius $ALOYSIUSDIR/00-yan/conf

Now yan can write their own configuration for whatever dynamic web services they want—at least, if supported by the ability for them to run user-level services—and it doesn't require giving them root access to change the other configuration for the reverse proxy. It falls directly out of Unix symlinks and permissions!

Why write it? Honestly, it's not terribly useful anymore: I run my own servers using mostly nginx and am pretty comfortable with configuration thereof. But I still think it's a cool idea, and I clearly think this directory-and-symlink-based system of configuration has legs (which I know I've written about before).

I still think it'd be convenient! After all, I have for a long time used a directory of files (along with a wildcard) to configure Nginx: this simply makes that the first-class way of doing configuration! But also, there's a lot of work that goes into writing a good web server, and this is also solving a problem which the tech world seems to no longer have: namely, how to shunt around requests to different web services all living on the same machine, since instead we abstract those away into their own containers and treat them like separate machines.

Why the name? The name Aloysius was chosen pretty much at random. I originally called it Melvil after Melvil Dewey due to a loose metaphor between the tiny scraps of information used in configuration and the card catalogues that Melvil Dewey had helped popularize, but only because at the time I didn't realize what an absolutely awful human being Melvil Dewey was.

#backburner #software #tool #web

Backburner Month 17: Lektor

September 18, 2021

This is an entry from a list of projects I hope to some day finish. Return to the backburnered projects index.

What is it? A protocol for writing RSS readers, heavily inspired by the maildir format.

The core idea of Lektor is that you separate out the typical operation of an RSS reader into two programs which can work concurrently over a specific directory. One program is called a fetcher, and its purpose is to read remote feeds (of any format) and write them into the lektordir format. The other program is called a viewer, and its purpose is to take the files inside the lektordir and view them.

I've got the full format specified elsewhere but the core idea is that you've got a specified directory called your lektordir, which in turn contains four directories: src, tmp, new, and cur. The src directory contains representations of feeds, new contains unread posts, and cur contains read posts. (The tmp directory is an explicit staging place: fetchers will create posts in there and then atomically move them to new, which means that it's impossible for a viewer to ever observe an incomplete post. Even if a fetcher crashes, it will only be able to leave garbage in tmp, and not in new.)

The representations of feeds and posts are themselves also directories, with files and file contents standing in for a key-value structure. That means a given feed is represented by a directory which contains at least an id (its URI) a name (its human-readable name), and a post is a directory which contains at least an id (its URI), a title (its human-readable name), a content file (which is its content represented as HTML), and a feed (a symlink to the feed that produced it.) In both cases, there are other optional files, and also ways of representing fetcher- or viewer-specific metadata.

The use of directories means that fetchers and viewers don't even need a specific data serialization or deserialization format: unlike maildir, you don't even need to “parse” the format of individual entries. This means fetchers need only a bare minimum of functionality in order to write feeds—I had test programs which used basic shell scripts that wouldn't have been able to safely serialize JSON, but were able to write this format trivially.

My intention is not just to design the format—which has already been pretty thoroughly specified, albeit in beta format and without having been subjected to much testing—but also to write the tools needed to write an RSS and maybe ActivityPub reader on top of this format.

Why write it? Well, I wanted to write an RSS reader, because none of the existing ones are exactly what I want, and this seemed like a good way of splitting up the implementation into bite-sized chunks.

There are some interesting features you get out of separating out the fetcher and viewer operation. For one, fetchers become ridiculously simple: they don't even need to be full services, even, they can just be cron jobs which hit an endpoint, grab some data, and toss it into the lektordir. Instead of a reader which needs to support multiple simultaneous versions (e.g. Atom, various versions of RSS, newer ActivityStreams-based formats) you can have a separate fetcher for each. For that matter, you can have fetchers which don't strictly correspond to a real “feed”: for example, you can have one which uses a weather API to include “posts” that are just sporadic weather updates, or one which chunks pieces of your system log and puts one in your feed every day to give you status updates, or whatnot.

Separating out viewers means that you can now have multiple views on the same data as well. Maybe one viewer only cares about webcomics, so you can pull it up and read webcomic pages but let it omit blog posts. Another gives you a digest of headlines and emails them to you, leaving them unread (but of course omits anything you happen to have read during the day.) Another does nothing but pop up a message if you let unread posts pile up too long. Hell, you could have one that takes your combined posts and… puts them together into an RSS feed. Why not?

The cool thing about lektordir to me is that it lowers the barrier to entry to writing RSS-reader-like applications. It takes the parts and splits them behind a well-defined interface. But there's a lot of cool stuff that you get from trying to apply a one-thing-well philosophy to problems like this!

Why the name? It's for reading, and lektor means “reader” in several languages. I'll probably change the name at some point, though, because there's also a plain-text CMS called Lektor.

#backburner #software #web #tool

Backburner Month 11: Palladio

September 12, 2021

This is an entry from a list of projects I hope to some day finish. Return to the backburnered projects index.

What is it? An authoring tool for simplistic shape grammars intended for generative art.

A shape grammar is a tool for building two-dimensional objects. Like all grammars, you can think of a shape grammar as either a recognizer of a language or as a generator of a language, and for our purposes it makes more sense to think of them in the latter sense. Also like other grammars, it consists of a set of rules, each of which has a left-hand side and a right-hand side: generating a language from a grammar is a process of finding rules whose left-hand side matches some part of an intermediate state, and then replacing that part with whatever is specified by the right-hand side. Similarly, some components are considered 'non-terminals', which are symbols that always appear on the left-hand side of rules and are meant to be replaced, while others are 'terminals' which form the final output: once a state contains no non-terminals, we can say it has been fully generated by the grammar.

The biggest distinction between a shape grammar and a traditional grammar, then, is that a shape grammar operates over, well, shapes. In the case of Palladio, the intention was for it to operate over sprites on a square grid. As an example, consider a grammar defined by these three rules:

A simple three-rule shape grammar

These rules are very simple: when we see a symbol on the left of an arrow, it means we try to find that symbol and replace it with whatever is on the right-hand side. That means if we see a black circle, we replace it with the symbols on the right: in this case, a specific configuration of a black square and two red circles. Similarly, when we see a red circle, we can choose one of the two possible rules that applies here: either we can replace it with a black square, or we can replace it with a light blue square.

This grammar is non-deterministic, but generates a very small number of possible outputs: rule A will produce one terminal and two non-terminals, and we can non-determinstically apply rules B or C to the resulting non-terminals. In fact, here we can show all four possible outputs along with the rules we apply to get them:

The four possible outputs of the above shape grammar

That said, it's not hard to write grammars which produce infinite numbers of possible outputs. This grammar is not terribly interesting, but it does produce an infinite number of possible outputs: in particular, it produces an infinite number of vertical stacks of black squares.

A simple two-rule shape grammar that includes a rule which can apply recursively

We can't enumerate every possible output here, but you get the idea:

The first handful of outputs of the above shape grammar

My goal with Palladio was to write an authoring environment for grammars like these, with the specific intention of using them to produce tile-based pixel art pieces. This means not just providing the ability to create and edit rules like these, but with things like live outputs—possibly with a fixed random seed, for stability, and possibly with a large set of seeds, to see wide varieties—and maybe analysis of the output, alongside the ability to export specific results but also the data used to create these.

Why write it? For fun and experimentation! I don't just love making art, I love making tools that make art, and my favorite way to do so is using grammars. Grammars act like tiny constrained programs over some value domain, and you can still bring a very human touch to writing a grammar that produces a specific kind of output.

Doing a tool to support this that's both graphical and constrained to grids is a good way to make it easy to get started with. My plan was even to build tools to spin up Twitter bots as quickly as possible, too, so someone could design a grammar and then with minimal fuss start showing off the output of that grammar on the internet.

My original version (which I admittedly didn't get terribly far with) was written as a desktop app (in Rust and GTK) but I suspect that for the purposes of making it easy to use I'd probably reimplement it to have a web-based version as well, making it easy to open on any device. It'd be nice to also build simple-to-use libraries that can ingest the grammars described, as well, so they can be embedded in other applications like video games or art projects.

Why the name? After the architect Andrea Palladio, but specifically because one of the famous early papers about shape grammars in architecture was Stiny & Mitchell's 1979 paper The Palladian Grammar. Palladio was famously schematic in terms of how he designed and constructed houses, so Stiny & Mitchell were able to take those rules and turn them into mechanized rules which can express the set of possible villas that Palladio might have designed. For a more recent and accessible treatment on the subject, you might look at Possible Palladian Villas (Plus a Few Instructively Impossible Ones) by Freedman & Hersey.

#backburner #software #tool #procedural

Backburner Month 07: Apicius

September 8, 2021

This is an entry from a list of projects I hope to some day finish. Return to the backburnered projects index.

What is it? A language for expressing recipes in graph form.

The usual presentation of a recipe is as a snippet of prose, doing things step-by-step. However, actual recipes aren't strictly linear: there are often parallel preparations that need to be done, like making a roux while the sauce simmers elsewhere. Prose is certainly capable of capturing this—prose is, after all, a default way of representing information for a good reason—but I'm also fond of notation and alternative ways of representing information. What if we could make the various branches of a recipe more explicit?

Apicius is a way of writing recipes that encodes explicit parallel paths. It looks more like a programming language:

eggplant rougail {
  [2] eggplants
    -> scoop flesh, discard skin
    -> mash
    -> $combine
    -> mix & [4bsp] oil
    -> DONE;
  [2] white onions or shallots -> mince -> $combine;
  [2] hot peppers -> mince -> $combine;
}

How do we read this? Well, the bit before the curly braces is the name of the recipe: eggplant rougail. Inside the curly braces, we have a set of paths, each of which ends with a semicolon. The beginning of every path starts with one or more raw ingredients, with the amount set apart with square brackets. Once you're in a path, there are two things you can write: steps, which look like plain text, or join points, which start with a dollar sign.

What is a join point? It's a place where you combine multiple previously-separate preparations. In the above example, you have three different basic ingredients: eggplants, peppers, and onions or shallots. Each is prepared separately, and then later on mixed together: the join point indicates where the ingredients are combined.

There's also a fourth ingredient at one point above: at any given step or join point, you can add extra ingredients with &. The intention is that these are staples which require no initial preparation or processing: it's where you'd specify salt or pepper or oil in most recipes, ingredients which sometimes might not even make it onto the initial ingredients list because they're so simple and obvious.

Finally, there's a special step called DONE, which is always the last step.

In the above example, there's a relatively simple pattern happening: three ingredients are prepared and then joined. But with different uses of join points, you can express recipes that have multiple separate preparations that come together in a specific way, and that parallelism is now apparent in the writing. Here is a different recipe:

egg curry {
  [8] eggs -> boil 10m -> $a;
  [2] onions -> mince -> brown &oil -> stir -> $b;
  [2] garlic cloves + [1] piece ginger + [1] hot pepper + salt + pepper -> grind -> $b;
  [4] ripe tomatoes -> chop -> $c;
  $b -> stir &thyme -> $c;
  $c -> simmer 5m &saffron -> $a;
  $a -> stir 10m -> simmer 5m &water -> halve -> DONE;
}

This is rather more complicated! In particular, we now have three different join points, and a larger number of basic ingredients (combined with +). In parallel, we can boil the eggs, mince and brown the onions, grind seasonings together, and chop the tomatoes. The first two disparate sequences to join are actually the browned onions and the garlic/ginger/pepper/&c mixture: those are combined with some thyme, and afterwards the tomatoes are mixed in; after simmering with saffron, the boiled eggs are added, then all paths are joined.

I don't think this is terribly easy to follow in this format, especially when using non-descriptive join point names like $a. However, an advantage of this format is that it's machine-readable: this format can be ingested and turned into a graphic, like this:

The above recipe for egg curry, represented as a graph with certain steps leading to others

or a table, like this:

The above recipe for egg curry, represented as a table with cumulative preparations combined

or possibly even into a prose description, using natural language generation techniques to turn this into a more typical recipe. Marking the amounts in square brackets might also allow for automatic resizing of the recipe (although how to do this correctly for baking recipes—where linearly scaling the ingredients can result in an incorrect and non-working recipe—is a can of worms I haven't even considered opening yet.)

Why write it? Well, I love the presentation of recipes as graphs. I've actually got a whole web site of cocktail recipes in this form, which you can find at the domain cocktail.graphics, and I would love to get to the point that I could semi-automatically generate these diagrams from a snippet of Apicius:

A chart-based recipe for the Last Word cocktail

But I think—as I mentioned when I described the language for Parley—that there are many cool things you can do if you have generic data that can be re-interpreted and revisualized. As mentioned above, there are lots of ways you can take this format and use it to represent various structures. What I'd like to do, once I have the tooling to do so, is take a bunch of my favorite recipes (like sundubu-jjigae or ševid polow or jalapeño cream sauce) and convert them into Apicius, and then have some kind of front-end which can be used to view the same recipes using many different rendered formats.

Why the name? One of the few Latin-language cookbooks we have remaining copies of is usually called Apicius, although it's also referred to as De re coquinaria, which boringly translates to “On the topic of cooking.” It's not clear who Apicius was or if the Apicius referenced by the book was a single person who really existed: it may have been written by an otherwise-unattested person named Caelius Apicius, or it may have been named in honor of the 1st-century gourmet Marcus Gavius Apicius, or perhaps it was even authored by a group of people. (It doesn't help that the remaining copies we have are from much later and were probably copied down in the 5th century.)

#backburner #software #tool

Backburner Month 02: Virgil

September 3, 2021

This is an entry from a list of projects I hope to some day finish. Return to the backburnered projects index.

What is it? A typed, non-Turing-complete configuration language that is otherwise identical to JSON. The goal is that all valid JSON files will also be valid Virgil files.

{
  "name": "Ash",
  "age": 44,
  "pets": [
    {"name": "Spot", "species": "Dog"},
    {"name": "Henry David Thoreau", "species": "Dog"}
  ]
}

However, in addition to the usual JSON extensions like comments and trailing commas, it would also allow “splices”, which would allow for snippets of code to be included that would evaluate to other JSON expressions.

{
  "name": "Ash",
  "age": `(22 * 2),
  "pets": `[ {"name": n, "species": "Dog"}
           | n <- ["Spot", "Henry David Thoreau"]
           ],
}

It would also include a type system built largely around structural types, so that you can enforce that your functions are used correctly, but with the types inferred from use instead of explicitly written:

`let double(obj) {obj["n"] * 2}
[ `double({"n": 5}),      # okay
  `double({"n": false}),  # error, because `obj["n"]` is a boolean
  `double({"x": 0}),      # error, because no key `"n"` exists
]

The definable functions would also be limited to strongly normalizing (i.e. non-Turing-complete) ones, so recursion would be entirely disallowed:

`let fact(n) {
  if n == 0 { 1 }
  else { fact(n-1) }  # error: recursion
}
`[ fact(i) | i <- range(10) ]

Importantly, the goal of this project would be not simply to implement the Virgil language, but to implement the libraries in a way that has an API indistinguishable from existing JSON libraries. The goal would be that a project could decide to use Virgil by simply swapping an import: no other change would be necessary.

Why write it? It's worth noting that there are now two other major contenders in this space: one of them is Dhall, the other is Jsonnet. My original notes on Virgil actually predated Dhall itself (although they were indeed inspired by this tweet by Gabriella Gonzales, who would go on to create Dhall) but the end goal was a language that would in some ways sit exactly between the space occupied by Dhall and Jsonnet. Unlike Dhall and like Jsonnet, it would restrict itself to only expressing data that could be expressed in JSON (and therefore the final output could not include e.g. functions) and could exist as a near-drop-in replacement to any program that already used JSON. Unlike Jsonnet and like Dhall, it would include a static type system and would be Turing-complete.

In theory, had I written it when I first considered the idea, it might have tackled some of the use-cases that Dhall and Jsonnet have now cornered. At this point, I don't think it necessarily brings enough to the table to unseat either of them in their intended use-cases, but I still think it'd be a fun project to write.

Why the name? Like the historical Virgil, it takes the story of J(a)SON and adds a bunch of extra stuff to it.

#backburner #software #tool

Backburner Month 01: Van de Graaff

September 3, 2021

This is an entry from a list of projects I hope to some day finish. Return to the backburnered projects index.

What is it? Van de Graaff is a static site generator built around declarative pipelines. The original design used a vaguely Lisp-like format which could describe how to take various source files and produce downstream files from them, with the goal of being unreasonably flexible in terms of the format and interpretation of input data. A small fragment of a site configuration looked like this:

# `every` produces a single output file for each input file
(every "posts/%.md" {
  # the output filename is a function of the input one
  produce: "posts/%.html"
  # these are the arguments passed to the standard page
  # template, defined elsewhere in the config file
  with: {
    # the template here is invoked with `content` set to
    # the result of rendering the file contents with markdown
    content: (render markdown _)
    # this pulls out of the front matter at the beginning of
    # the Markdown file
    title: (front-matter "title" _)
  }
})

The above example is pretty simple: take every Markdown file in one directory, produce a corresponding HTML page to each in another directory by rendering the Markdown and pulling metadata out of the front matter. But the goal of Van de Graaff is also that the underlying set of “functions” should allow for a massive amount of customizability. For example, maybe we don't want to use front-matter in a Markdown file: maybe each post should be contained in its own directory, with metadata living in a separate file from the source itself, and indeed with the source file's location and markup format defined in the metadata file. The goal of Van de Graaff was to be flexible enough to accomodate nearly any kind of data organization you wanted while remaining fundamentally declarative:

(every {meta: "posts/%/metadata.json" source: (json-elem "content" meta)} {
  produce: "posts/%.html"
  with: {
    content: (render (if (== (json-elem "format" meta) "rst")
                         rst
                         markdown)
                     source)
    title: (json-elem "title" meta)
  }
})

Notice some of what's going on here: our “input” file has become a pair of named files, one of which depends on the content of another. There's now conditional logic to decide how to actually render the input file, and which renderer to use depends on the other file. This is certainly a kind of data organization that'd be harder to express in most existing static site generators.

The original goal was to build Van de Graaff as a tool with a decidable (i.e. non-Turing-complete) language, hence the Lisp-inspired-but-clearly-data-driven formatting system. Whenever I return to this project, my current plan is to reconsider this decision and instead write the pipelines in something like Lua or Wren, but still keep the core idea intact: the script itself would not actually load and process any files, but rather define a set of declarative pipelines, and those pipelines would be used to process the files, allowing reuse of intermediate outputs as well as cached computation of inputs known to be identical.

Why write it? There are two custom static site generators that I've written which I still use: one of them generates my sadly-languishing random-topic blog What Happens When Computer, and the other generates my prose and quote-collection site. They're rather different in almost every way: the former is written in Scheme (complete with an S-expression based templating mechanism) and ingests a custom-designed markup format, while the latter is written in an unholy combination of Mustache, Python, Bash, and Make, and ingests a combination of Markdown and YAML.

Despite being radically different from one another, they do have features in common: they both care about taking collections of things (posts, snippets of text, specific pages, &s) which they read, process, and munge into several different output files. A blog like What Happens When Computer would be generally well-supported by an off-the-shelf solution like Hugo, but I'd have to do a little bit more tweaking for my prose-and-quotes site. On the other hand, I could easily port the prose-and-quotes site to use a CMS-like solution like Python's Lektor, but at the cost of abandoning my own data organization: I'd have to convert my existing data to use such a new format, which would break other tools I've already written that use this data in other ways.

Van de Graaff was my attempt to write a flexible-but-still-constrained system which could accommodate them both as well as anything else I'd like to write, replacing not just static site generators but any system where I write shell scripts or Makefiles to ran Pandoc for me. It's rather overengineered, and it's a terrible effort-to-reward ratio, but I still like many of the underlying ideas.

Why the name? It's a static generator, so I immediately thought of a Van de Graaff generator. (According to old notes and repos, I also at one point called it TMS but literally cannot remember what that was supposed to stand for.)

#backburner #software #tool