language

Backburner Month 25: Potrero and Treant and others

September 26, 2021

This is an entry from a list of projects I hope to some day finish. Return to the backburnered projects index.

Edit (2024-09-03): One of the projects described below has something resembling a proper release! The Matzo language has a reasonably usable version available on Github or to try in the browser. For more details, see this blog post about the language.

What is it? Various programming-language-centric approaches to procedural generation. I've done a lot of these, and Potrero and Treant are two of the newest ones, but I'm using this to talk about the whole family of these tools.

My approach here is to build tools for procedural generation as programming languages, and usually specifically as functional or logical languages: the UI is fundamentally text-centric, based on writing bits of structured text, and the operations defined can be broken down into atomic units that can be understood and reasoned about in isolation without having to reason about broader state. Additionally, because they are ideally decidable—i.e. you can ensure that they don't loop forever, but instead will always complete in a finite amount of time—you get a lot of ability to statically analyze them.

I think there's a really interesting design space here, and different experiments have focused on exposing them in different ways. There are some thoughts in common between all of them:

I think (as mentioned) that decidability is important here, both because “this will never loop forever” is a nice property but also because it opens the way for some sophisticated static analysis, like, “Will this program ever produce output with such-and-such a property?”
I think mostly-pure functional programming is a useful way to think about these things, where the only side effect is non-determinism
I think there's some interesting inspiration to be taken from logic programming or answer-set programming (c.f. Smith & Mateas, Answer Set Programming for Procedural Content Generation: A Design Space Approach and also Karth & Smith, WaveFunctionCollapse is Constraint Solving in the Wild)
I think there's a lot of API work to be done around exposing common resources (like Darius Kazemi's corpora) as “standard libraries” to draw on
I think there's also a lot of API work to be done around the right primitives for structuring output either textually or non-textually (although I'll freely admit that there's research here that I'm not conversant with!)

So the goal of my various tools has been to chip away at this space and experiment with these various theses!

Why write it? The big tool in this space right now is Tracery. Tracery is a great tool and I love it. That said, I'm very picky about tools.

To give an example of what I mean, here's an example of a Tracery program which creates a brief description of meeting a person:

{
  "origin": ["#[#setNoun#]story#"],
  "setNoun": [
    "[person:man][them:him]",
    "[person:woman][them:her]",
    "[person:person][them:them]"],
  "adj": ["familiar", "strange", "mysterious"],
  "adv": ["warmly", "cautiously", "profusely"],
  "story": ["You meet a #adj# #person#. You greet #them# #adv#."]
}

You'll notice that the syntax is all JSON, with all the advantages and disadvantages that brings: it's a universally available format, which is good, but it's also more finicky and particular, and also now important syntax is embedded in the string literals in a way that's not (for example) syntax-highlighted by default. Note also the "setNoun" rule above: what this does is actually create new rules effectively by injecting them into the set of rules, so depending on which choice is made, it'll define the rule "noun" as being either "man", "woman", or "person", and define "them" as being a corresponding pronoun.

Here's the same generator expressed in Matzo, a dynamically typed programming language for procedural generation that was also the very first tool I built along these lines:

person := "man" | "woman" | "person";
pronoun :=
  { "man"    => "him"
  ; "woman"  => "her"
  ; "person" => "them"
  };
adj ::= familiar unfamiliar mysterious;
adv ::= warmly cautiously profusely;

n := person; fix n;
puts "You come across a " adj " " n ".";
puts "You greet " pronoun.n " " adv ".";

This has a lot of little convenience features, like the ::= taking a set of bare words that are implicitly understood to be a disjunction. (Indeed, you could rewrite the first line of the program to person ::= man woman person; and it would have the same meaning: I wrote it with := to showcase both approaches.) It also has functions—the value of pronoun is a function which branches on its argument—and the fix operation, which takes a rule which otherwise would be vary, evaluates it once, and modifies its value to be the result of that: i.e. after that line, person will still randomly be one of three values, but n will always deterministically be one of those three values.

After Matzo, I worked on a tool called Latka which was similar but with a light static type system. You'll notice this one relies more heavily on things like (structurally typed) algebraic data types and helper functions to assemble sentences, like pa for a paragraph or se for a sentence. You might also notice that fixed is no longer a thing which changes an existing rule, but a property of the binding site of the variable.

let gender = M | F | N
let noun M = "man"
  | noun F = "woman"
  | noun N = "person"
let them M = "him"
  | noun F = "her"
  | noun N = "them"
let adj = %{familiar unfamiliar mysterious}
let adv = %{warmly cautiously profusely}

let main =
  let fixed n = gender
  pa.[
    se.["You come across a", adj, noun.n],
    se.["You greet", them.n, adv],
  ]

These obviously have tradeoffs relative to Tracery, but they more closely match the way I think about these problems and the way I'd naturally gravitate towards solving them. That said, I also don't think they're competing: in fact, one thing I've wanted to do (but never finished a prototype of) is building cross-calling between them: I'd love to be able to directly include a Tracery program in a Latka program and vice versa.

As with other projects I've described, I think that the affordances of functional or expression-oriented programming have a lot of benefits, and I think decidable programming languages are really useful. And of course, one reason I want to write it is that I love random generation of things.

There's a reason this project is also the last backburner post: it's the project I most regret having left on the backburner, and the one I would love to spend more time on soon once I develop motivation to do so again.¹

Why the name? Latka and Matzo were named together but otherwise chosen at random. I was originally gonna call another experimental language Mesa, which is Spanish for 'table' (by analogy with random tables you'd find in an RPG book) but there's already a graphics driver suite called mesa. So I decided to choose the name of a related geographical phenomenon, since a potrero is a kind of mesa. Similarly, the original version of Treant was designed around probabalistic tree rewriting, and treant is a name used in fantasy role-playing games to refer to a public-domain equivalent to Tolkien's Ents.

#backburner #tool #language #procedural

Hell, I've even given a talk about these before—although a bad one which I'm thankful wasn't recorded, since I was feverish and recovering from a throad infection at the time—and I never did release Latka in a usable format afterwards, which is an ambient regret of mine. I would love to go back and release one of these properly instead!

Backburner Month 23: PicoML

September 24, 2021

This is an entry from a list of projects I hope to some day finish. Return to the backburnered projects index.

What is it? A tiny ML variant, designed for embedding.

It should of course have closures and anonymous functions and so forth. It should have algebraic data types—named sums, tuples, records for product types—and it should also be garbage-collected. It should have powerful pattern-matching constructs. It shouldn't be pure, in the same way that something like SML isn't pure: mutation would still need to use explicit ref types, but side-effectful functions should be allowed everywhere. It should have a basic exception system. It probably doesn't need a full module system, but some kind of namespacing would be nice.

I'm pretty neutral with respect to syntax, but I was imagining maybe borrowing some ideas from ReasonML. Over time I've become a fan of explicit delimiters for scope, but I also fully admit that syntax is probably the least interesting thing here.

Most importantly, it should be easily embeddable in any language, exposing a C API that makes it easy to initialize the runtime, call into it, and expose callbacks to it. PicoML wouldn't be for writing programs on its own: it would be for writing scripts in larger programs, which means making it accessible to larger programs should be as simple as possible.

Why write it? There are a number of language which are explicitly designed for embedding in a larger program: Lua is one of the most prominent, but there are certainly many others, my personal favorites being Wren and Io.

These embedded languages rarely have types, and the handful that do (like Pike or Mun) often don't embrace functional programming in the way that I wanted. Entertainingly, though, the closest language to what I wanted PicoML to be is Gluon, and my biggest complaint there is that it's too Haskell-ey: it's got monads and GADTs, which feels to me like it goes too far in the other direction. There's a middle ground—an impure functional language with strong SML influence—and that's where I want PicoML to sit.

Why the name? Not that surprising: I wanted to get across the idea that this would be “an extra-tiny ML variant”.

#backburner #software #language

Backburner Month 13: Petri

September 14, 2021

This is an entry from a list of projects I hope to some day finish. Return to the backburnered projects index.

What is it? A programming language designed as a tiny experiment: what if a dynamic language were built to emulate the affordances of a static language with typeclasses or traits?

In practice, many dynamic languages are designed around storing data either in a handful of basic data structures—usually lists and associative arrays, sometimes adding a handful of others like sets—and then more sophisticated data modeling is done using some kind of object system. There are exceptions, of course—Erlang, for example, omits the object system entirely and sticks to simple data modeling plus processes—but it's definitely true of most of the commonly-used dynamic languages like Python and Ruby and JavaScript and Lua.

The experiment in Petri is that it uses a dynamic version of algebraic data types—which it calls tags, borrowing loosely from the Tulip language—and then provides a feature which allows functions to be implicitly looked up based on the tag. For example, the following program—using a syntax where keywords are marked explicitly with @¹—creates a tag called Person, then defines how to print it to the screen using the Repr class:

@tag Person {name, age, pets}
@impl Repr(Person) {
  @fn repr(p) {
    "Person({p.name}, {p.age}, {p.pets})"
  }
}

We can now write a function which creates a Person and prints a representation of that person to the screen:

@fn main() {
  person := Person {name: "Ari", age: 33, pets: []};
  print(Repr::repr(person))
}

Underneath the hood, Repr is a map from tags to bundles of functions, so when we call Repr::repr, it'll look up the tag of its argument and find the relevant implementations of the functions, and then call that with the relevant parameter. In practice, we can also “open” any module-like thing—i.e. anything where we'd use :: to access a member—so we can also @open Repr in the root scope and then call repr instead of Repr::repr. (In practice, the prelude will pre-open several commonly-used classes, and also provide syntactic sugar for classes like Eq and Add.)

There's a little bit more practical sophistication to the typeclass mechanism, as well. For example, Eq is a typeclass with two “type” parameters, and it also provides a default implementation. It also does run-time introspection to find out whether a class has an implementation for two specific values! Also notice that, in the definition of Eq::eq, the parameters are capital letters, which reference the stand-in tag names that are parameters to the class Eq: this is how we indicate which argument tags to use to look up instances. It's possible to have methods that take extra parameters which are ignored for the purposes of instance lookup, in which case those parameters can be written as _.

@class Eq(A, B) {
  @fn eq(A, B);
  @default eq(a, b) {
    @if implemented(Eq(b, a)) {
      return Eq::eq(b, a)
    }
    False
  }
}

This also doesn't go over some other useful features for writing actual programs (e.g. a basic namespace mechanism, how file imports work, what other parts of the stdlib look like) but it gets at the core of the idea: a language that builds a dynamic version that tries to replicate the affordances provided by features like Haskell's typeclasses or Rust's traits.

Why write it? To be totally clear, Petri is more of a UI experiment than any kind of programming language innovation! After all, there's nothing new semantically in this idea: it's just multimethods. In terms of what you can express, there's almost nothing you get that's not already present in CLOS. The thing I'm interested in trying with Petri is more about affordances, figuring out what programs look like when these features are expressed in this way. After all, Python and Ruby and (modern) JavaScript are all languages with very similar basic features—dynamic types, object systems which provide sugar over hashtables of values and functions, some simple built-in data structures, and so forth—but programs in all three can look radically different because of the specific way those low-level building-blocks are combined. What if we took a tack that, while not unique, was still several steps outside of the Python/Ruby/JavaScript world?

I've got a background in both static and dynamic languages: I used to program in a lot of Python and Scheme, but my first two jobs used primarily Haskell and eventually smatterings of Rust, and more recently I've been working on Ruby-related infrastructure at Stripe, which means I spend most of my work time these days in a dynamic language with a static type system over top of it. I'm always interested in things that can bring affordances from one into the other, whether that means bringing static features to dynamic languages (as here) or dynamic features to static languages.

Admittedly, I doubt I'd actually use this language for much, but I still love tiny experimental programming languages. It'd be fun to see how far it could be pushed in practice, to see what at least medium-sized example programs look like in something like Petri.

Why the name? It's an miniature experiment, and miniature experiments are often found in Petri dishes.

#backburner #software #language #experimental

This is also a feature borrowed from Tulip, and one that I think is very cool in general but especially for languages which are still growing. Modern languages often have a need to expand their vocabulary of keywords, which means they sometimes reserve a set keywords in case they're ever needed later, but even still languages might need to either context-sensitive keywords to introduce keywords without breaking existing programs or break backwards-compatibility to introduce new keywords which weren't previously reserved. None of these are inherently bad choices, to be clear, but I like the Tulip approach: if all keywords are in a particular namespace, then there's never any ambiguity or chance that a future revision will break your program since your variable is now reserved. Admittedly, it also is best when there's a syntax with a relatively low number of keywords, like the Haskell style. If you used this same approach for something like SQL, your program would be incredibly noisy with the sheer number of @ characters!

Backburner Month 09: Endsay

September 10, 2021

This is an entry from a list of projects I hope to some day finish. Return to the backburnered projects index.

What is it? A modern, debuggable, interactive PostScript implementation.

The first part is boring, and in fact doesn't really fit my criteria for these posts, because “just implement PostScript well” is not a super interesting project idea. But it's really worth calling out that many of the existing implementations—not surprisingly—really aren't designed for human beings. For example, if I accidentally pop too many elements in GhostScript, I get an informative but not terribly well-designed error like so:

GS>pop
Error: /stackunderflow in --pop--
Operand stack:

Execution stack:
   %interp_exit   .runexec2   --nostringval--   % removed for space
Dictionary stack:
   --dict:729/1123(ro)(G)--   --dict:0/20(G)--   --dict:75/200(L)--
Current allocation mode is local

It's not hard to improve on the ergonomics here, since something like GhostScript isn't really designed for human ergonomics anyway!

But there's more to Endsay that I wanted to get around to, which is also designing an interactive debugger for PostScript: to begin with, I want step-by-step execution and breakpoints in code, so I can walk through the process of watching my document being drawn on the screen. But I think we can even do better: for one, supporting some amount of rewinding, so we can rewind the state of the world, undraw things that we have drawn. Even further, I suspect we can set breakpoints based on graphical content, as well: for example, selecting an area of the page to be rendered and then requesting that we break whenever that area gets drawn, or choosing a color and requesting that we break when that color is used. Something that interests me about this project is figuring out the right way to build and expose those primitives!

Why write it? The PostScript language is a weird beast, and in some ways it's the complete opposite of the languages I've pitched in these backburner posts: rather than using a non-Turing-complete language for a task conventionally understood as the purview of executable logic like in Parley it's a Turing-complete language for a format conventionally understood to be declarative. PostScript is a dynamically-typed stack-based language, roughly like Forth, that was originally created for printers: rather than rendering expensive raster versions of your document on your computer, you could produce a simple, terse program which would produce your document, send that program to the printer, and let it handle the rendering.

There was a very very narrow slice of history where this actually made sense—where printers were good enough to need bigger raster files than computers could comfortably generate and send, for example—and at no point was PostScript really intended for human writing. PostScript lives on in more restricted forms: for example, the EPS format is PostScript under the hood, while the PDF format was effectively designed as pre-executed PostScript.

That said, you absolutely can, as a human being, write PostScript by hand. I happen to like writing PostScript a lot. I wouldn't be interested in writing a PostScript implementation for practical rendering of images—those exist already, and are fine—but I'd love to write an implementation that's more user-friendly, even if the user is just me!

Plus, building debugging tools for creating 2D graphics sounds like a fascinating design question. I'd love to see what turns out to be useful and how best to expose these features!

Why the name? English is one of many languages to have had strong “linguistic purist” movements: that is to say, efforts to remove foreign influences by removing borrowed words and replacing them with native words or compounds. Thankfully, these efforts have mostly disappeared, despite the best efforts of tedious linguistic reactionaries like George Orwell.

While I by and large think that linguistic purism is an abysmally stupid endeavour, I nonetheless do appreicate the creative and artistic applications of linguistic purism. Consequently, I do enjoy reading about some of the alternative words once proposed as “pure” equivalents of foreign borrowings, just because they sound so whimsical and entertaining: for example, using starlore instead of “astronomy”, bendsome instead of “flexible”, and, of course, endsay instead of “postscript”.

#backburner #software #language

Backburner Month 04: Rakonteto

September 5, 2021

This is an entry from a list of projects I hope to some day finish. Return to the backburnered projects index.

What is it? A very simple library and engine for writing storylet-based narratives.

“Storylets” are an approach to writing procedural, interactive narratives. There are a lot of different ways you can approach storylets, but the core idea is that you've got some kind of world state, and based on that world state, you choose a piece of “content”—in Rakonteto's case, a fragment of prose—that you can present to the player. The player can then make choices which change the world state, and that in turn allows the game to present another piece of content to the user.

Probably the simplest way to contrast this to other interactive narrative systems is by comparing it to a Choose-Your-Own-Adventure-like branching narrative system. Imagine that you want to write a story in which the player must go to two different places (A and B) in either order, and only after that can they return to a third, final place (Z). In a pure CYOA-style game, you effectively must have two branches, one corresponding to the sequence A → B → Z, the other B → A → Z, which implies that you must duplicate the content for A and B even if there are otherwise no changes to the visit to A and the visit to B. (And that's just for two places: if we have three places to visit in any order, we'd have to duplicate the content for each of the six possible visit orders!)

In contrast, a storylet-based system would let you represent A and B separately, and each one would end with modifying the world state with information indicating that the location had been visited. Finally, Z would include as precondition that A and B had both been visited. Many storylet systems allow for selecting the most salient next fragment, as well: that means that A and B can also include special content which will be surfaced in the specific situation that they're being visited first or second, but without having to duplicate all the content associated with A and B.

My implementation—Rakonteto—is designed as a textual language for producing interactive prose, loosely inspired by Ink. A given fragment of a story would look sort of like this:

=== [var: expr, var: expr, ...] ===

A fragment of text, to be shown to the player.

+ A choice. -> [var: new_expr, ...]
+ A different choice. -> [...]

The stuff in square brackets at the top is a set of “variables”—properties of the world, so to speak—with their current values. When the underlying story engine needs to choose a new story fragment to show to the user, it looks at each fragment to find whether the declared variables match the current state of the world, and then (using a handful of heuristics) choose a matching fragment to show to the player.

The bits at the bottom introduced with + are choices the use can make, and the stuff in brackets afterwards describes how to change the world-state so that a different fragment can be chosen next. A more “realistic” example might look like this:

=== [has_sword: false, in_location: forge] ===

As you walk into the forge, smoke stings your eyes. A
towering, muscular man with a singed beard is staring
pensively at the embers: as you approach, he is roused
from his thoughts and nods. Without a word, he turns,
walks to the back of the forge, and returns with a
beautiful and freshly polished sword, its surface only
marred by the man's sooty fingerprints.

+ Take the sword. -> [refused_sword: false, has_sword: true]
+ Decline the sword. -> [refused_sword: true]

The world state can include far more variables than are listed, and only the variables mentioned in the choices get updated: the others persist. One thing I hadn't figured out was the correct heuristics for handling selection among multiple different fragments that match to the same degree of “salience”. Another was how to provide tools to make certain common patterns more convenient: for example, I planned to have syntax in which you could include or swap out individual phrases or sentences based on parts of the world state, so that certain bits of customization didn't need an entirely new fragment. I also planned to have some choices “gated” by aspects of the world state, and also syntactic sugar for small branching scenes (so that you don't need to introduce new variables for situations like singular yes/no decisions that only matter within a single fragment.)

The original version was constrained entirely to finite sets of “world state” variables which could range over atomic strings, but I planned to also experiment with constrained integer arithmetic and eventually maybe even composite values like lists or sets.

Why write it? Mostly for experimentation! There were a few specific features I wanted to experiment with.

One such feature was that I wanted to have an engine for storylets which could be quickly and easily turned into something which can be played, sort of like the interfaces provided by Twine or Ink. My goal was that you could write a game and then “compile” it to a standalone HTML document that includes all the JavaScript necessary to run an interactive version of the game. For early playtesting, I also created a tiny command-line driver, which allowed you to choose from a list in order to move forward.

Another feature was an embeddable API. I wanted to be able to take this and link it in to other programs which could interact with it: for example, exposing it as a library so I could integrate it into a Unity or Unreal game (although I'd likely have started by writing bindings in something much simpler, like Löve.) The previous compile-to-HTML feature would let you get stories in front of people quickly, but I wanted to design the system such that you could use it as the “story backend” to a different kind of game, as well.

Finally, I wanted to experiment with debugging and modeling tools. With the state space constrained sufficiently, you could start throwing various solver techniques at a narrative. One reason you might want to do this is reachability: “I have these variables which have different states, and can change in this way. Can I actually ever get to this specific state?” Another is as a kind of automated play-testing. When your story is complicated enough, it might be possible for a player to “sequence break”: that is, to experience the story in a way that you didn't anticipate. I imagined addressing this with a solver as well: you could query your story model and ask questions like, “Is it possible that the player could experience fragment X before fragment Y?” and the tool could either confirm that it's impossible, or show that it is and give you an example of such a play-through.

Why the name? The word rakonteto is Esperanto: it's a diminutive form of rakonto “story”, so it means something like “little tale”.

#backburner #software #language #procedural