Librarian of Alexandria

He knows that there are in the soul tints more bewildering, more numberless and more nameless, than the colours of an autumn forest... Yet he seriously believes that these things can every one of them, in all their tones and semi-tones, in all their blends and unions, be accurately represented by an arbitrary system of grunts and squeals. He believes that an ordinary civilised stockbroker can really produce out of his own inside, noises which denote all the mysteries of memory and all the agonies of desire.

—-G.K. Chesterton


... getty: I think the deal is men often have to go ▁▂▃▄▅▆▇█▆▃▁ ▂▃▄▅▆▇█▆▃▁ getty: And multiple orgasms would let them go ▁▂▃▄▅▆▇█▆▄▄▅▆▇█▆▃▁ getty: HA! I knew I could somehow represent that concept in Unicode. getty: Just THINK! While I was doing that, OTHER people were having ACTUAL SEX!


I've had an idea kicking around in the back of my head for a while, and I finally got around to making a working prototype (which I will probably end up deciding is the final version, at some point.) It has to do with generating random words based on grammar rules, with an emphasis on making certain things more likely and generating 'average' words. It came out of early D&D playing, where I wanted to have a consistent set of phonologies for every fake language that D&D had.

It is called Matzo, because I asked a friend for a random name and that's what she said. She also suggested, among other things, Yoko, but I decided against that.

Here's the basic idea: the following is verbatim a Matzo source file, which I have saved as aquan.mtz for testing:

word := syllable . syllable . (6 @ (syllable)); use word; syllable := 2: vowel | vowel . “'” | 4: consonant . vowel | consonant . vowel . “'”; consonant ::= p t k h wh l m n ng r w; vowel ::= a i u e o;

These statements can be in any order, and whitespace isn't significant except as a separator for certain tokens, so you can format/arrange the lines however you want. There are three kinds of statements:

  • use statements tell you which rule is going to start. If you omit one, it won't run. If you have two, it picks the first one.
  • Normal assignments (represented by :=) take an expression that contains a mixture of disjunction (a | b), concatenation (a . b), weighting (5: b) and repetition (5@(b)) and which can be parenthesized. Expressions also include both literals, which are surrounded by quotation marks, and identifiers, which refer to other rules.
  • Literal assignments (represented by ::=) which differ in that they are assumed to have a space-separated list of literals, possibly with weighting. This was for the common case where you want one rule to contain a simple disjunction of literals, such as consonant and vowel above.

A lot of these things are shorthand for other things—-really, all you need is concatenation, disjunction, and literals, and you can do most everything. The syntax 5: foo is shorthand for foo foo foo foo foo, so it is used to make an option more prevalent in a disjunction. For example,

vowel ::= i u;

chooses i and u about as often, whereas

vowel ::= 9:i u;

will choose i nine times out of ten.

The syntax 5@(foo) is another shorthand, somewhat less useful. The statement

word := 3@(syllable);

is equivalent to

word := syllable | syllable . syllable | syllable . syllable . syllable;

and is used in many various circumstances in my grammars, although it's less useful in other circumstances.

There are still problems with my implementation, which I am going to fix and put up on Github, but the grammar shown above (and others) work correctly. This is an idea I've had for ages; I can't believe it's taken me so long to sit down and write it, especially as it took no time at all. (I did the whole thing while an autograder was running for something I was grading.)

Here's an example of running the above file:

[getty@arjuna matzo]$ ./matzo aquan.mtz nu'melamio'o [getty@arjuna matzo]$ ./matzo aquan.mtz hopuwho [getty@arjuna matzo]$ ./matzo aquan.mtz iloa [getty@arjuna matzo]$ ./matzo aquan.mtz nenopungau

It's not necessarily limited to random words, but it's lacking a lot of utility that would make it sufficient for other purposes, which I will eventually add. Still, as an example of what it could also be used for:

gender ::= man woman; hair-color ::= black brown red blonde pink; build ::= fat fit skinny; job := “doctor” | “lawyer” | “janitor” | “systems analyst”; description := “This ” . gender . “ is a ” . job . “ with “ . hair-color . “ hair and a ” . build . “ build.”; use description;

Running this yields:

[getty@arjuna matzo]$ ./matzo description.mtz This man is a systems analyst with brown hair and a skinny build. [getty@arjuna matzo]$ ./matzo description.mtz This woman is a janitor with blonde hair and a fit build. [getty@arjuna matzo]$ ./matzo description.mtz This woman is a lawyer with blonde hair and a fat build. [getty@arjuna matzo]$ ./matzo description.mtz This woman is a janitor with red hair and a fat build.

In the future: variables (e.g. reusing a single generated value) and predicates (e.g. pronoun(man) := he, pronoun(woman) := she for use in complicated expressions.) Still, I'm proud of how far it is with so little work.