Backburner Month 17: Lektor

This is an entry from a list of projects I hope to some day finish. Return to the backburnered projects index.

What is it? A protocol for writing RSS readers, heavily inspired by the maildir format.

The core idea of Lektor is that you separate out the typical operation of an RSS reader into two programs which can work concurrently over a specific directory. One program is called a fetcher, and its purpose is to read remote feeds (of any format) and write them into the lektordir format. The other program is called a viewer, and its purpose is to take the files inside the lektordir and view them.

I've got the full format specified elsewhere but the core idea is that you've got a specified directory called your lektordir, which in turn contains four directories: src, tmp, new, and cur. The src directory contains representations of feeds, new contains unread posts, and cur contains read posts. (The tmp directory is an explicit staging place: fetchers will create posts in there and then atomically move them to new, which means that it's impossible for a viewer to ever observe an incomplete post. Even if a fetcher crashes, it will only be able to leave garbage in tmp, and not in new.)

The representations of feeds and posts are themselves also directories, with files and file contents standing in for a key-value structure. That means a given feed is represented by a directory which contains at least an id (its URI) a name (its human-readable name), and a post is a directory which contains at least an id (its URI), a title (its human-readable name), a content file (which is its content represented as HTML), and a feed (a symlink to the feed that produced it.) In both cases, there are other optional files, and also ways of representing fetcher- or viewer-specific metadata.

The use of directories means that fetchers and viewers don't even need a specific data serialization or deserialization format: unlike maildir, you don't even need to “parse” the format of individual entries. This means fetchers need only a bare minimum of functionality in order to write feeds—I had test programs which used basic shell scripts that wouldn't have been able to safely serialize JSON, but were able to write this format trivially.

My intention is not just to design the format—which has already been pretty thoroughly specified, albeit in beta format and without having been subjected to much testing—but also to write the tools needed to write an RSS and maybe ActivityPub reader on top of this format.

Why write it? Well, I wanted to write an RSS reader, because none of the existing ones are exactly what I want, and this seemed like a good way of splitting up the implementation into bite-sized chunks.

There are some interesting features you get out of separating out the fetcher and viewer operation. For one, fetchers become ridiculously simple: they don't even need to be full services, even, they can just be cron jobs which hit an endpoint, grab some data, and toss it into the lektordir. Instead of a reader which needs to support multiple simultaneous versions (e.g. Atom, various versions of RSS, newer ActivityStreams-based formats) you can have a separate fetcher for each. For that matter, you can have fetchers which don't strictly correspond to a real “feed”: for example, you can have one which uses a weather API to include “posts” that are just sporadic weather updates, or one which chunks pieces of your system log and puts one in your feed every day to give you status updates, or whatnot.

Separating out viewers means that you can now have multiple views on the same data as well. Maybe one viewer only cares about webcomics, so you can pull it up and read webcomic pages but let it omit blog posts. Another gives you a digest of headlines and emails them to you, leaving them unread (but of course omits anything you happen to have read during the day.) Another does nothing but pop up a message if you let unread posts pile up too long. Hell, you could have one that takes your combined posts and… puts them together into an RSS feed. Why not?

The cool thing about lektordir to me is that it lowers the barrier to entry to writing RSS-reader-like applications. It takes the parts and splits them behind a well-defined interface. But there's a lot of cool stuff that you get from trying to apply a one-thing-well philosophy to problems like this!

Why the name? It's for reading, and lektor means “reader” in several languages. I'll probably change the name at some point, though, because there's also a plain-text CMS called Lektor.

#backburner #software #web #tool