Extending the Standard Streams

If I write my own shell—-which I may very well do at some point—-there's a particular process model I'd like to embed in it. To wit: in UNIX right now, each program in a pipeline has a single input stream and two output streams, with files/sockets/&c for other kinds of communication.

A pipeline of functions foo | bar | baz looks kind of like this

keyboard -> stdin\   /stdout -> stdin\   /stdout -> stdin\   /stdout —+
                  foo                 bar                 baz           |
                     \stderr -+          \stderr -+          \stderr -+ |
                              |                   |                   | |
                              v                   v                   v v

Which works pretty well. You can do some pretty nice things with redirecting stderr to here and stdin from there and so forth, and it enables some nice terse shell invocations.

I'd like that basic system to be preserved, but with the ability to easily create other named streams. For example, imagine a hypothetical version of wc which still outputs the relevant data to stdout, but also has three other streams with these names:

                     / newlines
                     | words
    stdin -> wc-s -> | bytes
                     | stdout
                     \ stderr

You can always see the normal output of wc on stdout:

    gdsh$ wc-s *
           2       3       6 this.txt
          10      20      30 that.c
         100    1000   10000 whatever.py

But you could also extract an individual stream from that invocation using special redirection operators:

    gdsh$ wc-s * stdout>/dev/null bytes>&stdout

We could also have multiple input channels. I imagine an fmt command which can interpolate named streams, e.g.

    gdsh$ printf "1\n2 3\n" | wc-s | fmt "bytes: {bytes}\n words: {words}\n nl: {newlines}\n"
    bytes: 6
    words: 3
    newlines: 2

We can then have a handful of other utilities and built-in shell operators for manipulating these other streams:

      # the `select` command takes a stream name and outputs it
    gdsh$ wc-s * | select words
      # here we redirect stdout to the stream X and pass it to fmt
    gdsh$ cat this.txt stdout>&X | fmt "this is {X}\n"
    this is 1
    2 3
      # the same, using file redirection operators
    gdsh$ fmt "this is {X}\n" X<this.txt
    this is 1
    2 3
      # the same, using a shorthand for setting up a stream by taking
      # the stdout from some command
    gdsh$ !X='cat this.txt' fmt "this is {X}\n"
    this is 1
    2 3
      # the same, using a shorthand for setting up a stream by just
      # reading and outputting a file
    gdsh$ @X=this.txt fmt "this is {X}\n"
    this is 1
    2 3
      # using a shorthand for filling in a stream with a string directly
    gdsh$ ^Y=recidivism fmt "Y is {Y}\n"
    Y is recidivism
      # redirecting each output stream to a different file
    gdsh$ wc-s * words>words.txt bytes>bytes.txt newlines>newlines.txt
      # using a SmallTalk-like quoting mechanism to apply different shell
      # commands to different streams
    gdsh$ wc -s * | split words=[sort >sorted-word-count.txt] bytes=[uniq >uniq-bytes.txt]

This could also enable new idioms for programs and utilities. For example, verbose output, rather than being controlled by a flag to the program, could be always output to a (possibly unused) stream called verbose, so the verbose output could be seen by redirecting the verbose stream (or by logging the verbose output while only seeing the typical stderr messages):

      # here we only see stderr
    gdsh$ myprog
    myprog: config file not found
      # here we ignore stderr and see only the verbose output
    gdsh$ myprog stderr>/dev/null verbose>&stderr
    Setting up context
    Looking in user dir... NOT FOUND
    Looking in global dir... NOT FOUND
    myprog: file not found
    Tearing down context
      # here we see stderr but logg the verbose output
    gdsh$ myprof verbose>errmsgs
    myprog: config file not found

Or maybe you could have human-readable error messages on stderr and machine-readable error messages on jsonerr:

      # here is a human-readable error message
    gdsh$ thatprog
    ERROR: no filename given
      # here is a machine-readable error message
    gdsh$ thatprog stderr>/dev/null jsonerr>stderr
    {"error-type":"fatal","error-code":30,"error-msg":"no filename given"}

Or you could have a program which takes in data on one stream and commands on another:

      # someprog takes in raw data on the stream DATA, and commands
      # on the stream CMDS. Here we take the data from a local file
      # and accept commands from the network:
    gdsh$ @DATA=file.dat !CMDS='nc -l 8000' someprog
      # ...and here we have a set of commands we run through locally
      # while taking data from the network:
    gdsh$ !DATA='nc -l 8001' @CMDS=cmds.txt someprog

There are other considerations I've glossed over here, but here are a few notes, advantages, and interactions:

So those are some ideas that have been drifting around in my head for a while. No idea if I'll ever implement any of them, or if they'd even be worth implementing, but I might get around to it at some point. We'll see.


  1. I originally figured that err.in would be a useless stream, but after some thought, I can imagine a use for this. Let's say my programming language of choice, Phosphorus, outputs its error messages in XML format. This is great for an IDE, but now I need to debug my program on a remote server which doesn't have my IDE installed. I could have a program ph-wrapper that passes all streams through unchanged except for err.in, which it parses as XML and then processes to a kind of pretty-printed trace representation and passes it to its own err.out. So

        gdsh$ phosphorus src.ph
        Setting up program...
        gdsh$ phosphorus src.ph | ph-wrapper
        Setting up program...
        NoSuchIndex exception on line 3:
          x = args[3];

    So yes, I can imagine a class of programs which want to pay attention to err.in.

  2. Don't cringe. Look—-the input device with the most information density is the keyboard, right? That's why you use the command line at all. However, graphical systems have more information density than pure-text systems. You can take a pure-text system and extend it with position and color to give it more information, and then with charts and graphs to give it more information, and so forth. What I'm proposing is not drag-and-drop, although that might be useful to some users; it's a keyboard-driven system that displays information in a more dense, information-rich style. I keep thinking of building this myself but for the massive herds of yaks I'd have to shave first.
  3. PowerShell is the usual example given here, but I confess I haven't used it. Effectively, rather than streams of raw text, think streams of well-formed data types like JSON or s-expressions or some other kind of more elaborate information. wc might instead of outputting tab-separated numbers output lists of a fixed size, then.