Tuesday, April 25, 2017

@tokens and XMacros

Crack annotations are a way to extend the compiler at the parser level. A lot of them do code generation, for example the @struct annotation generates a class with a constructor:

@import crack.ann struct;

@struct Foo {
    String name;
    int val;
}

# Equivalent to:

class Foo {
    String name;
    int val;

    oper init(String name, int val) : name = name, val = val {}
}

Annotations are just crack functions that are executed at compile time. The only restriction is that they must reside in a different module from the code that uses them. An annotation is just a public function that accepts a CrackContext object:

import crack.compiler CrackContext;

void struct(CrackContext ctx) {
  ...
}

The CrackContext object is an interface to the compiler and tokenizer which references the context in which the annotation was invoked. For example, we can consume tokens from the point where the annotation is specified and generate errors at that point:

    tok := ctx.getToken();
    if (!tok.isIdent())
        ctx.error(tok, 'Identifier expected!'.buffer);

Code generation in annotations has always been done by injecting tokens and strings into the tokenizer. We make use of the fact that the tokenizer has an unlimited putback queue and just "put back" the tokens that we want the parser to get next in reverse order. There is also an "inject()" method on the crack context that lets you inject a string to be tokenized.

Neither approach has been entirely satisfactory. Obviously, generating code by injecting one token at a time is far too verbose and tedious to use for anything of any size. And while inject() fixes that part of the problem, it relies on writing code in a string, so:

  • The line numbers of the code have to be provided to the inject() function, a technique which doesn't compose well.

  • Editors don't recognize it as crack code, breaking syntax higlighting and auto-indent.

A better solution relies on the recently introduced @tokens and @xmac annotations. @tokens is effectively a "token sequence literal." It consumes the delimited tokens following it and produces an expression that evaluates to a NodeList object containing those tokens.

This lets us generate crack code defined in crack code. For example, the following ennoation emits code to print "hello world":

import crack.ann deserializeNodeList;
@import crack.ann tokens;

void hello(CrackContext ctx) {
    @tokens { cout `hello world!\n`; }.expand(ctx);
}

In the example above, we use @tokens with curly braces as delimiters. We could have also used square brackets or parenthesis. Delimiters may be nested, but the symbols that are not being used need not be paired. So we can also use @tokens for asymetric constructs:

void begin_block(CrackContext ctx) {
    # The unbalanced '{' is allowed here.
    @tokens [ if (true) { ].expand(ctx);
}

While useful, @token still doesn't let us do the kind of composition we need in order to be able to generate code. There's nothing like macro parameters for @tokens, they are essentially constants. For interpolation, we have @xmac.

@xmac is like @tokens only with parameters allowing you to expand other NodeLists. For example, here's an annotation to emit exception classes:

import crack.ann deserializeXMac;
@import crack.ann xmac;

void exception(CrackContext ctx) {
    tok := ctx.getToken();
    if (!tok.isIdent()) ctx.error(tok, 'Identifier expected!');
    @xmac {
        class $className : Exception {
            oper init() {}
            oper init(String message) : Exception(message) {}
        }
    }.set('className', tok).expand(ctx);

We have to explicitly set each of the parameters with the set() method. We'll get an error if any of them are undefined when we expand. Alternately, we can use @xmac* to do this automatically with variables of the same name:

void exception(CrackContext ctx) {
    className := ctx.getToken();
    if (!className.isIdent())
        ctx.error(tok, 'Identifier expected!');
    @xmac* {
        class $className : Exception {
            oper init() {}
            oper init(String message) : Exception(message) {}
        }
    }.expand(ctx);
}

Since it just generates a NodeList, we can use @tokens to directly generate values to interpolate into an @xmac:

    method := @tokens {
        void foo() { }
    };

    @xmac* (
        class A {
            $method
        }
    }.expand(ctx);

We can also expand an @xmac into a NodeList using the expand() method with no arguments:

    accessors := @xmac* {
        void $name() {
            return __state.$name;
        }

        void $name(int val) {
            __state.$name = val;
        }
    }.expand();

    @xmac* {
        class A {
            $accessors
        }
    }.expand(ctx);

@tokens and @xmac are both useful tools for doing code generation in Crack annotations. They will be released in Crack 1.1.

Thursday, April 13, 2017

Appendages

I've recently pushed the code to implement Appendages, which will likely be the primary feature of the 1.1 release.

Formally, appendages are a way to extend the functionality of a class orthogonal to its normal inheritance hierarchy, and specifically to its instance state. In other words, they are classes consisting only of methods that can be applied to any object derived from a specified base class, called the "anchor" class.

Taking the example from the manual, let's say we have pair of classes for representing two dimensional coordinates:


    class Coord {
        int x, y;
        oper init(int x, int y) : x = x, y = y {}
    }

    class NamedCoord : Coord {
        String name;
        oper init(int x, int y, String name) :
            Coord(x, y),
            name = name {
        }
    }

Now let's say we want to add some new functionality:

  • Get the distance of the coordinate from the origin (the "magnitude" of the coordinate's vector).

  • Get the area of the rectangle defined by the coordinate and the origin.

In the absence of other considerations, we might just add two methods to Coord and be done with it. However, this doesn't always work.

If Coord and NamedCoord are in a module we don't own (an "external" module), adding methods is more complicated. Our new methods might not be appropriate for general use, and for non-final classes adding new methods breaks compatibility. So our new methods might simply not be welcome upstream, and in any case, we might not want to be blocked on waiting for our change to come back around into a released version of the external module.

We could derive a new class, "SpatialCoord", from Coord and give it the new methods. But then NamedCoord won't have them. Furthermore, if Coord comes from an external module, we might not even control the allocation of the new object: it might be produced internally by some other subsystem and merely shared with our calling code.

Prior to appendages, the only way to solve this was to define our new methods as functions accepting Coord as an argument:

    int getMagnitude(Coord c) {
        return sqrt(c.x * c.x + c.y * c.y);
    }

    int getArea(Coord c) {
        return c.x * c.y;
    }

This works, but it has a few problems as compared to having them bundled as methods in a class:

  • As separate functions, a module using them would have to import each of them separately and also retain them as separate elements in a shared namespace.

  • We lose some syntactic niceties, such as the object . method () syntax and the implicit this.

  • As standalone functions, we are unable to access protected members of the class that would be accessible to methods of a derived class. (This is not an issue in the example, however it is an issue with the approach in general).

Appendages provide a nicer solution. We can define an appendage on Coord by creating a derived class definition that uses an equal sign ("=") instead of a colon before the base class list:

    class SpatialCoord = Coord {
        int getMagnitude() {
            return sqrt(x * x + y * y);
        }

        int getArea() {
            return x * y;
        }
    }

Defining an appendage is very much like defining a class except that an appendage is limited to a set of methods. These can be applied to any instance of a class derived from the anchor class (Coord, in this case). To make this work, an appendage can have no instance data of its own. This means:

  • No instance variables.

  • No virtual methods (all methods are final or explicitly static).

  • No constructors or destructors.

To use an appendage, we must explicitly convert an instance of the anchor class using on overloaded "oper new" (which looks like ordinary instance construction):

    foo := SpatialCoord(Coord(10, 20));
    bar := SpatialCoord(NamedCoord(20, 30, 'fido'));
    cout I`bar has magnitude $(bar.getMagnitude()) \
           and area $(bar.getArea())\n`;

Note that the "foo" and "bar" assignments don't create new instances: they just convert existing instances of Coord and NamedCoord to SpatialCoord in order to allow the use of its methods. This is a zero cost abstraction.

You can also compose an appendage from several other appendages (as long as they all have the same anchor class). For example, we could have done this:


    class MagnitudeCoord = Coord {
        int getMagnitude() {
            return sqrt(x * x + y * y);
        }
    }

    class AreaCoord = Coord {
        int getArea() {
            return x * y;
        }
    }

    class SpatialCoord = MagnitudeCoord, AreaCoord {}

There are a number of places in the Crack library that will benefit from the use of appendages. Notably, appendages will make it possible to define encoding-specific String classes. The String class hierarchy (or, more properly, the Buffer class hierarchy) currently specializes around the concept of ownership (Buffer has no ownership assumptions, ManagedBuffer has an associated buffer and is growable, String owns an associated (to be treated as) immutable buffer ...). All of these are just byte buffers: there is no character concept, the user is responsible for assuming an encoding and ensuring that the string is treated correctly with respect to its encoding.

With appendages, it will be possible to define ASCIIString and UTF8String, so instaed of having to import individual methods from the crack.ascii and crack.strutil modules, we'll just be able to import (e.g.) ASCIIString and then call functions like strip() and toLower() as normal methods.

There are a few potential areas for improvement in appendages:

  • Implicit conversion (while generally an antipattern) would be useful here. If we have a function accepting an appendage as an argument, there's not a lot of value in having either the caller or the function itself do the conversion.

  • Having some way to explicitly require validation during conversion to an appendage would be nice (especially in the case of string appendages). You can currently define static members to do validation, but there's no way to exclude generation of the "oper new" methods that allow a user to more naturally bypass them.

  • It would be useful for appendages created from classes derived from the anchor class to preserve the methods of the derived class. For example, when we do

    bar := SpatialCoord(NamedCoord(20, 30, 'fido'))
    above, we're losing the ability to access the name variable from bar. It should be possible to work around this right now by defining the appendage as a generic, though at the cost of generating multiple instances of the appendage code.

Nonetheless, appendages are a feature that I have long wished for that are very much in line with Crack's original goal of expanding upon existing concepts in the Object Oriented paradigm in a very natural way.