Macros progress report: D2 merged

The grant that I'm currently working on, the macros grant, has now reached its D2 milestone. That is, the so-called "unquotes" work as advertised in Rakudo:

macro apply($code, $argument) {
    quasi {
        {{{$code}}}( {{{$argument}}} );
    }
}

apply sub ($t) { say $t }, "OH HAI";    # prints "OH HAI"

Macros are routines, and so they take parameters. The above apply macro takes some $code and an $argument, and calls the former with the latter. It's as if, when the macro expansion is all done, what's left in the code is the following line:

(sub ($t) { say $t })("OH HAI");    # prints "OH HAI"

Of course, we never actually see this line, and in the compiler it's never textually substituted like that, because the substitution all happens on the level of syntax trees, not on the level of text.

The new thing in this picture is the {{{ }}} thingies: the so-called unquotes. Back in my last progress report, we still didn't have unquote support in nqp. Now we do. In fact, we got unquote support already back in August. That, it turns out, was the easy bit.

Then the conceptual problems appeared. For a few months, whenever I thought about macros, my brain would melt trying to think about those problems. It took quite a while to go from unquotes existing to them being actually useful. What follows is an explanation of the problem and its solution.

The problem is one of context. By that, we mean the variable bindings seen by a piece of code. Psychologically, we expect a piece of code to see the variables in its lexical environment, that is, all the variables declared in all surrounding blocks.

my $a;

sub f {
    my $b;
    sub {
        my $c;
        # $a, $b, $c visible here
    }
}

The exciting and highly useful thing about closures is that they honor this expectation, while simultaneously being first-class values that you can pass around between parts of your program. This combination of static bindings and dynamic function values is so powerful that you can use it to emulate the object encapsulation so espoused by OO enthusiasts.

In the above case, the sub f implicitly returns its inner sub, which can be transported across the Russian tundras, stored in a dank wine cellar for 75 years before being uncorked... but when finally called, it will still remember its $a, $b, and $c bindings. That's because closures aren't just containers of statements. They also hold a reference to an OUTER block through which variable lookups can be made.

(And in the above case, $b and $c are properly encapsulated. $a isn't, since it's globally visible.)

We want macros to behave the same. That is, quasi blocks should behave like closures.

macro f {
    my $a = "OH HAI";
    quasi {
        say $a;
    }
}

my $a = "B... BOOOOM!";
f;      # OH HAI

It's the same principle: after the f; call has been conceptually replaced by say $a; this code should still remember its context, its origins, namely the macro body. The fact that say $a; doesn't print "B... BOOOOM!", from the variable in the mainline scope, is part of what's called hygiene. Hygiene means that just like with closures, bindings inside are isolated from bindings outside by default.

(The term "hygiene" is often conflated in people's minds with the term "AST-based macros". The two are not the same. AST-based macros are necessary but not sufficient for hygiene. End of rant.)

But wait a minute. These two situations are obviously very similar. In the case of the closure, we know that the closure must keep an OUTER reference to remember its context. What is it in the macro case that remembers the context?

The quasi construct generates an AST, a syntax tree, that then gets spliced into the mainline code where the f; call used to be. This AST must be the vessel for the context information. So, just like a closure is a bunch of statements plus a context, an AST object must be a tree plus the context information. If the AST didn't have a context, the above macro expansion couldn't be hygienic.

We must perform unholy surgery on the block that eventually results from the quasi AST. The block will naturally have mainline context, but we want to recontext it to have macro context. So in the Rakudo macro expansion code, there is some code that transplants the context from the AST object to the new block. It involves a Rakudo-specific op called perl6_get_outer_ctx. It's only used for this code path.

This much was clear already when I was merging D1. Now for the new complications.

Macro expansions consist of two stages of substitution, and this is what makes them useful:

Unquotes are replaced by ASTs, typically arguments originating from the outisde of the macro.
The macro invocation is replaced by the AST returned from applying the macro to its arguments.

When implementing D1, I sorted out my thoughts by writing lots of ruminating gists. During this phase of the work, I've composed fewer gists, but an unexpected thing happened: the more time passed, the more I realized how much I had misnamed the variables in the macro code I had contributed to Rakudo.

It wasn't that I was careless about naming when I first wrote that code. Instead, my understanding of the macro domain had shifted so much that the choices of names I had made started to feel wrong. Today I landed a long-awaited refactor which not only unified the three macro-invocation code paths, but also fixed all the now slighty-off variable names. Quite a relief.

Here's part of what changed. During D1, a lot of AST objects in source ended up being called quasi ASTs. Nowadays, the following distinction is made:

quasi ASTs are what quasi blocks generate. Naturally.
argument ASTs are the things that the parser generates as it parses the macro arguments, just before it invokes the macro.
macro ASTs are what's returned from a macro, to be spliced back into the mainline code.

There is overlap. A macro AST has been generated as either a quasi AST or an argument AST at some point or other. But the focus here is where the ASTs are coming from. And it turns out that matters a lot. Quasi ASTs and argument ASTs are quite different. Hence the need for precision.

By the way, there is possibly a fourth kind of AST, one that we don't have yet, but that is totally possible once people start building macro libraries and stuff:

synthetic ASTs, syntax trees built up programmatically from individual AST nodes or smaller ASTs.

Don't know yet if that's going to become a reality. Until it does, quasi blocks fill much the same role.

Once we had unquotes working in Rakudo, the one glaring omission was that the unquotes didn't behave hygienically. Which was a shame because, again, people really expect hygiene to work:

macro test($value) {
    my $a = "B... BOOOOM!";
    quasi {
        say {{{$value}}};
    }
}

my $a = "OH HAI";
test $a;    # OH HAI nowadays, used to B... BOOOOM!

Just as the quasi AST should remember its own original context, so should the argument AST that ends up in $value. It used not to, and so the context it got was the quasi's, resulting in "B... BOOOOM!" above. A little ironic that it was the successful recontexting of the quasi AST that messed things up for the argument AST.

For months I struggled with the problem of how to recontext the argument ASTs. I developed a solution in a branch, which finally worked as it should, except that it still didn't recontext the ASTs properly! Argh!

My plan of attack had been to set the context at the time of unquote evaluation, as the quasi is evaluated when running the macro. The other day, jnthn pointed out that this approach may be overly complicated: maybe the context could be set at the time of argument collection, just before calling the macro. This is definitely simpler. Not least because at this point, the parser actually is in the context it wants to set! And in particular, no block surgery was needed this time.

I tried it. It worked. This solution almost feels too simple, and I'm not sure yet it will let us do all the things we want to do. But all the tests pass, and I have hammered this solution with tricky situations that might break, and it's holding up so far. So, we now have hygienic macros with unquotes in Rakudo.

Here are the macro-related gists that I wrote during this period. They are in various states of obsolescence at this point, but still potentially informative:

The other artifacts that have emerged since D1 are as follows:

A new spectest file. Also, thanks to a suggestion by moritz++, the macro spectest files are now much better named.
A number of commits to the nom branch of Rakudo:
A number of commits to the nqp project:
Two more deliveries of the macros talk: one at French Perl Workshop in Strasbourg, and one at YAPC::Europe in Frankfurt.

And once again, it's time to glance at what's ahead in the grant work. D3 promises to deliver hygiene. As explained above, D2 already provides this; I actually could have declared the milestone D2 finished at the point I got unquotes working in Rakudo (in August), but it felt slightly disingenuous to do so, because unquotes aren't really useful until they're fully hygienic. Anyway, half of D3 ends up being already done. What still needs to be implemented is the COMPILING:: pseudopackage, which gives the macro author several ways to opt out of hygiene. This is sometimes very powerful, even if it makes sense for it not to be the default.

My grant reports have been sparse lately. I'm hopeful that the wait until the next one won't be as long.

Strangely Consistent

Theory, practice, and languages, braided together

Macros progress report: D2 merged