Strangely Consistent

Musings about programming, Perl 6, and programming Perl 6

Macros: what the FAQ are they?

Thank you sergot++ for eliciting these answers out of me, and prompting me to publish them.

Q: Is it common to be totally befuddled by all these macro-related concepts?

Yes! In fact, that seems to be the general state of mind not just for people who hear about these things now and then, but also for those of us who have chosen to implement a macro system! 😝

Seriously though, these are not easy ideas. When you don't deal with them every day, they naturally slip away from your attention. The brain prefers it that way.

Q: I keep seeing all these terms: quasis, unquotes, macros... I think I know what some of them are, but I'm not sure. Could you explain?

Yes. Let's take them in order.

Q: OK. What the heck is a quasi?

Quasis, or quasiquotes, are a way to express a piece of code as objects. An average program doesn't usually "program itself", inserting bits and pieces of program code using other program code. But that's exactly what quasis are for.

Quasis are not strictly necessary. You could create all those objects by hand.

quasi { say "OH HAI" }        # easy way

Q::Block.new(Q::Statement::Expr.new(Q::Postfix::Call.new(
    Q::Identifier("say"),
    Q::Literal::Str("OH HAI")
)));                          # hard way

It's easier to just write that quasi than to construct all those objects. Because when you want to structurally describe a bit of code, it turns out the easiest way to do that is usually to... write the code.

(By the way, take the Q::Block API with a huge grain of salt. It only exists for 007 so far, not for Perl 6. So the above is educated guesses.)

Q: Why would we want to create those Q objects? And in which situations?

Excellent question! Those Q objects are the "API for the structure of the language". So using it, we can query the program structure, change it during compile time, and even make our own Q objects and use them to extend the language for new things.

They are a mechanism for taking over the compiler's job when the language isn't flexible enough for you. Macros and slangs are a part of this.

Q: Do you have an example of this?

Imagine a format macro which takes a format string and some arguments:

say format "{}, {}!", "Hello", "World";

Since this is a macro, we can check things at compile time. For example, that we have the same number of directives as arguments after the format string:

say format "{}!", "Hello", "World";                        # compile-time error!

In the case of sufficient type information at compile time, we can even check that the types are right:

say format "{}, robot {:d}!", "Hello", "four-nineteen";    # compile-time error!

Q: Ooh, that's pretty cool!

I'm not hearing a question.

Q: Ooh, that's pretty... cool?

Yes! It is!

Q: Why is it called "quasiquote"? Why not "code quote" or something?

Historical reasons. In Lisp, the term is quasiquote. Perl 6's quasiquotes are not identical, but probably the nearest thing you'll get with Perl 6 still being Perl.

Traditionally, quasiquotes have unquotes in them, so let's talk about them.

Q: Right. What's an unquote?

In Java, you have to write something like "Hello, " + name + "!" when interpolating variables into a string. Java developers don't have it easy.

In Perl, you can do "Hello, $name!". This kind of thing is called "string interpolation".

Unquotes are like the $name interpolation, except that the string is a quasi instead, and $name is a Qtree that you want to insert somewhere into the quasi.

quasi {
    say "Hello," ~ {{{$name}}};
}

Just like the "Hello, $name" can be different every time (for example, if we loop over different $name from an array), unquotes make quasis potentially different every time, and therefore more flexible and more useful.

To tie it to a concrete example: every time we call the format macro at different points in the code, we can pass it different format strings and arguments. (Of course.) These could end up as unquotes in a quasi, and thus help to build different program fragments in the end.

In other words, a quasi is like a code template, and unquotes are like parametric holes in that template where you can pass in the code you want.

Q: Got it! So... macros?

Macros are very similar to subroutines. But where a sub call happens at run time, a macro call happens at compile time, when the parser sees it and knows what to send as arguments. At compile time, it's still early enough for us to be able to contribute/modify Q objects in the actual program.

So a macro in its simplest form is just a sub-like thing that says "here, insert this Qtree fragment that I just built".

Q: So quasis are used inside of a macro?

Yes. Well, they're no more tightly tied to each other than given and when are in Perl, but they're a good fit together. Since what you want to do in a macro is return Q objects representing some code, you'd naturally reach for a quasi to do that. (Or build the Q objects yourself. Or some combination of the two.)

Q: Nice! I get it!

Also not a question.

Q: I... get it?

Yeah! You do!

Q: Ok, final question: is there something that you've omitted from the above explanation that's important?

Oh gosh, yes. Unfortunately macros are still gnarly.

The most important bit that I didn't mention is hygiene. In the best case, this will just work out naturally, and Do What You Mean. But the deeper you go with macros, the more important it becomes to actually know what's going on.

Take the original quasiquote example from the top:

quasi { say "OH HAI" }

The identifier say refers to the usual say subroutine from the setting. Well, unless you were actually doing something like this:

macro moo() {
    sub say($arg) { callwith($arg.lc) }

    return quasi { say "OH HAI" }
}

moo();    # 'oh hai' in lower-case

What we mean by hygiene is that say (or any identifier) always refers to the say in the environment where the quasi was written. Even when the code gets inserted somewhere else in the program through macro mechanisms.

And, conversely, if you did this:

macro moo() {
    return quasi { say "OH HAI" }
}

{
    sub say($arg) { callwith("ARGLEBARGLE FLOOT GROMP") }
    moo();    # 'OH HAI'
}

Then say would still refer to the setting's say.

Basically, hygiene is a way to provide the macro author with basic guarantees that wherever the macro code gets inserted, it will behave like it would in the environment of the macro.

The same is not true if we manually return Q objects from the macro:

Q::Block.new(Q::Statement::Expr.new(Q::Postfix::Call.new(
    Q::Identifier("say"),
    Q::Literal::Str("OH HAI")
)));

In this case, say will be a "detached" identifier, and the corresponding two examples above would output "OH HAI" with all-caps and "ARGLEBARGLE FLOOT GROMP".

The simple explanation to this is that code inside a quasi can have a surrounding environment (namely that which surrounds the quasi)... but a bunch of synthetically created Q objects can't.

We're planning to use this to our advantage, providing the safe/sane quasiquoting mechanisms for most things, and the synthetic Q object creation mechanism for when you want to mess around with unhygiene.

Q: Excellent! So when will all this land in Perl 6? I'm so eager to...

Ok, that was all the questions we had time for today! Thank you very much, and see you next time!

Strategic rebasing

Just a quick mention of a Git pattern I discovered recently, and then started using a whole lot:

  1. Realize that a commit somewhere in the commit history contained a mistake (call it commit 00fbad).

  2. Unless it's been fixed already, fix it immediately and push the fix.

  3. Then, git checkout -b test-against-mistake 00fbad, creating a branch rooted in the bad commit.

  4. Write a test against the bad thing. See it fail. Commit it.

  5. git rebase master.

  6. Re-run the test. Confirm that it now passes.

  7. Check out master, merge test-against-mistake, push, delete the branch.

There are several things I like about this pattern.

First, we're using the full power of Git's beautiful (distributed) graph theory model. Basically, we're running the branch in two different environments: one where the thing is broken and one where the thing is fixed. Git doesn't much care where the two base commits are; it just takes your work and reconstitutes it in the new place. Typically, rebasing is done to "catch up" with other people's recent work. Here, we're doing strategic rebasing, intentionally starting from an old state and then upgrading, just to confirm the difference.

Second, there's a more light-weight pattern that does this:

  1. Fix the problem.

  2. Stash the fix.

  3. Write the test. See it fail.

  4. git stash pop

  5. Confirm test now passes.

  6. Commit test and fix.

This is sometimes fully adequate and even simpler (no branches). But what I like about the full pattern is (a) it prioritizes the fix (which makes sense if I get interrupted in the middle of the job), and (b) it still works fine even if the problem was fixed long ago in Git history.

Git and TDD keep growing closer together in my development. This is yet another step along that path.

Double-oh seven

It's now one year since I started working on 007. Dear reader, in case I haven't managed to corner you and bore you with the specifics already, here's the elevator pitch:

007 is a deliberately small language that's fun and easy to develop where I/we can iterate quickly on all the silly mistakes needed to get good macros in Perl 6.

At this stage, I'd say we're way into "iterate quickly", and we're starting to see the "good macros" part emerge. (Unless I'm overly optimistic, and we just went from "easy" into a "silly mistakes" phase.)

Except for my incessant blabbering, we have been doing this work mostly under the radar. The overarching vision is still to give 007 the three types of macros I imagine we'll get for Perl 6. Although we have the first type (just as in Rakudo), we're not quite there yet with the other two types.

Instead of giving a dry overview of the internals of the 007 parser and runtime — maybe some other time — I thought I would share some things that I've discovered in the past year.

The AST is the program

The first week of developing 007, the language didn't even have a syntax or a parser. Consider the following simple 007 script:

say("Greetings, Mister Bond!");

In the initial tests where we just wanted to run such a program without parsing it, this program would be written as just its AST (conveniently expressed using Lisp-y syntax):

(compunit (block
  (paramlist)
  (stmtlist (stexpr (postfix:<()>
    (ident "say")
    (arglist (str "Greetings, Mister Bond!")))))))

(In the tests we helpfully auto-wrap a compunit and a block on the outermost level, since these are always the same. So in a test you can start writing at the stmtlist.)

But 007 isn't cons lists internally — that's just a convenient way to write the AST. The thing that gets built up is a Qtree, consisting of specific Q nodes for each type of program element. When I ask 007 to output what it built, it gives me this:

Q::CompUnit Q::Block {
    parameterlist: Q::ParameterList [],
    statementlist: Q::StatementList [Q::Statement::Expr Q::Postfix::Call {
        expr: Q::Identifier "say",
        argumentlist: Q::ArgumentList [Q::Literal::Str "Greetings, Mister Bond!"]
    }]
}

Yes, it's exactly the same structure, but with objects/arrays instead of lists. This is where 007 begins, and ends: the program is a bunch of objects, a hierarchy that you can access from the program itself.

I've learned a lot (and I'm still learning) in designing the Qtree API. The initial inspiration comes from IntelliJ's PSI, a similar hierarchy for describing Java programs. (And to do refactors and analysis, etc.)

The first contact people have with object-oriented design tends to be awkward and full of learning experiences. People inevitably design according to physical reality and what they see, which is usually a bad fit for the digital medium. Only by experience does one learn to play to the strengths of the object system, the data model, and the language. I find the same to be true of the Qtree design: initially I designed it according to what I could see in the program syntax. Only gradually have my eyes been opened to the fact that Qtrees are their own thing (and in 007, the primary thing!) and need to be designed differently from textual reality and what I can see.

Some examples:

Qtrees are the primary thing. This is the deep insight (and the reason for the section title). We usually think of the source text as the primary representation of the program, and I guess I've been mostly thinking about it that way too. But the source text is tied to the textual medium in subtle ways, and not a good primary format for your program. Much of what Qtrees do is separate out the essential parts of your program, and store them as objects.

(I used to be fascinated by, and follow at a distance, a project called Eidola, aiming to produce a largely representation-independent programming medium. The vision is daring and interesting, and maybe not fully realistic. Qtrees are the closest I feel I've gotten to the notion of becoming independent of the source code.)

Another thing I've learned:

Custom operators are harder to implement than macros

I thought putting macros in place wold be hard and terrible. But it was surprisingly easy.

Part of why, of course, is that 007 is perfectly poised to have macros. Everything is already an AST, and so macros are basically just some clever copying-and-pasting of AST fragments. Another reason is that it's a small system, and we control all of it and can make it do what we want.

But custom operators, oh my. They're not rocket science, but there's just so many moving parts. I'm pretty sure they're the most tested feature in the language. Test after test with is looser, is tighter, is equal. Ways to do it right, ways to do it wrong. Phew! I like the end result, but... there are a few months of drastically slowed 007 activity in the early parts of 2015, where I was basically waiting for tuits to finish up the custom-operators branch. (Those tuits finally arrived during YAPC::Europe, and since then development has picked up speed again.)

I'm just saying I was surprised by custom operators being hairier than macros! But I stand by my statement: they are. At least in 007.

(Oh, and by the way: you can combine the two features, and get custom operator macros! That was somewhere in the middle in terms of difficulty-to-implement.)

Finally:

Toy language development is fun

Developing a small language just to explore certain aspects of language design space is unexpectedly addictive, and something I hope to do lots of in the next few years. I'm learning a lot. (If you've ever found yourself thinking that it's unfortunate that curly braces ({}) are used both for blocks and objects, just wait until you're implementing a parser that has to keep those two straight.)

There's lots of little tradeoffs a language designer makes all day. Even for a toy language that's never going to see any actual production use, taking those decisions seriously leads you to interesting new places. The smallest decision can have wide-ranging consequences.

If you're curious...

...here's what to look at next.

Looking forward to where we will take 007 in 2016. We're gearing up to an (internal) v1.0.0 release, and after that we have various language/macro ideas that we want to try out. We'll see how soon we manage to make 007 bootstrap its runtime and parser.

I want to take this opportunity to thank Filip Sergot and vendethiel for commits, pull requests, and discussions. Together we form both the developer team and user community of 007. 哈哈