Strangely Consistent

Musings about programming, Perl 6, and programming Perl 6

Refactoring the universe

I'm here to share a thing I'm working on, and chew gum; and I'm all out of gum. The purpose of this post is both to break a disagreeable silence that has befallen this blog, and to be able to geek out about a niche topic here in writing, partially sparing friends and family.

I'm currently refactoring 007's type system in a branch. Basically, ever since 007 was created, a type in 007-land has corresponded to a class/role in Perl 6, the host system that's implementing 007.

Here's what the implementation of Int looks like currently:

class Val::Int does Val {
    has Int $.value;

    method truthy {
        ?$.value;
    }
}

And here's what it looks like as I'm doing the refactor:

constant TYPE is export = {};

BEGIN {
    # ...
    TYPE<Int> = make-type "Int", :backed;
    # ...
}

So, instead of corresponding to types on the host level, all the 007 types are about to correspond to values. The former implementation was the one that felt obvious at the time (four-plus years ago), but it's become blindingly, painstakingly obvious that it really needs to be the latter.

Here's why: as soon as you want to implement class declarations in 007, in the former model you also need to bend over backwards and come up with an entirely new type in the host system. The Perl 6 code to do that looks like this:

return $.type.new(:type(EVAL qq[class :: \{
    method attributes \{ () \}
    method ^name(\$) \{ "{$name}" \}
\}]));

Which is... even someone as EVAL-positive as I wishes for a less clunky solution.

In the new model, a new class comes down to calling make-type and dropping the result in that TYPE hash. (Wait. Or not even dropping it in the TYPE hash. That hash is only for things used by 007 itself, not for user types.)

This is a refactor I've tried once before, back in 2017, but I failed back then because the code got too complicated and ran too slow. This time around I have a much better feeling.

By the way, there's also an is-type subroutine, and similarly a make-int and an is-int subroutine, and so on for every registered type. I figure why not wrap those simple things up in very short function names. So far that turns out to have been a very good decision. "Fold the language of your domain model into your code", and so on.

This is one of the things I'm doing better this time around; last time one of the problems was that each line I touched got longer and messier because there were more layers of indirection to dig through. Concerns were scattered all over the place. This time, it feels like the codebase is getting simpler thanks to those named subroutines. Maybe it can be likened to putting all your database-specific code in one place.

I sometimes get slight vertigo due to the bootstrapping aspects of this type system. One example: Object is an instance of Type, but the base class of Type is Object — a circularity. But, it turns out, if you have absolute power over the object graph, you can always bend things to your will:

BEGIN {
    TYPE<Type> = _007::Value.new(:type(__ITSELF__), slots => { name => "Type" });
    TYPE<Object> = make-type "Object";
    {
        # Bootstrap: now that we have Object, let's make it the base of Type and Object
        TYPE<Type>.slots<base> = TYPE<Object>;
        TYPE<Object>.slots<base> = TYPE<Object>;
    }
    # ...
}

I'm slightly reminded of a thing Gilad Bracha wrote once (which I can't find the exact quote for, unfortunately): that if mutual dependencies and recursive definitions are something that stump you, what you need is a healthy dose of letrec. It's twisty, yes, but it's basically a solved problem.

Like last time, I'm tackling the big task in small steps, one type at a time. I feel I've learned this from Martin Fowler's concept of asset capture. The idea is to end up back with a running system with passing tests often. I do this by replacing one old thing at a time by a new thing. Sounds obvious, but I'm actually not sure I would have been sensible enough on my own to tackle it this way, had I not known about asset capture.

One drawback is that you're sort of running the old system and the new system in parallel, as the old one is being phased out. Only once the whole thing has been asset-captured can complexity associated with the old system be completely removed.

It's a pleasant way to work. To me it's been at least a partial answer to the problem of the big rewrite. If we're free to refactor the insides, we can successively arrive at a point where the new better thing has completely replaced the old thing. The way there is allowed to be a little bit more complex (on the inside) than either endpoint. Importantly, you keep a running system throughout.

I don't have a concluding thought, except to say that I just managed to asset-capture arrays. Which is harder than it sounds, because arrays are everywhere in the compiler and the runtime.

Has it been three years?

007, the toy language, is turning three today. Whoa.

On its one-year anniversary, I wrote a blog post to chronicle it. It seriously doesn't feel like two years since I wrote that post.

On and off, in between long stretches of just being a parent, I come back to 007 and work intensely on it. I can't remember ever keeping a side project alive for three years before. (Later note: Referring to the language here, not my son.) So there is that.

So in a weird way, even though the language is not as far along as I would expect it to be after three years, I'm also positively surprised that it still exists and is active after three years!

In the previous blog post, I proudly announce that "We're gearing up to an (internal) v1.0.0 release". Well, we're still gearing up for v1.0.0, and we are closer to it. The details are in the roadmap, which has become much more detailed since then.

Noteworthy things that happened in these past two years:

Things that I'm looking forward to right now:

I tried to write those in increasing order of difficulty.

All in all, I'm quite eager to one day burst into #perl6 or #perl6-dev and actually showcase examples where macros quite clearly do useful, non-trivial things. 007 is and has always been about producing such examples, and making them run in a real (if toy) environment.

And while we're not quite over that hump yet, we're perceptibly closer than we were two years ago.

Belated addendum: Thanks and hugs to sergot++, to vendethiel++, to raiph++ and eritain++, for sharing the journey with me so far.

The root of all eval

Ah, the eval function. Loved, hated. Mostly the latter.

$ perl -E'my $program = q[say "OH HAI"]; eval $program'
OH HAI

I was a bit stunned when the eval function was renamed to EVAL in Perl 6 (back in 2013, after spec discussion here). I've never felt really comfortable with the rationale for doing so. I seem to be more or less alone in this opinion, though, which is fine.

The rationale was "the function does something really weird, so we should flag it with upper case". Like we do with BEGIN and the other phasers, for example. With BEGIN and others, the upper-casing is motivated, I agree. A phaser takes you "outside of the normal control flow". The eval function doesn't.

Other things that we upper-case are things like .WHAT, which look like attributes but are really specially code-generated at compile-time into something completely different. So even there the upper-casing is motivated because something outside of the normal is happening.

eval in the end is just another function. Yes, it's a function with potentially quite wide-ranging side effects, that's true. But a lot of fairly standard functions have wide-ranging side effects. (To name a few: shell, die, exit.) You don't see anyone clamoring to upper-case those.

I guess it could be argued that eval is very special because it hooks into the compiler and runtime in ways that normal functions don't, and maybe can't. (This is also how TimToady explained it in the commit message of the renaming commit.) But that's an argument from implementation details, which doesn't feel satisfactory. It applies with equal force to the lower-cased functions just mentioned.

To add insult to injury, the renamed EVAL is also made deliberately harder to use:

$ perl6 -e'my $program = q[say "OH HAI"]; EVAL $program'
===SORRY!=== Error while compiling -e
EVAL is a very dangerous function!!! (use the MONKEY-SEE-NO-EVAL pragma to override this error,
but only if you're VERY sure your data contains no injection attacks)
at -e:1
------> program = q[say "OH HAI"]; EVAL $program⏏<EOL>

$ perl6 -e'use MONKEY-SEE-NO-EVAL; my $program = q[say "OH HAI"]; EVAL $program'
OH HAI

Firstly, injection attacks are a real issue, and no laughing matter. We should educate each other and newcomers about them.

Secondly, that error message ("EVAL is a very dangerous function!!!") is completely over-the-top in a way that damages rather than helps. I believe when we explain the dangers of code injection to people, we need to do it calmly and matter-of-factly. Not with three exclamation marks. The error message makes sense to someone who already knows about injection attacks; it provides no hints or clues for people who are unaware of the risks.

(The Perl 6 community is not unique in eval-hysteria. Yesterday I stumbled across a StackOverflow thread about how to turn a string with a type name into the corresponding constructor in JavaScript. Some unlucky soul suggested eval, and everybody else immediately piled on to point out how irresponsible that was. Solely as a knee-jerk reaction "because eval is bad".)

Thirdly, MONKEY-SEE-NO-EVAL. Please, can we just... not. 😓 Random reference to monkies and the weird attempt at levity while switching on a nuclear-chainsaw function aside, I find it odd that a function that enables EVAL is called something with NO-EVAL. That's not Least Surprise.

Anyway, the other day I realized how I can get around both the problem of the all-caps name and the problem of the necessary pragma:

$ perl6 -e'my &eval = &EVAL; my $program = q[say "OH HAI"]; eval $program'
OH HAI

I was so happy to realize this that I thought I'd blog about it. Apparently the very dangerous function (!!!) is fine again if we just give it back its old name. 😜