tim: Tim with short hair, smiling, wearing a black jacket over a white T-shirt (working)
Tim Chevalier ([personal profile] tim) wrote2012-07-18 11:29 pm
Entry tags:

TMI: Tired of Swordfighting

I haven't been writing anything in a while. Honestly, I've been experiencing burnout. It's not that I've been working too hard or anything like that; just that when a colleague tells you "we don't want you around", it's harder to convince yourself to go to work in the morning, or even in the afternoon.

But I should talk about the new and exciting thing I'm working on. The Rust compiler is slow, and while there are any number of reasons for that, one of them is that it recompiles the entire crate every time you change any item, even in a way that's unobservable by any code that depends on the item. Compilers like GHC, with its --make mode, and SML/NJ, with its compilation manager, have been doing incremental recompilation for a long time, so there's no good excuse for us not to do the same for Rust. Incremental recompilation means that the compiler automatically determines dependencies between declared functions, and avoids recompiling files if they haven't changed since the last build and if nothing they depend on has changed either. For example, if I added a #debug statement to the body of a Rust function and made no other changes, it would be safe to recompile only the code for that function.

That sounds easy enough, but in Rust, determining what it means for one item to depend on another isn't straightforward. For one thing, there's more than one kind of dependency. For example, in:

fn f(x: int) {
   g(x);
}

fn g(x: int) {
   #error(x);
}
it's obvious that f depends on g in the sense that if g's type changes -- for example, if the programmer changed its signature to:
fn g(x: bool) { ...
-- the compiler needs to re-typecheck, and therefore recompile, f. But if g changes without its type changing, there's no need to recompile f just because g changed. On the other hand, in:
fn f(x: int) {
   g(x);
}

fn g(_x: T) {
    #error("Hello!");       
}
f depends on g more strongly, because g is polymorphic and therefore the compiler effectively inlines its body into f; we can imagine the generated code looking as it it was compiled from:
fn f(x: int) {
   #error("Hello!");
}
If we changed g to:
fn g(_x: T) {
   #error("Hi!");
}
and we also assumed that because g's type hasn't changed,
f
shouldn't be recompiled, the output -- "Hello!" instead of "Hi!" -- would probably surprise the programmer.

I'm also dancing around the issue of whether the unit of incremental compilation is the item (recompile code for item J only if any items that J depends on changed) or the file (recompile all the items in file F only if for some item J in F, one of J's dependencies changed). It's not clear to me right now whether the compile-time performance savings from doing the former would justify the added complexity involved.

For now, all I'm doing is trying to write code to generate item dependencies and display them as a graph (something that's useful in and of itself). I'll freely admit to not knowing exactly where I'm going with this yet -- discussions that have yet to happen with people who know more about the compiler pipeline as a whole than I do will help with that -- but I'm strongly motivated by wanting to make the time to rebuild rustc faster, since getting blocked for several minutes every time I need to recompile is a big productivity killer after a while. And the xkcd "I'm not slacking off, my code is compiling" comic stopped being funny a long time ago.

Dependencies

[personal profile] nikomatsakis 2012-07-19 01:42 pm (UTC)(link)
It seems to me there are three options for caching the compiled code:

- ASTs
- Generated LLVM bitcode
- Compiled code

ASTs are probably useless since most of our time is spent in trans and LLVM. Generated bitcode avoids re-running trans and avoids issues around inlining as well, but still requires that we pay the full LLVM price. Caching compiled code will, I would guess, be the most complex and result in slower executables since LLVM will not be optimizing the whole crate as a unit, though if we're doing a non-optimizing build that's obviously fine. I kind of think caching LLVM bitcode is the best "bang for our buck" but I could be persuaded otherwise.

One other comment on items vs files: in general, rustc quickly forgets the file from which code originated, so I actually think doing the caching at the level of items or modules will be the easiest.