tim | TMI: More typechecker refactoring

Almost finished with the big typechecker refactor. Or the typechecker refactor of moderate size, anyway. Getting libcore to compile with the refactored typechecker was painful, but once I got there, I got all the way through building stage1 and most of the run-pass tests! The first round of tests that failed were:

failures:
    [run-pass] /Users/tchevalier/rust3/src/test/run-pass/unreachable-code-1.rs
    [run-pass] /Users/tchevalier/rust3/src/test/run-pass/unreachable-code.rs
    [run-pass] /Users/tchevalier/rust3/src/test/run-pass/weird-exprs.rs

There was something satisfying about that. Little weird-exprs tests doing their job.

As for why the unreachable-code tests failed: trans (the part of rustc that translates Rust to LLVM) tries to use type information to avoid generating code for unreachable Rust expressions. This is all pretty ad hoc, but since we have the _|_ type in the type system, why not use it?

The expression in this case looks like return + 1. In the old typechecker, if you looked up the type for this instance of 1, it would have been int or uint. But return always returns and arguments to + in Rust are always evaluated left to right, so we know that the 1 is unreachable here. So the new typechecker says 1 has type _|_ in this context.

That caused the code to translate int literals to complain, since it's not expecting an int to have type _|_. Why would anyone write code like this, you ask? They probably wouldn't, but macros could.)

So I'll just make trans more consistent in when it avoids generating code for unreachable expressions. Arguably it might be better to make trans not approximate what code is unreachable and just treat _|_-typed things as if they are reachable and generate code for them, but that's a change for another day.

Flat | Top-Level Comments Only

From:

tim

That's true. In fact, one reason why typestate was less than useful when we had typestate was that codegen wasn't taking any of the typestate information into account, and in particular, couldn't eliminate patterns that typestate "knows" are unreachable. (It couldn't do that because our typestate predicates weren't guaranteed to actually mean anything, being written in Rust itself rather than in a less powerful subset, so it would have been unsound.)

If you're curious, Niko proposed this take on datasort refinements back in August, and we haven't discussed it much since then but I personally want it or something like it to be in some Rust version greater than 1.0.

etb

I just read Niko's post, and (if I understood it, which I may not have) it seems like a lot of work for something that's not that expressive. It's datasort refinements where the refinements have to just be subsets according to the head constructor. Then you can't express things like "list of booleans of even parity" (because whether something has even parity depends on the parity of the tail) or the colour invariant of red-black trees (because it depends on the colours of the children).

Maybe more importantly, you can't use that kind of refinement to express different sublanguages of ASTs—say you're writing a compiler, and you have some AST constructors that correspond to syntactic sugar ("derived forms"). Some pass is going to get rid of all the derived forms, and you want to write the subsequent passes without bothering with the cases for the derived forms, because you know they're impossible. If your refinements can look only at the head constructor, you can rule out derived forms at the top level of the case, but the type system doesn't know that the derived forms don't occur further down the AST. You can express that something is a binop, but you can't express that it's binops all the way down.

I was going to say that full datasort refinements shouldn't be much harder, but then I remembered that then you would need intersection types. (What? not everyone has those already?)

Maybe more importantly, you can't use that kind of refinement to express different sublanguages of ASTs [...]

Hmm, I think you're right about that, though I need to think about it more. And that would be too bad, since what you're describing about derived forms is exactly one of the sorts of things I'd like to use refinements for in the Rust compiler. More study needed, obviously.

As far as things like even-length lists (or vectors) or red-black tree invariants, I'm not sure that expressing these is a priority, because we're trying to focus on expressing patterns that occur frequently in systems programs. Obviously, you might use a red-black tree when writing an OS, but I'm not sure how urgent it would be, for people really writing systems code, to verify its correctness through types (rather than testing). In Rust we're focusing on trying to capture patterns that already arise in C and C++ programs and make them checkable by the compiler -- while integrating more ambitious static property checking on top of Rust would be an interesting research project for somebody, I'm not sure how much of a priority it is for us. I personally want refinements to eliminate "fail, the impossible happened" cases in the compiler, which could be addressed with subsets of enums (even if it's not exactly clear right now how to carve out a system that's just powerful enough).

S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Tim's journal

Taking metaphors too far since 1995

TMI: More typechecker refactoring

(no subject)

(no subject)

(no subject)

Profile

November 2021

Links

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags