tim | TMI: More typechecker refactoring

Almost finished with the big typechecker refactor. Or the typechecker refactor of moderate size, anyway. Getting libcore to compile with the refactored typechecker was painful, but once I got there, I got all the way through building stage1 and most of the run-pass tests! The first round of tests that failed were:

failures:
    [run-pass] /Users/tchevalier/rust3/src/test/run-pass/unreachable-code-1.rs
    [run-pass] /Users/tchevalier/rust3/src/test/run-pass/unreachable-code.rs
    [run-pass] /Users/tchevalier/rust3/src/test/run-pass/weird-exprs.rs

There was something satisfying about that. Little weird-exprs tests doing their job.

As for why the unreachable-code tests failed: trans (the part of rustc that translates Rust to LLVM) tries to use type information to avoid generating code for unreachable Rust expressions. This is all pretty ad hoc, but since we have the _|_ type in the type system, why not use it?

The expression in this case looks like return + 1. In the old typechecker, if you looked up the type for this instance of 1, it would have been int or uint. But return always returns and arguments to + in Rust are always evaluated left to right, so we know that the 1 is unreachable here. So the new typechecker says 1 has type _|_ in this context.

That caused the code to translate int literals to complain, since it's not expecting an int to have type _|_. Why would anyone write code like this, you ask? They probably wouldn't, but macros could.)

So I'll just make trans more consistent in when it avoids generating code for unreachable expressions. Arguably it might be better to make trans not approximate what code is unreachable and just treat _|_-typed things as if they are reachable and generate code for them, but that's a change for another day.

Flat | Top-Level Comments Only

From:

cxcvi

It's at this point that I feel somewhat compelled to ask what exactly the _|_ type means. And how it's supposed to be pronounced.

tim

Sure! _|_ is pronounced "bottom" and at least here, it means the type of values whose evaluation never returns to the caller (immediate context). So return has type _|_ because it returns immediately to the callee -- and fail!() expands to code using a primitive operation with type _|_, because it halts the program. Keep in mind that this is different from the type () (nil) -- let x = (); binds x to a value of type nil, whereas let x = return; pops the current stack frame before x has a chance to get bound.

We care because currently, the Rust typechecker looks for unreachable expressions -- ones that would have to be preceded by evaluating a value of type _|_ in the same context -- and warns about them. Though this should really be done outside the typechecker anyway.

The term comes from the theory of lattices and partial orders, and I guess Wikipedia is as good for the background on that as anything, though you don't have to understand all the math to understand how it works in programming languages (as usual).

sanxiyn

http://en.wikipedia.org/wiki/Bottom_type is probably a better Wikipedia reference.

etb

(Disclaimer: I don't actually know anything about Rust.)

I think all you need to know about partial orders is that ⊥ is below everything in the subtyping relation (which is a partial order). And ⊥ is below everything because it's uninhabited; the empty set is a subset of every set.

(I guess you mean the type of expressions (statements?) whose evaluation never returns? Values should be already evaluated, and you'll never actually get a value of type ⊥.)

Whoops, yes -- expressions. (In Rust, almost everything is an expression and a statement is just an expression with a semicolon after it, or a let-binding.)

Arguably it might be better to make trans not approximate what code is unreachable

You've mentioned possibly wanting to add some kind of refinement types to Rust, so I just want to point out: If you have refinement types, you'll probably want to use them to write matches that aren't trivially exhaustive (but are exhaustive given the refinement type), like

  case (xs : nonemptylist) of
   cons(h,t) ⇒ e1

(ML syntax, sorry.) And you'd also want to not generate any code for the nil arm if it's given:

  case (xs : nonemptylist) of
    cons(h,t) ⇒ e1
  | nil ⇒ e2

because e2 will be unreachable. (Depending on exactly how you arrange the type system, you might not even be able to generate code from e2.)

That's true. In fact, one reason why typestate was less than useful when we had typestate was that codegen wasn't taking any of the typestate information into account, and in particular, couldn't eliminate patterns that typestate "knows" are unreachable. (It couldn't do that because our typestate predicates weren't guaranteed to actually mean anything, being written in Rust itself rather than in a less powerful subset, so it would have been unsound.)

If you're curious, Niko proposed this take on datasort refinements back in August, and we haven't discussed it much since then but I personally want it or something like it to be in some Rust version greater than 1.0.

I just read Niko's post, and (if I understood it, which I may not have) it seems like a lot of work for something that's not that expressive. It's datasort refinements where the refinements have to just be subsets according to the head constructor. Then you can't express things like "list of booleans of even parity" (because whether something has even parity depends on the parity of the tail) or the colour invariant of red-black trees (because it depends on the colours of the children).

Maybe more importantly, you can't use that kind of refinement to express different sublanguages of ASTs—say you're writing a compiler, and you have some AST constructors that correspond to syntactic sugar ("derived forms"). Some pass is going to get rid of all the derived forms, and you want to write the subsequent passes without bothering with the cases for the derived forms, because you know they're impossible. If your refinements can look only at the head constructor, you can rule out derived forms at the top level of the case, but the type system doesn't know that the derived forms don't occur further down the AST. You can express that something is a binop, but you can't express that it's binops all the way down.

I was going to say that full datasort refinements shouldn't be much harder, but then I remembered that then you would need intersection types. (What? not everyone has those already?)

Maybe more importantly, you can't use that kind of refinement to express different sublanguages of ASTs [...]

Hmm, I think you're right about that, though I need to think about it more. And that would be too bad, since what you're describing about derived forms is exactly one of the sorts of things I'd like to use refinements for in the Rust compiler. More study needed, obviously.

As far as things like even-length lists (or vectors) or red-black tree invariants, I'm not sure that expressing these is a priority, because we're trying to focus on expressing patterns that occur frequently in systems programs. Obviously, you might use a red-black tree when writing an OS, but I'm not sure how urgent it would be, for people really writing systems code, to verify its correctness through types (rather than testing). In Rust we're focusing on trying to capture patterns that already arise in C and C++ programs and make them checkable by the compiler -- while integrating more ambitious static property checking on top of Rust would be an interesting research project for somebody, I'm not sure how much of a priority it is for us. I personally want refinements to eliminate "fail, the impossible happened" cases in the compiler, which could be addressed with subsets of enums (even if it's not exactly clear right now how to carve out a system that's just powerful enough).

S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Tim's journal

Taking metaphors too far since 1995

TMI: More typechecker refactoring

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

Profile

November 2021

Links

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags