Feb. 19th, 2013

tim: Tim with short hair, smiling, wearing a black jacket over a white T-shirt (Default)
I fixed #4736, which was easy enough, but that led to an adventure in refactoring the Rust typechecker.

Right now, most functions in the typechecker associate zero or more node IDs with types as a side effect (in a hash table that gets passed around), and return a bool. Every node in the abstract syntax tree (AST) has a node ID that identifies it uniquely.

The bool return value denotes whether or not the expression diverges. You can think of what that means like this: there's a primitive called fail, and fail or anything that definitively invokes fail diverges. (It's a little more complicated now because fail is a macro, but never you mind.) That information gets used to warn about statements that are statically known to be unreachable.

At the same time, one of the types that exists in the internal representation of types is ty_error, which represents an erroneous value. Normally a typechecker would abort at the first error it sees, but we want to report as many errors as possible during one compilation session, so we need a name for the type that erroneous subexpressions have. We need it because otherwise we end up reporting derived errors -- errors that are because ty_error doesn't match some other expected type. But if we saw a ty_error, we know we already reported an error about it, so we don't need to report lots more errors and overwhelm the programmer.

#4736 was because there was code that needed to execute to fill in certain data structures in error cases -- solely so we can keep typechecking and try to report more errors -- but it was getting skipped, reflecting an assumption that a certain error was fatal when it wasn't. At the same time, there were some missing checks for ty_error, so derived errors were getting reported.

But then I started thinking that I wanted to make the handling of ty_error more principled, so I changed the bool return value to a three-valued type that I called Outcome. An Outcome is either a ProperType (no errors), a BotType (saw something that diverges, which is not necessarily an error) or a TypeError (saw a type error). So in addition to the types that get written to tables as a side effect, typechecker functions now return an Outcome. This works, and is hopefully going to make it less likely for more derived errors to get reported accidentally.

But then I thought: why do we need a separate return flag in the first place? No function should ever return a type with a sub-component that is ty_error -- in that case, it should just return ty_error. For ty_bot, it's a little more complicated: for example, the expression true || fail!() doesn't always diverge (in fact, it never does) even though one of its components does. Nonetheless, I think if an expression isn't known to diverge, we don't need to know if any of its subcomponents are known to diverge.

The reason why the flag needs to be there right now is that the typing rules (as manifested in the typechecker) are non-compositional. We need the flag because we need to track information about an expression that isn't reflected in its type. But I think that's a bad code smell.

So I'm going to take a stab at making the rules compositional, which means that typechecker functions will return unit instead of bool or Outcome. This shouldn't be hard, because the internal representation of types already has a flag that says whether it contains ty_error -- so I can just make the constructors smart and return ty_error for anything whose representation would contain ty_error (like ~[ty_error]). It's a little harder to handle ty_bot correctly, but it shouldn't be too much harder.

Profile

tim: Tim with short hair, smiling, wearing a black jacket over a white T-shirt (Default)
Tim Chevalier

November 2021

S M T W T F S
 123456
78 910111213
14151617181920
21222324252627
282930    

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags