TMI: Bad copies
Jan. 11th, 2013 09:02 pmWhile doing bug triage yesterday, I got sucked into the black hole of fixing bad copies. Actually, it wasn't that hard once I got the hang of it, but there was some fumbling involved.
To give some more context, Rust has three different kinds of pointers: ~, pronounced "owned", @, pronounced "managed", and &, pronounced borrowed. So ~int is an owned pointer to a machine int; @int is a managed pointer to a machine int; and &int is a borrowed pointer to a machine int. @ pointers are garbage-collected and can be freely copied, while ~ pointers are guaranteed by the type system to always have a single "owner", meaning they can be freed at the end of the scope associated with the owner. Owned pointers can be "borrowed" as long as the span of time of the borrow can be statically guaranteed to be a sub-interval of the owner's lifetime.
Pushing so much of the memory model into the type system lets you program in Rust in a way that lets you relax and know that the typechecker will likely catch mistakes involving undesired copying. By declaring data as type ~T for any type T, you're saying you don't want it to be copied (unless you really mean to). But when you don't care whether your pointers get copied or not, you can use @T and copy to your heart's content.
Strings and vectors also come in three flavors, just like the ones for pointers.
I think this is all totally neato, but it was added to Rust fairly late in the language's history. So in the Rust compiler, there's lots of awkward code that copies stuff around for no good reason, except the historical reasons that for a while, there was only a ~str type (and in truth, @str still isn't well-supported); and also, references (borrowed pointers) weren't first-class in the past. Now that these restrictions have been lifted, it should be straightforward to update the code to pass non-copyable types by reference and to use @-vectors and @-strings for references that we don't mind copying. Still, it's easy to make one change and find that if you propagate changes naïvely, you're changing one of the most basic data types in the compiler (I found this out when inadvertently, a consequence of a change I made was changing idents to @strs. I backed that out pretty quick).
I finally got to a set of changes that still has a few copies, but less than where I started. Hopefully I've learned something from this adventure and will be able to progress more quickly later on.
To give some more context, Rust has three different kinds of pointers: ~, pronounced "owned", @, pronounced "managed", and &, pronounced borrowed. So ~int is an owned pointer to a machine int; @int is a managed pointer to a machine int; and &int is a borrowed pointer to a machine int. @ pointers are garbage-collected and can be freely copied, while ~ pointers are guaranteed by the type system to always have a single "owner", meaning they can be freed at the end of the scope associated with the owner. Owned pointers can be "borrowed" as long as the span of time of the borrow can be statically guaranteed to be a sub-interval of the owner's lifetime.
Pushing so much of the memory model into the type system lets you program in Rust in a way that lets you relax and know that the typechecker will likely catch mistakes involving undesired copying. By declaring data as type ~T for any type T, you're saying you don't want it to be copied (unless you really mean to). But when you don't care whether your pointers get copied or not, you can use @T and copy to your heart's content.
Strings and vectors also come in three flavors, just like the ones for pointers.
I think this is all totally neato, but it was added to Rust fairly late in the language's history. So in the Rust compiler, there's lots of awkward code that copies stuff around for no good reason, except the historical reasons that for a while, there was only a ~str type (and in truth, @str still isn't well-supported); and also, references (borrowed pointers) weren't first-class in the past. Now that these restrictions have been lifted, it should be straightforward to update the code to pass non-copyable types by reference and to use @-vectors and @-strings for references that we don't mind copying. Still, it's easy to make one change and find that if you propagate changes naïvely, you're changing one of the most basic data types in the compiler (I found this out when inadvertently, a consequence of a change I made was changing idents to @strs. I backed that out pretty quick).
I finally got to a set of changes that still has a few copies, but less than where I started. Hopefully I've learned something from this adventure and will be able to progress more quickly later on.