1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309
/** # Accurate mental model for Rust's reference types <sup>*by [David Tolnay] , 2019.10.01*</sup> [David Tolnay]: https://github.com/dtolnay <br> Rust's [ownership and borrowing system][ownership] involves the use of *references* to operate on borrowed data, and the type system distinguishes two different fundamental reference types. In code they are spelled **`&T`** and **`&mut T`**. `&mut T` is commonly known as a "mutable reference" to data of type `T`. By juxtaposition, `&T` is then an "immutable reference" or "const reference" to `T`. These names are fine and reasonably intuitive for Rust beginners, but this article lays out the motivation for preferring the names "shared reference" and "exclusive reference" as you grow beyond the beginner stage and get into library design and some more advanced aspects of the language. [ownership]: https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html <br> ## The beginner's understanding As described in the [References and Borrowing][borrowing] chapter of the Rust Book, a function that takes an argument by immutable reference is allowed to read the data behind the reference: [borrowing]: https://doc.rust-lang.org/book/ch04-02-references-and-borrowing.html ``` struct Point { x: u32, y: u32, } fn print_point(pt: &Point) { println!("x={} y={}", pt.x, pt.y); } ``` but is not allowed to mutate that data: ```compile_fail # struct Point { # x: u32, # y: u32, # } # fn embiggen_x(pt: &Point) { pt.x = pt.x * 2; } ``` ```console error[E0594]: cannot assign to `pt.x` which is behind a `&` reference --> src/main.rs | 1 | fn embiggen_x(pt: &Point) { | ------ help: consider changing this to be a mutable reference: `&mut Point` 2 | pt.x = pt.x * 2; | ^^^^^^^^^^^^^^^ `pt` is a `&` reference, so the data it refers to cannot be written ``` In order to mutate fields of a struct, or call mutating methods such as appending to a vector, the argument must be taken by `&mut` reference. ``` # struct Point { # x: u32, # y: u32, # } # fn embiggen_x(pt: &mut Point) { pt.x = pt.x * 2; // okay } ``` This distinction, and the terminology of "immutable reference" and "mutable reference", is typically adequate for writing one's first few toy programs with Rust. <br> ## It falls apart Sooner or later you will encounter a library signature that flatly contradicts the beginner's mental model of Rust references. Let's take a look at the `store` method of [`AtomicU32`] from the standard library as one example of this. The signature is: [`AtomicU32`]: https://doc.rust-lang.org/std/sync/atomic/struct.AtomicU32.html ``` # struct AtomicU32; # impl AtomicU32 { # const IGNORE: &'static str = stringify! { pub fn store(&self, val: u32, order: Ordering); # }; } ``` You give it a u32 value, and it atomically changes the number inside the `AtomicU32` to hold the value you gave. We might call the `store` method as: ``` # use std::sync::atomic::{AtomicU32, Ordering}; # static COUNTER: AtomicU32 = AtomicU32::new(0); fn reset() { COUNTER.store(0, Ordering::SeqCst); } ``` The `Ordering` parameter can be ignored for the purpose of this discussion; it has to do with the [C11 memory model for atomic operations][atomics]. [atomics]: https://doc.rust-lang.org/nomicon/atomics.html But the fact that `AtomicU32::store` takes self by immutable reference **should feel deeply uncomfortable** under the beginner's mental model. Sure the mutation is done atomically, but how can it be correct that we mutate something under an immutable reference? Is this a typo in the standard library? If intentional, it certainly feels hacky, or even dangerous. How is this method safe? How is it not undefined behavior? For former C++ programmers it calls to mind certain abuses of `const_cast` in C++, where maybe the author was never really sure whether they were violating some esoteric language law that would break the behavior of the code later on, even if it currently appears to work. Certainly in C++ the atomic mutation methods like [`std::atomic<T>::store`] all act on mutable references only. Storing through a const reference to a C++ atomic won't compile, as one should expect. [`std::atomic<T>::store`]: https://en.cppreference.com/w/cpp/atomic/atomic/store ```cpp // C++ #include <atomic> void test(const std::atomic<unsigned>& val) { val.store(0); } ``` ```console test.cc:4:7: error: no matching member function for call to 'store' val.store(0); ~~~~^~~~~ /usr/include/c++/5.4.0/bits/atomic_base.h:367:7: note: candidate function not viable: no known conversion from 'const std::atomic<unsigned int>' to 'std::__atomic_base<unsigned int>' for object argument store(__int_type __i, memory_order __m = memory_order_seq_cst) noexcept ^ /usr/include/c++/5.4.0/bits/atomic_base.h:378:7: note: candidate function not viable: no known conversion from 'const std::atomic<unsigned int>' to 'volatile std::__atomic_base<unsigned int>' for object argument store(__int_type __i, ^ ``` Something is wrong. It turns out to be the beginner's understanding of what the Rust `&` and `&mut` reference types mean. <br> ## Better names `&T` is not an "immutable reference" or "const reference" to data of type `T` — it is a "shared reference". And `&mut T` is not a "mutable reference" — it is an "exclusive reference". An exclusive reference means that no other reference to the same value could possibly exist at the same time. A shared reference means that other references to the same value *might* exist, possibly on other threads (if `T` implements `Sync`) or the caller's stack frame on the current thread. Guaranteeing that exclusive references really are exclusive is one of the key roles of the Rust borrow checker. Let's stare at the signature of `AtomicU32::store` again. ``` # struct AtomicU32; # impl AtomicU32 { # const IGNORE: &'static str = stringify! { pub fn store(&self, val: u32, order: Ordering); # }; } ``` This time **it should feel totally natural** that this function takes the atomic u32 by shared reference. *Of course* this function is fine with other references to the same `AtomicU32` existing at the same time. *The whole point* of atomics is allowing concurrent loads and stores without inducing a data race. If the library refused to allow other references to exist during the call to `store`, there would hardly be a point to doing it atomically. The reason exclusive references always behave as mutable is because if no other code is looking at the same data, we won't cause a data race by mutating it care-free. A data race is when data is operated on from two or more places at the same time and at least one is mutating, producing unspecifiable results or memory unsafety. But via atomics or other forms of interior mutability discussed below, mutating through a shared reference can be safe too. Fully internalizing the terminology "shared reference" and "exclusive reference", learning to think in terms of them, is an important milestone in learning to make the most of Rust and its tremendous safety guarantees. <br> ## Pedagogy I don't think it is bad for `&` and `&mut` to be introduced at first as immutable vs mutable references. The learning curve is difficult enough without frontloading the content of this article. As far as a beginner would be concerned, ability to mutate will be the most significant practical difference between the two reference types. What I would like to accomplish with this page is to establish that shifting from the "immutable reference"/"mutable reference" mental model to the "shared reference"/"exclusive reference" mental model is a necessary step that learners should be encouraged to take at the right time, and for this page to help them take it. A good time to link someone to this page is when they are first surprised or confused by some library function taking `&` when they would expect it to require `&mut`. After someone has internalized references as being about shared vs exclusive access, I think it is fine to continue saying "mutable reference" as a convenience since the keyword is `mut` after all; just keep in mind that data behind a shared reference *may also* be mutable sometimes. On the other hand for shared references I would recommend to always think and say "shared reference" rather than "immutable reference" or "const reference". <br> ## Addendum: interior mutability The term for safe APIs that support mutation through a shared reference in Rust is "interior mutability". I used `AtomicU32` as an example above because I find that it evokes the most striking rift between deeply-uncomfortable and totally-natural as you shift from the beginner's mental model to the correct one. While atomics are an important building block for multithreaded code, interior mutability is equally relevant on a single thread as well. The standard library type [`UnsafeCell<T>`] is *the only* way to hold data that is mutable through a shared reference. This is an unsafe low-level building block that we would almost never use directly. All other interior mutability is built as safe abstractions around an `UnsafeCell`, with a variety of properties and requirements as appropriate to different use cases. (Fundamentally Rust is a language for building safe abstractions, and this is one of the areas where that is most apparent.) [`UnsafeCell<T>`]: https://doc.rust-lang.org/std/cell/struct.UnsafeCell.html Beyond atomics, other safe abstractions in the standard library built on interior mutability include: - [`Cell<T>`] — we can perform mutation even when other references to the same `Cell<T>` may exist, and it's safe because the API enforces: - it's impossible for more than one thread to hold references to the same `Cell<T>` at a time because `Cell<T>` does not implement the `Sync` trait, i.e. `Cell<T>` is single threaded; - and it's impossible to obtain a reference to the contents within the `Cell<T>`, as such references could be invalidated by a mutation; instead all access is done by copying data out of the cell. - [`RefCell<T>`] — we can perform mutation even when other references to the same `RefCell<T>` may exist, and it's safe because the API enforces: - `RefCell<T>` is single threaded so it's impossible for multiple threads to refer to the same one, similar to `Cell<T>`; - and within the one thread, dynamically checked borrow rules will detect and prevent attempts to mutate while a reader is holding a reference into the content of the `RefCell`. - [`Mutex<T>`] — we can perform mutation even when other references to the same `Mutex<T>` may exist, and it's safe because the API enforces: - only one of the references may operate on the inner `T` at a time, whether reading or writing; other accesses will block until the current one has released its lock. - [`RwLock<T>`] — we can perform mutation even when other references to the same `RwLock<T>` may exist, and it's safe because the API enforces: - only one of the references may be used to mutate the `T` at a time, and only while no other references are being used for reading; accesses will block to meet these requirements. [`Cell<T>`]: https://doc.rust-lang.org/std/cell/struct.Cell.html [`RefCell<T>`]: https://doc.rust-lang.org/std/cell/struct.RefCell.html [`Mutex<T>`]: https://doc.rust-lang.org/std/sync/struct.Mutex.html [`RwLock<T>`]: https://doc.rust-lang.org/std/sync/struct.RwLock.html */ #[macro_export] macro_rules! _02__reference_types { ({ date: "October 1, 2019", author: "David Tolnay", }) => {}; }