[−][src]Macro dtolnay::_02__reference_types
Accurate mental model for Rust's reference types
by David Tolnay , 2019.10.01
Rust's ownership and borrowing system involves the use of
references to operate on borrowed data, and the type system distinguishes two
different fundamental reference types. In code they are spelled &T
and
&mut T
.
&mut T
is commonly known as a "mutable reference" to data of type T
. By
juxtaposition, &T
is then an "immutable reference" or "const reference" to
T
. These names are fine and reasonably intuitive for Rust beginners, but this
article lays out the motivation for preferring the names "shared reference" and
"exclusive reference" as you grow beyond the beginner stage and get into library
design and some more advanced aspects of the language.
The beginner's understanding
As described in the References and Borrowing chapter of the Rust Book, a function that takes an argument by immutable reference is allowed to read the data behind the reference:
struct Point { x: u32, y: u32, } fn print_point(pt: &Point) { println!("x={} y={}", pt.x, pt.y); }
but is not allowed to mutate that data:
fn embiggen_x(pt: &Point) { pt.x = pt.x * 2; }
error[E0594]: cannot assign to `pt.x` which is behind a `&` reference
--> src/main.rs
|
1 | fn embiggen_x(pt: &Point) {
| ------ help: consider changing this to be a mutable reference: `&mut Point`
2 | pt.x = pt.x * 2;
| ^^^^^^^^^^^^^^^ `pt` is a `&` reference, so the data it refers to cannot be written
In order to mutate fields of a struct, or call mutating methods such as
appending to a vector, the argument must be taken by &mut
reference.
fn embiggen_x(pt: &mut Point) { pt.x = pt.x * 2; // okay }
This distinction, and the terminology of "immutable reference" and "mutable reference", is typically adequate for writing one's first few toy programs with Rust.
It falls apart
Sooner or later you will encounter a library signature that flatly contradicts
the beginner's mental model of Rust references. Let's take a look at the store
method of AtomicU32
from the standard library as one example of this. The
signature is:
impl AtomicU32 { pub fn store(&self, val: u32, order: Ordering); }
You give it a u32 value, and it atomically changes the number inside the
AtomicU32
to hold the value you gave. We might call the store
method as:
static COUNTER: AtomicU32 = AtomicU32::new(0); fn reset() { COUNTER.store(0, Ordering::SeqCst); }
The Ordering
parameter can be ignored for the purpose of this discussion; it
has to do with the C11 memory model for atomic operations.
But the fact that AtomicU32::store
takes self by immutable reference should
feel deeply uncomfortable under the beginner's mental model. Sure the mutation
is done atomically, but how can it be correct that we mutate something under an
immutable reference? Is this a typo in the standard library? If intentional, it
certainly feels hacky, or even dangerous. How is this method safe? How is it not
undefined behavior?
For former C++ programmers it calls to mind certain abuses of const_cast
in
C++, where maybe the author was never really sure whether they were violating
some esoteric language law that would break the behavior of the code later on,
even if it currently appears to work.
Certainly in C++ the atomic mutation methods like std::atomic<T>::store
all
act on mutable references only. Storing through a const reference to a C++
atomic won't compile, as one should expect.
// C++
#include <atomic>
void test(const std::atomic<unsigned>& val) {
val.store(0);
}
test.cc:4:7: error: no matching member function for call to 'store'
val.store(0);
~~~~^~~~~
/usr/include/c++/5.4.0/bits/atomic_base.h:367:7: note: candidate function not viable: no known conversion from 'const std::atomic<unsigned int>' to 'std::__atomic_base<unsigned int>' for object argument
store(__int_type __i, memory_order __m = memory_order_seq_cst) noexcept
^
/usr/include/c++/5.4.0/bits/atomic_base.h:378:7: note: candidate function not viable: no known conversion from 'const std::atomic<unsigned int>' to 'volatile std::__atomic_base<unsigned int>' for object argument
store(__int_type __i,
^
Something is wrong. It turns out to be the beginner's understanding of what the
Rust &
and &mut
reference types mean.
Better names
&T
is not an "immutable reference" or "const reference" to data of type T
— it is a "shared reference". And &mut T
is not a "mutable reference"
— it is an "exclusive reference".
An exclusive reference means that no other reference to the same value could
possibly exist at the same time. A shared reference means that other references
to the same value might exist, possibly on other threads (if T
implements
Sync
) or the caller's stack frame on the current thread. Guaranteeing that
exclusive references really are exclusive is one of the key roles of the Rust
borrow checker.
Let's stare at the signature of AtomicU32::store
again.
impl AtomicU32 { pub fn store(&self, val: u32, order: Ordering); }
This time it should feel totally natural that this function takes the atomic
u32 by shared reference. Of course this function is fine with other references
to the same AtomicU32
existing at the same time. The whole point of atomics
is allowing concurrent loads and stores without inducing a data race. If the
library refused to allow other references to exist during the call to store
,
there would hardly be a point to doing it atomically.
The reason exclusive references always behave as mutable is because if no other code is looking at the same data, we won't cause a data race by mutating it care-free. A data race is when data is operated on from two or more places at the same time and at least one is mutating, producing unspecifiable results or memory unsafety. But via atomics or other forms of interior mutability discussed below, mutating through a shared reference can be safe too.
Fully internalizing the terminology "shared reference" and "exclusive reference", learning to think in terms of them, is an important milestone in learning to make the most of Rust and its tremendous safety guarantees.
Pedagogy
I don't think it is bad for &
and &mut
to be introduced at first as
immutable vs mutable references. The learning curve is difficult enough without
frontloading the content of this article. As far as a beginner would be
concerned, ability to mutate will be the most significant practical difference
between the two reference types.
What I would like to accomplish with this page is to establish that shifting
from the "immutable reference"/"mutable reference" mental model to the "shared
reference"/"exclusive reference" mental model is a necessary step that learners
should be encouraged to take at the right time, and for this page to help them
take it. A good time to link someone to this page is when they are first
surprised or confused by some library function taking &
when they would expect
it to require &mut
.
After someone has internalized references as being about shared vs exclusive
access, I think it is fine to continue saying "mutable reference" as a
convenience since the keyword is mut
after all; just keep in mind that data
behind a shared reference may also be mutable sometimes. On the other hand for
shared references I would recommend to always think and say "shared reference"
rather than "immutable reference" or "const reference".
Addendum: interior mutability
The term for safe APIs that support mutation through a shared reference in Rust is "interior mutability".
I used AtomicU32
as an example above because I find that it evokes the most
striking rift between deeply-uncomfortable and totally-natural as you shift from
the beginner's mental model to the correct one. While atomics are an important
building block for multithreaded code, interior mutability is equally relevant
on a single thread as well.
The standard library type UnsafeCell<T>
is the only way to hold data that
is mutable through a shared reference. This is an unsafe low-level building
block that we would almost never use directly. All other interior mutability is
built as safe abstractions around an UnsafeCell
, with a variety of
properties and requirements as appropriate to different use cases.
(Fundamentally Rust is a language for building safe abstractions, and this is
one of the areas where that is most apparent.)
Beyond atomics, other safe abstractions in the standard library built on interior mutability include:
-
Cell<T>
— we can perform mutation even when other references to the sameCell<T>
may exist, and it's safe because the API enforces:-
it's impossible for more than one thread to hold references to the same
Cell<T>
at a time becauseCell<T>
does not implement theSync
trait, i.e.Cell<T>
is single threaded; -
and it's impossible to obtain a reference to the contents within the
Cell<T>
, as such references could be invalidated by a mutation; instead all access is done by copying data out of the cell.
-
-
RefCell<T>
— we can perform mutation even when other references to the sameRefCell<T>
may exist, and it's safe because the API enforces:-
RefCell<T>
is single threaded so it's impossible for multiple threads to refer to the same one, similar toCell<T>
; -
and within the one thread, dynamically checked borrow rules will detect and prevent attempts to mutate while a reader is holding a reference into the content of the
RefCell
.
-
-
Mutex<T>
— we can perform mutation even when other references to the sameMutex<T>
may exist, and it's safe because the API enforces:- only one of the references may operate on the inner
T
at a time, whether reading or writing; other accesses will block until the current one has released its lock.
- only one of the references may operate on the inner
-
RwLock<T>
— we can perform mutation even when other references to the sameRwLock<T>
may exist, and it's safe because the API enforces:- only one of the references may be used to mutate the
T
at a time, and only while no other references are being used for reading; accesses will block to meet these requirements.
- only one of the references may be used to mutate the