atomic_memcpy/lib.rs
1/*!
2<!-- tidy:crate-doc:start -->
3Byte-wise atomic memcpy.
4
5This is an attempt to implement equivalent of C++ ["P1478: Byte-wise atomic memcpy"][p1478] in Rust.
6
7This is expected to allow algorithms such as Seqlock and Chase-Lev deque to be implemented without UB of data races.
8See [P1478][p1478] for more.
9
10## Status
11
12- If the alignment of the type being copied is the same as the pointer width, `atomic_load` is possible to produce an assembly roughly equivalent to the case of using volatile read + atomic fence on many platforms. (e.g., [aarch64](https://github.com/taiki-e/atomic-memcpy/blob/HEAD/tests/asm-test/asm/aarch64-unknown-linux-gnu/atomic_memcpy_load_align8), [riscv64](https://github.com/taiki-e/atomic-memcpy/blob/HEAD/tests/asm-test/asm/riscv64gc-unknown-linux-gnu/atomic_memcpy_load_align8). See [`tests/asm-test/asm`][asm-test] directory for more).
13- If the alignment of the type being copied is smaller than the pointer width, there will be some performance degradation. However, it is implemented in such a way that it does not cause extreme performance degradation at least on x86_64. (See [the implementation comments of `atomic_load`][implementation] for more.) It is possible that there is still room for improvement, especially on non-x86_64 platforms.
14- Optimization for the case where the alignment of the type being copied is larger than the pointer width has not yet been fully investigated. It is possible that there is still room for improvement.
15- If the type being copied contains pointers it is not compatible with strict provenance because the copy does ptr-to-int transmutes.
16- If the type being copied contains uninitialized bytes (e.g., padding) [it is undefined behavior because the copy goes through integers][undefined-behavior]. This problem will probably not be resolved until something like `AtomicMaybeUninit` is supported.
17
18## Related Projects
19
20- [portable-atomic]: Portable atomic types including support for 128-bit atomics, atomic float, etc. Using byte-wise atomic memcpy to implement Seqlock, which is used in the fallback implementation.
21- [atomic-maybe-uninit]: Atomic operations on potentially uninitialized integers.
22
23[asm-test]: https://github.com/taiki-e/atomic-memcpy/tree/HEAD/tests/asm-test/asm
24[atomic-maybe-uninit]: https://github.com/taiki-e/atomic-maybe-uninit
25[implementation]: https://github.com/taiki-e/atomic-memcpy/blob/v0.2.0/src/lib.rs#L367-L427
26[p1478]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1478r7.html
27[portable-atomic]: https://github.com/taiki-e/portable-atomic
28[undefined-behavior]: https://doc.rust-lang.org/reference/behavior-considered-undefined.html
29
30<!-- tidy:crate-doc:end -->
31*/
32
33#![no_std]
34#![doc(test(
35 no_crate_inject,
36 attr(
37 deny(warnings, rust_2018_idioms, single_use_lifetimes),
38 allow(dead_code, unused_variables)
39 )
40))]
41#![warn(
42 missing_debug_implementations,
43 missing_docs,
44 rust_2018_idioms,
45 single_use_lifetimes,
46 unreachable_pub
47)]
48#![cfg_attr(test, warn(unsafe_op_in_unsafe_fn))] // unsafe_op_in_unsafe_fn requires Rust 1.52
49#![cfg_attr(not(test), allow(unused_unsafe))]
50#![warn(
51 clippy::pedantic,
52 // lints for public library
53 clippy::alloc_instead_of_core,
54 clippy::exhaustive_enums,
55 clippy::exhaustive_structs,
56 clippy::std_instead_of_alloc,
57 clippy::std_instead_of_core,
58 // lints that help writing unsafe code
59 clippy::as_ptr_cast_mut,
60 clippy::default_union_representation,
61 clippy::trailing_empty_array,
62 clippy::transmute_undefined_repr,
63 clippy::undocumented_unsafe_blocks,
64 // misc
65 clippy::missing_inline_in_public_items,
66)]
67#![allow(clippy::doc_markdown, clippy::inline_always, clippy::single_match_else)]
68
69// This crate should work on targets with power-of-two pointer widths,
70// but it is not clear how it will work on targets without them.
71// There are currently no 8-bit, 128-bit, or higher builtin targets.
72// Note that Rust (and C99) pointers must be at least 16-bits: https://github.com/rust-lang/rust/pull/49305
73#[cfg(not(any(
74 target_pointer_width = "16",
75 target_pointer_width = "32",
76 target_pointer_width = "64",
77)))]
78compile_error!(
79 "atomic-memcpy currently only supports targets with {16,32,64}-bit pointer width; \
80 if you need support for others, \
81 please submit an issue at <https://github.com/taiki-e/atomic-memcpy>"
82);
83
84#[cfg(not(target_os = "none"))]
85use core::sync::atomic;
86use core::sync::atomic::Ordering;
87
88#[cfg(target_os = "none")]
89use portable_atomic as atomic;
90
91/// Byte-wise atomic load.
92///
93/// # Safety
94///
95/// Behavior is undefined if any of the following conditions are violated:
96///
97/// - `src` must be valid for reads.
98/// - `src` must be properly aligned.
99/// - `src` must go through [`UnsafeCell::get`](core::cell::UnsafeCell::get).
100/// - `T` must not contain uninitialized bytes.
101/// - There are no concurrent non-atomic write operations.
102/// - There are no concurrent atomic write operations of different
103/// granularity. The granularity of atomic operations is an implementation
104/// detail, so the concurrent write operation that can always
105/// safely be used is only [`atomic_store`].
106///
107/// Like [`ptr::read`](core::ptr::read), `atomic_load` creates a bitwise copy of `T`, regardless of
108/// whether `T` is [`Copy`]. If `T` is not [`Copy`], using both the returned
109/// value and the value at `*src` can [violate memory safety][read-ownership].
110///
111/// Note that even if `T` has size `0`, the pointer must be non-null.
112///
113/// ## Returned value
114///
115/// This function returns [`MaybeUninit<T>`](core::mem::MaybeUninit) instead of `T`.
116///
117/// - All bits in the returned value are guaranteed to be copied from `src`.
118/// - There is *no* guarantee that all bits in the return have been copied at
119/// the same time, so if `src` is updated by a concurrent write operation,
120/// it is up to the caller to make sure that the returned value is valid as `T`.
121///
122/// [read-ownership]: core::ptr::read#ownership-of-the-returned-value
123/// [valid]: core::ptr#safety
124///
125/// # Panics
126///
127/// Panics if `order` is [`Release`](Ordering::Release) or [`AcqRel`](Ordering::AcqRel).
128///
129/// # Examples
130///
131/// ```rust
132/// use std::{cell::UnsafeCell, sync::atomic::Ordering};
133///
134/// let v = UnsafeCell::new([0_u8; 64]);
135/// let result = unsafe { atomic_memcpy::atomic_load(v.get(), Ordering::Acquire) };
136/// // SAFETY: there was no concurrent write operations during load.
137/// assert_eq!(unsafe { result.assume_init() }, [0; 64]);
138/// ```
139#[cfg_attr(feature = "inline-always", inline(always))]
140#[cfg_attr(not(feature = "inline-always"), inline)]
141pub unsafe fn atomic_load<T>(src: *const T, order: Ordering) -> core::mem::MaybeUninit<T> {
142 assert_load_ordering(order);
143 // SAFETY: the caller must uphold the safety contract for `atomic_load`.
144 let val = unsafe { imp::atomic_load(src) };
145 match order {
146 Ordering::Relaxed => { /* no-op */ }
147 _ => atomic::fence(order),
148 }
149 val
150}
151
152/// Byte-wise atomic store.
153///
154/// # Safety
155///
156/// Behavior is undefined if any of the following conditions are violated:
157///
158/// - `dst` must be [valid] for writes.
159/// - `dst` must be properly aligned.
160/// - `dst` must go through [`UnsafeCell::get`](core::cell::UnsafeCell::get).
161/// - `T` must not contain uninitialized bytes.
162/// - There are no concurrent non-atomic operations.
163/// - There are no concurrent atomic operations of different
164/// granularity. The granularity of atomic operations is an implementation
165/// detail, so the concurrent operation that can always
166/// safely be used is only [`atomic_load`].
167///
168/// If there are concurrent write operations, the resulting value at `*dst` may
169/// contain a mixture of bytes written by this thread and bytes written by
170/// another thread. If `T` is not valid for all bit patterns, using the value at
171/// `*dst` can violate memory safety.
172///
173/// Note that even if `T` has size `0`, the pointer must be non-null.
174///
175/// [valid]: core::ptr#safety
176///
177/// # Panics
178///
179/// Panics if `order` is [`Acquire`](Ordering::Acquire) or [`AcqRel`](Ordering::AcqRel).
180///
181/// # Examples
182///
183/// ```rust
184/// use std::{cell::UnsafeCell, sync::atomic::Ordering};
185///
186/// let v = UnsafeCell::new([0_u8; 64]);
187/// unsafe {
188/// atomic_memcpy::atomic_store(v.get(), [1; 64], Ordering::Release);
189/// }
190/// let result = unsafe { atomic_memcpy::atomic_load(v.get(), Ordering::Acquire) };
191/// // SAFETY: there was no concurrent write operations during load.
192/// assert_eq!(unsafe { result.assume_init() }, [1; 64]);
193/// ```
194#[cfg_attr(feature = "inline-always", inline(always))]
195#[cfg_attr(not(feature = "inline-always"), inline)]
196pub unsafe fn atomic_store<T>(dst: *mut T, val: T, order: Ordering) {
197 assert_store_ordering(order);
198 match order {
199 Ordering::Relaxed => { /* no-op */ }
200 _ => atomic::fence(order),
201 }
202 // SAFETY: the caller must uphold the safety contract for `atomic_store`.
203 unsafe {
204 imp::atomic_store(dst, val);
205 }
206}
207
208// https://github.com/rust-lang/rust/blob/1.70.0/library/core/src/sync/atomic.rs#L3155
209#[cfg_attr(feature = "inline-always", inline(always))]
210#[cfg_attr(not(feature = "inline-always"), inline)]
211fn assert_load_ordering(order: Ordering) {
212 match order {
213 Ordering::Acquire | Ordering::Relaxed | Ordering::SeqCst => {}
214 Ordering::Release => panic!("there is no such thing as a release load"),
215 Ordering::AcqRel => panic!("there is no such thing as an acquire-release load"),
216 _ => unreachable!("{:?}", order),
217 }
218}
219
220// https://github.com/rust-lang/rust/blob/1.70.0/library/core/src/sync/atomic.rs#L3140
221#[cfg_attr(feature = "inline-always", inline(always))]
222#[cfg_attr(not(feature = "inline-always"), inline)]
223fn assert_store_ordering(order: Ordering) {
224 match order {
225 Ordering::Release | Ordering::Relaxed | Ordering::SeqCst => {}
226 Ordering::Acquire => panic!("there is no such thing as an acquire store"),
227 Ordering::AcqRel => panic!("there is no such thing as an acquire-release store"),
228 _ => unreachable!("{:?}", order),
229 }
230}
231
232mod imp {
233 use core::{
234 mem::{self, ManuallyDrop, MaybeUninit},
235 ops::Range,
236 };
237
238 #[cfg(not(target_pointer_width = "16"))]
239 use crate::atomic::AtomicU32;
240 use crate::atomic::{AtomicU16, AtomicUsize, Ordering};
241
242 // Boundary to make the fields of LoadState private.
243 //
244 // Note that this is not a complete safe/unsafe boundary[1], since it is still
245 // possible to pass an invalid pointer to the constructor.
246 //
247 // [1]: https://www.ralfj.de/blog/2016/01/09/the-scope-of-unsafe.html
248 mod load {
249 use core::mem;
250
251 use crate::atomic::{AtomicU8, AtomicUsize, Ordering};
252
253 // Invariant: `src` and `result` will never change.
254 // Invariant: Only the `advance` method can advance offset and counter.
255 pub(super) struct LoadState {
256 src: *const u8,
257 // Note: This is a pointer from MaybeUninit.
258 result: *mut u8,
259 /// Counter to track remaining bytes in `T`.
260 remaining: usize,
261 offset: usize,
262 }
263
264 impl LoadState {
265 #[cfg_attr(feature = "inline-always", inline(always))]
266 #[cfg_attr(not(feature = "inline-always"), inline)]
267 pub(super) fn new<T>(result: *mut T, src: *const T) -> Self {
268 Self {
269 src: src as *const u8,
270 result: result as *mut u8,
271 remaining: mem::size_of::<T>(),
272 offset: 0,
273 }
274 }
275
276 /// Advances pointers by `size` **bytes**.
277 ///
278 /// # Safety
279 ///
280 /// - The remaining bytes must be greater than or equal to `size`.
281 /// - The range of `self.dst..self.dst.add(size)` must be filled.
282 #[cfg_attr(feature = "inline-always", inline(always))]
283 #[cfg_attr(not(feature = "inline-always"), inline)]
284 unsafe fn advance(&mut self, size: usize) {
285 debug_assert!(self.remaining >= size);
286 self.remaining -= size;
287 self.offset += size;
288 }
289
290 #[cfg_attr(feature = "inline-always", inline(always))]
291 #[cfg_attr(not(feature = "inline-always"), inline)]
292 pub(super) fn remaining(&self) -> usize {
293 self.remaining
294 }
295
296 #[cfg_attr(feature = "inline-always", inline(always))]
297 #[cfg_attr(not(feature = "inline-always"), inline)]
298 unsafe fn src<T>(&self) -> &T {
299 // SAFETY: the caller must uphold the safety contract.
300 unsafe { &*(self.src.add(self.offset) as *const T) }
301 }
302
303 #[cfg_attr(feature = "inline-always", inline(always))]
304 #[cfg_attr(not(feature = "inline-always"), inline)]
305 unsafe fn result<T>(&self) -> *mut T {
306 // SAFETY: the caller must uphold the safety contract.
307 unsafe { self.result.add(self.offset) as *mut T }
308 }
309
310 #[cfg_attr(feature = "inline-always", inline(always))]
311 #[cfg_attr(not(feature = "inline-always"), inline)]
312 pub(super) fn atomic_load_u8(&mut self, count: usize) {
313 // This condition is also checked by the caller, so the compiler
314 // will remove this assertion by optimization.
315 assert!(self.remaining() >= count);
316 for _ in 0..count {
317 // SAFETY:
318 // - we've checked that the remaining bytes is greater than or equal to `count`
319 // Therefore, due to `LoadState`'s invariant:
320 // - `src` is valid to atomic read of `count` of u8.
321 // - `result` is valid to write of `count` of u8.
322 unsafe {
323 let val = self.src::<AtomicU8>().load(Ordering::Relaxed);
324 self.result::<u8>().write(val);
325 // SAFETY: we've filled 1 byte.
326 self.advance(1);
327 }
328 }
329 }
330
331 /// Note: The remaining bytes smaller than usize are ignored.
332 ///
333 /// # Safety
334 ///
335 /// - `self.src` must be properly aligned for `usize`.
336 ///
337 /// There is no alignment requirement for `self.result`.
338 #[cfg_attr(feature = "inline-always", inline(always))]
339 #[cfg_attr(not(feature = "inline-always"), inline)]
340 pub(super) unsafe fn atomic_load_usize_to_end(&mut self) {
341 while self.remaining() >= mem::size_of::<usize>() {
342 // SAFETY:
343 // - the caller must guarantee that `src` is properly aligned for `usize`.
344 // - we've checked that the remaining bytes is greater than
345 // or equal to `size_of::<usize>()`.
346 // Therefore, due to `LoadState`'s invariant:
347 // - `src` is valid to atomic read of `usize`.
348 // - `result` is valid to *unaligned* write of `usize`.
349 unsafe {
350 let val = self.src::<AtomicUsize>().load(Ordering::Relaxed);
351 self.result::<usize>().write_unaligned(val);
352 // SAFETY: we've filled `size_of::<usize>()` bytes.
353 self.advance(mem::size_of::<usize>());
354 }
355 }
356 }
357 }
358 }
359
360 /// Byte-wise atomic load.
361 ///
362 /// # Safety
363 ///
364 /// See the documentation of [crate root's `atomic_load`](crate::atomic_load) for safety requirements.
365 /**
366 # Implementation
367
368 It is implemented based on the assumption that atomic operations at a
369 granularity greater than bytes is not a problem, as stated by [p1478].
370
371 > Note that on standard hardware, it should be OK to actually perform the
372 > copy at larger than byte granularity. Copying multiple bytes as part of
373 > one operation is indistinguishable from running them so quickly that the
374 > intermediate state is not observed. In fact, we expect that existing
375 > assembly memcpy implementations will suffice when suffixed with the required fence.
376
377 And it turns out that the granularity of the atomic operations is very important for performance.
378
379 - Loading/storing all bytes in bytes is very slow at least on x86/x86_64.
380 - The pointer width atomic operation is the fastest at least on x86/x86_64.
381 - Atomic operations with a granularity larger than the pointer width are slow
382 at least on x86/x86_64 (cmpxchg8b/cmpxchg16b).
383
384 Note that the following additional safety requirements.
385
386 - The granularity of the atomic operations in load and store must be the same.
387 - When performing an atomic operation as a type with alignment greater than 1,
388 the pointer must be properly aligned.
389
390 The caller of `atomic_load` guarantees that the `src` is properly aligned.
391 So, we can avoid calling `align_offset` or read at a granularity greater
392 than u8 in some cases.
393
394 The following is what this implementation is currently `atomic_load` using
395 (Note: `atomic_store` also uses exactly the same way to determine the
396 granularity of atomic operations):
397
398 Branch | Granularity of atomic operations | Conditions
399 ------ | -------------------------------- | ----------
400 1 | u8 ..., usize ..., u8 ... | `size_of::<T>() >= size_of::<usize>() * 4`, `align_of::<T>() < align_of::<AtomicUsize>()`
401 2 | usize ... | `align_of::<T>() >= align_of::<AtomicUsize>()`
402 3 | u32 ... | `align_of::<T>() >= align_of::<AtomicU32>()`, 64-bit or higher
403 4 | u16 ... | `align_of::<T>() >= align_of::<AtomicU16>()`, 32-bit or higher
404 5 | u8 ... |
405
406 - Branch 1: If the alignment of `T` is less than usize, but `T` can be read
407 as at least a few numbers of usize, compute the align offset and read it
408 like `(&[AtomicU8], &[AtomicUsize], &[AtomicU8])`.
409 - Branch 2: If the alignment of `T` is greater than or equal to usize, we
410 can read it as a chunk of usize from the first byte.
411 - Branch 3, 4: If the alignment of `T` is greater than 1, we can read it as
412 a chunk of smaller integers (u32 or u16). This is basically the same
413 strategy as Branch 2.
414 - Branch 5: Otherwise, we read it per byte.
415
416 Note that only Branch 1 requires to compute align offset dynamically.
417 Note that which branch is chosen is evaluated at compile time.
418
419 - The fastest is Branch 2, which can read all bytes as a chunk of usize.
420 - If the size of `T` is not too small, Branch 1 is the next fastest to Branch 2.
421 - If the size of `T` is small, Branch 3/4/5 can be faster than Branch 1.
422
423 Whether to choose Branch 1 or Branch 3/4/5 when `T` is small is currently
424 based on a rough heuristic based on simple benchmarks on x86_64.
425
426 [p1478]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1478r7.html
427 */
428 #[cfg_attr(feature = "inline-always", inline(always))]
429 #[cfg_attr(not(feature = "inline-always"), inline)]
430 pub(crate) unsafe fn atomic_load<T>(src: *const T) -> MaybeUninit<T> {
431 // Safety requirements guaranteed by the caller:
432 // - `src` is valid for atomic reads.
433 // - `src` is properly aligned for `T`.
434 // - `src` go through `UnsafeCell::get`.
435 // - `T` does not contain uninitialized bytes.
436 // - there are no concurrent non-atomic write operations.
437 // - there are no concurrent atomic write operations of different granularity.
438 // Note that the safety of the code in this function relies on these guarantees,
439 // whether or not they are explicitly mentioned in the each safety comment.
440 debug_assert!(!src.is_null());
441 debug_assert!(src as usize % mem::align_of::<T>() == 0);
442
443 let mut result = MaybeUninit::<T>::uninit();
444
445 if mem::size_of::<T>() == 0 {
446 return result;
447 }
448
449 // Branch 1: If the alignment of `T` is less than usize, but `T` can be read as
450 // at least one or more usize, compute the align offset and read it
451 // like `(&[AtomicU8], &[AtomicUsize], &[AtomicU8])`.
452 if mem::align_of::<T>() < mem::align_of::<AtomicUsize>()
453 && mem::size_of::<T>() >= mem::size_of::<usize>() * 4
454 {
455 let mut state = load::LoadState::new(result.as_mut_ptr(), src);
456 let offset = (src as *const u8).align_offset(mem::align_of::<AtomicUsize>());
457 // Note: align_offset may returns usize::MAX: https://github.com/rust-lang/rust/issues/62420
458 if state.remaining() >= offset {
459 // Load `offset` bytes per byte to align `state.src`.
460 state.atomic_load_u8(offset);
461 debug_assert!(state.remaining() >= mem::size_of::<usize>());
462 // SAFETY:
463 // - align_offset succeeds and the `offset` bytes have been
464 // filled, so now `state.src` is definitely aligned.
465 // - we've checked that the remaining bytes is greater than
466 // or equal to `size_of::<usize>()`.
467 //
468 // In this branch, the pointer to `state.result` is usually
469 // not properly aligned, so we use `atomic_load_usize_to_end`,
470 // which has no requirement for alignment of `state.result`.
471 unsafe { state.atomic_load_usize_to_end() }
472 // Load remaining bytes per byte.
473 state.atomic_load_u8(state.remaining());
474 debug_assert_eq!(state.remaining(), 0);
475 return result;
476 }
477 }
478
479 // Branch 2: If the alignment of `T` is greater than or equal to usize,
480 // we can read it as a chunk of usize from the first byte.
481 if mem::align_of::<T>() >= mem::align_of::<AtomicUsize>() {
482 let src = src as *const AtomicUsize;
483 let dst = result.as_mut_ptr() as *mut usize;
484 for i in range(0..mem::size_of::<T>() / mem::size_of::<usize>()) {
485 // SAFETY:
486 // - the caller must guarantee that `src` is properly aligned for `T`.
487 // - `T` has an alignment greater than or equal to usize.
488 // - the remaining bytes is greater than or equal to `size_of::<usize>()`.
489 unsafe {
490 let val: usize = (*src.add(i)).load(Ordering::Relaxed);
491 dst.add(i).write(val);
492 }
493 }
494 return result;
495 }
496
497 #[cfg(not(target_pointer_width = "16"))]
498 {
499 // Branch 3: If the alignment of `T` is greater than or equal to u32,
500 // we can read it as a chunk of u32 from the first byte.
501 if mem::size_of::<usize>() > 4 && mem::align_of::<T>() >= mem::align_of::<AtomicU32>() {
502 let src = src as *const AtomicU32;
503 let dst = result.as_mut_ptr() as *mut u32;
504 for i in range(0..mem::size_of::<T>() / mem::size_of::<u32>()) {
505 // SAFETY:
506 // - the caller must guarantee that `src` is properly aligned for `T`.
507 // - `T` has an alignment greater than or equal to u32.
508 // - the remaining bytes is greater than or equal to `size_of::<u32>()`.
509 unsafe {
510 let val: u32 = (*src.add(i)).load(Ordering::Relaxed);
511 dst.add(i).write(val);
512 }
513 }
514 return result;
515 }
516 }
517
518 // Branch 4: If the alignment of `T` is greater than or equal to u16,
519 // we can read it as a chunk of u16 from the first byte.
520 if mem::size_of::<usize>() > 2 && mem::align_of::<T>() >= mem::align_of::<AtomicU16>() {
521 let src = src as *const AtomicU16;
522 let dst = result.as_mut_ptr() as *mut u16;
523 for i in range(0..mem::size_of::<T>() / mem::size_of::<u16>()) {
524 // SAFETY:
525 // - the caller must guarantee that `src` is properly aligned for `T`.
526 // - `T` has an alignment greater than or equal to u16.
527 // - the remaining bytes is greater than or equal to `size_of::<u16>()`.
528 unsafe {
529 let val: u16 = (*src.add(i)).load(Ordering::Relaxed);
530 dst.add(i).write(val);
531 }
532 }
533 return result;
534 }
535
536 // Branch 5: Otherwise, we read it per byte.
537 let mut state = load::LoadState::new(result.as_mut_ptr(), src);
538 state.atomic_load_u8(state.remaining());
539 debug_assert_eq!(state.remaining(), 0);
540 result
541 }
542
543 // Boundary to make the fields of StoreState private.
544 //
545 // Note that this is not a complete safe/unsafe boundary, since it is still
546 // possible to pass an invalid pointer to the constructor.
547 mod store {
548 use core::mem;
549
550 use crate::atomic::{AtomicU8, AtomicUsize, Ordering};
551
552 // Invariant: `src` and `dst` will never change.
553 // Invariant: Only the `advance` method can advance offset and counter.
554 pub(super) struct StoreState {
555 src: *const u8,
556 dst: *const u8,
557 /// Number of remaining bytes in `T`.
558 remaining: usize,
559 offset: usize,
560 }
561
562 impl StoreState {
563 #[cfg_attr(feature = "inline-always", inline(always))]
564 #[cfg_attr(not(feature = "inline-always"), inline)]
565 pub(super) fn new<T>(dst: *mut T, src: *const T) -> Self {
566 Self {
567 src: src as *const u8,
568 dst: dst as *mut u8 as *const u8,
569 remaining: mem::size_of::<T>(),
570 offset: 0,
571 }
572 }
573
574 /// Advances pointers by `size` **bytes**.
575 ///
576 /// # Safety
577 ///
578 /// - The remaining bytes must be greater than or equal to `size`.
579 /// - The range of `self.dst..self.dst.add(size)` must be filled.
580 #[cfg_attr(feature = "inline-always", inline(always))]
581 #[cfg_attr(not(feature = "inline-always"), inline)]
582 unsafe fn advance(&mut self, size: usize) {
583 debug_assert!(self.remaining >= size);
584 self.remaining -= size;
585 self.offset += size;
586 }
587
588 #[cfg_attr(feature = "inline-always", inline(always))]
589 #[cfg_attr(not(feature = "inline-always"), inline)]
590 pub(super) fn remaining(&self) -> usize {
591 self.remaining
592 }
593
594 #[cfg_attr(feature = "inline-always", inline(always))]
595 #[cfg_attr(not(feature = "inline-always"), inline)]
596 unsafe fn src<T>(&self) -> *const T {
597 // SAFETY: the caller must uphold the safety contract.
598 unsafe { self.src.add(self.offset) as *const T }
599 }
600
601 #[cfg_attr(feature = "inline-always", inline(always))]
602 #[cfg_attr(not(feature = "inline-always"), inline)]
603 unsafe fn dst<T>(&self) -> &T {
604 // SAFETY: the caller must uphold the safety contract.
605 unsafe { &*(self.dst.add(self.offset) as *const T) }
606 }
607
608 #[cfg_attr(feature = "inline-always", inline(always))]
609 #[cfg_attr(not(feature = "inline-always"), inline)]
610 pub(super) fn atomic_store_u8(&mut self, count: usize) {
611 // This condition is also checked by the caller, so the compiler
612 // will remove this assertion by optimization.
613 assert!(self.remaining() >= count);
614 for _ in 0..count {
615 // SAFETY:
616 // - we've checked that the remaining bytes is greater than or equal to `count`
617 // Therefore, due to `StoreState`'s invariant:
618 // - `src` is valid to read of `count` of u8.
619 // - `dst` is valid to atomic write of `count` of u8.
620 unsafe {
621 let val = self.src::<u8>().read();
622 self.dst::<AtomicU8>().store(val, Ordering::Relaxed);
623 // SAFETY: we've filled 1 byte.
624 self.advance(1);
625 }
626 }
627 }
628
629 /// Note: The remaining bytes smaller than usize are ignored.
630 ///
631 /// # Safety
632 ///
633 /// - `self.dst` must be properly aligned for `usize`.
634 ///
635 /// There is no alignment requirement for `self.src`.
636 #[cfg_attr(feature = "inline-always", inline(always))]
637 #[cfg_attr(not(feature = "inline-always"), inline)]
638 pub(super) unsafe fn atomic_store_usize_to_end(&mut self) {
639 while self.remaining() >= mem::size_of::<usize>() {
640 // SAFETY:
641 // - the caller must guarantee that `dst` is properly aligned for `usize`.
642 // - we've checked that the remaining bytes is greater than
643 // or equal to `size_of::<usize>()`.
644 // Therefore, due to `StoreState`'s invariant:
645 // - `src` is valid to *unaligned* read of `usize`.
646 // - `dst` is valid to atomic write of `usize`.
647 unsafe {
648 let val = self.src::<usize>().read_unaligned();
649 self.dst::<AtomicUsize>().store(val, Ordering::Relaxed);
650 // SAFETY: we've filled `size_of::<usize>()` bytes.
651 self.advance(mem::size_of::<usize>());
652 }
653 }
654 }
655 }
656 }
657
658 /// Byte-wise atomic store.
659 ///
660 /// See the [`atomic_load`] function for the detailed implementation comment.
661 ///
662 /// # Safety
663 ///
664 /// See the documentation of [crate root's `atomic_store`](crate::atomic_store) for safety requirements.
665 #[cfg_attr(feature = "inline-always", inline(always))]
666 #[cfg_attr(not(feature = "inline-always"), inline)]
667 pub(crate) unsafe fn atomic_store<T>(dst: *mut T, val: T) {
668 // Safety requirements guaranteed by the caller:
669 // - `dst` is valid for atomic writes.
670 // - `dst` is properly aligned for `T`.
671 // - `dst` go through `UnsafeCell::get`.
672 // - `T` does not contain uninitialized bytes.
673 // - there are no concurrent non-atomic operations.
674 // - there are no concurrent atomic operations of different granularity.
675 // - if there are concurrent atomic write operations, `T` is valid for all bit patterns.
676 // Note that the safety of the code in this function relies on these guarantees,
677 // whether or not they are explicitly mentioned in the each safety comment.
678 debug_assert!(!dst.is_null());
679 debug_assert!(dst as usize % mem::align_of::<T>() == 0);
680
681 // In atomic_store, the panic *after* the first store operation is unsound
682 // because dst may become an invalid bit pattern.
683 //
684 // Our code is written very carefully so as not to cause panic, but we
685 // will use additional guards just in case.
686 //
687 // Note:
688 // - If the compiler can understand at compile time that panic will
689 // never occur, this guard will be removed (as with no-panic).
690 // - atomic_load does not modify the data, so it does not have this requirement.
691 // - If an invalid ordering is passed, it will be panic *before* the
692 // first store operation, so is fine.
693 let guard = PanicGuard;
694
695 let val = ManuallyDrop::new(val); // Do not drop `val`.
696
697 if mem::size_of::<T>() == 0 {
698 mem::forget(guard);
699 return;
700 }
701
702 // Branch 1: If the alignment of `T` is less than usize, but `T` can be write as
703 // at least one or more usize, compute the align offset and write it
704 // like `(&[AtomicU8], &[AtomicUsize], &[AtomicU8])`.
705 if mem::align_of::<T>() < mem::align_of::<AtomicUsize>()
706 && mem::size_of::<T>() >= mem::size_of::<usize>() * 4
707 {
708 let mut state = store::StoreState::new(dst, &*val);
709 let offset = (dst as *mut u8).align_offset(mem::align_of::<AtomicUsize>());
710 // Note: align_offset may returns usize::MAX: https://github.com/rust-lang/rust/issues/62420
711 if state.remaining() >= offset {
712 // Store `offset` bytes per byte to align `state.dst`.
713 state.atomic_store_u8(offset);
714 debug_assert!(state.remaining() >= mem::size_of::<usize>());
715 // SAFETY:
716 // - align_offset succeeds and the `offset` bytes have been
717 // filled, so now `state.dst` is definitely aligned.
718 // - we've checked that the remaining bytes is greater than
719 // or equal to `size_of::<usize>()`.
720 //
721 // In this branch, the pointer to `state.src` is usually
722 // not properly aligned, so we use `atomic_store_usize_to_end`,
723 // which has no requirement for alignment of `state.src`.
724 unsafe {
725 state.atomic_store_usize_to_end();
726 }
727 // Store remaining bytes per byte.
728 state.atomic_store_u8(state.remaining());
729 debug_assert_eq!(state.remaining(), 0);
730 mem::forget(guard);
731 return;
732 }
733 }
734
735 // Branch 2: If the alignment of `T` is greater than or equal to usize,
736 // we can write it as a chunk of usize from the first byte.
737 if mem::align_of::<T>() >= mem::align_of::<AtomicUsize>() {
738 let src = &*val as *const T as *const usize;
739 let dst = dst as *const AtomicUsize;
740 for i in range(0..mem::size_of::<T>() / mem::size_of::<usize>()) {
741 // SAFETY:
742 // - the caller must guarantee that `dst` is properly aligned for `T`.
743 // - `T` has an alignment greater than or equal to usize.
744 // - the remaining bytes is greater than or equal to `size_of::<usize>()`.
745 unsafe {
746 let val: usize = src.add(i).read();
747 (*dst.add(i)).store(val, Ordering::Relaxed);
748 }
749 }
750 mem::forget(guard);
751 return;
752 }
753
754 #[cfg(not(target_pointer_width = "16"))]
755 {
756 // Branch 3: If the alignment of `T` is greater than or equal to u32,
757 // we can write it as a chunk of u32 from the first byte.
758 if mem::size_of::<usize>() > 4 && mem::align_of::<T>() >= mem::align_of::<AtomicU32>() {
759 let src = &*val as *const T as *const u32;
760 let dst = dst as *const AtomicU32;
761 for i in range(0..mem::size_of::<T>() / mem::size_of::<u32>()) {
762 // SAFETY:
763 // - the caller must guarantee that `dst` is properly aligned for `T`.
764 // - `T` has an alignment greater than or equal to u32.
765 // - the remaining bytes is greater than or equal to `size_of::<u32>()`.
766 unsafe {
767 let val: u32 = src.add(i).read();
768 (*dst.add(i)).store(val, Ordering::Relaxed);
769 }
770 }
771 mem::forget(guard);
772 return;
773 }
774 }
775
776 // Branch 4: If the alignment of `T` is greater than or equal to u16,
777 // we can write it as a chunk of u16 from the first byte.
778 if mem::size_of::<usize>() > 2 && mem::align_of::<T>() >= mem::align_of::<AtomicU16>() {
779 let src = &*val as *const T as *const u16;
780 let dst = dst as *const AtomicU16;
781 for i in range(0..mem::size_of::<T>() / mem::size_of::<u16>()) {
782 // SAFETY:
783 // - the caller must guarantee that `dst` is properly aligned for `T`.
784 // - `T` has an alignment greater than or equal to u16.
785 // - the remaining bytes is greater than or equal to `size_of::<u16>()`.
786 unsafe {
787 let val: u16 = src.add(i).read();
788 (*dst.add(i)).store(val, Ordering::Relaxed);
789 }
790 }
791 mem::forget(guard);
792 return;
793 }
794
795 // Branch 5: Otherwise, we write it per byte.
796 let mut state = store::StoreState::new(dst, &*val);
797 state.atomic_store_u8(state.remaining());
798 debug_assert_eq!(state.remaining(), 0);
799 mem::forget(guard);
800 }
801
802 // This allows read_volatile and atomic_load to be lowered to exactly the
803 // same assembly on little endian platforms such as aarch64, riscv64.
804 #[cfg_attr(feature = "inline-always", inline(always))]
805 #[cfg_attr(not(feature = "inline-always"), inline)]
806 #[cfg(target_endian = "little")]
807 fn range<T>(r: Range<T>) -> core::iter::Rev<Range<T>>
808 where
809 Range<T>: DoubleEndedIterator,
810 {
811 r.rev()
812 }
813 #[cfg_attr(feature = "inline-always", inline(always))]
814 #[cfg_attr(not(feature = "inline-always"), inline)]
815 #[cfg(target_endian = "big")]
816 fn range<T>(r: Range<T>) -> Range<T>
817 where
818 Range<T>: DoubleEndedIterator,
819 {
820 r
821 }
822
823 struct PanicGuard;
824
825 impl Drop for PanicGuard {
826 fn drop(&mut self) {
827 // This crate supports no-std environment, so we cannot use std::process::abort.
828 // Instead, it uses the nature of double panics being converted to an abort.
829 panic!("abort");
830 }
831 }
832}