atomic_memcpy/
lib.rs

1/*!
2<!-- tidy:crate-doc:start -->
3Byte-wise atomic memcpy.
4
5This is an attempt to implement equivalent of C++ ["P1478: Byte-wise atomic memcpy"][p1478] in Rust.
6
7This is expected to allow algorithms such as Seqlock and Chase-Lev deque to be implemented without UB of data races.
8See [P1478][p1478] for more.
9
10## Status
11
12- If the alignment of the type being copied is the same as the pointer width, `atomic_load` is possible to produce an assembly roughly equivalent to the case of using volatile read + atomic fence on many platforms. (e.g., [aarch64](https://github.com/taiki-e/atomic-memcpy/blob/HEAD/tests/asm-test/asm/aarch64-unknown-linux-gnu/atomic_memcpy_load_align8), [riscv64](https://github.com/taiki-e/atomic-memcpy/blob/HEAD/tests/asm-test/asm/riscv64gc-unknown-linux-gnu/atomic_memcpy_load_align8). See [`tests/asm-test/asm`][asm-test] directory for more).
13- If the alignment of the type being copied is smaller than the pointer width, there will be some performance degradation. However, it is implemented in such a way that it does not cause extreme performance degradation at least on x86_64. (See [the implementation comments of `atomic_load`][implementation] for more.) It is possible that there is still room for improvement, especially on non-x86_64 platforms.
14- Optimization for the case where the alignment of the type being copied is larger than the pointer width has not yet been fully investigated. It is possible that there is still room for improvement.
15- If the type being copied contains pointers it is not compatible with strict provenance because the copy does ptr-to-int transmutes.
16- If the type being copied contains uninitialized bytes (e.g., padding) [it is undefined behavior because the copy goes through integers][undefined-behavior]. This problem will probably not be resolved until something like `AtomicMaybeUninit` is supported.
17
18## Related Projects
19
20- [portable-atomic]: Portable atomic types including support for 128-bit atomics, atomic float, etc. Using byte-wise atomic memcpy to implement Seqlock, which is used in the fallback implementation.
21- [atomic-maybe-uninit]: Atomic operations on potentially uninitialized integers.
22
23[asm-test]: https://github.com/taiki-e/atomic-memcpy/tree/HEAD/tests/asm-test/asm
24[atomic-maybe-uninit]: https://github.com/taiki-e/atomic-maybe-uninit
25[implementation]: https://github.com/taiki-e/atomic-memcpy/blob/v0.2.0/src/lib.rs#L367-L427
26[p1478]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1478r7.html
27[portable-atomic]: https://github.com/taiki-e/portable-atomic
28[undefined-behavior]: https://doc.rust-lang.org/reference/behavior-considered-undefined.html
29
30<!-- tidy:crate-doc:end -->
31*/
32
33#![no_std]
34#![doc(test(
35    no_crate_inject,
36    attr(
37        deny(warnings, rust_2018_idioms, single_use_lifetimes),
38        allow(dead_code, unused_variables)
39    )
40))]
41#![warn(
42    missing_debug_implementations,
43    missing_docs,
44    rust_2018_idioms,
45    single_use_lifetimes,
46    unreachable_pub
47)]
48#![cfg_attr(test, warn(unsafe_op_in_unsafe_fn))] // unsafe_op_in_unsafe_fn requires Rust 1.52
49#![cfg_attr(not(test), allow(unused_unsafe))]
50#![warn(
51    clippy::pedantic,
52    // lints for public library
53    clippy::alloc_instead_of_core,
54    clippy::exhaustive_enums,
55    clippy::exhaustive_structs,
56    clippy::std_instead_of_alloc,
57    clippy::std_instead_of_core,
58    // lints that help writing unsafe code
59    clippy::as_ptr_cast_mut,
60    clippy::default_union_representation,
61    clippy::trailing_empty_array,
62    clippy::transmute_undefined_repr,
63    clippy::undocumented_unsafe_blocks,
64    // misc
65    clippy::missing_inline_in_public_items,
66)]
67#![allow(clippy::doc_markdown, clippy::inline_always, clippy::single_match_else)]
68
69// This crate should work on targets with power-of-two pointer widths,
70// but it is not clear how it will work on targets without them.
71// There are currently no 8-bit, 128-bit, or higher builtin targets.
72// Note that Rust (and C99) pointers must be at least 16-bits: https://github.com/rust-lang/rust/pull/49305
73#[cfg(not(any(
74    target_pointer_width = "16",
75    target_pointer_width = "32",
76    target_pointer_width = "64",
77)))]
78compile_error!(
79    "atomic-memcpy currently only supports targets with {16,32,64}-bit pointer width; \
80     if you need support for others, \
81     please submit an issue at <https://github.com/taiki-e/atomic-memcpy>"
82);
83
84#[cfg(not(target_os = "none"))]
85use core::sync::atomic;
86use core::sync::atomic::Ordering;
87
88#[cfg(target_os = "none")]
89use portable_atomic as atomic;
90
91/// Byte-wise atomic load.
92///
93/// # Safety
94///
95/// Behavior is undefined if any of the following conditions are violated:
96///
97/// - `src` must be valid for reads.
98/// - `src` must be properly aligned.
99/// - `src` must go through [`UnsafeCell::get`](core::cell::UnsafeCell::get).
100/// - `T` must not contain uninitialized bytes.
101/// - There are no concurrent non-atomic write operations.
102/// - There are no concurrent atomic write operations of different
103///   granularity. The granularity of atomic operations is an implementation
104///   detail, so the concurrent write operation that can always
105///   safely be used is only [`atomic_store`].
106///
107/// Like [`ptr::read`](core::ptr::read), `atomic_load` creates a bitwise copy of `T`, regardless of
108/// whether `T` is [`Copy`]. If `T` is not [`Copy`], using both the returned
109/// value and the value at `*src` can [violate memory safety][read-ownership].
110///
111/// Note that even if `T` has size `0`, the pointer must be non-null.
112///
113/// ## Returned value
114///
115/// This function returns [`MaybeUninit<T>`](core::mem::MaybeUninit) instead of `T`.
116///
117/// - All bits in the returned value are guaranteed to be copied from `src`.
118/// - There is *no* guarantee that all bits in the return have been copied at
119///   the same time, so if `src` is updated by a concurrent write operation,
120///   it is up to the caller to make sure that the returned value is valid as `T`.
121///
122/// [read-ownership]: core::ptr::read#ownership-of-the-returned-value
123/// [valid]: core::ptr#safety
124///
125/// # Panics
126///
127/// Panics if `order` is [`Release`](Ordering::Release) or [`AcqRel`](Ordering::AcqRel).
128///
129/// # Examples
130///
131/// ```rust
132/// use std::{cell::UnsafeCell, sync::atomic::Ordering};
133///
134/// let v = UnsafeCell::new([0_u8; 64]);
135/// let result = unsafe { atomic_memcpy::atomic_load(v.get(), Ordering::Acquire) };
136/// // SAFETY: there was no concurrent write operations during load.
137/// assert_eq!(unsafe { result.assume_init() }, [0; 64]);
138/// ```
139#[cfg_attr(feature = "inline-always", inline(always))]
140#[cfg_attr(not(feature = "inline-always"), inline)]
141pub unsafe fn atomic_load<T>(src: *const T, order: Ordering) -> core::mem::MaybeUninit<T> {
142    assert_load_ordering(order);
143    // SAFETY: the caller must uphold the safety contract for `atomic_load`.
144    let val = unsafe { imp::atomic_load(src) };
145    match order {
146        Ordering::Relaxed => { /* no-op */ }
147        _ => atomic::fence(order),
148    }
149    val
150}
151
152/// Byte-wise atomic store.
153///
154/// # Safety
155///
156/// Behavior is undefined if any of the following conditions are violated:
157///
158/// - `dst` must be [valid] for writes.
159/// - `dst` must be properly aligned.
160/// - `dst` must go through [`UnsafeCell::get`](core::cell::UnsafeCell::get).
161/// - `T` must not contain uninitialized bytes.
162/// - There are no concurrent non-atomic operations.
163/// - There are no concurrent atomic operations of different
164///   granularity. The granularity of atomic operations is an implementation
165///   detail, so the concurrent operation that can always
166///   safely be used is only [`atomic_load`].
167///
168/// If there are concurrent write operations, the resulting value at `*dst` may
169/// contain a mixture of bytes written by this thread and bytes written by
170/// another thread. If `T` is not valid for all bit patterns, using the value at
171/// `*dst` can violate memory safety.
172///
173/// Note that even if `T` has size `0`, the pointer must be non-null.
174///
175/// [valid]: core::ptr#safety
176///
177/// # Panics
178///
179/// Panics if `order` is [`Acquire`](Ordering::Acquire) or [`AcqRel`](Ordering::AcqRel).
180///
181/// # Examples
182///
183/// ```rust
184/// use std::{cell::UnsafeCell, sync::atomic::Ordering};
185///
186/// let v = UnsafeCell::new([0_u8; 64]);
187/// unsafe {
188///     atomic_memcpy::atomic_store(v.get(), [1; 64], Ordering::Release);
189/// }
190/// let result = unsafe { atomic_memcpy::atomic_load(v.get(), Ordering::Acquire) };
191/// // SAFETY: there was no concurrent write operations during load.
192/// assert_eq!(unsafe { result.assume_init() }, [1; 64]);
193/// ```
194#[cfg_attr(feature = "inline-always", inline(always))]
195#[cfg_attr(not(feature = "inline-always"), inline)]
196pub unsafe fn atomic_store<T>(dst: *mut T, val: T, order: Ordering) {
197    assert_store_ordering(order);
198    match order {
199        Ordering::Relaxed => { /* no-op */ }
200        _ => atomic::fence(order),
201    }
202    // SAFETY: the caller must uphold the safety contract for `atomic_store`.
203    unsafe {
204        imp::atomic_store(dst, val);
205    }
206}
207
208// https://github.com/rust-lang/rust/blob/1.70.0/library/core/src/sync/atomic.rs#L3155
209#[cfg_attr(feature = "inline-always", inline(always))]
210#[cfg_attr(not(feature = "inline-always"), inline)]
211fn assert_load_ordering(order: Ordering) {
212    match order {
213        Ordering::Acquire | Ordering::Relaxed | Ordering::SeqCst => {}
214        Ordering::Release => panic!("there is no such thing as a release load"),
215        Ordering::AcqRel => panic!("there is no such thing as an acquire-release load"),
216        _ => unreachable!("{:?}", order),
217    }
218}
219
220// https://github.com/rust-lang/rust/blob/1.70.0/library/core/src/sync/atomic.rs#L3140
221#[cfg_attr(feature = "inline-always", inline(always))]
222#[cfg_attr(not(feature = "inline-always"), inline)]
223fn assert_store_ordering(order: Ordering) {
224    match order {
225        Ordering::Release | Ordering::Relaxed | Ordering::SeqCst => {}
226        Ordering::Acquire => panic!("there is no such thing as an acquire store"),
227        Ordering::AcqRel => panic!("there is no such thing as an acquire-release store"),
228        _ => unreachable!("{:?}", order),
229    }
230}
231
232mod imp {
233    use core::{
234        mem::{self, ManuallyDrop, MaybeUninit},
235        ops::Range,
236    };
237
238    #[cfg(not(target_pointer_width = "16"))]
239    use crate::atomic::AtomicU32;
240    use crate::atomic::{AtomicU16, AtomicUsize, Ordering};
241
242    // Boundary to make the fields of LoadState private.
243    //
244    // Note that this is not a complete safe/unsafe boundary[1], since it is still
245    // possible to pass an invalid pointer to the constructor.
246    //
247    // [1]: https://www.ralfj.de/blog/2016/01/09/the-scope-of-unsafe.html
248    mod load {
249        use core::mem;
250
251        use crate::atomic::{AtomicU8, AtomicUsize, Ordering};
252
253        // Invariant: `src` and `result` will never change.
254        // Invariant: Only the `advance` method can advance offset and counter.
255        pub(super) struct LoadState {
256            src: *const u8,
257            // Note: This is a pointer from MaybeUninit.
258            result: *mut u8,
259            /// Counter to track remaining bytes in `T`.
260            remaining: usize,
261            offset: usize,
262        }
263
264        impl LoadState {
265            #[cfg_attr(feature = "inline-always", inline(always))]
266            #[cfg_attr(not(feature = "inline-always"), inline)]
267            pub(super) fn new<T>(result: *mut T, src: *const T) -> Self {
268                Self {
269                    src: src as *const u8,
270                    result: result as *mut u8,
271                    remaining: mem::size_of::<T>(),
272                    offset: 0,
273                }
274            }
275
276            /// Advances pointers by `size` **bytes**.
277            ///
278            /// # Safety
279            ///
280            /// - The remaining bytes must be greater than or equal to `size`.
281            /// - The range of `self.dst..self.dst.add(size)` must be filled.
282            #[cfg_attr(feature = "inline-always", inline(always))]
283            #[cfg_attr(not(feature = "inline-always"), inline)]
284            unsafe fn advance(&mut self, size: usize) {
285                debug_assert!(self.remaining >= size);
286                self.remaining -= size;
287                self.offset += size;
288            }
289
290            #[cfg_attr(feature = "inline-always", inline(always))]
291            #[cfg_attr(not(feature = "inline-always"), inline)]
292            pub(super) fn remaining(&self) -> usize {
293                self.remaining
294            }
295
296            #[cfg_attr(feature = "inline-always", inline(always))]
297            #[cfg_attr(not(feature = "inline-always"), inline)]
298            unsafe fn src<T>(&self) -> &T {
299                // SAFETY: the caller must uphold the safety contract.
300                unsafe { &*(self.src.add(self.offset) as *const T) }
301            }
302
303            #[cfg_attr(feature = "inline-always", inline(always))]
304            #[cfg_attr(not(feature = "inline-always"), inline)]
305            unsafe fn result<T>(&self) -> *mut T {
306                // SAFETY: the caller must uphold the safety contract.
307                unsafe { self.result.add(self.offset) as *mut T }
308            }
309
310            #[cfg_attr(feature = "inline-always", inline(always))]
311            #[cfg_attr(not(feature = "inline-always"), inline)]
312            pub(super) fn atomic_load_u8(&mut self, count: usize) {
313                // This condition is also checked by the caller, so the compiler
314                // will remove this assertion by optimization.
315                assert!(self.remaining() >= count);
316                for _ in 0..count {
317                    // SAFETY:
318                    // - we've checked that the remaining bytes is greater than or equal to `count`
319                    // Therefore, due to `LoadState`'s invariant:
320                    // - `src` is valid to atomic read of `count` of u8.
321                    // - `result` is valid to write of `count` of u8.
322                    unsafe {
323                        let val = self.src::<AtomicU8>().load(Ordering::Relaxed);
324                        self.result::<u8>().write(val);
325                        // SAFETY: we've filled 1 byte.
326                        self.advance(1);
327                    }
328                }
329            }
330
331            /// Note: The remaining bytes smaller than usize are ignored.
332            ///
333            /// # Safety
334            ///
335            /// - `self.src` must be properly aligned for `usize`.
336            ///
337            /// There is no alignment requirement for `self.result`.
338            #[cfg_attr(feature = "inline-always", inline(always))]
339            #[cfg_attr(not(feature = "inline-always"), inline)]
340            pub(super) unsafe fn atomic_load_usize_to_end(&mut self) {
341                while self.remaining() >= mem::size_of::<usize>() {
342                    // SAFETY:
343                    // - the caller must guarantee that `src` is properly aligned for `usize`.
344                    // - we've checked that the remaining bytes is greater than
345                    //   or equal to `size_of::<usize>()`.
346                    // Therefore, due to `LoadState`'s invariant:
347                    // - `src` is valid to atomic read of `usize`.
348                    // - `result` is valid to *unaligned* write of `usize`.
349                    unsafe {
350                        let val = self.src::<AtomicUsize>().load(Ordering::Relaxed);
351                        self.result::<usize>().write_unaligned(val);
352                        // SAFETY: we've filled `size_of::<usize>()` bytes.
353                        self.advance(mem::size_of::<usize>());
354                    }
355                }
356            }
357        }
358    }
359
360    /// Byte-wise atomic load.
361    ///
362    /// # Safety
363    ///
364    /// See the documentation of [crate root's `atomic_load`](crate::atomic_load) for safety requirements.
365    /**
366    # Implementation
367
368    It is implemented based on the assumption that atomic operations at a
369    granularity greater than bytes is not a problem, as stated by [p1478].
370
371    > Note that on standard hardware, it should be OK to actually perform the
372    > copy at larger than byte granularity. Copying multiple bytes as part of
373    > one operation is indistinguishable from running them so quickly that the
374    > intermediate state is not observed. In fact, we expect that existing
375    > assembly memcpy implementations will suffice when suffixed with the required fence.
376
377    And it turns out that the granularity of the atomic operations is very important for performance.
378
379    - Loading/storing all bytes in bytes is very slow at least on x86/x86_64.
380    - The pointer width atomic operation is the fastest at least on x86/x86_64.
381    - Atomic operations with a granularity larger than the pointer width are slow
382      at least on x86/x86_64 (cmpxchg8b/cmpxchg16b).
383
384    Note that the following additional safety requirements.
385
386    - The granularity of the atomic operations in load and store must be the same.
387    - When performing an atomic operation as a type with alignment greater than 1,
388      the pointer must be properly aligned.
389
390    The caller of `atomic_load` guarantees that the `src` is properly aligned.
391    So, we can avoid calling `align_offset` or read at a granularity greater
392    than u8 in some cases.
393
394    The following is what this implementation is currently `atomic_load` using
395    (Note: `atomic_store` also uses exactly the same way to determine the
396    granularity of atomic operations):
397
398    Branch | Granularity of atomic operations | Conditions
399    ------ | -------------------------------- | ----------
400    1      | u8 ..., usize ..., u8 ...        | `size_of::<T>() >= size_of::<usize>() * 4`, `align_of::<T>() < align_of::<AtomicUsize>()`
401    2      | usize ...                        | `align_of::<T>() >= align_of::<AtomicUsize>()`
402    3      | u32 ...                          | `align_of::<T>() >= align_of::<AtomicU32>()`, 64-bit or higher
403    4      | u16 ...                          | `align_of::<T>() >= align_of::<AtomicU16>()`, 32-bit or higher
404    5      | u8 ...                           |
405
406    - Branch 1: If the alignment of `T` is less than usize, but `T` can be read
407      as at least a few numbers of usize, compute the align offset and read it
408      like `(&[AtomicU8], &[AtomicUsize], &[AtomicU8])`.
409    - Branch 2: If the alignment of `T` is greater than or equal to usize, we
410      can read it as a chunk of usize from the first byte.
411    - Branch 3, 4: If the alignment of `T` is greater than 1, we can read it as
412      a chunk of smaller integers (u32 or u16). This is basically the same
413      strategy as Branch 2.
414    - Branch 5: Otherwise, we read it per byte.
415
416    Note that only Branch 1 requires to compute align offset dynamically.
417    Note that which branch is chosen is evaluated at compile time.
418
419    - The fastest is Branch 2, which can read all bytes as a chunk of usize.
420    - If the size of `T` is not too small, Branch 1 is the next fastest to Branch 2.
421    - If the size of `T` is small, Branch 3/4/5 can be faster than Branch 1.
422
423    Whether to choose Branch 1 or Branch 3/4/5 when `T` is small is currently
424    based on a rough heuristic based on simple benchmarks on x86_64.
425
426    [p1478]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1478r7.html
427    */
428    #[cfg_attr(feature = "inline-always", inline(always))]
429    #[cfg_attr(not(feature = "inline-always"), inline)]
430    pub(crate) unsafe fn atomic_load<T>(src: *const T) -> MaybeUninit<T> {
431        // Safety requirements guaranteed by the caller:
432        // - `src` is valid for atomic reads.
433        // - `src` is properly aligned for `T`.
434        // - `src` go through `UnsafeCell::get`.
435        // - `T` does not contain uninitialized bytes.
436        // - there are no concurrent non-atomic write operations.
437        // - there are no concurrent atomic write operations of different granularity.
438        // Note that the safety of the code in this function relies on these guarantees,
439        // whether or not they are explicitly mentioned in the each safety comment.
440        debug_assert!(!src.is_null());
441        debug_assert!(src as usize % mem::align_of::<T>() == 0);
442
443        let mut result = MaybeUninit::<T>::uninit();
444
445        if mem::size_of::<T>() == 0 {
446            return result;
447        }
448
449        // Branch 1: If the alignment of `T` is less than usize, but `T` can be read as
450        // at least one or more usize, compute the align offset and read it
451        // like `(&[AtomicU8], &[AtomicUsize], &[AtomicU8])`.
452        if mem::align_of::<T>() < mem::align_of::<AtomicUsize>()
453            && mem::size_of::<T>() >= mem::size_of::<usize>() * 4
454        {
455            let mut state = load::LoadState::new(result.as_mut_ptr(), src);
456            let offset = (src as *const u8).align_offset(mem::align_of::<AtomicUsize>());
457            // Note: align_offset may returns usize::MAX: https://github.com/rust-lang/rust/issues/62420
458            if state.remaining() >= offset {
459                // Load `offset` bytes per byte to align `state.src`.
460                state.atomic_load_u8(offset);
461                debug_assert!(state.remaining() >= mem::size_of::<usize>());
462                // SAFETY:
463                // - align_offset succeeds and the `offset` bytes have been
464                //   filled, so now `state.src` is definitely aligned.
465                // - we've checked that the remaining bytes is greater than
466                //   or equal to `size_of::<usize>()`.
467                //
468                // In this branch, the pointer to `state.result` is usually
469                // not properly aligned, so we use `atomic_load_usize_to_end`,
470                // which has no requirement for alignment of `state.result`.
471                unsafe { state.atomic_load_usize_to_end() }
472                // Load remaining bytes per byte.
473                state.atomic_load_u8(state.remaining());
474                debug_assert_eq!(state.remaining(), 0);
475                return result;
476            }
477        }
478
479        // Branch 2: If the alignment of `T` is greater than or equal to usize,
480        // we can read it as a chunk of usize from the first byte.
481        if mem::align_of::<T>() >= mem::align_of::<AtomicUsize>() {
482            let src = src as *const AtomicUsize;
483            let dst = result.as_mut_ptr() as *mut usize;
484            for i in range(0..mem::size_of::<T>() / mem::size_of::<usize>()) {
485                // SAFETY:
486                // - the caller must guarantee that `src` is properly aligned for `T`.
487                // - `T` has an alignment greater than or equal to usize.
488                // - the remaining bytes is greater than or equal to `size_of::<usize>()`.
489                unsafe {
490                    let val: usize = (*src.add(i)).load(Ordering::Relaxed);
491                    dst.add(i).write(val);
492                }
493            }
494            return result;
495        }
496
497        #[cfg(not(target_pointer_width = "16"))]
498        {
499            // Branch 3: If the alignment of `T` is greater than or equal to u32,
500            // we can read it as a chunk of u32 from the first byte.
501            if mem::size_of::<usize>() > 4 && mem::align_of::<T>() >= mem::align_of::<AtomicU32>() {
502                let src = src as *const AtomicU32;
503                let dst = result.as_mut_ptr() as *mut u32;
504                for i in range(0..mem::size_of::<T>() / mem::size_of::<u32>()) {
505                    // SAFETY:
506                    // - the caller must guarantee that `src` is properly aligned for `T`.
507                    // - `T` has an alignment greater than or equal to u32.
508                    // - the remaining bytes is greater than or equal to `size_of::<u32>()`.
509                    unsafe {
510                        let val: u32 = (*src.add(i)).load(Ordering::Relaxed);
511                        dst.add(i).write(val);
512                    }
513                }
514                return result;
515            }
516        }
517
518        // Branch 4: If the alignment of `T` is greater than or equal to u16,
519        // we can read it as a chunk of u16 from the first byte.
520        if mem::size_of::<usize>() > 2 && mem::align_of::<T>() >= mem::align_of::<AtomicU16>() {
521            let src = src as *const AtomicU16;
522            let dst = result.as_mut_ptr() as *mut u16;
523            for i in range(0..mem::size_of::<T>() / mem::size_of::<u16>()) {
524                // SAFETY:
525                // - the caller must guarantee that `src` is properly aligned for `T`.
526                // - `T` has an alignment greater than or equal to u16.
527                // - the remaining bytes is greater than or equal to `size_of::<u16>()`.
528                unsafe {
529                    let val: u16 = (*src.add(i)).load(Ordering::Relaxed);
530                    dst.add(i).write(val);
531                }
532            }
533            return result;
534        }
535
536        // Branch 5: Otherwise, we read it per byte.
537        let mut state = load::LoadState::new(result.as_mut_ptr(), src);
538        state.atomic_load_u8(state.remaining());
539        debug_assert_eq!(state.remaining(), 0);
540        result
541    }
542
543    // Boundary to make the fields of StoreState private.
544    //
545    // Note that this is not a complete safe/unsafe boundary, since it is still
546    // possible to pass an invalid pointer to the constructor.
547    mod store {
548        use core::mem;
549
550        use crate::atomic::{AtomicU8, AtomicUsize, Ordering};
551
552        // Invariant: `src` and `dst` will never change.
553        // Invariant: Only the `advance` method can advance offset and counter.
554        pub(super) struct StoreState {
555            src: *const u8,
556            dst: *const u8,
557            /// Number of remaining bytes in `T`.
558            remaining: usize,
559            offset: usize,
560        }
561
562        impl StoreState {
563            #[cfg_attr(feature = "inline-always", inline(always))]
564            #[cfg_attr(not(feature = "inline-always"), inline)]
565            pub(super) fn new<T>(dst: *mut T, src: *const T) -> Self {
566                Self {
567                    src: src as *const u8,
568                    dst: dst as *mut u8 as *const u8,
569                    remaining: mem::size_of::<T>(),
570                    offset: 0,
571                }
572            }
573
574            /// Advances pointers by `size` **bytes**.
575            ///
576            /// # Safety
577            ///
578            /// - The remaining bytes must be greater than or equal to `size`.
579            /// - The range of `self.dst..self.dst.add(size)` must be filled.
580            #[cfg_attr(feature = "inline-always", inline(always))]
581            #[cfg_attr(not(feature = "inline-always"), inline)]
582            unsafe fn advance(&mut self, size: usize) {
583                debug_assert!(self.remaining >= size);
584                self.remaining -= size;
585                self.offset += size;
586            }
587
588            #[cfg_attr(feature = "inline-always", inline(always))]
589            #[cfg_attr(not(feature = "inline-always"), inline)]
590            pub(super) fn remaining(&self) -> usize {
591                self.remaining
592            }
593
594            #[cfg_attr(feature = "inline-always", inline(always))]
595            #[cfg_attr(not(feature = "inline-always"), inline)]
596            unsafe fn src<T>(&self) -> *const T {
597                // SAFETY: the caller must uphold the safety contract.
598                unsafe { self.src.add(self.offset) as *const T }
599            }
600
601            #[cfg_attr(feature = "inline-always", inline(always))]
602            #[cfg_attr(not(feature = "inline-always"), inline)]
603            unsafe fn dst<T>(&self) -> &T {
604                // SAFETY: the caller must uphold the safety contract.
605                unsafe { &*(self.dst.add(self.offset) as *const T) }
606            }
607
608            #[cfg_attr(feature = "inline-always", inline(always))]
609            #[cfg_attr(not(feature = "inline-always"), inline)]
610            pub(super) fn atomic_store_u8(&mut self, count: usize) {
611                // This condition is also checked by the caller, so the compiler
612                // will remove this assertion by optimization.
613                assert!(self.remaining() >= count);
614                for _ in 0..count {
615                    // SAFETY:
616                    // - we've checked that the remaining bytes is greater than or equal to `count`
617                    // Therefore, due to `StoreState`'s invariant:
618                    // - `src` is valid to read of `count` of u8.
619                    // - `dst` is valid to atomic write of `count` of u8.
620                    unsafe {
621                        let val = self.src::<u8>().read();
622                        self.dst::<AtomicU8>().store(val, Ordering::Relaxed);
623                        // SAFETY: we've filled 1 byte.
624                        self.advance(1);
625                    }
626                }
627            }
628
629            /// Note: The remaining bytes smaller than usize are ignored.
630            ///
631            /// # Safety
632            ///
633            /// - `self.dst` must be properly aligned for `usize`.
634            ///
635            /// There is no alignment requirement for `self.src`.
636            #[cfg_attr(feature = "inline-always", inline(always))]
637            #[cfg_attr(not(feature = "inline-always"), inline)]
638            pub(super) unsafe fn atomic_store_usize_to_end(&mut self) {
639                while self.remaining() >= mem::size_of::<usize>() {
640                    // SAFETY:
641                    // - the caller must guarantee that `dst` is properly aligned for `usize`.
642                    // - we've checked that the remaining bytes is greater than
643                    //   or equal to `size_of::<usize>()`.
644                    // Therefore, due to `StoreState`'s invariant:
645                    // - `src` is valid to *unaligned* read of `usize`.
646                    // - `dst` is valid to atomic write of `usize`.
647                    unsafe {
648                        let val = self.src::<usize>().read_unaligned();
649                        self.dst::<AtomicUsize>().store(val, Ordering::Relaxed);
650                        // SAFETY: we've filled `size_of::<usize>()` bytes.
651                        self.advance(mem::size_of::<usize>());
652                    }
653                }
654            }
655        }
656    }
657
658    /// Byte-wise atomic store.
659    ///
660    /// See the [`atomic_load`] function for the detailed implementation comment.
661    ///
662    /// # Safety
663    ///
664    /// See the documentation of [crate root's `atomic_store`](crate::atomic_store) for safety requirements.
665    #[cfg_attr(feature = "inline-always", inline(always))]
666    #[cfg_attr(not(feature = "inline-always"), inline)]
667    pub(crate) unsafe fn atomic_store<T>(dst: *mut T, val: T) {
668        // Safety requirements guaranteed by the caller:
669        // - `dst` is valid for atomic writes.
670        // - `dst` is properly aligned for `T`.
671        // - `dst` go through `UnsafeCell::get`.
672        // - `T` does not contain uninitialized bytes.
673        // - there are no concurrent non-atomic operations.
674        // - there are no concurrent atomic operations of different granularity.
675        // - if there are concurrent atomic write operations, `T` is valid for all bit patterns.
676        // Note that the safety of the code in this function relies on these guarantees,
677        // whether or not they are explicitly mentioned in the each safety comment.
678        debug_assert!(!dst.is_null());
679        debug_assert!(dst as usize % mem::align_of::<T>() == 0);
680
681        // In atomic_store, the panic *after* the first store operation is unsound
682        // because dst may become an invalid bit pattern.
683        //
684        // Our code is written very carefully so as not to cause panic, but we
685        // will use additional guards just in case.
686        //
687        // Note:
688        // - If the compiler can understand at compile time that panic will
689        //   never occur, this guard will be removed (as with no-panic).
690        // - atomic_load does not modify the data, so it does not have this requirement.
691        // - If an invalid ordering is passed, it will be panic *before* the
692        //   first store operation, so is fine.
693        let guard = PanicGuard;
694
695        let val = ManuallyDrop::new(val); // Do not drop `val`.
696
697        if mem::size_of::<T>() == 0 {
698            mem::forget(guard);
699            return;
700        }
701
702        // Branch 1: If the alignment of `T` is less than usize, but `T` can be write as
703        // at least one or more usize, compute the align offset and write it
704        // like `(&[AtomicU8], &[AtomicUsize], &[AtomicU8])`.
705        if mem::align_of::<T>() < mem::align_of::<AtomicUsize>()
706            && mem::size_of::<T>() >= mem::size_of::<usize>() * 4
707        {
708            let mut state = store::StoreState::new(dst, &*val);
709            let offset = (dst as *mut u8).align_offset(mem::align_of::<AtomicUsize>());
710            // Note: align_offset may returns usize::MAX: https://github.com/rust-lang/rust/issues/62420
711            if state.remaining() >= offset {
712                // Store `offset` bytes per byte to align `state.dst`.
713                state.atomic_store_u8(offset);
714                debug_assert!(state.remaining() >= mem::size_of::<usize>());
715                // SAFETY:
716                // - align_offset succeeds and the `offset` bytes have been
717                //   filled, so now `state.dst` is definitely aligned.
718                // - we've checked that the remaining bytes is greater than
719                //   or equal to `size_of::<usize>()`.
720                //
721                // In this branch, the pointer to `state.src` is usually
722                // not properly aligned, so we use `atomic_store_usize_to_end`,
723                // which has no requirement for alignment of `state.src`.
724                unsafe {
725                    state.atomic_store_usize_to_end();
726                }
727                // Store remaining bytes per byte.
728                state.atomic_store_u8(state.remaining());
729                debug_assert_eq!(state.remaining(), 0);
730                mem::forget(guard);
731                return;
732            }
733        }
734
735        // Branch 2: If the alignment of `T` is greater than or equal to usize,
736        // we can write it as a chunk of usize from the first byte.
737        if mem::align_of::<T>() >= mem::align_of::<AtomicUsize>() {
738            let src = &*val as *const T as *const usize;
739            let dst = dst as *const AtomicUsize;
740            for i in range(0..mem::size_of::<T>() / mem::size_of::<usize>()) {
741                // SAFETY:
742                // - the caller must guarantee that `dst` is properly aligned for `T`.
743                // - `T` has an alignment greater than or equal to usize.
744                // - the remaining bytes is greater than or equal to `size_of::<usize>()`.
745                unsafe {
746                    let val: usize = src.add(i).read();
747                    (*dst.add(i)).store(val, Ordering::Relaxed);
748                }
749            }
750            mem::forget(guard);
751            return;
752        }
753
754        #[cfg(not(target_pointer_width = "16"))]
755        {
756            // Branch 3: If the alignment of `T` is greater than or equal to u32,
757            // we can write it as a chunk of u32 from the first byte.
758            if mem::size_of::<usize>() > 4 && mem::align_of::<T>() >= mem::align_of::<AtomicU32>() {
759                let src = &*val as *const T as *const u32;
760                let dst = dst as *const AtomicU32;
761                for i in range(0..mem::size_of::<T>() / mem::size_of::<u32>()) {
762                    // SAFETY:
763                    // - the caller must guarantee that `dst` is properly aligned for `T`.
764                    // - `T` has an alignment greater than or equal to u32.
765                    // - the remaining bytes is greater than or equal to `size_of::<u32>()`.
766                    unsafe {
767                        let val: u32 = src.add(i).read();
768                        (*dst.add(i)).store(val, Ordering::Relaxed);
769                    }
770                }
771                mem::forget(guard);
772                return;
773            }
774        }
775
776        // Branch 4: If the alignment of `T` is greater than or equal to u16,
777        // we can write it as a chunk of u16 from the first byte.
778        if mem::size_of::<usize>() > 2 && mem::align_of::<T>() >= mem::align_of::<AtomicU16>() {
779            let src = &*val as *const T as *const u16;
780            let dst = dst as *const AtomicU16;
781            for i in range(0..mem::size_of::<T>() / mem::size_of::<u16>()) {
782                // SAFETY:
783                // - the caller must guarantee that `dst` is properly aligned for `T`.
784                // - `T` has an alignment greater than or equal to u16.
785                // - the remaining bytes is greater than or equal to `size_of::<u16>()`.
786                unsafe {
787                    let val: u16 = src.add(i).read();
788                    (*dst.add(i)).store(val, Ordering::Relaxed);
789                }
790            }
791            mem::forget(guard);
792            return;
793        }
794
795        // Branch 5: Otherwise, we write it per byte.
796        let mut state = store::StoreState::new(dst, &*val);
797        state.atomic_store_u8(state.remaining());
798        debug_assert_eq!(state.remaining(), 0);
799        mem::forget(guard);
800    }
801
802    // This allows read_volatile and atomic_load to be lowered to exactly the
803    // same assembly on little endian platforms such as aarch64, riscv64.
804    #[cfg_attr(feature = "inline-always", inline(always))]
805    #[cfg_attr(not(feature = "inline-always"), inline)]
806    #[cfg(target_endian = "little")]
807    fn range<T>(r: Range<T>) -> core::iter::Rev<Range<T>>
808    where
809        Range<T>: DoubleEndedIterator,
810    {
811        r.rev()
812    }
813    #[cfg_attr(feature = "inline-always", inline(always))]
814    #[cfg_attr(not(feature = "inline-always"), inline)]
815    #[cfg(target_endian = "big")]
816    fn range<T>(r: Range<T>) -> Range<T>
817    where
818        Range<T>: DoubleEndedIterator,
819    {
820        r
821    }
822
823    struct PanicGuard;
824
825    impl Drop for PanicGuard {
826        fn drop(&mut self) {
827            // This crate supports no-std environment, so we cannot use std::process::abort.
828            // Instead, it uses the nature of double panics being converted to an abort.
829            panic!("abort");
830        }
831    }
832}