ring-buffer-macro 0.2.0

# ring-buffer-macro Internals

A deep-dive into how this crate works, why it works, and what all the proc macro
nonsense is actually doing under the hood.

---

## Table of Contents

1. [What Even Is a Proc Macro?](#what-even-is-a-proc-macro)
2. [The Big Picture](#the-big-picture)
3. [Project Structure](#project-structure)
4. [The Entry Point: `src/lib.rs`](#the-entry-point-srclibrs)
5. [Parsing: `src/parser.rs`](#parsing-srcparserrs)
6. [Error Handling: `src/error.rs`](#error-handling-srcerrorrs)
7. [Standard Mode: `src/generator.rs`](#standard-mode-srcgeneratorrs)
8. [SPSC Mode: `src/spsc.rs`](#spsc-mode-srcspscrs)
9. [MPSC Mode: `src/mpsc.rs`](#mpsc-mode-srcmpscrs)
10. [The Macro Expansion Lifecycle](#the-macro-expansion-lifecycle)
11. [Memory Ordering (the hard part)](#memory-ordering-the-hard-part)
12. [Key Design Decisions](#key-design-decisions)

---

## What Even Is a Proc Macro?

Before diving into the code, you need to understand what a procedural macro (proc
macro) actually is, because it's genuinely one of the weirder things in Rust.

### Normal Code vs. Macro Code

When you write normal Rust code, the compiler reads your `.rs` files, parses them
into an AST (Abstract Syntax Tree), type-checks everything, and emits machine code.
Straightforward.

A proc macro is a **Rust program that runs at compile time**. It takes Rust source
code as input (as tokens), does whatever it wants with that input, and produces new
Rust source code as output (also as tokens). The compiler then takes that output and
continues compiling it as if you'd written it by hand.

So when you write:

```rust
#[ring_buffer(5)]
struct MyBuffer(i32);
```

The compiler sees the `#[ring_buffer(5)]` attribute, goes "oh, that's a proc macro",
and hands two things to our macro:

1. **The arguments**: `5` (what's inside the parentheses)
2. **The input**: `struct MyBuffer(i32);` (the item the attribute is attached to)

Our macro then runs, processes these, and returns a completely new chunk of Rust code
that replaces the original. That new code includes the struct (now with different
fields) and a full `impl` block with all the ring buffer methods.

### The Three Crates

Proc macros live in a special kind of crate (`proc-macro = true` in `Cargo.toml`).
This crate depends on three core libraries:

- **`proc_macro`** (std library): Provides the raw `TokenStream` type that the
  compiler hands to your macro. This is the "real" token stream that the compiler
  understands.

- **`syn`**: A parsing library. It takes raw tokens and parses them into a strongly
  typed AST. Instead of dealing with raw tokens like `Ident("struct")`, `Ident("MyBuffer")`,
  `Punct('(')`, etc., `syn` gives you nice Rust types like `DeriveInput` with fields
  like `.ident`, `.data`, `.generics`. It's basically a Rust parser in library form.

- **`quote`**: The inverse of `syn`. Where `syn` goes from tokens to structured data,
  `quote` goes from structured data back to tokens. The `quote!` macro lets you write
  Rust-looking code with interpolation holes (using `#variable_name`) and it produces
  a `TokenStream`.

The flow is: **raw tokens -> syn (parse) -> your logic -> quote (generate) -> raw tokens**.

### Attribute Macro vs. Derive Macro

There are actually multiple kinds of proc macros. The two most common are:

- **Derive macros**: `#[derive(Debug, Clone)]` -- these _add_ code alongside your
  struct. Your struct stays exactly as-is, and the macro adds an `impl Debug for ...`
  block.

- **Attribute macros**: `#[ring_buffer(5)]` -- these _replace_ the item they're
  attached to. The macro gets the entire struct, can modify it however it wants, and
  returns the replacement.

This crate uses an **attribute macro** because we need to _transform_ the struct.
We take a tuple struct `struct MyBuffer(i32)` and replace it with a completely
different named struct with fields like `data`, `head`, `tail`, etc. A derive macro
can't do that -- it can only add code, not modify existing code.

---

## The Big Picture

Here's what happens when someone writes:

```rust
#[ring_buffer(5)]
struct MyBuffer(i32);
```

**Step 1: Parse the arguments**
The `5` gets parsed into a `RingBufferArgs` struct: `{ capacity: 5, mode: Standard, power_of_two: false, cache_padded: false, blocking: false }`.

**Step 2: Parse the input**
The `struct MyBuffer(i32)` gets parsed into `syn::DeriveInput`. This is a structured
representation of the struct -- its name (`MyBuffer`), its visibility (`pub` or not),
its generics (none here), and its fields (one unnamed field of type `i32`).

**Step 3: Extract the element type**
We dig into the parsed struct and pull out `i32` as the element type. This is what
the buffer will store.

**Step 4: Transform the struct**
We replace the tuple struct's single unnamed field with named fields:

```rust
struct MyBuffer {
    data: Vec<i32>,
    capacity: usize,
    head: usize,
    tail: usize,
    size: usize,
}
```

This happens by mutating the `DeriveInput` in place. We literally rip out the old
`Fields::Unnamed` and replace it with `Fields::Named`.

**Step 5: Generate the implementation**
We use `quote!` to generate a full `impl MyBuffer { ... }` block with `new()`,
`enqueue()`, `dequeue()`, and all the other methods. The `#element_type` placeholder
gets replaced with `i32`, `#capacity` with `5`, etc.

**Step 6: Return everything**
We combine the modified struct definition and the generated impl block into one
`TokenStream` and hand it back to the compiler.

The compiler then compiles the output as if you had hand-written a 200+ line struct
with all those methods. The user just sees a 2-line annotation.

---

## Project Structure

```
src/
  lib.rs        -- Entry point. The #[proc_macro_attribute] function lives here.
  parser.rs     -- Parses macro arguments and extracts the element type from the struct.
  error.rs      -- Custom error types that produce nice compile-time error messages.
  generator.rs  -- Generates code for standard (single-threaded) mode.
  spsc.rs       -- Generates code for SPSC (single-producer, single-consumer) mode.
  mpsc.rs       -- Generates code for MPSC (multi-producer, single-consumer) mode.
```

Each mode gets its own file because the generated code is significantly different
between modes. Standard mode uses `Vec<T>` and simple integer indices. SPSC uses
`UnsafeCell<Vec<MaybeUninit<T>>>` and atomic indices. MPSC adds CAS loops and
per-slot `AtomicBool` flags. They share the same parser and error handling, but
the generation logic is mode-specific.

---

## The Entry Point: `src/lib.rs`

```rust
#[proc_macro_attribute]
pub fn ring_buffer(args: TokenStream, input: TokenStream) -> TokenStream {
    let args = parse_macro_input!(args as RingBufferArgs);
    let mut input = parse_macro_input!(input as DeriveInput);

    match expand_ring_buffer(args, &mut input) {
        Ok(tokens) => tokens,
        Err(e) => e.to_compile_error().into(),
    }
}
```

This is the actual macro function. The `#[proc_macro_attribute]` annotation tells the
Rust compiler "this function is an attribute macro". It takes two `TokenStream`
arguments:

- `args`: Everything inside the parentheses, e.g. `5` or `capacity = 1024, mode = "spsc"`.
- `input`: The struct definition the attribute is attached to.

`parse_macro_input!` is a `syn` macro that parses raw tokens into a typed struct. If
parsing fails, it automatically returns a compile error. `RingBufferArgs` is our
custom type (defined in `parser.rs`), and `DeriveInput` is syn's representation of
any item that could have a derive attribute (structs, enums, unions).

The `expand_ring_buffer` function does the real work. It's separated out so we can
use `?` for error handling (the main macro function can't use `?` because its return
type is `TokenStream`, not `Result`).

Inside `expand_ring_buffer`:

```rust
fn expand_ring_buffer(args: RingBufferArgs, input: &mut DeriveInput) -> Result<TokenStream> {
    let element_type = find_element_type(input)?;

    let expanded = match args.mode {
        BufferMode::Standard => { /* ... */ }
        BufferMode::Spsc => { /* ... */ }
        BufferMode::Mpsc => { /* ... */ }
    };

    Ok(expanded.into())
}
```

Each mode follows the same two-step pattern:

1. **`add_*_fields()`** -- Mutates the struct to replace tuple fields with named fields
2. **`generate_*_impl()`** -- Generates the `impl` block with all methods

The `quote! { #input #implementation }` at the end combines both pieces: the modified
struct definition and the generated implementation.

---

## Parsing: `src/parser.rs`

This file handles two things: parsing the macro arguments and extracting the element
type from the struct.

### `RingBufferArgs`

```rust
pub struct RingBufferArgs {
    pub capacity: usize,
    pub mode: BufferMode,
    pub power_of_two: bool,
    pub cache_padded: bool,
    pub blocking: bool,
}
```

This struct holds everything the user configured. The `Parse` trait implementation
handles two syntaxes:

**Simple syntax**: `#[ring_buffer(5)]`

The parser peeks at the first token. If it's a `LitInt` (integer literal), it takes
the simple path: parse the number, set defaults for everything else.

**Named parameter syntax**: `#[ring_buffer(capacity = 1024, mode = "spsc")]`

If the first token is an `Ident` (identifier), it enters a loop that parses
`key = value` pairs separated by commas. Each key is matched against known parameter
names.

The `peek` / `lookahead` pattern is how `syn` does speculative parsing. Instead of
trying to parse and handling errors, you look ahead at what the next token _is_ and
branch on that. This avoids consuming tokens you can't put back.

### `find_element_type`

```rust
pub fn find_element_type(input: &DeriveInput) -> Result<Type> {
    match &input.data {
        Data::Struct(data_struct) => match &data_struct.fields {
            Fields::Unnamed(fields) if fields.unnamed.len() == 1 => {
                Ok(fields.unnamed.first().unwrap().ty.clone())
            }
            // ... error cases
        },
        _ => Err(Error::not_a_struct(input.ident.span())),
    }
}
```

This digs through syn's AST to find the type inside the tuple struct. The path is:

`DeriveInput` -> `Data::Struct` -> `DataStruct.fields` -> `Fields::Unnamed` -> first
field -> `.ty` (the `Type`)

For `struct MyBuffer(i32)`, this returns the `Type` representing `i32`. For
`struct GenericBuffer<T: Clone>(Vec<T>)`, this returns the `Type` representing `Vec<T>`.

The function validates that:
- The item is a struct (not an enum or union)
- The struct uses tuple syntax (not named fields or unit struct)
- There's exactly one field (not zero, not two)

If any of these fail, it returns a descriptive error that shows up as a compile-time
error pointing at the right span (location in the source code).

---

## Error Handling: `src/error.rs`

```rust
pub enum Error {
    NotAStruct(Span),
    NotTupleStruct(Span),
    InvalidTupleStruct(Span),
    Syn(SynError),
}
```

Each variant carries a `Span`, which is a location in the user's source code. When
we convert an error to a compile error via `to_compile_error()`, the error message
points at the right line and column in the user's code, not somewhere in our macro
internals.

For example, if someone writes:

```rust
#[ring_buffer(5)]
enum NotAStruct { A, B }
```

They'll see: `error: ring_buffer can only be applied to structs` pointing at the
`enum` keyword. The span makes this possible.

The `From<SynError>` impl lets us use `?` with syn's parsing functions, which return
`syn::Error`. Those automatically get wrapped in our `Error::Syn` variant.

---

## Standard Mode: `src/generator.rs`

This generates the simplest version of the ring buffer -- single-threaded, no atomics,
no unsafe.

### `add_fields`

This function transforms:

```rust
struct MyBuffer(i32);
```

into:

```rust
struct MyBuffer {
    data: Vec<i32>,
    capacity: usize,
    head: usize,
    tail: usize,
    size: usize,
}
```

It does this by:

1. Creating new `syn::Field` values using `syn::parse_quote!`. This macro lets you
   write Rust syntax that gets parsed into syn types. So `syn::parse_quote! { data: Vec<#element_type> }`
   creates a field named `data` of type `Vec<i32>` (with `#element_type` interpolated).

2. Building a `FieldsNamed` struct and pushing the fields into it.

3. Replacing `data_struct.fields` (which was `Fields::Unnamed`) with `Fields::Named`.

If `cache_padded = true`, the `head` and `tail` fields use a generated cache-padded
wrapper type instead of plain `usize`. This type is `#[repr(C, align(64))]` which
forces 64-byte alignment (a typical cache line size). This prevents false sharing
when head and tail are accessed from different threads (more relevant in the
concurrent modes, but available here too for consistency).

### `generate_impl`

This is where the bulk of the generated code lives. The function builds an entire
`impl` block using `quote!`.

Key things happening:

**Index calculation**: The `next_head` and `next_tail` expressions compute the next
index with wraparound. Normally this is `(index + 1) % capacity`. If `power_of_two`
is enabled, it's `(index + 1) & mask` where `mask = capacity - 1`. The bitwise AND
is faster than modulo on most CPUs because modulo requires a division instruction.

**`enqueue`**: Checks if full, writes to the tail slot, advances tail with wraparound,
increments size. If the backing `Vec` hasn't been filled to this index yet, it uses
`push()`; otherwise it overwrites with direct indexing. This is because the `Vec`
starts empty and grows lazily.

**`dequeue`**: Checks if empty, clones the item at head (this is why standard mode
requires `Clone`), advances head with wraparound, decrements size. The `where T: Clone`
bound is placed on the method, not the struct, so you can create a buffer of
non-Clone types -- you just can't dequeue from it (which is enforced at compile time).

**`drain`**: This creates a separate struct (`MyBufferDrain`) that acts as an iterator.
It holds a raw pointer to the buffer and calls `dequeue()` on each `next()`. The
`Drop` impl ensures remaining items are drained even if the iterator is dropped early.
The raw pointer is necessary because you can't hold a `&mut` reference to the buffer
while also having the drain struct (which borrows the buffer) -- Rust's borrow checker
doesn't allow it. This is a common pattern in std (see `Vec::drain()`).

**Visibility propagation**: The `#vis` interpolation ensures generated methods and
types inherit the same visibility as the original struct. If you write
`pub struct MyBuffer(i32)`, the generated `new()`, `enqueue()`, etc. are all `pub` too.

**Generics propagation**: `#impl_generics`, `#ty_generics`, and `#where_clause` are
produced by `generics.split_for_impl()`. This is syn's standard way of handling
generics in generated code. For a struct like `GenericBuffer<T: Clone>`, these expand
to:

- `impl_generics`: `<T: Clone>` (goes after `impl`)
- `ty_generics`: `<T>` (goes after the struct name)
- `where_clause`: empty here (but would hold additional `where` bounds)

---

## SPSC Mode: `src/spsc.rs`

SPSC (Single-Producer, Single-Consumer) mode generates a lock-free ring buffer that
can be safely used from two threads: one writing, one reading.

### How It's Different From Standard Mode

The fundamental difference is **no locks, no mutexes**. Standard mode uses `&mut self`
for enqueue/dequeue, which means only one thread can touch the buffer at a time. SPSC
mode uses shared references (`&self`) and atomic operations to allow concurrent access.

The generated struct looks like:

```rust
struct SpscBuffer {
    data: UnsafeCell<Vec<MaybeUninit<i32>>>,
    head: AtomicUsize,        // only the consumer modifies this
    tail: AtomicUsize,        // only the producer modifies this
    _marker: PhantomData<i32>,
}
```

Key differences from standard mode:

- **`UnsafeCell`**: This is Rust's escape hatch for interior mutability. Normally,
  you can't mutate data through a `&` reference. `UnsafeCell` says "I know what I'm
  doing, let me mutate through a shared reference." This is safe here because the
  producer only writes to slots the consumer isn't reading, and vice versa.

- **`MaybeUninit<T>`**: Instead of cloning items out like standard mode, SPSC mode
  _moves_ items. A slot contains `MaybeUninit<T>` -- it might have a valid value or
  it might be garbage. When the producer writes, it puts a real value in with
  `MaybeUninit::new()`. When the consumer reads, it takes the value out with
  `assume_init_read()`. No clone needed, which means `T` doesn't need `Clone` --
  it just needs `Send` (can be transferred between threads).

- **`AtomicUsize`**: The head and tail indices are atomic, meaning they can be read
  and written from different threads without data races. The specific memory orderings
  used (Acquire/Release) ensure that when the consumer sees a new tail value, all the
  data the producer wrote to that slot is actually visible.

- **`PhantomData<T>`**: The struct doesn't directly own a `T` (it owns
  `UnsafeCell<Vec<MaybeUninit<T>>>`), so the compiler doesn't know it logically owns
  `T` values. `PhantomData` tells the compiler "pretend this struct owns `T`" so that
  the drop checker and variance rules work correctly.

### The Split Pattern

Instead of directly calling `enqueue`/`dequeue` on the buffer, SPSC mode uses a
split pattern:

```rust
let buf = SpscBuffer::new();
let (producer, consumer) = buf.split();
```

`split()` returns two lightweight handle structs: `SpscBufferProducer` and
`SpscBufferConsumer`. Each holds a `&SpscBuffer` reference. The producer can only
enqueue; the consumer can only dequeue. This separation is enforced at the type level.

Why not just have `enqueue` and `dequeue` on the buffer directly? Because then
there's nothing stopping you from calling both from the same thread, or having two
threads both enqueue. The split pattern makes it structurally impossible to misuse --
you physically can't call `try_dequeue` on a producer.

### The `unsafe impl Send/Sync` Block

```rust
unsafe impl Send for SpscBuffer where i32: Send {}
unsafe impl Sync for SpscBuffer where i32: Send {}
unsafe impl<'a> Send for SpscBufferProducer<'a> where i32: Send {}
unsafe impl<'a> Send for SpscBufferConsumer<'a> where i32: Send {}
```

The compiler can't automatically verify that our `UnsafeCell`-based concurrent access
is safe, so we manually promise it with `unsafe impl`. The `where T: Send` bound
ensures we only do this for types that are safe to transfer between threads (which
excludes things like `Rc<T>`).

`Sync` on the buffer means it can be shared between threads via `&` references (which
is necessary for `Arc<SpscBuffer>` to work). `Send` on the handles means they can be
moved to other threads.

### Blocking Mode

When `blocking = true`, three extra fields are added:

```rust
mutex: Mutex<()>,
not_empty: Condvar,
not_full: Condvar,
```

The `Mutex` doesn't actually protect any data (it guards `()`, nothing). It exists
purely because `Condvar::wait()` requires a `MutexGuard`. The condition variables
provide efficient waiting:

- `enqueue_blocking()`: If the buffer is full, the producer locks the mutex, calls
  `not_full.wait()`, and goes to sleep. The OS wakes it up when a consumer calls
  `not_full.notify_one()` after dequeueing an item.

- `dequeue_blocking()`: Same idea in reverse. If the buffer is empty, the consumer
  waits on `not_empty` until a producer notifies it.

This is much better than busy-spinning (which wastes CPU) but has higher latency
than pure lock-free (because waking a thread involves a syscall).

---

## MPSC Mode: `src/mpsc.rs`

MPSC (Multi-Producer, Single-Consumer) adds support for multiple threads enqueuing
concurrently. This is significantly more complex than SPSC.

### The Problem

In SPSC, only one thread writes the tail index, so a simple atomic store is enough.
In MPSC, multiple threads are competing to write to the next slot. If Thread A and
Thread B both read `tail = 5` and both try to write to slot 5, you get a data race.

### The Solution: CAS (Compare-And-Swap)

The core of MPSC is a CAS loop in `try_enqueue`:

```rust
loop {
    let tail = self.buffer.tail.load(Ordering::Relaxed);
    let next_tail = (tail + 1) % capacity;
    let head = self.buffer.head.load(Ordering::Acquire);

    if next_tail == head { return Err(item); }  // Full

    match self.buffer.tail.compare_exchange_weak(tail, next_tail, ...) {
        Ok(_) => {
            // We won! Write our data to slot `tail`
            data[tail] = MaybeUninit::new(item);
            written[tail].store(true, Ordering::Release);
            return Ok(());
        }
        Err(_) => continue,  // Someone else got there first, retry
    }
}
```

`compare_exchange_weak` is the atomic CAS operation. It says: "If `tail` is still
the value I read earlier, atomically change it to `next_tail`. If someone else already
changed it, fail and tell me the new value."

So if Thread A and Thread B both read `tail = 5`:
- Thread A does CAS(5 -> 6) -- succeeds, writes to slot 5
- Thread B does CAS(5 -> 6) -- fails (tail is now 6), loops back
- Thread B reads `tail = 6`, does CAS(6 -> 7) -- succeeds, writes to slot 6

This guarantees that every producer gets a unique slot to write to.

### The Written Flags

There's a subtle problem: after a producer wins the CAS, there's a gap between
"I claimed slot 5" and "I finished writing data to slot 5". If the consumer sees
`tail = 6` and tries to read slot 5, the producer might not have finished writing yet.

The `written` array (`Box<[AtomicBool]>`) solves this. Each slot has a boolean flag:

1. Producer claims slot via CAS on tail
2. Producer writes data to the slot
3. Producer sets `written[slot] = true` (with Release ordering)
4. Consumer checks `written[slot]` before reading (with Acquire ordering)
5. Consumer reads data
6. Consumer sets `written[slot] = false`

This ensures the consumer never reads partially-written data.

### Producer vs Consumer Handles

Unlike SPSC's `split()` which returns both handles at once, MPSC has separate methods:

```rust
let producer = buffer.producer();  // Can call this multiple times / clone
let consumer = buffer.consumer();  // Should only have one
```

The producer handle is `#[derive(Clone)]` because you need multiple producers. The
consumer handle is not clonable because the protocol only supports one consumer (the
consumer doesn't use CAS for the head pointer, so two consumers would race).

Note: there's no runtime enforcement of "only one consumer". It's documented but
not prevented at the type level. If you call `consumer()` twice and use both from
different threads, you'll get data races. This is a known limitation.

---

## The Macro Expansion Lifecycle

To make this concrete, here's exactly what `#[ring_buffer(5)] struct Buf(i32);`
expands to in standard mode (simplified):

```rust
// The struct (transformed from tuple to named):
struct Buf {
    data: Vec<i32>,
    capacity: usize,
    head: usize,
    tail: usize,
    size: usize,
}

// The generated impl:
impl Buf {
    fn new() -> Self {
        Self {
            data: Vec::with_capacity(5),
            capacity: 5,
            head: 0,
            tail: 0,
            size: 0,
        }
    }

    fn enqueue(&mut self, item: i32) -> Result<(), i32> {
        if self.is_full() { return Err(item); }
        let tail = self.tail;
        if self.data.len() <= tail {
            self.data.push(item);
        } else {
            self.data[tail] = item;
        }
        self.tail = (self.tail + 1) % self.capacity;
        self.size += 1;
        Ok(())
    }

    fn dequeue(&mut self) -> Option<i32> where i32: Clone {
        if self.is_empty() { return None; }
        let head = self.head;
        let item = self.data[head].clone();
        self.head = (self.head + 1) % self.capacity;
        self.size -= 1;
        Some(item)
    }

    fn peek(&self) -> Option<&i32> { /* ... */ }
    fn peek_mut(&mut self) -> Option<&mut i32> { /* ... */ }
    fn peek_back(&self) -> Option<&i32> { /* ... */ }
    fn is_full(&self) -> bool { self.size == self.capacity }
    fn is_empty(&self) -> bool { self.size == 0 }
    fn len(&self) -> usize { self.size }
    fn capacity(&self) -> usize { self.capacity }
    fn clear(&mut self) { /* ... */ }
    fn iter(&self) -> impl Iterator<Item = &i32> { /* ... */ }
    fn drain(&mut self) -> BufDrain { /* ... */ }
}

struct BufDrain { buffer: *mut Buf }
impl Iterator for BufDrain { /* ... */ }
impl Drop for BufDrain { /* ... */ }
```

You write 2 lines. The macro generates ~100 lines. That's the point.

---

## Memory Ordering (the hard part)

If you're not familiar with atomics, this section will hurt. But it's the most
important part of understanding the concurrent modes.

### Why Memory Ordering Matters

CPUs don't execute instructions in order. They reorder reads and writes for
performance. On a single thread, this is invisible -- the CPU guarantees the
_illusion_ of sequential execution. But across threads, reordering is visible
and causes bugs.

Example without ordering guarantees:

```
Thread A (producer):              Thread B (consumer):
  data[5] = 42;                     if tail == 6 {
  tail = 6;                           x = data[5];  // might see garbage!
                                    }
```

The CPU might reorder Thread A's operations, writing `tail = 6` _before_ `data[5] = 42`.
Thread B sees the new tail, reads slot 5, and gets garbage.

### Acquire and Release

This crate primarily uses **Acquire** and **Release** ordering:

- **Release** (on stores): "All writes I did before this store are visible to anyone
  who does an Acquire load of this value."

- **Acquire** (on loads): "I can see all writes that happened before the Release store
  that wrote the value I just read."

Together, they form a _happens-before_ relationship. In the SPSC code:

```rust
// Producer:
data[tail] = MaybeUninit::new(item);           // Write data
self.buffer.tail.store(next_tail, Release);     // Release: data is visible

// Consumer:
let tail = self.buffer.tail.load(Acquire);      // Acquire: sees the data
let item = data[head].assume_init_read();        // Read data (safe now)
```

The Release store on tail _synchronizes with_ the Acquire load on tail. Everything
the producer wrote before the Release (including the data) is guaranteed to be visible
to the consumer after the Acquire.

### Relaxed

**Relaxed** ordering means "just do the atomic operation, no ordering guarantees."
It's the cheapest ordering. We use it when:

- A thread is reading its own index (the producer reads `tail` with Relaxed because
  it's the only writer of `tail`)
- Ordering doesn't matter for correctness (e.g., approximate `len()` calculations)

### AcqRel (in MPSC)

`compare_exchange_weak` in the MPSC producer uses `AcqRel` (Acquire + Release
combined). This is because the CAS both reads the old tail value (needs Acquire to see
the previous producer's data) and writes the new tail value (needs Release so the
consumer can see our data).

---

## Key Design Decisions

### Why a tuple struct as input?

The macro requires `struct Buffer(i32)`, not `struct Buffer { element_type: i32 }`.
This is purely ergonomic. The tuple struct syntax `(T)` is a concise way to specify
"this buffer holds T". The single field is immediately extracted and thrown away -- the
actual struct gets completely different fields. The tuple struct is just a vehicle for
specifying the element type and the struct name.

### Why `Vec<T>` in standard mode instead of an array?

Arrays in Rust need a const generic size: `[T; N]`. While this would work fine with
our known-at-compile-time capacity, `Vec` is simpler to generate and avoids
complications with uninitialized memory. The `Vec` is pre-allocated to full capacity
on creation (`Vec::with_capacity(N)`) so there's no reallocation. The trade-off is
one heap allocation instead of inline storage, but for a runtime-created buffer
this is fine.

### Why `MaybeUninit<T>` in concurrent modes?

Standard mode uses `Vec<T>` and clones items on dequeue. This requires `T: Clone`.
Concurrent modes use `MaybeUninit<T>` and _move_ items (write with
`MaybeUninit::new()`, read with `assume_init_read()`). This drops the `Clone`
requirement to just `Send`, which is important for performance-critical concurrent
code where cloning in a hot loop is unacceptable.

### Why `UnsafeCell` instead of `Mutex`?

The whole point of lock-free data structures is to avoid mutex overhead. A `Mutex`
would make the implementation trivially safe but would serialize all access through
a single lock. `UnsafeCell` lets us do concurrent access without locks, at the cost
of us being responsible for correctness (via atomic orderings and careful protocol
design).

### Why no runtime enforcement of "one consumer" in MPSC?

The split pattern in SPSC structurally prevents misuse (you get one producer, one
consumer, that's it). MPSC doesn't do this for the consumer -- you can call
`consumer()` twice. Adding runtime enforcement (e.g., an `AtomicBool` flag) would
add overhead to a path that's supposed to be zero-cost. The choice was to document
the constraint rather than enforce it.

### Why not use `crossbeam` or `std::sync::mpsc`?

Because the point of this crate is compile-time code generation with zero dependencies
(beyond the proc-macro tooling). The generated code is self-contained -- it doesn't
pull in any runtime library. This makes it suitable for environments where you want
minimal dependency trees or need to understand exactly what code is running.

---

That's the full picture. The crate is fundamentally a code generator: it takes a
two-line struct annotation and emits a complete, specialized ring buffer implementation.
The proc macro machinery (`syn`/`quote`) is the plumbing; the actual ring buffer
algorithms (standard FIFO, lock-free SPSC, CAS-based MPSC) are the substance.