memx 0.2.0

memory functions like libc memcmp(), memchr(), memmem(), memcpy(), memset()
Documentation
# Design of the memx Library

## 1. Overview

The `memx` library is a high-performance, `no_std` compatible Rust library for memory manipulation. It provides a set of functions that mimic the standard C library's memory functions, but with a focus on safety, performance, and ease of use in a Rust environment. The library is designed to be a drop-in replacement for basic memory operations, with significant performance improvements on supported architectures.

## 2. Architecture

The library is structured into three main modules:

- **`mem`**: Contains the core, platform-agnostic implementations of the memory functions. These implementations are written in safe Rust and serve as a fallback for architectures that do not have specialized, optimized versions.
- **`arch`**: Provides architecture-specific, highly optimized implementations of the memory functions. Currently, it includes optimizations for `x86` and `x86_64` using SSE2 and AVX2 intrinsics.
- **`iter`**: Contains iterator-based versions of the search functions, allowing for more idiomatic and flexible usage in Rust.

The top-level `lib.rs` file serves as the main entry point, dispatching calls to the appropriate implementation (either `arch` or `mem`) based on the target architecture and available CPU features.

## 3. Core Components

### 3.1. Search Functions

The search functions are the heart of the library, providing a rich set of tools for finding bytes and sub-sequences within a byte slice. The design of these functions follows a consistent pattern:

- **Single-byte search (`memchr`, `memrchr`)**: These functions are the simplest, finding the first or last occurrence of a single byte.
- **Multi-byte search (`memchr_dbl`, `memchr_tpl`, `memchr_qpl`)**: These functions extend the single-byte search to a set of 2, 3, or 4 bytes. They are implemented using a combination of bitwise operations and SIMD instructions for efficient searching.
- **Negated search (`memnechr`, `memrnechr`)**: These functions find the first or last byte that is *not* in a given set. They are implemented by inverting the logic of the corresponding `memchr` functions.
- **Sub-slice search (`memmem`, `memrmem`)**: These functions find the first or last occurrence of a sub-slice (needle) within a larger byte slice (haystack). The implementation uses a stochastic naive algorithm, which is optimized for performance by selecting the most efficient search direction based on the statistical properties of the needle's first and last bytes.

### 3.2. Comparison Functions

- **`memcmp`**: This function performs a lexicographical comparison of two byte slices. The implementation is optimized to compare bytes in chunks (e.g., 8 or 16 bytes at a time) for improved performance.
- **`memeq`**: This function checks for equality between two byte slices. It is a specialized version of `memcmp` that returns a boolean value.

### 3.3. Manipulation Functions

- **`memcpy`**: This function copies bytes from a source to a destination slice. The implementation is optimized for performance by copying data in large chunks and ensuring proper memory alignment.
- **`memset`**: This function fills a byte slice with a given byte. Like `memcpy`, it is optimized to operate on large chunks of memory at a time.

### 3.4. Iterators

All search functions have a corresponding iterator version (e.g., `memchr_iter`). These iterators provide a more idiomatic Rust interface for working with search results, allowing for easy chaining with other iterator methods.

## 4. Performance Optimizations

The key to `memx`'s performance is its use of architecture-specific optimizations. The library uses a dynamic dispatch mechanism to select the most efficient implementation at runtime based on the available CPU features.

- **x86/x86-64 Optimizations**: On x86 and x86-64 architectures, the library uses SSE2 and AVX2 intrinsics to perform operations on multiple bytes simultaneously. This results in a significant performance improvement over the naive, byte-by-byte approach.
- **Fallback Mechanism**: For architectures that do not have specialized implementations, the library falls back to the safe and reliable Rust implementations in the `mem` module. This ensures that the library is portable and can be used on a wide range of platforms.

## 5. `no_std` Compatibility

The library is designed to be fully compatible with `no_std` environments. This is achieved by avoiding any dependency on the standard library and using only the `core` library for basic functionalities.

## 6. Error Handling

The only function that can fail is `memcpy`, which returns a `RangeError` if the destination slice is smaller than the source slice. All other functions are guaranteed to be safe and will not panic.