getattrlistbulk 0.1.0

Safe Rust bindings for macOS getattrlistbulk() system call for high-performance directory enumeration
Documentation
# getattrlistbulk

[![Crates.io](https://img.shields.io/crates/v/getattrlistbulk.svg)](https://crates.io/crates/getattrlistbulk)
[![Documentation](https://docs.rs/getattrlistbulk/badge.svg)](https://docs.rs/getattrlistbulk)
[![License](https://img.shields.io/crates/l/getattrlistbulk.svg)](LICENSE)

Safe Rust bindings for the macOS `getattrlistbulk()` system call. Enumerate directories and retrieve file metadata in bulk with minimal syscalls.

## Why?

Traditional directory reading requires N+1 syscalls for N files:

```
opendir() → readdir() × N → stat() × N → closedir()
```

`getattrlistbulk()` retrieves entries AND metadata together:

```
open() → getattrlistbulk() × ceil(N/batch) → close()
```

For a directory with 10,000 files, this means ~10 syscalls instead of ~20,000.

## Requirements

- **macOS 10.10+** (Yosemite or later)
- **Rust 1.70+**

This crate only compiles on macOS. On other platforms, it will fail to compile with a clear error message.

## Installation

```toml
[dependencies]
getattrlistbulk = "0.1"
```

## Usage

### Basic Example

```rust
use getattrlistbulk::{read_dir, RequestedAttributes};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let attrs = RequestedAttributes {
        name: true,
        size: true,
        object_type: true,
        ..Default::default()
    };

    for entry in read_dir("/Users/me/Documents", attrs)? {
        let entry = entry?;
        println!("{}: {} bytes", entry.name, entry.size.unwrap_or(0));
    }

    Ok(())
}
```

### Get All Available Metadata

```rust
use getattrlistbulk::{read_dir, RequestedAttributes};

let attrs = RequestedAttributes {
    name: true,
    object_type: true,
    size: true,
    alloc_size: true,
    modified_time: true,
    permissions: true,
    inode: true,
    entry_count: true,  // for directories
};

for entry in read_dir("/path/to/dir", attrs)? {
    let entry = entry?;

    if let Some(modified) = entry.modified_time {
        println!("{} last modified: {:?}", entry.name, modified);
    }
}
```

### Custom Buffer Size

Larger buffers mean fewer syscalls for large directories:

```rust
use getattrlistbulk::{read_dir_with_buffer, RequestedAttributes};

let attrs = RequestedAttributes::default().with_name().with_size();

// 256KB buffer for very large directories
let entries = read_dir_with_buffer("/big/directory", attrs, 256 * 1024)?;
```

### Using the Builder

```rust
use getattrlistbulk::DirReader;

let entries = DirReader::new("/path/to/dir")
    .name()
    .size()
    .object_type()
    .buffer_size(128 * 1024)
    .follow_symlinks(false)
    .read()?;
```

## Performance

### Syscall Comparison

The performance advantage comes entirely from reducing syscalls:

| Approach | Syscall Pattern | Complexity | 10K Files |
|----------|-----------------|------------|-----------|
| **Traditional POSIX** | `readdir()` + `stat()` per file | `O(n)` | ~20,000 syscalls |
| **Swift FileManager** | Uses POSIX internally | `O(n)` | ~10,000-20,000 syscalls |
| **getattrlistbulk** | Bulk metadata per call | `O(n/batch)` | **~12 syscalls** |

### Why This Matters

```
Traditional:  opendir() → [readdir() + stat()] × N → closedir()
This crate:   open() → getattrlistbulk() × ceil(N/800) → close()
```

Each syscall requires a user→kernel context switch. Reducing syscalls by **~1,600x** eliminates this overhead.

### Benchmarks (10,000 files)

Run the included benchmark yourself:

```bash
cargo bench --bench compare
```

Example output (Apple Silicon SSD):

| Method | Avg Time | Speedup |
|--------|----------|---------|
| `std::fs::read_dir` + `metadata()` | ~19ms | 1.0x |
| **`getattrlistbulk`** | **~5ms** | **~4x faster** |

**Syscall reduction: ~1,600x fewer** (from ~20,000 to ~12)

### Why "only" 4x faster with 1,600x fewer syscalls?

On fast NVMe SSDs, the kernel's VFS cache handles most metadata requests in-memory. The syscall overhead (~1μs each) becomes the bottleneck only partially.

Expected speedups by storage type:

| Storage | Speedup | Why |
|---------|---------|-----|
| NVMe SSD (cached) | ~4x | VFS cache masks I/O, syscall overhead partial |
| SATA SSD | ~5-8x | More I/O latency exposed |
| HDD | ~10-20x | Seek time dominates, batching helps significantly |
| Network (NFS/SMB) | ~20-50x | Round-trip latency makes batching critical |

### Swift Comparison

Swift's `FileManager` does NOT use `getattrlistbulk` internally—it wraps POSIX calls:

```swift
// Swift - still O(n) syscalls under the hood
let contents = try FileManager.default.contentsOfDirectory(
    at: url,
    includingPropertiesForKeys: [.fileSizeKey, .isDirectoryKey]
)
```

Swift *can* call `getattrlistbulk` via C interop, but Apple's high-level frameworks don't. This crate provides the optimized path that Apple's own tools use internally (Finder, `ls`, etc.).

## Comparison with Alternatives

| Crate | Bulk Metadata | macOS Optimized | Cross-Platform |
|-------|---------------|-----------------|----------------|
| `std::fs` | No | No | Yes |
| `walkdir` | No | No | Yes |
| `jwalk` | No | No | Yes |
| **`getattrlistbulk`** | **Yes** | **Yes** | No |

Use this crate when:
- You're targeting macOS only
- You need to read large directories quickly
- You need metadata along with filenames

Use `std::fs` or `walkdir` when:
- You need cross-platform support
- You're reading small directories
- You don't need metadata

## Error Handling

```rust
use getattrlistbulk::{read_dir, RequestedAttributes, Error};

match read_dir("/some/path", RequestedAttributes::default()) {
    Ok(entries) => { /* ... */ }
    Err(Error::Open(e)) => eprintln!("Failed to open directory: {}", e),
    Err(Error::Syscall(e)) => eprintln!("System call failed: {}", e),
    Err(Error::Parse(msg)) => eprintln!("Buffer parsing error: {}", msg),
    Err(Error::NotSupported) => eprintln!("Not running on macOS"),
}
```

## Safety

This crate uses `unsafe` internally to call the C system call, but exposes a fully safe public API. All buffer parsing is bounds-checked, and file descriptors are properly managed.

## License

Licensed under either of:

- Apache License, Version 2.0 ([LICENSE-APACHE]LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license ([LICENSE-MIT]LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

## Contributing

Contributions welcome! Please read the [SPECIFICATION.md](SPECIFICATION.md) for implementation details and requirements.