getattrlistbulk
Safe Rust bindings for the macOS getattrlistbulk() system call. Enumerate directories and retrieve file metadata in bulk with minimal syscalls.
Why?
Traditional directory reading requires N+1 syscalls for N files:
opendir() → readdir() × N → stat() × N → closedir()
getattrlistbulk() retrieves entries AND metadata together:
open() → getattrlistbulk() × ceil(N/batch) → close()
For a directory with 10,000 files, this means ~10 syscalls instead of ~20,000.
Requirements
- macOS 10.10+ (Yosemite or later)
- Rust 1.70+
This crate only compiles on macOS. On other platforms, it will fail to compile with a clear error message.
Installation
[]
= "0.1"
Usage
Basic Example
use ;
Get All Available Metadata
use ;
let attrs = RequestedAttributes ;
for entry in read_dir?
Custom Buffer Size
Larger buffers mean fewer syscalls for large directories:
use ;
let attrs = default.with_name.with_size;
// 256KB buffer for very large directories
let entries = read_dir_with_buffer?;
Using the Builder
use DirReader;
let entries = new
.name
.size
.object_type
.buffer_size
.follow_symlinks
.read?;
Performance
Syscall Comparison
The performance advantage comes entirely from reducing syscalls:
| Approach | Syscall Pattern | Complexity | 10K Files |
|---|---|---|---|
| Traditional POSIX | readdir() + stat() per file |
O(n) |
~20,000 syscalls |
| Swift FileManager | Uses POSIX internally | O(n) |
~10,000-20,000 syscalls |
| getattrlistbulk | Bulk metadata per call | O(n/batch) |
~12 syscalls |
Why This Matters
Traditional: opendir() → [readdir() + stat()] × N → closedir()
This crate: open() → getattrlistbulk() × ceil(N/800) → close()
Each syscall requires a user→kernel context switch. Reducing syscalls by ~1,600x eliminates this overhead.
Benchmarks (10,000 files)
Run the included benchmark yourself:
Example output (Apple Silicon SSD):
| Method | Avg Time | Speedup |
|---|---|---|
std::fs::read_dir + metadata() |
~19ms | 1.0x |
getattrlistbulk |
~5ms | ~4x faster |
Syscall reduction: ~1,600x fewer (from ~20,000 to ~12)
Why "only" 4x faster with 1,600x fewer syscalls?
On fast NVMe SSDs, the kernel's VFS cache handles most metadata requests in-memory. The syscall overhead (~1μs each) becomes the bottleneck only partially.
Expected speedups by storage type:
| Storage | Speedup | Why |
|---|---|---|
| NVMe SSD (cached) | ~4x | VFS cache masks I/O, syscall overhead partial |
| SATA SSD | ~5-8x | More I/O latency exposed |
| HDD | ~10-20x | Seek time dominates, batching helps significantly |
| Network (NFS/SMB) | ~20-50x | Round-trip latency makes batching critical |
Swift Comparison
Swift's FileManager does NOT use getattrlistbulk internally—it wraps POSIX calls:
// Swift - still O(n) syscalls under the hood
let contents = try FileManager.default.contentsOfDirectory(
at: url,
includingPropertiesForKeys: [.fileSizeKey, .isDirectoryKey]
)
Swift can call getattrlistbulk via C interop, but Apple's high-level frameworks don't. This crate provides the optimized path that Apple's own tools use internally (Finder, ls, etc.).
Comparison with Alternatives
| Crate | Bulk Metadata | macOS Optimized | Cross-Platform |
|---|---|---|---|
std::fs |
No | No | Yes |
walkdir |
No | No | Yes |
jwalk |
No | No | Yes |
getattrlistbulk |
Yes | Yes | No |
Use this crate when:
- You're targeting macOS only
- You need to read large directories quickly
- You need metadata along with filenames
Use std::fs or walkdir when:
- You need cross-platform support
- You're reading small directories
- You don't need metadata
Error Handling
use ;
match read_dir
Safety
This crate uses unsafe internally to call the C system call, but exposes a fully safe public API. All buffer parsing is bounds-checked, and file descriptors are properly managed.
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Contributing
Contributions welcome! Please read the SPECIFICATION.md for implementation details and requirements.