rawgrep
Grep at the speed of raw disk - search text by reading data directly from raw block devices.
Benchmarks
benchmark script: bench.sh
corpus: 674,283 files across mixed C/C++/Rust/Python projects
pattern: `TODO` (literal)
system: Intel i5-13400F, 16 threads, NVMe SSD, 16GB RAM (10GB free), Debian 6.12
rawgrep 0.1.4 vs ripgrep 15.1.0
| scenario | rawgrep | ripgrep | speedup |
|---|---|---|---|
| cold cache + fragment cache | 1.26s ± 0.02s | 11.08s ± 0.50s | 8.8x |
| cold cache, no fragment cache | 6.24s ± 0.08s | 11.03s ± 0.97s | 1.8x |
| warm cache + fragment cache | 173ms ± 9ms | 436ms ± 45ms | 2.5x |
| warm cache, no fragment cache | 389ms ± 9ms | 454ms ± 73ms | 1.2x |
fragment cache stores per-file search metadata to skip unchanged files on repeat searches.
Correctness notes
rawgrep and ripgrep differ in which files they search by design:
| count | |
|---|---|
| files matched by ripgrep only | 60 |
| files matched by rawgrep only | 257 |
files ripgrep found that rawgrep missed: mostly .github/ yaml files, python venv files,
and large test data files. these are gitignore/binary detection policy differences rather than missed matches.
files rawgrep found that ripgrep missed: .recording files and other files ripgrep
treats as binary. rawgrep searches these by default.
no text file that ripgrep searched was missed by rawgrep.
How is rawgrep so fast?
rawgrepreads files DIRECTLY from your partition, completely bypassing the filesystem.rawgrepis cache-friendly and insanely memory efficient, simply streaming through your device and outputting the matches.rawgrepuses work-stealing parallel traversal to keep all CPU cores busy during directory scanning.rawgrepuses a sophisticated fragment-based caching system (inspired by nowgrep) that learns which files can be skipped for repeated searches.
Installation
Prerequisites
- Linux (contribute to make rawgrep support Windows) system with ext4/ntfs filesystem
- Rust toolchain (for building from source)
- Root access or be able to set capabilities
Option 1: One-Time Setup with Capabilities (Recommended)
# If you want maximum speed possible (requires nightly):
# cargo +nightly build --profile=release-fast --target=<your_target> --features=use_nightly
# Run the one-time setup command. Why? Read "Why Elevated Permissions?" section
Now you can run it without sudo:
Option 2: Use sudo Every Time
If you prefer not to use capabilities, just build and run with sudo:
# Again, if you want maximum speed possible (requires nightly):
# cargo +nightly build --profile=release-fast --target=<your_target> --features=use_nightly
# Run with sudo each time
Usage
Basic Search
# Search current directory
# Search specific directory
# Regex patterns
Advanced Options
# Specify device manually (auto-detected by default)
# Print statistics at the end of the search
# Disable filtering (search everything)
# or
# Disable specific filters
Filtering Levels
# Default: respects .gitignore, skips binaries and large files (> 5 MB)
# -u: ignore .gitignore
# -uu: also search binary files
# -uuu: search everything, including large files
Why Elevated Permissions?
rawgrep reads raw block devices (e.g., /dev/sda1), which are protected by the OS. Instead of requiring full root access via sudo every time, we use Linux capabilities to grant only the specific permission needed.
What is CAP_DAC_READ_SEARCH?
This capability grants exactly one permission: bypass file read permission checks.
rawgrep only reads data, it never writes anything to disk.
Verifying Capabilities
You can verify what capabilities the binary has:
# Output: ./target/release-fast/rawgrep = cap_dac_read_search+eip
Removing Capabilities
If you want to revoke the capability and go back to using sudo:
Limitations (IMPORTANT)
- ext4/ntfs only: Currently only supports ext4/ntfs filesystems.
Development
Note: Capabilities are tied to the binary file itself, so you'll need to re-run setcap after each rebuild.
Why no automation script? I intentionally decide not to provide a script that runs
sudocommands. If you want automation, write your own script, it's just a few lines of bash code and you'll understand exactly what it does.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Roadmap
- Support for Windows. (Some physical partition stuff needs to get fixed on Windows, besides that everything should be already working)
- Support for OSX+APFS.
- Symlink support
Emacs Integration
To use rawgrep from Emacs with jumpable locations:
- Download
emacs/rawgrep.eland place it in your load path - Add to your
.emacsorinit.el:
(require 'rawgrep)
(global-set-key (kbd "M-e") 'rawgrep)
Or if you use use-package:
(use-package rawgrep
:load-path "path/to/rawgrep.el"
:bind ("M-e" . rawgrep))
Works exactly like 'grep-find but better.
FAQ
Q: Is this safe to use?
A: Yes. The tool only reads data and never writes. The CAP_DAC_READ_SEARCH capability is narrowly scoped.
Q: Is rawgrep faster than ripgrep? A: Yeah.
Q: Why am I missing some matches?
A: By default, rawgrep respects .gitignore and skips binary/large files. Use -u to ignore .gitignore, -uu to also search binaries, or -uuu to search everything. This matches ripgrep's behavior.
Q: Can I use this on other filesystems? A: Currently only ext4/ntfs is supported. Support for other filesystems may be added in the future. (Motivate me with stars)
Q: Will this damage my filesystem? A: No. The tool only performs read operations. It cannot modify your filesystem.
Q: What if partition auto-detection fails?
A: Specify the device manually with --device=/dev/sdXY. Use df -Th to find your partition.
Acknowledgments
Inspired by ripgrep and nowgrep, and the need for high-quality software in the big 25.