fdf - High-Performance POSIX File Finder
An Experimental alternative to fd/find tool for regex/glob matching, with colourised output.
Not production-ready: API unstable, renaming pending. NOT IN A STATE FOR CONTRIBUTION
(It works, it just hasn't got the feature set I'd like yet, see copious tests!)
(Mostly this is done as a project to learn C+assembly, somehow just got bigger)
i do have benchmark suites!
Important Notes
As I fix and improve certain features, I will make it open to contributions.
Honestly this is still a hobby project that still needs much work.
It works for the subset I've implemented perfectly but it's far from complete.
It has better performance than fd on equivalent featuresets but fd
has an immense set, of which I'm not going to replicate
Rather that I'm just working on this project for myself because I really wanted to know what
happens when you optimally write hardware specific code( and how to write it!)
How to test
&&
BE WARNED, I CLONE THE LLVM REPO TO CREATE A SUSTAINABLE ENVIRONMENT FOR TESTING, I DO THIS SPECIFICALLY IN /tmp so this will be deleted at next shutdown, same goes for macos (Provides option to delete afterwards) not BSD (well, I only played around in QEMU, seems they've got a distinctively different system)
This runs a comprehensive suite of internal library+CLI tests as well as benchmarks.
Cool bits(full benchmarks can be seen in speed_benchmarks.txt)
Testing on my local filesystem (to show on non-toy example)
| | | | | |
| | | | | | #search my whole pc
| | | | | |
| | | | | |
| | | | | |
| | | | | |
######
#some of my benchmarks. repeatable on your own pc (as above)
#
# REPEATABLE BENCHMARKS (FOUND IN THE *HOW TO TEST* section above)
##### searching for the extension c in the llvm repo
)
)
)
)
############
)
)
)
)
#####
)
)
)
)
Extra bits
-cstr! :a macro use a byte slice as a pointer (automatically initialise memory, then add a null terminator for FFI use)
use cstr;
let who_is_that_pointer_over_there:*const u8=unsafe;
//automatically create an inline null-terminated stack allocated buffer of length LOCAL_PATH_MAX(4096)
//this is actually default to 4096
//but setting eg `export LOCAL_PATH_MAX=13000 && cargo b -r -q ` will recompile with LOCAL_PATH_MAX as 13000.
//this is a self explanatory one!
let leave_me_alone:*const u8=unsafe; //this will CRASH because you've only told to stack allocate for 5
/*explosions*/
//hence why its unsafe!
let this_is_fine_though:*const u8= unsafe;
//previous cstr! macros i've seen only worked on literals, which got fixed in rust 1.77+ (via the c"....." (c prefix on literals auto adds a null terminator))
// https://docs.rs/rustix/latest/rustix/macro.cstr.html, this is an example of what i've specifically tried to generalise.
let oh_it_doesnt_need_literals:&=b".";
let dot_as_pointer:*const u8 =unsafe;
-A black magic macro that can colour filepaths based on a compile time perfect hashmap (I made it into a separate crate) it's defined in another github repo of mine at https://github.com/alexcu2718/compile_time_ls_colours
Then this function, really nice way to avoid branch misses during dirent parsing (a really hot loop)
//The code is explained better in the true function definition (this is crate agnostic)
//This is the little-endian implementation, see crate for modified version for big-endian
// Only used on Linux systems, OpenBSD/macos systems store the name length trivially.
use find_zero_byte_u64; // a const SWAR function (SIMD within a register, so no architecture dependence.
pub const unsafe
WHY?
Well, I found find slow, I didn't know fd existed, I didn't expect some random test project to actually be good.
Then finally, the reward is a tool I can use for the rest of my life to find stuff.
Mostly though, I just enjoy learning.
To put it in perspective, I did not know any C before I started this project, I noticed that every type of file finding tool will inevitably rely on some kind of iterator that will heap allocate regardless of whether or not it's a match,
So we're talking a lot of random allocations which I suspect may be a big bottleneck. (I think arenas just might be the best option, simplicity and complexity trade off)
Even though my project in it's current state is faster, I've got some experiments to try filtering before allocating Unfortunately, you have to have to allocate heap space for directories in stdlib (because they're necessary for the next call) (The same would probably go here)
Which is partially why I felt the need to rewrite it from libc, it's just the standard library was too high level.
I'm curious to see what happens when you filter before allocation, this is something I have partially working in my current crate but the implementation details like that is not accessible via CLI. If it proves to be performant, it will eventually be in there. Obviously, I'm having to learn a lot to do these things and it takes TIME to understand, get inspired and implement things...
I do intend to only add features and not break anything, until i can somewhat promise that, then i won't entertain wasting other people's time but eventually if anyone felt like adding something, they can!
(notably, there's some obvious things I have not touched in a while and things that are just less interesting, ideally one day someone could do that, not now though)
NECESSARY DISCLAIMERS (I might have a conscience somewhere)
I've directly taken code from https://docs.rs/fnmatch-regex/latest/src/fnmatch_regex/glob.rs.html#3-574 and modified it so I could convert globs to regex patterns trivially, this simplifies the string filtering model by delegating it to rust's extremely fast regex crate. Notably I modified it because it's quite old and has dependencies I was able to remove
(I have emailed and received approval from the author above)
I've also done so for here https://doc.rust-lang.org/src/core/slice/memchr.rs.html#111-161 I've found a much more rigorous way of doing some bit tricks via this
I enjoy relying on validated work like stdlib to ideally 'covalidate' my work, aka less leaps of logic required to make the assessment
Future plans?
Separation of utilities
Right now, it's a bit monolithic. Some aspects might deserve their own crate (i dislike the idea of having 500 crates to do 1 specific thing each)
(Although, writing FFI like this for multiple different POSIX systems with distinct pecularities will tend to be a lot of code)
I'd probably just keep the CLI stuff simple, features to be added are datetime based filtering (could be done quick, I just have rarely used time based filtering and that's why it's slow!) as well as just other things, eg to search for device drivers/etc.
Add POSIX compatibility in general ( illumos/solaris QEMU isn't straight forward, quite esoteric)
Add Windows... Well, This would take a fundamental rewrite because of architectural differences, I might do it. (It may be interesting to learn the differences actually)
Additional features on my compile_time_ls_colours would be nice, I think I want to explore compile time hashmaps more, I'm only really scratching the service on metaprogramming and rust's utilities are great (one day I'll try cpp template metaprogramming and become a convert...)
Fundamentally I want to develop something that's simple to use (doing --help shouldnt give you the bible) ..and exceedingly efficient.
COMPATIBILITY STATE
1.Working on Linux(glibc dynamic linking/MUSL static linking) 64bit Tested on Debian/Ubuntu/Arch/Fedora varying versions
2.Aarch 64 Linux/Android Debian
3.Macos 64bit (Tested on Sonoma)
4.Free/Open/Net/Dragonfly BSD 64bit (Ok, it compiles on these platforms but only tested on freebsd+openbsd.)
5.Works on big endian systems, tested on Ubuntu PPC64 (took 20 minutes to compile....)
Installation
# Clone & build
# Optional system install
)
# Find all JPG files in the home directory (excluding hidden files)
# Find all Python files in /usr/local (including hidden files)
## Options (T)
)
)
)
TODO LIST (Maybe):
-- Arena Allocator (potentially): Written from scratch. See Microsoft's edit for a nice example: https://github.com/microsoft/edit/tree/main/src/arena
-- io_uring for Batched Syscalls: e.g., batched open/read operations. This will be extremely challenging.
Unfortunately uring lacks the op code required for getdents-- however other op codes are available, but this would require a LOT of work,
it also would require an async runtime-> Which inevitably means tokio-> which means most of my work in avoiding dependencies goes down the bin (I'm already unhappy being reliant on rayon but that's on the list to remove.)
-- I might continue developing my compile time hashmap for LS_COLORS and make an easier way to do these maps, it's got a good general use case and the macro use is pretty fun! However I do have a separate commit at https://github.com/alexcu2718/compile_time_ls_colours/tree/no_phf_build Which has no dependencies, although it's annoying to do without doing a HELLA lot of byte manipulation yourself. (also, it's runtime statically initialised, not as cool!)
-- Threading Without Rayon: My attempts have come close, but aren’t quite there yet. I'll rely on Rayon for now until I can come up with a smart way to implement an appropriate work-distributing algorithm. TODO!
-- Iterator Adaptor + Filter: Some kind of adaptor that avoids a lot of unnecessary allocations on non-directories.
-- Syscall Limits: I think there’s ultimately a hard limit on syscalls.
I've experimented with an early Zig io_uring + getdents implementation — but it's well outside my comfort zone (A LOT).
I’ll probably give it a go anyway (if possible).
**** THIS IS NOT FINISHED. I have no idea what the long-term plans are — I'm just trying to make stuff go fast and learn, OK?