file-identify
A Rust library for identifying file types based on extensions, content, and shebangs.
Given a file (or pre-loaded file information), returns a set of standardized tags identifying what the file is. Supports 315+ file types with compile-time optimized lookups via PHF.
Rust port of the Python identify library.
Quick start
[]
= "0.3"
use ;
// Full identification from a filesystem path
let tags = tags_from_path.unwrap;
assert!;
assert!;
assert!;
// Filename-only identification (no I/O)
let tags = tags_from_filename;
assert!;
I/O-free identification
For use with mocked or virtual filesystems (e.g., in tests), tags_from_info
accepts pre-loaded file data with no filesystem access:
use ;
let info = FileInfo ;
let tags = tags_from_info;
assert!;
assert!;
The FileIdentifier builder works with both paths and FileInfo:
use FileIdentifier;
let id = new
.skip_content_analysis
.skip_shebang_analysis;
let tags = id.identify.unwrap;
// Or: id.identify_from(&info);
How it works
A call to tags_from_path does this:
- Checks the file type (file, symlink, directory, socket). If not a regular file, stop.
- Checks permissions and adds
executableornon-executable. - Matches the filename or extension. If recognized, adds tags (including
text/binary) and stops. - Reads the first 1KB to determine
textorbinary. - For text executables, parses the shebang to identify the interpreter.
By design, recognized extensions skip file reads entirely.
CLI
The CLI is behind the cli feature to keep the library dependency-free of clap:
# ["file", "non-executable", "rust", "text"]
# ["cargo", "text", "toml"]
Tag categories
| Category | Tags |
|---|---|
| Type | file, directory, symlink, socket |
| Mode | executable, non-executable |
| Encoding | text, binary |
| Language | python, rust, javascript, c++, ... (315+ types) |
Minimum supported Rust version
The MSRV is 1.94.0 and is checked in CI.
License
MIT