Crate file_identify

Source
Expand description

§file-identify

A Rust library for identifying file types based on extensions, content, and shebangs.

This library provides a comprehensive way to identify files by analyzing:

  • File extensions and special filenames
  • File content (binary vs text detection)
  • Shebang lines for executable scripts
  • File system metadata (permissions, file type)

§Quick Start

use file_identify::{tags_from_path, tags_from_filename, FileIdentifier};

// Simple filename identification
let tags = tags_from_filename("script.py");
assert!(tags.contains("python"));
assert!(tags.contains("text"));

// Full file identification from filesystem path
let tags = tags_from_path(&file_path).unwrap();
assert!(tags.contains("file"));
assert!(tags.contains("python"));

// Customized identification with builder pattern
let identifier = FileIdentifier::new()
    .skip_content_analysis()  // Skip text vs binary detection
    .skip_shebang_analysis(); // Skip shebang parsing

let tags = identifier.identify(&file_path).unwrap();
assert!(tags.contains("file"));
assert!(tags.contains("python"));

§Tag System

Files are identified using a set of standardized tags:

  • Type tags: file, directory, symlink, socket
  • Mode tags: executable, non-executable
  • Encoding tags: text, binary
  • Language/format tags: python, javascript, json, xml, etc.

§Error Handling

Functions that access the filesystem return Result types. The main error conditions are:

Modules§

extensions
interpreters
tags

Structs§

FileIdentifier
Configuration for file identification behavior.
ShebangTuple
A tuple-like immutable container for shebang components that matches Python’s tuple behavior.

Enums§

IdentifyError
Errors that can occur during file identification.

Functions§

file_is_text
Determine if a file contains text or binary data.
is_text
Determine if data from a reader contains text or binary content.
parse_shebang
Parse a shebang line from a reader and return raw shebang components.
parse_shebang_from_file
Parse shebang line from an executable file and return raw shebang components.
tags_from_filename
Identify a file based only on its filename.
tags_from_interpreter
Identify tags based on a shebang interpreter.
tags_from_path
Identify a file from its filesystem path.

Type Aliases§

Result
Result type for file identification operations.