Skip to main content

Module parser

Module parser 

Source
Expand description

Magic file parser module

This module handles parsing of magic files into an Abstract Syntax Tree (AST) that can be evaluated against file buffers for type identification.

§Overview

The parser implements a complete pipeline for transforming magic file text into a hierarchical rule structure suitable for evaluation. The pipeline consists of:

  1. Preprocessing: Line handling, comment removal, continuation processing
  2. Parsing: Individual magic rule parsing using nom combinators
  3. Hierarchy Building: Constructing parent-child relationships based on indentation
  4. Validation: Type checking and offset resolution

§Format Detection and Loading

The module automatically detects and handles three types of magic file formats:

  • Text files: Human-readable magic rule definitions
  • Directories: Collections of magic files (Magdir pattern)
  • Binary files: Compiled .mgc files (currently unsupported)

§Unified Loading API

The recommended entry point for loading magic files is load_magic_file(), which automatically detects the format and dispatches to the appropriate handler:

use libmagic_rs::parser::load_magic_file;
use std::path::Path;

// Works with text files
let rules = load_magic_file(Path::new("/usr/share/misc/magic"))?;

// Also works with directories
let rules = load_magic_file(Path::new("/usr/share/misc/magic.d"))?;

// Binary .mgc files return an error with guidance
match load_magic_file(Path::new("/usr/share/misc/magic.mgc")) {
    Ok(rules) => { /* ... */ },
    Err(e) => eprintln!("Use --use-builtin for binary files: {}", e),
}

§Three-Tier Loading Strategy

The loading process works as follows:

  1. Format Detection: detect_format() examines the path to determine the file type
  2. Dispatch to Handler:
  3. Return Merged Rules: All rules are returned in a single Vec<MagicRule>

§Examples

Use the unified load_magic_file() API for automatic format detection:

use libmagic_rs::parser::load_magic_file;
use std::path::Path;

let rules = load_magic_file(Path::new("/usr/share/misc/magic"))?;
println!("Loaded {} magic rules", rules.len());

§Parsing Text Content Directly

For parsing magic rule text that’s already in memory:

use libmagic_rs::parser::parse_text_magic_file;

let magic_content = r#"
0 string \x7fELF ELF executable
>4 byte 1 32-bit
>4 byte 2 64-bit
"#;

let rules = parse_text_magic_file(magic_content)?;
assert_eq!(rules.len(), 1);
assert_eq!(rules[0].children.len(), 2);

§Loading a Directory Explicitly

For Magdir-style directories containing multiple magic files:

use libmagic_rs::parser::load_magic_directory;
use std::path::Path;

// Directory structure:
// /usr/share/file/magic.d/
//   ├── elf
//   ├── archive
//   └── text

let rules = load_magic_directory(Path::new("/usr/share/file/magic.d"))?;
// Rules from all files are merged in alphabetical order by filename

§Migration Note

For users upgrading from direct function calls:

  • Old approach: Call detect_format() then dispatch manually
  • New approach: Use load_magic_file() for automatic dispatching

The individual functions (parse_text_magic_file(), load_magic_directory()) remain available for advanced use cases where you need direct control.

Key differences:

  • load_magic_file(): Unified API with automatic format detection (recommended)
  • parse_text_magic_file(): Parses a single text string containing magic rules
  • load_magic_directory(): Loads and merges all magic files from a directory
  • detect_format(): Low-level format detection (now called internally by load_magic_file())

Error handling in load_magic_directory():

  • Critical errors (I/O failures, invalid UTF-8): Returns ParseError immediately
  • Non-critical errors (parse failures in individual files): Logs warning to stderr and continues

Re-exports§

pub use ast::Endianness;
pub use ast::MagicRule;
pub use ast::OffsetSpec;
pub use ast::Operator;
pub use ast::StrengthModifier;
pub use ast::TypeKind;
pub use ast::Value;
pub use grammar::parse_number;
pub use grammar::parse_offset;

Modules§

ast
Abstract Syntax Tree definitions for magic rules
grammar
Grammar parsing for magic files using nom parser combinators
types
Type keyword parsing for magic file types

Enums§

MagicFileFormat
Represents the format of a magic file or directory

Functions§

detect_format
Detect the format of a magic file or directory
load_magic_directory
Loads and parses all magic files from a directory, merging them into a single rule set.
load_magic_file
Loads magic rules from a file or directory, automatically detecting the format.
parse_text_magic_file
Parses a complete magic file from raw text input.