Crate dirgrab_lib

Crate dirgrab_lib 

Source
Expand description

ยงdirgrab ๐Ÿ“โšก

Crates.io Docs.rs

dirgrab walks a directory (or Git repository), selects the files that matter, and concatenates their contents for easy copy/paste into language models. It can write to stdout, a file, or your clipboard, and it ships with a library crate so the same logic can be embedded elsewhere.

ยงHighlights

  • ๐Ÿ”ง Configurable defaults โ€“ merge built-in defaults with global config.toml, project-local .dirgrab.toml, .dirgrabignore, and CLI flags.
  • ๐Ÿงญ Git-aware out of the box โ€“ untracked files are included by default, scoped to the selected subdirectory, with --tracked-only and --all-repo to opt out.
  • ๐Ÿ—‚๏ธ Structured context โ€“ optional directory tree, per-file headers, PDF text extraction, and deterministic file ordering for stable diffs.
  • ๐Ÿงฎ Better stats โ€“ -s/--stats now prints summary totals plus a per-file token leaderboard, and you can pick which reports to show each run.
  • ๐Ÿ™… Safety nets โ€“ automatically ignores the active output file, respects .gitignore, and gracefully skips binary/non-UTF8 files.

ยงInstallation

cargo install dirgrab
# or from a local checkout
# cargo install --path .

Check it worked:

dirgrab --version

ยงUsage

dirgrab [OPTIONS] [TARGET_PATH]

TARGET_PATH defaults to the current directory. When invoked inside a Git repo, dirgrab scopes the listing to that subtree unless you pass --all-repo.

ยงCommon Options

  • -o, --output [FILE] โ€“ write to a file (defaults to dirgrab.txt if no name is given). Conflicts with --clipboard.
  • -c, --clipboard โ€“ copy to the system clipboard instead of stdout or a file.
  • --no-headers / --no-tree / --no-pdf โ€“ disable headers, the directory tree, or PDF extraction.
  • -e, --exclude <PATTERN> โ€“ add glob-style excludes (applied after config files).
  • --tracked-only โ€“ Git mode: limit to tracked files. (Compatibility note: -u/--include-untracked still forces inclusion if you need it.)
  • --all-repo โ€“ Git mode: operate on the entire repository even if the target is a subdirectory.
  • --include-default-output โ€“ allow dirgrab.txt back into the run.
  • --no-git โ€“ ignore Git context entirely and walk the filesystem.
  • --no-config โ€“ ignore global/local config files and .dirgrabignore.
  • --config <FILE> โ€“ load an additional TOML config file (applied after global/local unless --no-config).
  • --token-ratio <FLOAT> โ€“ override the characters-to-tokens ratio used by --stats (defaults to 3.6).
  • --tokens-exclude-tree / --tokens-exclude-headers โ€“ subtract tree or header sections when estimating tokens.
  • -s, --stats [REPORT...] โ€“ print stats reports to stderr. Defaults to overview + top-files=5; provide explicit reports like --stats overview top-files=10.
  • -v, -vv, -vvv โ€“ increase log verbosity (Warn, Info, Debug, Trace).
  • -h, --help / -V, --version โ€“ CLI boilerplate.

ยงConfiguration Files

dirgrab layers configuration in the following order (later wins):

  1. Built-in defaults
  2. Global config + ignore
    • Linux: ~/.config/dirgrab/config.toml & ~/.config/dirgrab/ignore
    • macOS: ~/Library/Application Support/dirgrab/config.toml & โ€ฆ/ignore
    • Windows: %APPDATA%\dirgrab\config.toml & ignore
  3. Project-local config: <target>/.dirgrab.toml
  4. Project-local ignore patterns: <target>/.dirgrabignore
  5. CLI flags (--tracked-only, --no-tree, etc.)

Sample config.toml:

[dirgrab]
exclude = ["Cargo.lock", "*.csv", "node_modules/", "target/"]
include_tree = true
add_headers = true
convert_pdf = true
tracked_only = false
all_repo = false

[stats]
enabled = true
token_ratio = 3.6
tokens_exclude = ["tree"]
reports = ["overview", "top-files=8"]

ignore files use the same syntax as .gitignore. CLI -e patterns and the active output file name are appended last, so the freshly written file is never re-ingested accidentally.

ยงExamples

# Grab the current repo subtree (includes untracked files) and show stats
dirgrab -s

# Limit to tracked files only and exclude build artifacts
dirgrab --tracked-only -e "*.log" -e "target/"

# Force a whole-repo snapshot from within a subdirectory
dirgrab --all-repo

# Plain directory mode with custom excludes, writing to the default file
dirgrab --no-git -e "*.tmp" -o

# Use project defaults but ignore configs for a โ€œcleanโ€ run
dirgrab --no-config --no-tree --no-headers

ยงBehaviour Notes

  • Git scope & ordering โ€“ Paths are gathered via git ls-files, scoped to the target subtree unless --all-repo is set, and the final list is sorted for deterministic output. Non-Git mode uses walkdir with the same ordering.
  • File headers & tree โ€“ Headers and tree sections remain enabled by default; toggle them per run or through config files.
  • PDF handling โ€“ Text is extracted from PDFs unless disabled. Failures and binary files are skipped with informative (but less noisy) logs.
  • Stats โ€“ When --stats is active (or enabled in config), stderr shows the requested reports (default: totals + top files). Exclude tree/headers, adjust the ratio, or pick different reports via config or CLI.
  • Safety โ€“ dirgrab.txt stays excluded unless explicitly re-enabled, and any active -o FILE target is auto-excluded for that run.

ยงLibrary (dirgrab-lib)

The same engine powers dirgrab-lib; import it to drive custom tooling:

use dirgrab_lib::{grab_contents, GrabConfig};

See docs.rs for API details.

ยงChangelog

See CHANGELOG.md for the full release history.

ยงLicense

Licensed under either of:

ยงContributing

Issues and PRs are welcome! Please run cargo fmt, cargo clippy, and cargo test before submitting.

Structsยง

GrabConfig
Configuration for the dirgrab operation.
GrabOutput
GrabbedFile

Enumsยง

GrabError
Errors that can occur during the dirgrab library operations.

Functionsยง

grab_contents
Performs the main dirgrab operation based on the provided configuration.
grab_contents_detailed
Performs the main dirgrab operation and returns file-level metadata along with the content.

Type Aliasesยง

GrabResult
A convenience type alias for Result<T, GrabError>.