codebook_config 0.3.10

Configuration handling for the Codebook spell checker
Documentation
# Technical Specification: `codebook-config` Crate

## Overview

The `codebook-config` crate provides a hierarchical configuration management system for the Codebook application, supporting both global (user-level) and project-specific configurations. The system respects platform-specific configuration directories and implements proper precedence rules between the two configuration levels.

## Core Components

### `CodebookConfig` Struct

The main configuration container with the following capabilities:

- **Hierarchical Configuration Management**:
  - Loads global configuration from platform-specific config directories
  - Loads project-specific configuration from the current directory or parent directories
  - Properly merges configurations with project settings taking precedence

- **Configuration Storage**:
  - `project_settings`: The project-specific configuration
  - `global_settings`: The global (user-level) configuration
  - `effective_settings`: The merged result used for actual operations

- **Configuration Discovery and Loading**:
  - Automatically finds global config in platform-specific locations
  - Recursively searches for project config files (`codebook.toml` or `.codebook.toml`)
  - Properly handles missing or invalid configuration files

### `ConfigSettings` Struct

The data model for configuration settings:

- `dictionaries`: List of dictionary IDs for spell-checking
- `words`: Custom allowlist of words that should be considered correct
- `flag_words`: Words that should always be flagged as problematic
- `ignore_paths`: Glob patterns for file paths to exclude
- `ignore_patterns`: Regex patterns for text content to ignore
- `use_global`: Whether to incorporate global configuration (project-config only)

## Key Features

### Global Configuration Support

- **Platform-Specific Paths**:
  - Linux/macOS: `$XDG_CONFIG_HOME/codebook/codebook.toml` if XDG_CONFIG_HOME is set
  - Linux/macOS fallback: `~/.config/codebook/codebook.toml`
  - Windows: `%APPDATA%\codebook\codebook.toml`

- **Configuration Precedence**:
  - Project configuration overrides global configuration
  - Global configuration is loaded first, then extended/overridden by project settings
  - Project config can entirely ignore global config via `use_global = false`

### Configuration Management

- **Read-Only Global Config**: Global configuration is never modified by the library
- **Project-Only Modifications**: Methods like `add_word()` only affect project configuration
- **Global-Only Modifications**: Methods like `add_word_global()` only affect global configuration
- **Effective Settings**: All validation methods use the merged effective settings
- **Transparent Reloading**: Changes to either global or project config are detected on reload

### Case-Insensitive Word Management

All word lists are stored and matched in lowercase, ensuring consistent behavior regardless of capitalization.

### Pattern-Based Ignoring

- **Path Ignoring**: Uses glob patterns for ignoring specified file paths
- **Content Ignoring**: Uses regular expressions for ignoring specified content patterns
- **Lazy RegexSet Initialization**: Compiles regex patterns only when needed

### Error Handling

Uses standard `std::io::Error` with appropriate context for all operations, without external dependencies for error handling.

## Technical Details

### Dependencies

- `serde` with `derive` feature: For serialization/deserialization
- `toml`: For TOML parsing and generation
- `dirs`: For finding platform-specific home directories
- `glob`: For file path pattern matching
- `log`: For logging operations
- `regex`: For text pattern matching

### Thread Safety

Uses `RwLock` for interior mutability, allowing thread-safe access to configuration settings.

### Cache Management

Maintains a temporary cache directory that can be cleaned as needed.

## Integration Points

The crate exposes the following main integration points:

1. `CodebookConfig::load()`: Load both global and project configurations
3. Core validation methods:
   - `is_allowed_word()`
   - `should_flag_word()`
   - `should_ignore_path()`
4. Configuration manipulation methods:
   - `add_word()`: Add words to project allowlist only
   - `add_word_global()`: Add words to global allowlist only
   - `save()`: Save project configuration only
   - `save_global()`: Save global configuration only
   - `reload()`: Reload both global and project configurations

## Usage Example

```rust
// Load configurations (both global and project)
let config = CodebookConfig::load(None)?;

// Check if a path should be ignored (uses effective settings)
let should_ignore = config.should_ignore_path("target/debug/build");

// Check if a word is allowed (uses effective settings)
let is_allowed = config.is_allowed_word("rustc");

// Add a new word to the project allowlist (project config only)
config.add_word("newterm")?;
config.save()?;

// Add a new word to the global allowlist (global config only)
config.add_word_global("newterm")?;
config.save_global()?;

// Clean the cache directory
config.clean_cache();
```

## Configuration File Format

The configuration files use TOML format with the following structure:

```toml
# List of dictionaries to use for spell checking
dictionaries = ["en_us", "en_gb"]

# Custom allowlist of words
words = ["codebook", "rustc"]

# Words that should always be flagged
flag_words = ["todo", "fixme"]

# Glob patterns for paths to ignore
ignore_paths = ["target/**/*", "**/*.md"]

# Regex patterns for text to ignore
ignore_patterns = ["^[A-Z0-9]+$", "\\d{3}-\\d{2}-\\d{4}"]

# Whether to use global configuration (project config only)
use_global = true
```

## Implementation Details

- Word lists are case-insensitive and automatically converted to lowercase for consistent matching
- When merging global and project configurations, duplicate entries are automatically removed
- Project configuration takes precedence over global configuration for all settings
- Regular expressions are compiled lazily to improve performance
- Configuration changes are detected by comparing content, not just timestamps