Expand description
Content processing utilities for transforming file contents
This module provides utilities for processing and transforming file content, particularly for optimizing content for LLM consumption by removing or truncating large binary/encoded data.
§Features
- Base64 Detection and Truncation: Automatically detects and truncates base64-encoded content (data URIs, embedded images, etc.) to save tokens
- Pattern-based Processing: Uses pre-compiled regex patterns for efficient content transformation
§Examples
§Truncating Base64 Content
use infiniloom_engine::content_processing::truncate_base64;
// Data URI with embedded image
let content = "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAAB...";
let truncated = truncate_base64(content);
assert!(truncated.contains("[BASE64_TRUNCATED]"));
// Regular text is preserved
let text = "This is normal text";
let result = truncate_base64(text);
assert_eq!(result, text);§Performance
- Uses
once_cell::sync::Lazyfor one-time regex compilation - Regex patterns are compiled once and reused across all calls
- Efficient for processing large codebases with many files
§Detection Rules
The base64 detection looks for:
- Data URIs:
data:[mimetype];base64,[content] - Long base64 strings: Sequences of 200+ base64 characters
Truncation behavior:
- Data URIs: Preserves prefix, replaces content with
[BASE64_TRUNCATED] - Long strings (>100 chars with +/): Shows first 50 chars +
...[BASE64_TRUNCATED] - Short strings (<200 chars): Not truncated
- Non-base64 text: Preserved unchanged
Functions§
- truncate_
base64 - Truncate base64-encoded content in text to save tokens