pub fn truncate_base64(content: &str) -> StringExpand description
Truncate base64-encoded content in text to save tokens
This function detects and truncates large base64-encoded content (such as embedded images in data URIs or long base64 strings) to reduce token count while preserving the structure and meaning of the text.
§Arguments
content- The text content to process
§Returns
A new string with base64 content truncated and replaced with markers.
§Detection and Truncation Rules
-
Data URIs (e.g.,
data:image/png;base64,iVBORw0KG...):- Preserves the MIME type prefix:
data:image/png;base64, - Replaces the base64 content with:
[BASE64_TRUNCATED] - Result:
data:image/png;base64,[BASE64_TRUNCATED]
- Preserves the MIME type prefix:
-
Long base64 strings (200+ characters with
+or/):- Shows first 50 characters
- Appends:
...[BASE64_TRUNCATED] - Result:
SGVsbG8gV29...ybGQ=...[BASE64_TRUNCATED]
-
Short base64 strings (<200 characters):
- Not truncated (kept as-is)
-
Long strings without base64 characters (no
+or/):- Not truncated (likely not base64)
-
Regular text:
- Completely preserved
§Examples
use infiniloom_engine::content_processing::truncate_base64;
// Data URI truncation
let data_uri = "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAAB...";
let result = truncate_base64(data_uri);
assert_eq!(result, "data:image/png;base64,[BASE64_TRUNCATED]");
// Long base64 string truncation
let long_base64 = "A".repeat(250) + "+/";
let result = truncate_base64(&long_base64);
assert!(result.contains("[BASE64_TRUNCATED]"));
// Short base64 preserved
let short = "SGVsbG8gV29ybGQ="; // "Hello World" in base64 (16 chars)
let result = truncate_base64(short);
assert_eq!(result, short); // Unchanged
// Regular text preserved
let text = "This is regular code with no base64";
let result = truncate_base64(text);
assert_eq!(result, text); // Unchanged§Performance
- Uses pre-compiled regex pattern (compiled once, reused forever)
- Efficient replacement with
Regex::replace_all() - Only allocates new string if matches are found
§Use Cases
- Reducing token count when packing repositories with embedded images
- Removing large data URIs from HTML/CSS files
- Truncating base64-encoded assets in configuration files
- Optimizing content for LLM context windows