truncate_base64

Function truncate_base64 

Source
pub fn truncate_base64(content: &str) -> String
Expand description

Truncate base64-encoded content in text to save tokens

This function detects and truncates large base64-encoded content (such as embedded images in data URIs or long base64 strings) to reduce token count while preserving the structure and meaning of the text.

§Arguments

  • content - The text content to process

§Returns

A new string with base64 content truncated and replaced with markers.

§Detection and Truncation Rules

  1. Data URIs (e.g., ...):

    • Preserves the MIME type prefix: data:image/png;base64,
    • Replaces the base64 content with: [BASE64_TRUNCATED]
    • Result: data:image/png;base64,[BASE64_TRUNCATED]
  2. Long base64 strings (200+ characters with + or /):

    • Shows first 50 characters
    • Appends: ...[BASE64_TRUNCATED]
    • Result: SGVsbG8gV29...ybGQ=...[BASE64_TRUNCATED]
  3. Short base64 strings (<200 characters):

    • Not truncated (kept as-is)
  4. Long strings without base64 characters (no + or /):

    • Not truncated (likely not base64)
  5. Regular text:

    • Completely preserved

§Examples

use infiniloom_engine::content_processing::truncate_base64;

// Data URI truncation
let data_uri = "...";
let result = truncate_base64(data_uri);
assert_eq!(result, "data:image/png;base64,[BASE64_TRUNCATED]");

// Long base64 string truncation
let long_base64 = "A".repeat(250) + "+/";
let result = truncate_base64(&long_base64);
assert!(result.contains("[BASE64_TRUNCATED]"));

// Short base64 preserved
let short = "SGVsbG8gV29ybGQ="; // "Hello World" in base64 (16 chars)
let result = truncate_base64(short);
assert_eq!(result, short); // Unchanged

// Regular text preserved
let text = "This is regular code with no base64";
let result = truncate_base64(text);
assert_eq!(result, text); // Unchanged

§Performance

  • Uses pre-compiled regex pattern (compiled once, reused forever)
  • Efficient replacement with Regex::replace_all()
  • Only allocates new string if matches are found

§Use Cases

  • Reducing token count when packing repositories with embedded images
  • Removing large data URIs from HTML/CSS files
  • Truncating base64-encoded assets in configuration files
  • Optimizing content for LLM context windows