kodegen_utils
Memory-efficient, blazing-fast utilities for code generation agents. Part of the KODEGEN.ᴀɪ ecosystem.
Overview
kodegen_utils provides high-performance text processing utilities designed specifically for AI coding assistants and MCP (Model Context Protocol) tools. It focuses on solving the invisible character problems that plague AI-generated code edits through sophisticated character-level analysis and fuzzy string matching.
Features
- 🔍 Fuzzy String Matching: Levenshtein distance-based search with recursive optimization
- 🔬 Character-Level Analysis: Deep diagnostics for invisible Unicode issues (zero-width chars, mixed line endings, tabs vs spaces)
- 📊 Visual Diffs: Character-precise diff visualization in format:
prefix{-removed-}{+added+}suffix - ⚡ Async Telemetry: Non-blocking logging with fire-and-forget patterns
- 🎯 Smart Suggestions: Actionable error messages for edit failures
- 💾 LRU Caching: Optimized performance for repeated analysis operations
- 📈 Usage Tracking: Built-in statistics for MCP tool operations
Installation
Add to your Cargo.toml:
[]
= "0.1.0"
This crate requires Rust nightly:
Usage
Fuzzy String Matching
Find approximate matches in text using Levenshtein distance:
use ;
let text = "The quick brown fox jumps over the lazy dog";
let result = recursive_fuzzy_index_of_with_defaults;
println!;
println!;
// Calculate similarity ratio (0.0 to 1.0)
let similarity = get_similarity_ratio;
println!;
Character-Level Diff
Generate visual diffs to identify invisible character differences:
use CharDiff;
let expected = "function getUserData()";
let actual = "function getUserData()"; // Extra space
let diff = new;
println!;
// Output: function {--}{+ +}getUserData()
if diff.is_whitespace_only
Character Analysis
Deep analysis for diagnosing invisible character issues:
use ;
// Analysis is automatically cached in LRU cache
let analysis: CharCodeData = analyze_string_diff;
println!;
println!;
// Check for specific issues
if analysis.has_zero_width
for issue in &analysis.whitespace_issues
Async Edit Logging
Non-blocking telemetry for edit operations:
use ;
use Utc;
let logger = get_edit_logger;
let entry = EditBlockLogEntry ;
// Fire-and-forget logging (never blocks)
logger.log;
println!;
User-Facing Suggestions
Generate actionable error messages:
use ;
let context = SuggestionContext ;
let reason = FuzzyMatchBelowThreshold ;
let suggestion = for_failure;
println!;
Architecture
The library is organized into focused modules:
fuzzy_search: Levenshtein distance and recursive fuzzy matchingchar_diff: Character-level diff generationchar_analysis: Deep character diagnostics with LRU cachingedit_log: Async telemetry for edit operationsfuzzy_logger: Async fuzzy search loggingusage_tracker: MCP tool usage statisticssuggestions: User-facing error messagesline_endings: Cross-platform line ending handling
Performance Design
All logging operations use fire-and-forget async patterns:
- Unbounded channels prevent blocking
- Background tasks batch disk writes
- Periodic flushes (5-second intervals)
- Graceful shutdown handling
Development
Building
# Build library
# Build with optimizations
Testing
# Run all tests
# Run specific test
# Show test output
Linting and Formatting
# Format code
# Run clippy
# Check without building
Requirements
- Rust: Nightly toolchain (2024 edition)
- Targets:
x86_64-apple-darwin,wasm32-unknown-unknown - Components:
rustfmt,clippy
See rust-toolchain.toml for exact configuration.
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Contributing
Contributions are welcome! This library is part of the KODEGEN.ᴀɪ project.
See the repository for contribution guidelines.
Links
- Homepage: https://kodegen.ai
- Repository: https://github.com/cyrup-ai/kodegen
- Documentation: docs.rs
Built with ❤️ by the KODEGEN.ᴀɪ team