# Code Digest Documentation
Welcome to the code-digest documentation! This high-performance CLI tool converts codebases to Markdown format optimized for Large Language Model (LLM) consumption.
## Quick Links
- [Installation Guide](installation.md)
- [Usage Guide](usage.md)
- [Configuration Reference](configuration.md)
- [API Reference](api.md)
- [Examples](examples.md)
- [Troubleshooting](troubleshooting.md)
- [Contributing](../CONTRIBUTING.md)
## What is Code Digest?
Code Digest is a Rust-based CLI tool that:
- **Converts** entire codebases to structured Markdown
- **Prioritizes** files based on importance and token limits
- **Optimizes** output for LLM context windows
- **Supports** 20+ programming languages
- **Integrates** with LLM CLI tools (gemini, codex)
- **Processes** projects in parallel for maximum performance
## Key Features
### π **High Performance**
- Parallel file processing with Rayon
- Intelligent token counting with tiktoken-rs
- Memory-efficient streaming for large projects
- Benchmark: 2.4K files/sec end-to-end processing
### π― **Smart Prioritization**
- File importance scoring based on type and location
- Token limit enforcement with optimal file selection
- Configurable priority weights and patterns
- Automatic structure overhead calculation
### βοΈ **Flexible Configuration**
- TOML configuration files with inheritance
- CLI argument overrides
- .digestignore support (like .gitignore)
- Environment variable integration
### π§ **LLM Integration**
- Direct integration with gemini and codex
- Optimized token usage for context windows
- Structured output with table of contents
- File tree visualization
### π§ͺ **Production Ready**
- Comprehensive test suite (77 tests)
- CI/CD with GitHub Actions
- Release automation
- Performance benchmarks
## Quick Start
```bash
# Install
cargo install code-digest
# Basic usage
code-digest -d /path/to/project -o project.md
# With token limits
code-digest -d /path/to/project --max-tokens 50000 -o project.md
# Direct LLM integration
code-digest -d /path/to/project "Explain the architecture of this codebase"
# With configuration
code-digest -d /path/to/project -c config.toml -o project.md
```
## Use Cases
### π **Code Review & Analysis**
- Generate comprehensive project overviews
- Create documentation for legacy codebases
- Prepare code for AI-assisted reviews
- Export codebases for external analysis
### π€ **LLM Context Preparation**
- Convert projects for ChatGPT/GPT-4 analysis
- Prepare context for code generation tasks
- Create training data for custom models
- Generate structured prompts for AI tools
### π **Documentation & Knowledge Transfer**
- Create onboarding materials for new developers
- Generate technical documentation automatically
- Export codebases for architecture discussions
- Prepare materials for technical interviews
### π **Project Understanding**
- Quickly understand unfamiliar codebases
- Generate project summaries and insights
- Identify key components and dependencies
- Analyze code patterns and structures
## Architecture Overview
```
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β CLI Parser βββββΆβ Configuration βββββΆβ Directory Walkerβ
β (clap) β β (TOML + Args) β β (walkdir) β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β LLM Integrationββββββ Markdown Gen ββββββ File Prioritizerβ
β (gemini/codex) β β (templates) β β (tiktoken-rs) β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
```
## Performance Characteristics
| Directory Walking | 160-276K files/sec |
| Token Counting | 680MB/s - 1.6GB/s |
| File Prioritization | 10K files/sec |
| Markdown Generation | 80K files/sec |
| End-to-End | 2.4K files/sec |
| Parallel Speedup | ~40% improvement |
## Supported Languages
| Rust | `.rs` | High | Native optimization |
| Python | `.py` | High | Complete support |
| JavaScript | `.js` | High | ES6+ features |
| TypeScript | `.ts`, `.tsx` | High | Full type support |
| Go | `.go` | Medium | Standard library aware |
| Java | `.java` | Medium | Package structure |
| C++ | `.cpp`, `.hpp` | Medium | Header handling |
| C | `.c`, `.h` | Medium | Include processing |
| C# | `.cs` | Medium | Namespace support |
| Ruby | `.rb` | Medium | Gem structure |
| PHP | `.php` | Medium | Framework aware |
| Swift | `.swift` | Medium | iOS/macOS focus |
| Kotlin | `.kt` | Medium | Android support |
| Scala | `.scala` | Medium | JVM integration |
| Haskell | `.hs` | Medium | Functional focus |
| Markdown | `.md` | Low | Documentation |
| JSON | `.json` | Low | Configuration |
| YAML | `.yml`, `.yaml` | Low | Configuration |
| TOML | `.toml` | Low | Configuration |
| XML | `.xml` | Low | Data format |
| HTML | `.html` | Low | Web content |
| CSS | `.css` | Low | Styling |
## Project Status
- β
**Core Features**: Complete
- β
**Testing**: 77 tests, 100% critical path coverage
- β
**Performance**: Optimized and benchmarked
- β
**Documentation**: Comprehensive guides
- β
**CI/CD**: GitHub Actions pipeline
- π§ **Examples**: In progress
- π§ **Release**: Preparing v1.0.0
## Community & Support
- **Issues**: [GitHub Issues](https://github.com/matiasvillaverde/code-digest/issues)
- **Discussions**: [GitHub Discussions](https://github.com/matiasvillaverde/code-digest/discussions)
- **Contributing**: See [CONTRIBUTING.md](../CONTRIBUTING.md)
- **License**: MIT License
## What's Next?
- π¦ Package distribution (Homebrew, apt, etc.)
- π Plugin system for custom processors
- π¨ Template system for custom output formats
- π Watch mode for continuous processing
- π Web interface for team collaboration
- π Analytics and usage insights