# cadi
CADI CLI - Content-Addressed Development Interface
A universal build and distribution system treating all artifacts as content-addressed chunks with support for multiple representations (source → WASM IR → binaries → containers), registry federation, provenance tracking, and LLM optimization.
## Installation
```bash
cargo install cadi
```
## Quick Start
### Scrape a Project
```bash
cadi scrape ./my-project --strategy semantic --output ./chunks
```
### Publish to Registry
```bash
cadi publish \
--registry https://registry.example.com \
--auth-token YOUR_TOKEN \
--namespace myorg/project
```
### Build Artifacts
```bash
cadi build ./cadi.yaml
```
### Query Registry
```bash
cadi query --registry URL --namespace myorg
```
## Commands
### scrape
Convert source code and files into CADI chunks.
```bash
cadi scrape <INPUT> [OPTIONS]
Options:
--output DIR Output directory (default: ./cadi-chunks)
--include-overlap Include context between chunks
--hierarchy Create hierarchical relationships
--extract-api Extract API surfaces
--detect-licenses Detect licenses
--publish Publish after scraping
--namespace NAMESPACE Registry namespace
--format FORMAT table|json
```
### publish
Publish chunks to a registry.
```bash
cadi publish [OPTIONS]
Options:
--registry URL Registry URL
--auth-token TOKEN Authentication token
--namespace NAMESPACE Namespace/project
--batch-size N Chunks per request
--dry-run Show what would be published
```
### build
Execute build recipes.
```bash
cadi build <RECIPE> [OPTIONS]
Options:
--output DIR Output directory
--config FILE Configuration file
```
### query
Query registry for chunks.
```bash
cadi query [OPTIONS]
Options:
--registry URL Registry URL
--namespace NAMESPACE Filter by namespace
--search QUERY Search terms
--limit N Result limit
```
## Configuration
### Global Config
Create `~/.cadi/config.yaml`:
```yaml
default_registry: https://registry.example.com
default_namespace: myorg
auth:
token: your-token-here
scraper:
strategy: semantic
max_chunk_size: 52428800
extract_api_surface: true
```
### Project Config
Create `./cadi.yaml` in your project:
```yaml
name: my-project
version: 1.0.0
namespace: myorg/my-project
scraping:
strategy: semantic
include_overlap: true
languages:
- rust
- python
publishing:
registry: https://registry.example.com
namespace: myorg/my-project
```
## Examples
### Scrape and Publish
```bash
# Scrape with semantic chunking
cadi scrape ./src --strategy semantic --output ./chunks
# Publish to registry
cadi publish --registry https://registry.example.com \
--auth-token $CADI_TOKEN \
--namespace myorg/project
```
### Batch Processing
```bash
for dir in ./projects/*; do
echo "Processing $(basename $dir)..."
cadi scrape "$dir" --strategy semantic
done
```
### Integration with Pipelines
```bash
# GitHub Actions
cadi scrape . --strategy semantic --publish \
--registry $REGISTRY_URL \
--auth-token $REGISTRY_TOKEN \
--namespace $GITHUB_REPOSITORY
```
## Environment Variables
```bash
CADI_REGISTRY_URL # Default registry
CADI_AUTH_TOKEN # Authentication token
CADI_NAMESPACE # Default namespace
CADI_CHUNKING_STRATEGY # Default strategy
CADI_MAX_CHUNK_SIZE # Default chunk size
CADI_EXTRACT_API # Extract API surfaces
CADI_DETECT_LICENSES # Detect licenses
```
## Output Formats
### Table Format (default)
```
ID Content Type Size Chunks
abc123def456 application/rs 2,048 3
xyz789 text/markdown 512 1
```
### JSON Format
```bash
cadi scrape ./project --format json
```
Outputs structured JSON with full metadata:
```json
{
"chunks": [
{
"id": "abc123def456",
"content_type": "application/rs",
"size": 2048,
"metadata": {
"title": "hello.rs",
"description": "Main binary",
"keywords": ["rust", "cli"]
}
}
],
"statistics": {
"total_chunks": 10,
"total_bytes": 102400,
"duration_ms": 1200
}
}
```
## Troubleshooting
### Authentication Failed
```bash
export CADI_AUTH_TOKEN="your-token"
cadi query --registry https://registry.example.com
```
### Rate Limiting
Configure in `~/.cadi/config.yaml`:
```yaml
scraper:
rate_limit_per_sec: 20
request_timeout_secs: 30
```
### Large Projects
For large projects, use hierarchical chunking:
```bash
cadi scrape ./large-project --strategy hierarchical
```
## Integration with Other Tools
### Docker
```dockerfile
FROM rust:latest
RUN cargo install cadi
WORKDIR /project
COPY . .
RUN cadi scrape . --strategy semantic
```
### CI/CD
```yaml
# GitLab CI
build_chunks:
image: rust:latest
script:
- cargo install cadi
- cadi scrape . --strategy semantic --publish
```
## System Requirements
- Rust 1.70+
- 4GB RAM (2GB minimum)
- 100MB disk space
- Network access for registry operations
## Performance
Typical performance on modern hardware:
- Scraping: 50-100 MB/sec
- Publishing: Limited by network bandwidth
- Querying: < 100ms per chunk
## Documentation
- **User Guide**: See SCRAPER-GUIDE.md for detailed documentation
- **API Docs**: `cargo doc --open` (library crates)
- **Examples**: See examples/ directory
- **Quick Start**: SCRAPER-QUICKSTART.md
## License
MIT License
## Contributing
Contributions welcome! Please check repository for guidelines.
## Support
- GitHub Issues: https://github.com/ConflictingTheories/cadi/issues
- Documentation: https://github.com/ConflictingTheories/cadi