AgenticVision
Core vision library for AI agents — image capture, CLIP embedding, similarity search, and persistent visual memory.
What it does
AgenticVision gives AI agents persistent visual memory. Capture images, embed them with CLIP ViT-B/32 into 512-dimensional vectors, store them in a compact .avis binary format, and query them by similarity, time, or description.
Install
Or add to your Cargo.toml:
[]
= "0.1"
Usage
use ;
let mut store = open?;
// Capture from file
let id = store.capture?;
// Find similar
let matches = store.similar?;
for m in matches
Key features
- CLIP ViT-B/32 embeddings — 512-dimensional vectors via ONNX Runtime, with fallback mode when model is not present
- Binary
.avisformat — 64-byte header, JSON payload, JPEG thumbnails. Single file, portable, no database - Similarity search — Brute-force cosine in 1-2 ms (top-5)
- Visual diff — Pixel-level differencing with 8×8 grid region detection in <1 ms
- Image capture — From files, base64, screenshots, or clipboard. Auto-resize and JPEG compression. Native screenshot support on macOS (
screencapture) and Linux (gnome-screenshot/scrot/maim); clipboard capture viaosascript(macOS) orxclip/wl-paste(Linux)
Performance
| Operation | Time |
|---|---|
| Image capture (file → embed → store) | 47 ms |
| Similarity search (top-5) | 1-2 ms |
| Visual diff (pixel-level) | <1 ms |
| Storage per capture | ~4.26 KB |
MCP Server
For LLM integration via the Model Context Protocol, see agentic-vision-mcp.
Links
- GitHub
- MCP Server
- AgenticMemory — Persistent cognitive memory for AI agents
License
MIT