embeddenator-interop
Interoperability layer for Embeddenator: format conversions, FFI bindings, and language integrations.
Independent component extracted from the Embeddenator monolithic repository. Part of the Embeddenator workspace.
Repository: https://github.com/tzervas/embeddenator-interop
Status
Phase 2B Component Migration - Core functionality complete.
Implementation includes:
- ✅ Format conversion (JSON, bincode, text)
- ✅ C/C++ FFI bindings with automated header generation
- ✅ Python bindings (PyO3) - requires
pythonfeature - ✅ Envelope compression support (Zstd, LZ4)
- ✅ High-level adapter layers
Features
Format Conversions
- JSON: Human-readable, cross-language compatible
- Bincode: Efficient binary serialization
- Text: Debug-friendly output format
- Round-trip guarantees for JSON and bincode formats
- Support for all core Embeddenator types (SparseVec, Engram, Manifest, VSAConfig)
FFI Bindings (C/C++)
- C-compatible interface for cross-language integration
- Opaque handle-based API for memory safety
- Core operations: encode, decode, bundle, bind, cosine similarity
- Serialization to/from JSON for data interchange
- Well-documented safety requirements
Python Bindings (Optional)
- PyO3-based Python API (enable
pythonfeature) - Pythonic interface with property accessors
- Native integration with Python bytes and strings
- JSON and bincode serialization support
Adapter Layers
- EnvelopeAdapter: Full compression support with Zstd and LZ4 codecs
- FileAdapter: High-level file I/O operations
- StreamAdapter: Streaming encode/decode for large data
- BatchAdapter: Batch operations for efficiency
- AutoFormatAdapter: Automatic format detection
Compression Support
- Zstd: High compression ratio (enable
compression-zstdfeature) - LZ4: Fast compression (enable
compression-lz4feature) - None: No compression for maximum speed
- Full round-trip guarantees for all compression codecs
Automated C Header Generation
- Automatic C header generation via cbindgen (enable
c-bindingsfeature) - Headers generated in
include/embeddenator_interop.h - C++ compatible with proper include guards
- Full documentation included in generated headers
Kernel Interop
- Backend-agnostic VSA operations
- Vector store abstraction
- Candidate generation and reranking
- Runtime integration support
Usage
Format Conversion
use ;
use SparseVec;
// Create a vector
let vec = SparseVec ;
// Convert to JSON
let json_bytes = sparse_vec_to_format.unwrap;
let from_json = sparse_vec_from_format.unwrap;
// Convert to bincode
let bincode_bytes = sparse_vec_to_format.unwrap;
let from_bincode = sparse_vec_from_format.unwrap;
// Debug text format
let text = sparse_vec_to_format.unwrap;
println!;
File Operations
use FileAdapter;
use ;
use Manifest;
// Save and load sparse vectors
let vec = new;
save_sparse_vec.unwrap;
let loaded = load_sparse_vec.unwrap;
// Save and load config
let config = default;
save_vsa_config.unwrap;
let loaded_config = load_vsa_config.unwrap;
// Save and load manifests
let manifest = Manifest ;
save_manifest.unwrap;
let loaded_manifest = load_manifest.unwrap;
Batch Operations
use BatchAdapter;
use ReversibleVSAConfig;
let config = default;
let data_chunks = vec!;
// Batch encode
let vectors = batch_encode;
// Batch similarity
let query = vectors.clone;
let similarities = batch_similarity;
// Batch bundle
let bundled = batch_bundle;
C FFI Example
// Create vectors
SparseVecHandle* vec1 = ;
SparseVecHandle* vec2 = ;
// Perform operations
SparseVecHandle* bundled = ;
double similarity = ;
// Encode data
VSAConfigHandle* config = ;
const char* data = "Hello, C!";
SparseVecHandle* encoded = ;
// Serialize to JSON
ByteBuffer json = ;
// Use json.data, json.len...
;
// Cleanup
;
;
;
;
;
Python Example
# Create vectors
=
=
# Operations
=
=
# Encode data
=
= b
=
# Serialize
=
=
Features
Default features: None
Optional features:
python: Enable Python bindings via PyO3c-bindings: Enable automated C header generation with cbindgencompression-zstd: Enable Zstd compression codeccompression-lz4: Enable LZ4 compression codeccompression: Enable all compression codecs (zstd + lz4)
Dependencies
[]
= { = "0.20.0-alpha.1" }
# With Python support
= { = "0.20.0-alpha.1", = ["python"] }
Development
Build
# Standard build
# With compression support
# Generate C headers (creates include/embeddenator_interop.h)
# With all features
# With Python support
# Release build
Test
# Run all tests
# Run with Python tests
Architecture
- formats.rs: Format conversion utilities (JSON, bincode, text)
- ffi.rs: C FFI bindings with opaque handles
- bindings.rs: Python bindings via PyO3 (optional)
- adapters.rs: High-level adapter layers
- kernel_interop.rs: Backend-agnostic VSA operations
Supported Formats
| Type | JSON | Bincode | Text |
|---|---|---|---|
| SparseVec | ✓ | ✓ | ✓ (read-only) |
| Engram | ✓ | ✓ | ✓ (read-only) |
| Manifest | ✓ | ✓ | ✓ (read-only) |
| SubEngram | ✓ | ✓ | ✓ (read-only) |
| VSAConfig | ✓ | ✓ | ✓ (read-only) |
FFI Safety
All FFI functions are marked unsafe and require:
- Valid, non-null pointers
- Proper memory management (caller frees returned memory)
- Null-terminated UTF-8 strings
- No use-after-free violations
See FFI documentation for detailed safety contracts.
Integration Recommendations
For C/C++ Projects
- Include generated header (requires cbindgen)
- Link against libraryembeddenator_interop.a or .so
- Follow opaque handle pattern
- Always free allocated resources
For Python Projects
- Build with
--features python - Install as Python module
- Use Pythonic interface with native types
- Serialization integrates with pickle/JSON
For Rust Projects
- Use adapter layers for high-level operations
- Use formats module for conversion needs
- Use kernel_interop for backend integration
- Direct access to all functionality
Performance Notes
- Bincode is ~10x faster than JSON for serialization
- Batch operations reduce overhead for multiple items
- Streaming adapters minimize memory usage for large data
- FFI calls have minimal overhead (single indirection)
License
MIT
See Also
- ADR-016 - Component decomposition rationale
- embeddenator - Main repository
- embeddenator-vsa - VSA implementation
- embeddenator-fs - Filesystem types
- embeddenator-io - I/O utilities