Crate oxillama_py

Expand description

§oxillama-py

PyO3 Python bindings for the OxiLLaMa Pure-Rust LLM inference engine.

§Quick start

import oxillama_py

config = oxillama_py.EngineConfig(model_path="model.gguf", context_size=4096)
engine = oxillama_py.Engine(config)
engine.load_model()

text = engine.generate("Hello", max_tokens=128)
emb  = engine.embed("Hello world")   # List\[float\]
toks = engine.tokenize("Hello")      # List[int]

engine.generate_streaming(
    "Hello",
    max_tokens=128,
    callback=lambda tok: print(tok, end="", flush=True),
)

§Module structure

Python class	Rust source
`EngineConfig`	`engine.rs`
`Engine`	`engine.rs`
`SamplerConfig`	`sampler.rs`
`SpeculativeConfig`	`speculative.rs`
`SpeculativeEngine`	`speculative.rs`
`Lora`	`lora.rs`

Modules§

async_support: Async Python support for OxiLLaMa.
callback: Python-callable streaming bridge utilities.
cancel: Python-accessible CancellationToken for cooperative cancellation of generation.
chat_template: Pure-Rust chat template engine for common HuggingFace prompt formats.
dlpack: DLPack v0.8 capsule producer/consumer for f32 CPU tensors.
engine: Python wrappers for InferenceEngine and EngineConfig.
error: Error conversion from Rust error types to pyo3::PyErr.
lora: Python wrapper for LoadedLora.
sampler: Python wrapper for SamplerConfig.
snapshot
speculative: Python wrappers for SpeculativeEngine and SpeculativeConfig.
tokenizer: Python wrapper for TokenizerBridge — standalone tokenizer access.
torch_interop: Torch interop registration hook.

Crate oxillama_py

Crate oxillama_py Copy item path

§oxillama-py

§Quick start

§Module structure

Modules§

Crate oxillama_py