Skip to main content

Crate oxillama_py

Crate oxillama_py 

Source
Expand description

§oxillama-py

PyO3 Python bindings for the OxiLLaMa Pure-Rust LLM inference engine.

§Quick start

import oxillama_py

config = oxillama_py.EngineConfig(model_path="model.gguf", context_size=4096)
engine = oxillama_py.Engine(config)
engine.load_model()

text = engine.generate("Hello", max_tokens=128)
emb  = engine.embed("Hello world")   # List\[float\]
toks = engine.tokenize("Hello")      # List[int]

engine.generate_streaming(
    "Hello",
    max_tokens=128,
    callback=lambda tok: print(tok, end="", flush=True),
)

§Module structure

Python classRust source
EngineConfigengine.rs
Engineengine.rs
SamplerConfigsampler.rs
SpeculativeConfigspeculative.rs
SpeculativeEnginespeculative.rs
Loralora.rs

Modules§

async_support
Async Python support for OxiLLaMa.
callback
Python-callable streaming bridge utilities.
cancel
Python-accessible CancellationToken for cooperative cancellation of generation.
chat_template
Pure-Rust chat template engine for common HuggingFace prompt formats.
dlpack
DLPack v0.8 capsule producer/consumer for f32 CPU tensors.
engine
Python wrappers for InferenceEngine and EngineConfig.
error
Error conversion from Rust error types to pyo3::PyErr.
lora
Python wrapper for LoadedLora.
sampler
Python wrapper for SamplerConfig.
snapshot
speculative
Python wrappers for SpeculativeEngine and SpeculativeConfig.
tokenizer
Python wrapper for TokenizerBridge — standalone tokenizer access.
torch_interop
Torch interop registration hook.