llama-cpp-v3 0.1.2

# llama-cpp-v3

Safe and ergonomic Rust wrapper for [llama.cpp](https://github.com/ggml-org/llama.cpp) with **runtime dynamic loading** and **auto-downloading** support.

## Features

- 🚀 **Runtime Backend Switching**: Switch between CPU, CUDA, Vulkan, and SYCL without recompiling.
- 📦 **Zero-Configuration Build**: No need for a C++ compiler or `llama.cpp` source locally.
- ⏬ **Auto-Download**: Automatically downloads pre-built `llama.cpp` binaries from GitHub releases based on the selected backend and OS.
- 🛡️ **Safe API**: RAII-style wrappers for models, contexts, and samplers.
- 🔄 **Latest llama.cpp**: Support for the modern GGUF and vocabulary APIs.

## Installation

Add this to your `Cargo.toml`:

```toml
[dependencies]
llama-cpp-v3 = "0.1.1" # Example version
```

## Quick Start

```rust
use llama_cpp_v3::{LlamaBackend, LlamaModel, LlamaContext, LlamaSampler, LoadOptions, Backend};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // 1. Load the backend (downloads DLL if missing)
    let backend = LlamaBackend::load(LoadOptions {
        backend: Backend::Cpu,
        app_name: "my_app",
        version: None, // uses latest
        explicit_path: None,
        cache_dir: None,
    })?;

    // 2. Load a model
    let model = LlamaModel::load_from_file(&backend, "tinyllama.gguf", LlamaModel::default_params(&backend))?;

    // 3. Create a context
    let mut ctx = LlamaContext::new(&model, LlamaContext::default_params(&model))?;

    // 4. Tokenize and generate
    let tokens = model.tokenize("Hello, my name is", true, true)?;
    // ... fill batch and decode ...

    Ok(())
}
```

## How it Works

This crate uses `llama-cpp-sys-v3` to dynamically load `llama.dll` (Windows) or `libllama.so` (Linux). When you call `LlamaBackend::load`, the library:
1. Checks the local cache for the specified backend.
2. If missing, it uses the GitHub API to find the latest `llama.cpp` release.
3. Downloads the appropriate binary zip for your OS and Architecture.
4. Extracts and loads the library at runtime.

## License

MIT