oxbitnet
Run BitNet b1.58 ternary LLMs with wgpu.
Part of 0xBitNet — also available as 0xbitnet (npm) and oxbitnet (Python).
Quick Start
use oxbitnet::BitNet;
use futures::StreamExt;
#[tokio::main]
async fn main() -> oxbitnet::Result<()> {
let mut model = BitNet::load("model.gguf", Default::default()).await?;
let options = oxbitnet::GenerateOptions {
max_tokens: 256,
temperature: 0.7,
top_k: 40,
repeat_penalty: 1.1,
..Default::default()
};
let mut stream = model.generate_chat(
&[
oxbitnet::ChatMessage { role: "system".into(), content: "You are a helpful assistant.".into() },
oxbitnet::ChatMessage { role: "user".into(), content: "Hello!".into() },
],
options,
);
while let Some(token) = stream.next().await {
print!("{token}");
}
model.dispose();
Ok(())
}
Features
- Native wgpu — Vulkan, Metal, DX12 backends automatically
- GGUF loading — Handles I2_S ternary packing (Microsoft BitNet fork)
- Streaming — Token-by-token via
impl Stream<Item = String>
- Chat templates — Built-in LLaMA 3 chat formatting
- Disk caching — Models cached at
~/.cache/.0xbitnet/
Example
Interactive chat CLI:
cargo run --example chat --release
cargo run --example chat --release -- --url /path/to/model.gguf
License
MIT