Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Mullama
Run any LLM locally. Use it from any language. Deploy anywhere.
Mullama is a local LLM server and library that works just like Ollama — same CLI commands, same model format, same Modelfile syntax — but with native language bindings for Python, Node.js, Go, PHP, Rust, and C/C++. Embed inference directly in your app with zero HTTP overhead, or run it as a server with OpenAI and Anthropic-compatible APIs.
Install
# One-liner (Linux/macOS)
|
# Windows (PowerShell)
|
# Or via package managers
Quick Start
# Run a model (daemon auto-starts)
# Interactive chat
# Start an OpenAI-compatible server
Coming from Ollama? Your commands work unchanged — run, pull, serve, list, ps, create, show, rm, cp.
Use as a Library
Embed LLM inference directly in your application — no server, no HTTP overhead, no separate process.
Python:
=
=
=
Node.js:
const = require;
const model = await ;
const ctx = ;
const response = await ctx.;
console.log;
Rust:
use ;
let model = load?;
let mut ctx = new?;
let response = ctx.generate?;
println!;
Go:
import "github.com/cognisoc/mullama"
model, _ := mullama.LoadModel("llama3.2-1b.gguf", &mullama.ModelParams)
ctx, _ := mullama.NewContext(model, &mullama.ContextParams)
response, _ := ctx.Generate("Hello, AI!", 256, nil)
fmt.Println(response)
PHP:
See the bindings documentation for full API details.
Why Mullama?
| Native bindings for 6 languages | Python, Node.js, Go, PHP, Rust, C/C++ — call models directly, no HTTP roundtrips |
| Drop-in Ollama replacement | Same CLI commands, same Modelfile format, same model registry |
| OpenAI + Anthropic API compatible | Use your existing SDKs and tools without changes |
| Embed in any app | Run inference in-process — no separate daemon required |
| 7 GPU backends | CUDA, Metal, ROCm, OpenCL, Vulkan, SYCL, RPC |
| Multimodal | Text, image, and real-time audio with voice activity detection |
| Built-in Web UI and TUI | Chat interface, model management, and API playground |
What You Can Build
- Chatbots and assistants — Streaming responses, multi-turn context, and custom system prompts
- RAG pipelines — Embeddings, ColBERT-style semantic search, and grammar-constrained generation
- Voice assistants — Real-time audio capture with VAD, speech-to-text, and streaming LLM responses
- API servers — Production-ready OpenAI-compatible endpoints with streaming SSE
- Edge deployments — Embed a model directly in your app with no network dependency
- Batch processing — Parallel inference across documents with work-stealing scheduling
Ollama Compatibility
| Feature | Mullama | Ollama |
|---|---|---|
CLI commands (run, pull, serve, etc.) |
Same syntax | — |
| Modelfile format | Compatible | — |
| GGUF models | Yes | Yes |
| OpenAI API | Yes | Yes |
| Anthropic API | Yes | No |
| Native language bindings | 6 languages | HTTP only |
| Embed in your app (no daemon) | Yes | No |
| Built-in Web UI | Yes | No |
| Built-in TUI chat | Yes | No |
Full comparison | Migration guide
GPU Acceleration
Set the environment variable for your hardware before building:
# NVIDIA CUDA
# Apple Silicon
# AMD ROCm
# Intel/AMD OpenCL
# Vulkan (cross-platform)
# Intel SYCL/oneAPI
# Distributed RPC backend
Roadmap
- v0.4 — Speculative decoding, prompt caching, improved quantization support
- v0.5 — Distributed inference across multiple nodes, model sharding
- v0.6 — Built-in fine-tuning (LoRA/QLoRA), training data pipelines
- v1.0 — Stable API, LTS release, comprehensive benchmarks
Documentation
Full documentation is available at docs.cognisoc.com/mullama.
Guides cover installation, library usage, daemon configuration, language bindings, advanced features, API reference, and tutorials.
Contributing
See CONTRIBUTING.md for guidelines.
License
MIT License — see LICENSE for details.