web_rwkv/lib.rs
1//! # Web-RWKV
2//!
3//! This is an inference engine for the [language model of RWKV](https://github.com/BlinkDL/RWKV-LM) implemented in pure WebGPU.
4//!
5//! ## Features
6//!
7//! - No dependencies on CUDA/Python.
8//! - Support Nvidia/AMD/Intel GPUs, including integrated GPUs.
9//! - Vulkan/Dx12/OpenGL backends.
10//! - Batched inference.
11//! - Int8 and NF4 quantization.
12//! - Very fast.
13//! - LoRA merging at loading time.
14//! - Support RWKV V4, V5 and V6.
15//!
16//! ## Notes
17//!
18//! Note that `web-rwkv` is only an inference engine. It only provides the following functionalities:
19//! - A tokenizer.
20//! - Model loading.
21//! - State creation and updating.
22//! - A `run` function that takes in prompt tokens and returns logits (predicted next token probabilities after calling `softmax`).
23//!
24//! It *does not* provide the following:
25//! - OpenAI API or APIs of any kind.
26//! - If you would like to deploy an API server, check [AI00 RWKV Server](https://github.com/cgisky1980/ai00_rwkv_server) which is a fully-functional OpenAI-compatible API server built upon `web-rwkv`.
27//! - You could also check the [`web-rwkv-axum`](https://github.com/Prunoideae/web-rwkv-axum) project if you want some fancy inference pipelines, including Classifier-Free Guidance (CFG), Backus–Naur Form (BNF) guidance, and more.
28//! - Samplers, though in the examples a basic nucleus sampler is implemented, this is *not* included in the library itself.
29//! - State caching or management system.
30//! - Python (or any other languages) binding.
31//! - Runtime. Without a runtime makes it easy to be integrated into any applications from servers, front-end apps (yes, `web-rwkv` can run in browser) to game engines.
32//!
33//! ## Crate Features
34//!
35#![doc = document_features::document_features!()]
36
37pub mod context;
38pub mod num;
39pub mod runtime;
40pub mod tensor;
41pub mod tokenizer;
42
43pub use wgpu;