multiscreen-rs
A Rust implementation of the Multiscreen neural language model — training and inference — powered by Burn. Based on Screening Is Enough.
Quickstart
# Train (10M params, ~25 min on CUDA)
# Chat
Data and tokenizer bundled in examples/data/. No setup needed.
CLI
Train
# Quick test (CPU, ~1 min)
# 10M on CUDA (~25 min)
# 50M on CUDA (~2-3 hrs)
# Your own data (put tokenizer.model + .jsonl in a directory)
Chat
# Interactive (streaming)
# One-shot
# Custom sampling
Run with
--helpto see all options (training & chat).
Data Formats
Put training data alongside tokenizer.model in any directory:
| Format | Description |
|---|---|
.txt |
Blank lines separate samples; split 60/40 into prompt/response |
.jsonl text |
{"text": "your text here"} — one sample per line |
.jsonl chat |
{"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]} — OpenAI format |
Tip: Aim for ≥10× more tokens than parameters. 10M params → ~83M+ tokens.
Library Usage
[]
= "0.2"
# multiscreen-rs = { version = "0.2", features = ["cuda"] }
use *;
// Training
let mut trainer = builder
.vocab_size
.budget
.device
.batch_size.seq_len.steps
.build?;
let report = trainer.train_on_token_sequences?;
// or: trainer.train_on_chat_sequences(&chat_pairs)?;
// Inference
let model = load?;
let tokens = model.generate?;
model.generate_stream?;
Architecture
Transformer: score = QK^T/√d → softmax → attn · V
Multiscreen: alpha = clamp(1 - r*(1 - QK^T), 0, 1)² (trim-and-square)
relevance = alpha * causal_cosine_window (NO softmax)
out = tanh_norm(relevance · V) * silu_gate(g)
No softmax → no attention dilution at long contexts. Advantages show at seq_len 4K+.
Parameter budgets: 1m 5m 10m 50m 100m