multiscreen-rs 0.3.0

A Rust implementation of the Multiscreen neural language model — training and inference powered by Burn.
Documentation

multiscreen-rs

A Rust implementation of the Multiscreen neural language model — training and inference — powered by Burn. Based on Screening Is Enough.

Quickstart

# Train (10M params, ~25 min on CUDA)
cargo run --release --features cuda --example train_with_tokenizer -- \
    --budget 10m --steps 10000 --batch-size 16 --seq-len 256

# Chat
cargo run --release --features cuda --example chat_with_tokenizer -- --run-dir runs/my-model

Data and tokenizer bundled in examples/data/. No setup needed.


CLI

Train

# Quick test (CPU, ~1 min)
cargo run --release --example train_with_tokenizer -- --budget 1m --steps 500

# 10M on CUDA (~25 min)
cargo run --release --features cuda --example train_with_tokenizer -- \
    --budget 10m --steps 10000 --batch-size 16 --seq-len 256

# 50M on CUDA (~2-3 hrs)
cargo run --release --features cuda --example train_with_tokenizer -- \
    --budget 50m --steps 10000 --batch-size 4 --seq-len 256

# Your own data (put tokenizer.model + .jsonl in a directory)
cargo run --release --features cuda --example train_with_tokenizer -- \
    --train-dir /path/to/data --run-dir runs/custom --budget 10m --steps 50000

Chat

# Interactive (streaming)
cargo run --release --features cuda --example chat_with_tokenizer -- --run-dir runs/my-model

# One-shot
cargo run --release --features cuda --example chat_with_tokenizer -- \
    --run-dir runs/my-model --prompt "Once upon a time"

# Custom sampling
cargo run --release --features cuda --example chat_with_tokenizer -- \
    --run-dir runs/my-model --temperature 0.8 --top-k 40 --max-new-tokens 256

Run with --help to see all options (training & chat).


Data Formats

Put training data alongside tokenizer.model in any directory:

Format Description
.txt Blank lines separate samples; split 60/40 into prompt/response
.jsonl text {"text": "your text here"} — one sample per line
.jsonl chat {"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]} — OpenAI format

Tip: Aim for ≥10× more tokens than parameters. 10M params → ~83M+ tokens.


Library Usage

[dependencies]
multiscreen-rs = "0.2"
# multiscreen-rs = { version = "0.2", features = ["cuda"] }
use multiscreen_rs::prelude::*;

// Training
let mut trainer = Trainer::builder()
    .vocab_size(1000)
    .budget(ParameterBudget::Params10M)
    .device(auto_device()?)
    .batch_size(16).seq_len(128).steps(50_000)
    .build()?;

let report = trainer.train_on_token_sequences(&sequences)?;
// or: trainer.train_on_chat_sequences(&chat_pairs)?;

// Inference
let model = ChatModel::load("checkpoints/latest.mpk")?;
let tokens = model.generate(&[1, 2, 3], GenerationConfig::default())?;

model.generate_stream(&[1, 2, 3], GenerationConfig::default(), |id, _| {
    print!("{id} "); true
})?;

Architecture

Transformer:    score = QK^T/√d → softmax → attn · V
Multiscreen:    alpha = clamp(1 - r*(1 - QK^T), 0, 1)²   (trim-and-square)
                relevance = alpha * causal_cosine_window   (NO softmax)
                out = tanh_norm(relevance · V) * silu_gate(g)

No softmax → no attention dilution at long contexts. Advantages show at seq_len 4K+.

Parameter budgets: 1m 5m 10m 50m 100m

License

MIT