π§ DeepSeek R1 (Rust) β Research-Grade Reasoning Model Prototype
A Rust implementation of a DeepSeek R1βinspired reasoning model focused on clarity, testability, and strong engineering practices. This project is designed to be an impressive portfolio piece: it includes a modular transformer architecture, reasoning-aware inference, evaluation harness, examples, comprehensive tests, and CI.
Highlights:
- Fully-typed Rust 2024 crate with modules for model, inference, training, and utilities
- Transformer stack with rotary embeddings, standard attention, pre-norm layers, and an LM head
- MLA (Multi-head Latent Attention) and MoE (Mixture of Experts) components implemented and tested
- Reasoning-aware generation pipeline with β¦ parsing and structured analysis
- Evaluation harness for benchmarks across math, logic, programming, and general reasoning
- Examples that compile and run via cargo
- GitHub Actions CI with fmt, clippy, build, unit + integration tests, benchmarks (artifacts), and docs publishing
π Quick Start
Prerequisites: Rust (stable), Cargo.
Build:
Run CLI:
# Help (shows available commands)
Core commands:
# Show default model configuration
# Show version and build info
# Run basic checks and smoke tests
# Generate text from a prompt (uses simple model forward)
# Evaluate reasoning benchmarks (math, logic, programming, general)
# Export evaluation results as JSON (for dashboards)
# Save current model weights (full)
# Save only lm_head parameters; exclude embeddings
# Save a small demo-size checkpoint (size-conscious)
# Load weights and generate deterministically (temperature=0)
# Load only lm_head from checkpoint, allowing missing others
Run examples:
Tests + checks:
# Unit + doc tests
# Integration tests (CLI)
# Optional heavier integration tests
# Lints/format
# Benchmarks (Criterion)
π§° Devcontainer
A ready-to-use devcontainer is provided at .devcontainer/devcontainer.json for reproducible development with VS Code or compatible editors.
Requirements:
- Docker (or a compatible container runtime)
- VS Code with the βDev Containersβ extension (or an equivalent)
Usage:
- Open the project folder in VS Code.
- When prompted, choose βReopen in Containerβ (or use the Command Palette: βDev Containers: Reopen in Containerβ).
- The container installs Rust stable, rustfmt, clippy, llvm-tools, and utilities like cargo-tarpaulin and cargo-criterion.
Common commands inside the devcontainer:
# Run unit + integration tests
# Lints/format
# Benchmarks (Criterion)
Notes:
- Cargo registries are cached via container volumes for faster builds.
- The environment enables colored output and backtraces by default.
π³ Docker Usage
A minimal Dockerfile is included for reproducible builds and tests.
Build the image:
Run the CLI:
# Print version
# Show default config
# Generate text
Mount the project and run from source (optional):
# Evaluate and export JSON results using the container runtime
Run tests inside the container:
# Use the built toolchain to run tests against your mounted workspace
Tip:
- For faster local iteration, you can keep the container warm and re-run commands without rebuilding the image unless dependencies change.
ποΈ Project Structure
βββ ds-r1_rs/ # Rust crate
β βββ Cargo.toml
β βββ src/
β β βββ main.rs # CLI: config/version/test/generate/eval
β β βββ lib.rs # Public crate API & re-exports
β β βββ model/ # Core model components
β β β βββ config.rs # Model configuration & validation
β β β βββ transformer.rs# Transformer stack + LM head (implemented)
β β β βββ attention.rs # Standard attention + MLA + Linear
β β β βββ layers.rs # Pre-norm TransformerLayer, FFN (SwiGLU), LayerNorm
β β β βββ embeddings.rs # Token + Rotary embeddings
β β β βββ moe.rs # Mixture of Experts (router, experts, load balancing)
β β βββ inference/ # Inference & reasoning
β β β βββ engine.rs # InferenceEngine + high-level solve/explain APIs
β β β βββ generation.rs # Text generation & configs (KV cache placeholder)
β β β βββ sampling.rs # Greedy/temperature/top-k sampling
β β β βββ reasoning.rs # <think> parsing, states, analysis
β β β βββ math_solver.rs# Structured math solver utilities
β β β βββ code_analyzer.rs
β β βββ training/ # Training infrastructure (supervised + RL scaffolding)
β β β βββ data.rs # Datasets + loaders + synthetic generator
β β β βββ loss.rs # CrossEntropy + metrics
β β β βββ optimizer.rs # Adam optimizer
β β β βββ trainer.rs # BasicTrainer + RLTrainer (REINFORCE scaffolding)
β β βββ utils/ # Errors, math, tokenizer, evaluation harness
β βββ examples/ # Ready-to-run demos
β βββ generation_demo.rs
β βββ math_solver_demo.rs
β βββ training_demo.rs
β βββ config_demo.rs
βββ .github/workflows/ci.yml # CI: build, lint, test, examples, coverage
π§© Whatβs Implemented
- Model
- Token embeddings (+ scaling), Rotary embedding (RoPE)
- Transformer layers with pre-norm and residuals
- Standard multi-head attention with causal masking
- Feed-forward with SwiGLU activation
- Final layer norm + LM head (Linear)
- Forward pass returning flattened logits
[seq_len * vocab_size]
- Advanced Modules (standalone, tested)
- MLA (Multi-head Latent Attention) with compressed KV via LoRA-style compression
- Mixture of Experts (experts, router, load balancer)
- Inference & Reasoning
- InferenceEngine with text generation APIs
- Reasoning-aware generation with β¦ support
- Reasoning chain parsing, analysis, and structured outputs
- Evaluation
- EvaluationHarness to run curated benchmarks (math, logic, programming, science, general)
- Per-problem metrics, performance placeholders, category & difficulty breakdowns
- Training (Prototype)
- Basic supervised training scaffold (cross-entropy)
- RL training scaffold (REINFORCE with a simple reward function)
- Utilities
- Tokenizer powered by tiktoken-rs (BPE), math helpers, error handling (thiserror)
- Engineering
- Unit tests across modules
- CI (fmt, clippy, build, test, run examples, coverage with tarpaulin)
- Examples showing end-to-end flows
π§ͺ Usage Examples
Programmatic usage:
use ;
CLI usage (quick):
πΎ Checkpointing & Reproducibility
You can save/load checkpoints in JSON v1 format. Partial save/load is supported via name-prefix filters. For size-conscious artifacts, use the demo-small model configuration.
Examples:
# Full save
# Partial save/load only lm_head.*
# Size-conscious small artifact
# Deterministic generation (temperature=0 applied automatically in load-weights flow)
π§ How Reasoning Works Here
This prototype uses special thinking tokens and a reasoning state machine to parse and structure βthoughtsβ during generation:
- The generator can produce
<think> ... </think>sections. - The
ReasoningEnginetracks states (Normal/Thinking/Answering), captures steps, and produces aReasoningOutput. - The
EvaluationHarnessaggregates metrics (accuracy, clarity, verification presence) across curated benchmarks and reports performance and breakdowns.
βοΈ Implementation Notes
- The transformer forward is implemented and functional:
- Embeddings β N Γ TransformerLayer β FinalNorm β LM Head
- Standard attention uses RoPE and causal masking.
- Output shape is flattened
[seq_len * vocab_size]for simple integration with training and demos.
- MLA and MoE are integrated into the Transformer stack via config toggles (Standard|MLA attention, Dense|MoE FFN), with support for mixed-depth patterns (e.g., periodic MLA/MoE) and telemetry (compression, routing).
- Generation includes sampling strategies (greedy, temperature, top-k) and incremental decoding with a per-layer KV cache; tokens/sec is reported in CLI and evaluation.
- Training code is intentionally conservativeβscaffolding and examples demonstrate APIs, not production SGD for large checkpoints.
π¬ Benchmarks & Evaluation
Use:
This runs curated reasoning benchmarks via the EvaluationHarness:
- Mathematics (arithmetic, algebra, word problems, equations)
- Logical reasoning
- Programming logic
- Science reasoning
- General reasoning
Metrics reported:
- Accuracy proxy with numeric tolerance for math answers
- Reasoning depth, clarity, verification presence
- Tokens/sec and reasoning overhead
π§ Roadmap
Whatβs next (post v0.1)
- Inference
- True streaming token-by-token callbacks
- Beam search, top-p sampling, and repetition penalty with full history
- Architecture
- Additional telemetry for MLA compression and MoE routing balance
- Configurable dropouts, norms, activations; adapter/residual options for MLA paths
- Training
- Extend backward pass beyond LM head/embeddings; broader parameter updates
- Mixed precision and larger-batch experiments
- Evaluation
- Exact-match datasets and code execution-based tasks
- Richer telemetry and standardized result schemas
- Tooling
- More integration tests and benchmark coverage
π§° CI/CD
GitHub Actions workflow runs on PRs and main:
- rustfmt, clippy (CI runs warn-only; locally recommend -D warnings)
- build + unit and integration tests
- run examples and Criterion benchmarks (artifacts uploaded)
- coverage via tarpaulin (artifacts) and docs published (docs.rs per release, GitHub Pages via workflow)
π€ Contributing
This is a research/education project. Issues and PRs are welcome. Please:
- Keep code modular, documented, and tested
- Maintain CI green (fmt, clippy, tests)
- Include examples or docs for new features
π License
MIT β see the crate manifest for details.
Made with insistence by Khaled.