tsetlin-rs
Highlights
Performance: 4.4x parallel speedup | 116x bitwise evaluation | O(1) inference
Memory: Zero-allocation SmallClause | Cache-aligned structures | no_std support
Correctness: 99%+ test coverage | Property-based testing | Deterministic seeds
Table of Contents
- Overview
- Installation
- Quick Start
- Models
- Parallel Training
- Clause Implementations
- Advanced Features
- Benchmarks
- Algorithm Reference
- API Reference
- Coverage
- In Memory of Michael Tsetlin
- References
- License
Overview
The Tsetlin Machine is a machine learning algorithm based on propositional logic and game theory. Unlike neural networks, it learns human-readable rules (conjunctions of literals) that can be directly interpreted and verified.
Key Properties:
| Property | Tsetlin Machine | Neural Network |
|---|---|---|
| Interpretability | Rules in propositional logic | Black box |
| Training | Reinforcement learning | Gradient descent |
| Inference | Boolean operations | Matrix multiplication |
| Hardware | FPGA/ASIC friendly | GPU optimized |
| Memory | O(clauses × features) | O(layers × neurons²) |
| Term | Definition | Category |
|---|---|---|
| AoS | Array of Structures — traditional object layout | opt |
| Bit-plane | Transposed bit representation for parallel operations | opt |
| Clause | Conjunction (AND) of literals; votes for/against a class | core |
| CoTM | Coalesced Tsetlin Machine | abbr |
| CTM | Convolutional Tsetlin Machine | abbr |
| False sharing | Cache line contention between CPU cores | opt |
| FPGA | Field-Programmable Gate Array | abbr |
| Literal | Boolean variable (xₖ) or its negation (¬xₖ) |
core |
| MSB | Most Significant Bit — encodes automaton action | opt |
| Polarity | Clause vote direction: +1 or −1 | core |
| Ripple-carry | Bit-level addition/subtraction algorithm | opt |
| RNG | Random Number Generator | abbr |
| SIMD | Single Instruction Multiple Data | abbr |
| SoA | Structure of Arrays — cache-friendly layout | opt |
| Specificity (s) | Controls pattern generality; higher = fewer literals | train |
| TA | Tsetlin Automaton | abbr |
| Threshold (T) | Controls feedback probability | train |
| TM | Tsetlin Machine | abbr |
core — fundamentals · train — training · opt — optimization · abbr — abbreviation
Installation
[]
= "0.2"
With parallel training and serialization:
[]
= { = "0.2", = ["parallel", "serde"] }
Feature Flags
| Feature | Default | Description |
|---|---|---|
std |
Yes | Standard library (disable for embedded) |
parallel |
No | Lock-free parallel training via rayon |
serde |
No | Serialization/deserialization |
simd |
No | SIMD optimization (requires nightly) |
Quick Start
use ;
// Configure: 20 clauses, 2 features
let config = builder
.clauses
.features
.build
.unwrap;
// Create machine with threshold T=15
let mut tm = new;
// XOR dataset
let x = vec!;
let y = vec!;
// Train for 200 epochs with seed=42
tm.fit;
// Evaluate
let accuracy = tm.evaluate;
println!;
// Extract learned rules
for rule in tm.rules
Output:
Accuracy: 100.0%
+: x₀ ∧ ¬x₁
+: ¬x₀ ∧ x₁
-: x₀ ∧ x₁
-: ¬x₀ ∧ ¬x₁
Models
Binary Classification — TsetlinMachine
Standard two-class classification with weighted clause voting.
use ;
let config = builder.clauses.features.build.unwrap;
let mut tm = new;
tm.fit;
let prediction = tm.predict; // 0 or 1
Multi-class Classification — MultiClass
One-vs-all ensemble of binary classifiers.
use ;
let config = builder.clauses.features.build.unwrap;
let mut tm = new; // 10 classes
tm.fit;
let class = tm.predict; // 0..9
Regression — Regressor
Continuous output via clause voting with binning.
use ;
let config = builder.clauses.features.build.unwrap;
let mut reg = new;
reg.fit;
let value = reg.predict; // f32
Convolutional — Convolutional
2D patch extraction for image-like data.
use ;
let config = ConvConfig ;
let mut ctm = new;
Parallel Training
This implementation provides lock-free parallel training based on the Massively Parallel and Asynchronous Tsetlin Machine Architecture (ICML 2021).
The Problem
Traditional TM training requires synchronization barriers:
┌────────────────────────────────────────────────────────────────────────┐
│ TRADITIONAL (SYNCHRONOUS) │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ [Clause 0] ──┐ │
│ [Clause 1] ──┼──► BARRIER ──► Sum Votes ──► BARRIER ──► Feedback │
│ [Clause 2] ──┤ ▲ ▲ │
│ ... │ │ │ │
│ [Clause N] ──┘ wait all wait all │
│ │
│ Problem: Threads idle during barriers = poor scaling │
└────────────────────────────────────────────────────────────────────────┘
The Solution: Async Local Voting Tallies
Each training sample maintains its own atomic vote accumulator. Clauses update tallies via fetch_add without synchronization.
┌────────────────────────────────────────────────────────────────────────┐
│ PARALLEL V1 (ASYNC TALLIES) │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ Sample 0: LocalTally ─────────────────────────────────────────────┐ │
│ Sample 1: LocalTally ─────────────────────────────────────────────┤ │
│ Sample N: LocalTally ─────────────────────────────────────────────┘ │
│ ▲ │
│ │ atomic fetch_add (no locks) │
│ │ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Thread 0 │ Thread 1 │ Thread 2 │ ... │ Thread M │ │
│ │ [Sample 0] │ [Sample 1] │ [Sample 2] │ │ [Sample N] │ │
│ │ eval all │ eval all │ eval all │ │ eval all │ │
│ │ clauses │ clauses │ clauses │ │ clauses │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ Speedup: 1.5x (evaluation parallel, feedback sequential) │
└────────────────────────────────────────────────────────────────────────┘
Parallel V2: Fully Parallel Feedback
V2 parallelizes the feedback phase by processing clauses instead of samples:
┌────────────────────────────────────────────────────────────────────────┐
│ PARALLEL V2 (FULLY PARALLEL) │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ Key insight: Each clause's automata occupy DISJOINT memory regions │
│ │
│ states: [clause_0_automata | clause_1_automata | ... | clause_N] │
│ ▲ ▲ ▲ │
│ │ │ │ │
│ Thread 0 Thread 1 Thread N │
│ (no conflict) (no conflict) (no conflict) │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Clause 0 │ Clause 1 │ ... │ Clause N │ │
│ │ ├─ sample 0 │ ├─ sample 0 │ │ ├─ sample 0 │ │
│ │ ├─ sample 1 │ ├─ sample 1 │ │ ├─ sample 1 │ │
│ │ └─ sample N │ └─ sample N │ │ └─ sample N │ │
│ │ [feedback all] │ [feedback all] │ │ [feedback all] │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ Speedup: 4.4x (both evaluation AND feedback parallel) │
└────────────────────────────────────────────────────────────────────────┘
Usage
use ;
// Create clause bank and batch
let mut bank = new; // 100 clauses, 64 features
let batch = new;
for epoch in 0..100
// Inference
let vote_sum = bank.sum_votes;
let prediction = if vote_sum >= 0.0 else ;
Implementation Details
LocalTally — Atomic Vote Accumulator
- CachePadded: Prevents false sharing between CPU cores
- AtomicI64: Lock-free concurrent updates
- Scaled integers:
f32weights →i64 * 10000for atomics
const WEIGHT_SCALE: i64 = 10_000; // 4 decimal precision
// Accumulate vote (thread-safe)
tally.fetch_add;
// Read final sum
let sum = tally.load as f32 / SCALE as f32;
ParallelBatch — Training Data Container
Memory Ordering
| Operation | Ordering | Rationale |
|---|---|---|
tally.fetch_add() |
Relaxed |
Addition is commutative |
tally.load() |
Acquire |
Synchronize-with all prior stores |
tally.store(0) |
Relaxed |
Reset before new epoch |
Performance Comparison
100 clauses, 64 features, 1 epoch:
| Samples | Sequential | V1 (par eval) | V2 (fully par) | Speedup |
|---|---|---|---|---|
| 100 | 2.93 ms | 2.00 ms | 0.81 ms | 3.6x |
| 500 | 14.96 ms | 10.11 ms | 3.39 ms | 4.4x |
| 1000 | 29.15 ms | 19.92 ms | 6.58 ms | 4.4x |
Original CUDA implementation reports 50x speedup on GPU.
Clause Implementations
This library provides four clause types optimized for different scenarios:
Decision Guide
┌─────────────────────────┐
│ Feature count known at │
│ compile time? │
└───────────┬─────────────┘
│
┌───────────┴───────────┐
│ │
YES NO
│ │
┌───────┴───────┐ │
│ N ≤ 64? │ ▼
└───────┬───────┘ ┌───────────────┐
│ │ Clause │
┌───────┴───────┐ │ (heap, serde) │
│ │ └───────────────┘
YES NO
│ │
▼ ▼
┌───────────────┐ ┌─────────────────────┐
│ SmallClause<N>│ │SmallBitwiseClause │
│ (fastest, no │ │<N,W> (bitwise ops, │
│ heap) │ │ no heap) │
└───────────────┘ └─────────────────────┘
1. Clause — Dynamic, Heap-Allocated
use Clause;
let clause = new; // 64 features, 100 states, +1 polarity
let fires = clause.evaluate;
let vote = clause.vote; // polarity × fires × weight
Memory Layout:
┌─────────────────────────────────────────────────────────────┐
│ Clause (64-byte aligned) │
├─────────────────────────────────────────────────────────────┤
│ automata: Vec<Automaton> │ 24 bytes (ptr + len + cap) │
│ n_features: usize │ 8 bytes │
│ weight: f32 │ 4 bytes │
│ activations: u32 │ 4 bytes │
│ correct: u32 │ 4 bytes │
│ incorrect: u32 │ 4 bytes │
│ polarity: i8 │ 1 byte + 7 padding │
├─────────────────────────────────────────────────────────────┤
│ + heap: [Automaton; 2N] │ 8N bytes │
└─────────────────────────────────────────────────────────────┘
Best for: Unknown dimensions, serialization, large feature sets (1000+)
2. SmallClause<N> — Const Generic, Stack-Allocated
use ;
let clause: Clause16 = new;
let fires = clause.evaluate; // [u8; 16]
Memory Layout:
┌─────────────────────────────────────────────────────────────┐
│ SmallClause<N> (stack-allocated) │
├─────────────────────────────────────────────────────────────┤
│ include: [Automaton; N] │ 4N bytes │
│ negated: [Automaton; N] │ 4N bytes │
│ weight: f32 │ 4 bytes │
│ activations: u32 │ 4 bytes │
│ correct: u32 │ 4 bytes │
│ incorrect: u32 │ 4 bytes │
│ polarity: i8 │ 1 byte │
├─────────────────────────────────────────────────────────────┤
│ Total: 8N + 17 bytes (NO HEAP) │
└─────────────────────────────────────────────────────────────┘
Type Aliases:
type Clause2 = ; // XOR problems
type Clause4 = ; // Iris dataset
type Clause8 = ;
type Clause16 = ;
type Clause32 = ;
type Clause64 = ; // Maximum before bitwise
Best for: Known dimensions, embedded/no_std, maximum performance
3. BitwiseClause — Packed Bitmask Evaluation
Evaluates 64 features per CPU instruction using bitwise AND.
use ;
let mut clause = new;
// ... training ...
clause.rebuild_masks; // Compile automata → bitmasks
let packed = pack_input; // Vec<u64>
let fires = clause.evaluate_packed;
Evaluation Algorithm:
Input: x = [1,0,1,1,0,0,1,0, ...] → packed: 0b01001101...
Include mask: 0b00001101 (positions where x_k must be 1)
Negated mask: 0b00100010 (positions where x_k must be 0)
Clause fires iff: (include & !x) | (negated & x) == 0
Example:
include = 0b00001101
negated = 0b00100010
x = 0b01001101
include & !x = 0b00001101 & 0b10110010 = 0b00000000
negated & x = 0b00100010 & 0b01001101 = 0b00000000
result = 0b00000000 → fires = true
Performance:
64 features: 1 u64 AND → 43x faster than scalar
256 features: 4 u64 ANDs → 85x faster
1024 features: 16 u64 ANDs → 116x faster
4. SmallBitwiseClause<N, W> — Const Generic Bitwise
use ;
let mut clause: BitwiseClause64 = new;
clause.rebuild_masks;
let packed: = pack_input_small;
let fires = clause.evaluate_packed;
Type Aliases:
type BitwiseClause64 = ; // 1 word
type BitwiseClause128 = ; // 2 words
type BitwiseClause256 = ; // 4 words
Performance Comparison
| Type | 16 features | 64 features | 256 features | Heap |
|---|---|---|---|---|
Clause |
23 ns | 100 ns | 300 ns | Yes |
SmallClause<N> |
15 ns | 60 ns | 240 ns | No |
BitwiseClause |
8 ns | 3 ns | 4.8 ns | Yes |
SmallBitwiseClause<N,W> |
6 ns | 2.5 ns | 4 ns | No |
Advanced Features
Weighted Clauses
Clauses learn accuracy-based weights during training:
use ;
let opts = AdvancedOptions ;
let config = builder.clauses.features.build.unwrap;
let mut tm = with_advanced;
Weight Update Rule:
After each prediction:
if clause_correct:
weight += weight_lr × (weight_max - weight)
else:
weight -= weight_lr × (weight - weight_min)
Adaptive Threshold
Dynamic T adjustment based on training error:
let opts = AdvancedOptions ;
Adaptation Rule:
if accuracy < target:
T += t_lr × (t_max - T) // Increase → more aggressive learning
else:
T -= t_lr × (T - t_min) // Decrease → more conservative
Clause Pruning
Automatic reset of ineffective clauses:
let opts = AdvancedOptions ;
Pruning Criteria:
if clause.activations < prune_threshold OR clause.weight < prune_weight:
reset_clause_to_initial_state()
Complete Example
use ;
let opts = AdvancedOptions ;
let config = builder.clauses.features.build.unwrap;
let mut tm = with_advanced;
tm.fit;
When to Use Advanced Features
| Scenario | Standard | Advanced | Winner |
|---|---|---|---|
| Clean data (0% noise) | 97.5% | 100% | Advanced |
| High noise (30%+) | 55.9% | 58.8% | Advanced |
| Large scale (32+ features) | 76.4% | 77.7% | Advanced |
| Complex patterns (parity) | 49.1% | 50.5% | Advanced |
| Simple patterns, low noise | 82.8% | 81.1% | Standard |
Benchmarks
Parallel Training
100 clauses, 64 features, 1 epoch:
| Samples | Sequential | Parallel v1 | Parallel v2 | Speedup |
|---|---|---|---|---|
| 100 | 2.93 ms | 2.00 ms | 0.81 ms | 3.6x |
| 500 | 14.96 ms | 10.11 ms | 3.39 ms | 4.4x |
| 1000 | 29.15 ms | 19.92 ms | 6.58 ms | 4.4x |
Bitwise Evaluation
| Features | Scalar | Bitwise | Speedup |
|---|---|---|---|
| 64 | 82 ns | 1.9 ns | 43x |
| 256 | 342 ns | 4 ns | 85x |
| 1024 | 1.32 µs | 11 ns | 116x |
Storage Layouts
| Clauses | AoS (Vec) | SoA (ClauseBank) | Speedup |
|---|---|---|---|
| 50 | 3.64 µs | 3.50 µs | 1.04x |
| 100 | 8.55 µs | 8.39 µs | 1.02x |
| 200 | 18.50 µs | 16.02 µs | 1.15x |
BitPlaneBank (Parallel State Updates)
| Operation | ClauseBank | BitPlaneBank | Speedup |
|---|---|---|---|
| Type II (1024 features) | 1.48 µs | 759 ns | ~2x |
| Type II (256 features) | 426 ns | 330 ns | ~1.3x |
| Type I (1024 features) | 5.05 µs | 5.25 µs | ~1x |
Run Benchmarks
Algorithm Reference
Finite State Machine
A Tsetlin Automaton (TA) is a 2N-state FSM that learns a binary action through reinforcement:
EXCLUDE ZONE │ INCLUDE ZONE
◄─────────────────────────────│───────────────────────────────►
│
┌───┬───┬───┬───┬─────────────┼───┬───────────┬───┬───┬───┐
│ 1 │ 2 │ 3 │...│ N │N+1│ ... │...│2N-1│2N │
└───┴───┴───┴───┴─────────────┼───┴───────────┴───┴───┴───┘
▲ │ ▲
│ │ │
floor threshold ceiling
(min) (max)
Action = EXCLUDE │ Action = INCLUDE
if state ∈ [1, N] │ if state ∈ [N+1, 2N]
State Transitions:
| Feedback | Current State | Action |
|---|---|---|
| Reward | s ∈ [1, N] | s → max(1, s-1) |
| Reward | s ∈ [N+1, 2N] | s → min(2N, s+1) |
| Penalty | s ∈ [1, N] | s → min(N, s+1) |
| Penalty | s ∈ [N+1, 2N] | s → max(N+1, s-1) |
Conjunction of Literals
A clause is a conjunction (AND) of literals that votes for or against a class:
Clause_j = L₁ ∧ L₂ ∧ ... ∧ Lₘ
where each Lᵢ ∈ {xₖ, ¬xₖ} for some feature k
Example (XOR):
Clause₊₁: x₀ ∧ ¬x₁ (fires when x=[1,0])
Clause₊₂: ¬x₀ ∧ x₁ (fires when x=[1,0])
Clause₋₁: x₀ ∧ x₁ (fires when x=[1,1])
Clause₋₂: ¬x₀ ∧ ¬x₁ (fires when x=[0,0])
Evaluation:
Weighted Majority Vote
Classification is determined by summing weighted clause votes:
C/2
v(x) = Σ [polarityⱼ × weightⱼ × clauseⱼ(x)]
j=1
ŷ = { 1 if v(x) ≥ 0
{ 0 if v(x) < 0
Polarity Assignment:
- Even-indexed clauses: polarity = +1 (vote for class 1)
- Odd-indexed clauses: polarity = -1 (vote for class 0)
Reinforcing Target Patterns
Applied when the sample belongs to the clause's target class.
When clause FIRES (satisfies input):
| Input | Literal | Probability | Action |
|---|---|---|---|
| xₖ = 1 | xₖ | (s-1)/s | Increment (strengthen) |
| xₖ = 1 | ¬xₖ | 1/s | Decrement (weaken) |
| xₖ = 0 | xₖ | 1/s | Decrement (weaken) |
| xₖ = 0 | ¬xₖ | (s-1)/s | Increment (strengthen) |
When clause DOES NOT FIRE:
All automata decremented with probability 1/s (drift toward exclusion).
Effect: Reinforces literals that match the input pattern.
Blocking False Positives
Applied when a clause fires for the WRONG class.
Algorithm:
Effect: Adds a contradicting literal that will fail on this input.
Complete Training Loop
Parameter Effects
| Parameter | Low Value | High Value |
|---|---|---|
| Clauses (C) | Fewer patterns, faster | More patterns, slower |
| States (N) | Fast adaptation, unstable | Slow adaptation, stable |
| Threshold (T) | Conservative learning | Aggressive learning |
| Specificity (s) | General patterns (more literals) | Specific patterns (fewer literals) |
API Reference
Models
| Type | Description |
|---|---|
TsetlinMachine |
Binary classification |
MultiClass |
Multi-class (one-vs-all) |
Regressor |
Continuous output |
Convolutional |
2D patch extraction |
Clauses
| Type | Heap | Best For |
|---|---|---|
Clause |
Yes | Dynamic dimensions, serde |
SmallClause<N> |
No | N ≤ 64, maximum speed |
BitwiseClause |
Yes | N ≥ 64, 25-92x speedup |
SmallBitwiseClause<N,W> |
No | 64-256 features |
Storage
| Type | Description |
|---|---|
ClauseBank |
SoA layout for bulk operations |
BitPlaneBank |
Bit-plane for parallel updates |
Parallel Training
| Type | Description |
|---|---|
LocalTally |
Cache-padded atomic accumulator |
ParallelBatch |
Training batch with tallies |
Configuration
| Type | Description |
|---|---|
Config |
Basic TM configuration |
ConfigBuilder |
Fluent configuration builder |
ConvConfig |
Convolutional TM configuration |
AdvancedOptions |
Weights, adaptive T, pruning |
FitOptions |
Early stopping, callbacks |
Utilities
| Function | Description |
|---|---|
pack_input(&[u8]) |
Pack input for BitwiseClause |
pack_input_small(&[u8]) |
Pack for SmallBitwiseClause |
pack_batch(&[Vec<u8>]) |
Pack multiple inputs |
rng_from_seed(u64) |
Deterministic RNG |
rng_from_entropy() |
Random RNG |
Coverage
Sunburst
Grid
Icicle
In Memory of Michael Tsetlin
Mikhail Lvovich Tsetlin (Михаил Львович Цетлин, 1924–1966) — Soviet mathematician and one of the founders of cybernetics in the USSR.
A veteran of World War II who served as a scout and tank gunner, he later became a brilliant scientist. Working alongside I.M. Gelfand, he pioneered the theory of learning automata — finite state machines that learn optimal behavior through interaction with the environment.
His work on collective automata behavior laid the theoretical foundation for what we now call the Tsetlin Machine. Despite his life being cut short at 41, his ideas continue to influence machine learning research today.
This library honors his legacy by bringing his concepts to modern systems programming.
References
Original Paper
The Tsetlin Machine - A Game Theoretic Bandit Driven Approach to Optimal Pattern Recognition with Propositional Logic Ole-Christoffer Granmo, 2018 arXiv:1804.01508
Parallel Training
Massively Parallel and Asynchronous Tsetlin Machine Architecture Supporting Almost Constant-Time Scaling K. Darshana Abeyrathna et al., ICML 2021 arXiv:2009.04861
Implementations
| Repository | Description |
|---|---|
| cair/TsetlinMachineC | Original C implementation |
| cair/pyTsetlinMachine | Python library |
| cair/PyTsetlinMachineCUDA | CUDA implementation |
| cair/tmu | Unified TM framework |
License
MIT