# pubsat architecture
**Status:** Living document — 2026-05-11
**Owner:** julian.bravard@gmail.com
**Targets:** 0.1.x stable surface, 1.0 eventual API freeze
This document captures the architectural decisions behind
`pubsat`, the layering of its modules, and the hot-path lessons
that informed the trait shapes. New contributors should be able to
pick the work up cold by reading this plus the README.
## What pubsat is
`pubsat` is a **backend-agnostic, ecosystem-agnostic SAT-based
dependency resolver** for semver-shaped constraint problems.
Callers describe their world to it through trait implementations
(`PackageRegistry`) and a root [`VersionMetadata`]; pubsat returns
a [`Resolution`] of `(name, version)` pairs satisfying the
constraints, or a structured conflict report.
The same engine should serve:
- A JS-style package manager
- A capability-bound or private artifact registry
- A build-tool internal resolver
- An offline dependency-audit tool
- Anywhere semver-shaped constraints meet a "pick the right
versions" problem
## Why SAT (and not PubGrub)
Cargo and uv use [PubGrub][pubgrub], a conflict-driven
backtracking algorithm designed specifically for version
resolution. `pubsat` lowers the problem to CNF and uses a
general-purpose SAT solver (Varisat by default) instead. The
trade-off:
[pubgrub]: https://github.com/pubgrub-rs/pubgrub
**Why SAT:**
- **Soft preferences are clean.** Version-preference ("newest
wins") is expressed as a set of *assumptions* on the solver,
not as part of the algorithm's branching strategy. On UNSAT
with all preferences, the resolver drops preferences one at a
time in deterministic order and lets the solver re-check.
PubGrub bakes its preference order into the search; ours is
swappable.
- **Adding new constraint kinds is mechanical.** Workspace
protocols, hoisting preferences, security exclusions,
conditional deps — anything that lowers to CNF plugs in as
another `Constraint` variant or an extra clause. PubGrub
requires reshaping the term language for each new constraint
kind.
- **Conflict reporting is dataflow, not search-history.** A CNF
UNSAT core gives a minimal clause set the caller can walk back
to the originating `Constraint`s and produce a structured
conflict report that doesn't depend on which order the solver
tried things.
**Trade-offs we accept:**
- SAT is asymptotically worse than PubGrub on pathological inputs
(NP-complete in the worst case). In practice, real npm-shaped
dep graphs solve in 200–500 ms on a modern machine — validated
against constraint sets like `express` (68 transitive deps),
`lodash`, and `chalk`.
- CNF encodings are less human-readable than PubGrub's
incompatibility terms when something does go wrong; the
caller-side conflict-report layer is what compensates.
## Layering
```
┌─────────────────────────────────┐
│ caller code │
│ (implements PackageRegistry, │
│ constructs root manifest) │
└──────────────┬──────────────────┘
│
▼
┌───────────────────────────────────────┐
│ pubsat │
│ │
│ resolver (orchestrator) │
│ ▲ │
│ builder (graph builder, │
│ ▲ concurrent fetches) │
│ graph (dep graph) │
│ ▲ │
│ encoding (Constraint → CNF) │
│ ▲ │
│ constraint (vocabulary) │
│ ▲ │
│ registry (registry abstraction)│
│ ▲ │
│ sat (solver abstraction) │
│ ▲ │
│ version (semver ranges) │
│ error (cross-cutting) │
└───────────────────────────────────────┘
```
Each module can be used standalone — consume just `version` to
parse npm-flavour range strings, or stop at `encoding` to produce
a SAT problem that something else solves. The full pipeline runs
through `resolver`.
## What's in 0.1.x
- **`version`** — node-semver-compatible range parser. Caret,
tilde, hyphen ranges, partial versions, `>=`/`<=`/`>`/`<` with
optional whitespace, x-ranges, unions, dist tags. Permissive
enough to handle the punctuation drift real npm registries
contain (e.g. `>= 2.1.2 < 3.0.0`).
- **`constraint`** — ecosystem-independent vocabulary. Nine
variants: `Dependency`, `Conflict`, `AtMostOne`, `AtLeastOne`,
`Exactly`, `Exclude`, `Implies`, `OptionalDependency`,
`ConditionalDependency`. Ecosystem-specific concepts (npm's
`workspace:` protocol, pnpm-style hoisting, Cargo features)
live in caller crates and lower themselves to these primitives.
- **`registry`** — `PackageRegistry` trait with a pubsat-owned
`VersionMetadata` boundary type. Registries lower their native
range syntax to `VersionSet` at the trait edge, so the
resolver doesn't carry ecosystem types. Ships with
`CachedRegistry` (single-flighted via `OnceCell` across all
three reads — the failure-mode is the "all reads must intercept"
hot-path note below) and `MockRegistry`.
- **`graph`** — `DependencyGraph<DependencyNode, DependencyEdge>`,
petgraph-backed. `DependencyType` distinguishes runtime, dev,
peer, and optional edges. Cycle detection, topological sort,
BFS depth analysis, DOT export.
- **`encoding`** — `ConstraintEncoder` that lowers `Constraint`s
to CNF. Pairwise at-most-one for `n ≤ 4`, sequential/ladder
encoding for larger cardinalities (O(n) clauses with auxiliary
variables). Primable local-version cache so the encoding phase
stays in-process — see the hot-path notes.
- **`builder`** — `DependencyGraphBuilder` that walks a registry
to build a graph from a root `VersionMetadata`. Per-parent dep
fetches fan out concurrently via `buffer_unordered`.
- **`resolver`** — `DependencyResolver` orchestrating the full
pipeline. Greedy newest-version preferences via SAT assumptions
with per-package fallback on UNSAT.
- **`sat`** — backend-agnostic SAT problem types and the
`SatSolver` trait. A `VarisatSolver` impl ships behind the
default `varisat-solver` feature with both one-shot and
incremental solving. A `cadical-solver` placeholder feature is
reserved so adding a second backend later is a small change.
`BasicSolver` (pure-Rust DPLL) is included for testing.
- **`error`** — `ResolutionError`, `SatError`, `EncodingError`
and their `Result` aliases. The structured-conflict surface
(`ConflictDetail`, `ConflictReason`) is ready for caller-side
conflict reporting.
## What's intentionally out of scope
The following live in caller code (or in companion crates that
might appear post-0.1.x):
### Workspace and hoisting
npm-style workspaces (`workspace:` protocol, hoisting
preferences, install-tree shape) are caller-domain concerns. The
constraint vocabulary in `pubsat::constraint` deliberately does
*not* include `Workspace*` or `Hoisting*` variants — they were in
the prior-art resolver this crate's design grew out of and were
dropped here because they baked npm-shaped semantics into the
vocabulary. Callers needing those should:
- Encode workspace constraints as standard `Constraint::Exactly`
/ `Constraint::Implies` / `Constraint::Exclude` clauses against
the SAT abstraction directly.
- Or maintain a separate `pubsat-npm` (or similar) crate with the
ecosystem-specific lowering.
### Lockfile bridging
Reading and writing lockfile formats (npm's `package-lock.json`,
yarn's `yarn.lock`, pnpm's `pnpm-lock.yaml`, Cargo's `Cargo.lock`,
etc.) is ecosystem-specific. pubsat returns a [`Resolution`]
which is the input to whatever format the caller wants to emit.
### Install / fetch / build pipeline
pubsat resolves what versions to use. Downloading tarballs,
verifying signatures, extracting files, running install hooks — all
caller concerns. The boundary between pubsat and an install
pipeline is the `Resolution` struct.
## Hot paths (and why they matter)
These are bugs that bit in the prior-art resolver this design
grew out of. The patterns are preserved in pubsat's
implementation; anyone touching the encoder, builder, or
`CachedRegistry` should understand the failure modes.
### Prime the encoder's version cache before encoding starts
`ConstraintEncoder` consults a `local_versions` map (set via
[`set_local_versions`]) before falling back to the registry. The
resolver populates this from its own version cache before calling
`encode_constraint`. If you skip the priming and the encoder
falls through to `registry.get_satisfying_versions(...)` for every
`(package, version, dep_edge)` tuple, an `express`-sized graph
explodes into ~7000 redundant calls — a multi-minute encoding
phase that should be in-process and sub-second. The resolver in
this crate does the priming correctly; callers building their own
encoder pipelines need to remember.
### `CachedRegistry` intercepts *all* three reads
`CachedRegistry` wraps an inner `PackageRegistry`. The cache is
correct only because every read goes through the wrapper:
`get_package_versions`, `get_satisfying_versions`, and
`get_version_metadata`. The earlier shape (in the prior-art
resolver) where `get_package_versions` went straight to the inner
registry bypassed the cache and silently double-counted network
fetches, defeating the wrapper. The trait's default
implementations route through the primary read so impls that
override one method without the others don't break the invariant.
### Greedy fallback for version preferences
When using SAT assumptions to pin "newest version" preferences,
try all-at-once first (the fast path). On UNSAT, drop preferences
one at a time in *deterministic order*, keeping each only if SAT
remains. The original bug was all-or-nothing fallback — dropping
every preference on a single conflict, leading to non-newest
selections everywhere. The greedy fix is what the `resolver`
module implements.
### HashMap iteration order must be sorted before SAT encoding
Varisat's branching is deterministic given a deterministic
variable-allocation order. Iterating `HashMap`s directly makes
the resolution output flap between runs. The encoder and resolver
both sort package names before iteration so identical inputs
produce identical outputs.
### `^0.x.y` semantics differ by component
Node-semver caret has three regimes:
- `^1.2.3` ⇒ `>=1.2.3 <2.0.0` (major > 0)
- `^0.2.3` ⇒ `>=0.2.3 <0.3.0` (major=0, minor > 0)
- `^0.0.3` ⇒ exact `0.0.3` (major=0, minor=0)
The check in `VersionSet::Caret::satisfies` handles all three;
easy to break in a naïve refactor.
## Comparison with prior art
| Algorithm | SAT (Varisat) | PubGrub | Backtracking |
| Backend-swappable | yes | no | no |
| Ecosystem-agnostic | yes | yes | Cargo-specific |
| Soft preferences | assumptions | search order | search order |
| Workspace-aware | caller | caller | built-in |
| Ships registry + graph + builder + resolver | yes | partial | yes |
## Roadmap
| `version` + `constraint` + `sat` + `error` | done |
| `registry` + boundary types | done |
| `graph` + `encoding` | done |
| `builder` (concurrent fetches) | done |
| `resolver` (greedy preferences) | done |
| `examples/` end-to-end with `MockRegistry` | done |
| `VarisatSolver` incremental solving | done |
| 0.1.0 crates.io publish | next |
| `cadical-solver` backend | once a Rust binding exposes the surface |
| 1.0 API freeze | after the API has carried real consumers for a while |
Breaking changes between 0.1.x minor versions will be called out
in `CHANGELOG.md` with migration notes; the CNF representation
(literal/clause/problem shapes) is the most stable, and the
higher-level `PackageRegistry` trait and `Constraint` vocabulary
are the layers most likely to grow.
## See also
- [README.md](../../README.md) — landing page + 0.1.x scope
- [pubgrub-rs](https://github.com/pubgrub-rs/pubgrub) — alternative algorithm
- [Varisat](https://crates.io/crates/varisat) — current default SAT backend
[`VersionMetadata`]: ../../src/registry.rs
[`Resolution`]: ../../src/resolver.rs
[`set_local_versions`]: ../../src/encoding.rs