pubsat 0.1.0

Building blocks for SAT-based dependency resolvers: a node-semver-compatible range parser, an ecosystem-independent constraint vocabulary, and a backend-agnostic SAT problem/solver abstraction with a Varisat backend.
Documentation
# pubsat architecture

**Status:** Living document — 2026-05-11
**Owner:** julian.bravard@gmail.com
**Targets:** 0.1.x stable surface, 1.0 eventual API freeze

This document captures the architectural decisions behind
`pubsat`, the layering of its modules, and the hot-path lessons
that informed the trait shapes. New contributors should be able to
pick the work up cold by reading this plus the README.

## What pubsat is

`pubsat` is a **backend-agnostic, ecosystem-agnostic SAT-based
dependency resolver** for semver-shaped constraint problems.
Callers describe their world to it through trait implementations
(`PackageRegistry`) and a root [`VersionMetadata`]; pubsat returns
a [`Resolution`] of `(name, version)` pairs satisfying the
constraints, or a structured conflict report.

The same engine should serve:

- A JS-style package manager
- A capability-bound or private artifact registry
- A build-tool internal resolver
- An offline dependency-audit tool
- Anywhere semver-shaped constraints meet a "pick the right
  versions" problem

## Why SAT (and not PubGrub)

Cargo and uv use [PubGrub][pubgrub], a conflict-driven
backtracking algorithm designed specifically for version
resolution. `pubsat` lowers the problem to CNF and uses a
general-purpose SAT solver (Varisat by default) instead. The
trade-off:

[pubgrub]: https://github.com/pubgrub-rs/pubgrub

**Why SAT:**

- **Soft preferences are clean.** Version-preference ("newest
  wins") is expressed as a set of *assumptions* on the solver,
  not as part of the algorithm's branching strategy. On UNSAT
  with all preferences, the resolver drops preferences one at a
  time in deterministic order and lets the solver re-check.
  PubGrub bakes its preference order into the search; ours is
  swappable.
- **Adding new constraint kinds is mechanical.** Workspace
  protocols, hoisting preferences, security exclusions,
  conditional deps — anything that lowers to CNF plugs in as
  another `Constraint` variant or an extra clause. PubGrub
  requires reshaping the term language for each new constraint
  kind.
- **Conflict reporting is dataflow, not search-history.** A CNF
  UNSAT core gives a minimal clause set the caller can walk back
  to the originating `Constraint`s and produce a structured
  conflict report that doesn't depend on which order the solver
  tried things.

**Trade-offs we accept:**

- SAT is asymptotically worse than PubGrub on pathological inputs
  (NP-complete in the worst case). In practice, real npm-shaped
  dep graphs solve in 200–500 ms on a modern machine — validated
  against constraint sets like `express` (68 transitive deps),
  `lodash`, and `chalk`.
- CNF encodings are less human-readable than PubGrub's
  incompatibility terms when something does go wrong; the
  caller-side conflict-report layer is what compensates.

## Layering

```
                ┌─────────────────────────────────┐
                │  caller code                    │
                │  (implements PackageRegistry,   │
                │   constructs root manifest)     │
                └──────────────┬──────────────────┘
            ┌───────────────────────────────────────┐
            │             pubsat                    │
            │                                       │
            │   resolver      (orchestrator)        │
            │      ▲                                │
            │   builder       (graph builder,       │
            │      ▲           concurrent fetches)  │
            │   graph         (dep graph)           │
            │      ▲                                │
            │   encoding      (Constraint → CNF)    │
            │      ▲                                │
            │   constraint    (vocabulary)          │
            │      ▲                                │
            │   registry      (registry abstraction)│
            │      ▲                                │
            │   sat           (solver abstraction)  │
            │      ▲                                │
            │   version       (semver ranges)       │
            │   error         (cross-cutting)       │
            └───────────────────────────────────────┘
```

Each module can be used standalone — consume just `version` to
parse npm-flavour range strings, or stop at `encoding` to produce
a SAT problem that something else solves. The full pipeline runs
through `resolver`.

## What's in 0.1.x

- **`version`** — node-semver-compatible range parser. Caret,
  tilde, hyphen ranges, partial versions, `>=`/`<=`/`>`/`<` with
  optional whitespace, x-ranges, unions, dist tags. Permissive
  enough to handle the punctuation drift real npm registries
  contain (e.g. `>= 2.1.2 < 3.0.0`).
- **`constraint`** — ecosystem-independent vocabulary. Nine
  variants: `Dependency`, `Conflict`, `AtMostOne`, `AtLeastOne`,
  `Exactly`, `Exclude`, `Implies`, `OptionalDependency`,
  `ConditionalDependency`. Ecosystem-specific concepts (npm's
  `workspace:` protocol, pnpm-style hoisting, Cargo features)
  live in caller crates and lower themselves to these primitives.
- **`registry`**`PackageRegistry` trait with a pubsat-owned
  `VersionMetadata` boundary type. Registries lower their native
  range syntax to `VersionSet` at the trait edge, so the
  resolver doesn't carry ecosystem types. Ships with
  `CachedRegistry` (single-flighted via `OnceCell` across all
  three reads — the failure-mode is the "all reads must intercept"
  hot-path note below) and `MockRegistry`.
- **`graph`**`DependencyGraph<DependencyNode, DependencyEdge>`,
  petgraph-backed. `DependencyType` distinguishes runtime, dev,
  peer, and optional edges. Cycle detection, topological sort,
  BFS depth analysis, DOT export.
- **`encoding`**`ConstraintEncoder` that lowers `Constraint`s
  to CNF. Pairwise at-most-one for `n ≤ 4`, sequential/ladder
  encoding for larger cardinalities (O(n) clauses with auxiliary
  variables). Primable local-version cache so the encoding phase
  stays in-process — see the hot-path notes.
- **`builder`**`DependencyGraphBuilder` that walks a registry
  to build a graph from a root `VersionMetadata`. Per-parent dep
  fetches fan out concurrently via `buffer_unordered`.
- **`resolver`**`DependencyResolver` orchestrating the full
  pipeline. Greedy newest-version preferences via SAT assumptions
  with per-package fallback on UNSAT.
- **`sat`** — backend-agnostic SAT problem types and the
  `SatSolver` trait. A `VarisatSolver` impl ships behind the
  default `varisat-solver` feature with both one-shot and
  incremental solving. A `cadical-solver` placeholder feature is
  reserved so adding a second backend later is a small change.
  `BasicSolver` (pure-Rust DPLL) is included for testing.
- **`error`**`ResolutionError`, `SatError`, `EncodingError`
  and their `Result` aliases. The structured-conflict surface
  (`ConflictDetail`, `ConflictReason`) is ready for caller-side
  conflict reporting.

## What's intentionally out of scope

The following live in caller code (or in companion crates that
might appear post-0.1.x):

### Workspace and hoisting

npm-style workspaces (`workspace:` protocol, hoisting
preferences, install-tree shape) are caller-domain concerns. The
constraint vocabulary in `pubsat::constraint` deliberately does
*not* include `Workspace*` or `Hoisting*` variants — they were in
the prior-art resolver this crate's design grew out of and were
dropped here because they baked npm-shaped semantics into the
vocabulary. Callers needing those should:

- Encode workspace constraints as standard `Constraint::Exactly`
  / `Constraint::Implies` / `Constraint::Exclude` clauses against
  the SAT abstraction directly.
- Or maintain a separate `pubsat-npm` (or similar) crate with the
  ecosystem-specific lowering.

### Lockfile bridging

Reading and writing lockfile formats (npm's `package-lock.json`,
yarn's `yarn.lock`, pnpm's `pnpm-lock.yaml`, Cargo's `Cargo.lock`,
etc.) is ecosystem-specific. pubsat returns a [`Resolution`]
which is the input to whatever format the caller wants to emit.

### Install / fetch / build pipeline

pubsat resolves what versions to use. Downloading tarballs,
verifying signatures, extracting files, running install hooks — all
caller concerns. The boundary between pubsat and an install
pipeline is the `Resolution` struct.

## Hot paths (and why they matter)

These are bugs that bit in the prior-art resolver this design
grew out of. The patterns are preserved in pubsat's
implementation; anyone touching the encoder, builder, or
`CachedRegistry` should understand the failure modes.

### Prime the encoder's version cache before encoding starts

`ConstraintEncoder` consults a `local_versions` map (set via
[`set_local_versions`]) before falling back to the registry. The
resolver populates this from its own version cache before calling
`encode_constraint`. If you skip the priming and the encoder
falls through to `registry.get_satisfying_versions(...)` for every
`(package, version, dep_edge)` tuple, an `express`-sized graph
explodes into ~7000 redundant calls — a multi-minute encoding
phase that should be in-process and sub-second. The resolver in
this crate does the priming correctly; callers building their own
encoder pipelines need to remember.

### `CachedRegistry` intercepts *all* three reads

`CachedRegistry` wraps an inner `PackageRegistry`. The cache is
correct only because every read goes through the wrapper:
`get_package_versions`, `get_satisfying_versions`, and
`get_version_metadata`. The earlier shape (in the prior-art
resolver) where `get_package_versions` went straight to the inner
registry bypassed the cache and silently double-counted network
fetches, defeating the wrapper. The trait's default
implementations route through the primary read so impls that
override one method without the others don't break the invariant.

### Greedy fallback for version preferences

When using SAT assumptions to pin "newest version" preferences,
try all-at-once first (the fast path). On UNSAT, drop preferences
one at a time in *deterministic order*, keeping each only if SAT
remains. The original bug was all-or-nothing fallback — dropping
every preference on a single conflict, leading to non-newest
selections everywhere. The greedy fix is what the `resolver`
module implements.

### HashMap iteration order must be sorted before SAT encoding

Varisat's branching is deterministic given a deterministic
variable-allocation order. Iterating `HashMap`s directly makes
the resolution output flap between runs. The encoder and resolver
both sort package names before iteration so identical inputs
produce identical outputs.

### `^0.x.y` semantics differ by component

Node-semver caret has three regimes:

- `^1.2.3``>=1.2.3 <2.0.0` (major > 0)
- `^0.2.3``>=0.2.3 <0.3.0` (major=0, minor > 0)
- `^0.0.3` ⇒ exact `0.0.3` (major=0, minor=0)

The check in `VersionSet::Caret::satisfies` handles all three;
easy to break in a naïve refactor.

## Comparison with prior art

| Property                       | pubsat 0.1.x         | pubgrub-rs    | cargo's resolver |
|--------------------------------|----------------------|---------------|------------------|
| Algorithm                      | SAT (Varisat)        | PubGrub       | Backtracking     |
| Backend-swappable              | yes                  | no            | no               |
| Ecosystem-agnostic             | yes                  | yes           | Cargo-specific   |
| Soft preferences               | assumptions          | search order  | search order     |
| Workspace-aware                | caller               | caller        | built-in         |
| Ships registry + graph + builder + resolver | yes     | partial       | yes              |

## Roadmap

| Milestone                                       | Status |
|-------------------------------------------------|--------|
| `version` + `constraint` + `sat` + `error`      | done   |
| `registry` + boundary types                     | done   |
| `graph` + `encoding`                            | done   |
| `builder` (concurrent fetches)                  | done   |
| `resolver` (greedy preferences)                 | done   |
| `examples/` end-to-end with `MockRegistry`      | done   |
| `VarisatSolver` incremental solving             | done   |
| 0.1.0 crates.io publish                         | next   |
| `cadical-solver` backend                        | once a Rust binding exposes the surface |
| 1.0 API freeze                                  | after the API has carried real consumers for a while |

Breaking changes between 0.1.x minor versions will be called out
in `CHANGELOG.md` with migration notes; the CNF representation
(literal/clause/problem shapes) is the most stable, and the
higher-level `PackageRegistry` trait and `Constraint` vocabulary
are the layers most likely to grow.

## See also

- [README.md]../../README.md — landing page + 0.1.x scope
- [pubgrub-rs]https://github.com/pubgrub-rs/pubgrub — alternative algorithm
- [Varisat]https://crates.io/crates/varisat — current default SAT backend

[`VersionMetadata`]: ../../src/registry.rs
[`Resolution`]: ../../src/resolver.rs
[`set_local_versions`]: ../../src/encoding.rs