Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
bevy_autodiff
Automatic differentiation using Bevy ECS as the computational graph backend.
Write normal Rust functions, get exact derivatives — including gradients, Jacobians, and Hessians. GPU batch evaluation, WGSL code generation, and native Bevy ECS integration included.
Why bevy_autodiff?
Arbitrary-order exact derivatives. Most Rust AD crates provide first-order gradients. bevy_autodiff computes exact partial derivatives to any order — second-order Hessians, third-order tensors, mixed partials — via successive symbolic differentiation. No finite differences, no truncation.
GPU batch evaluation. Evaluate a compiled function and all its derivatives at millions of input points in parallel on the GPU. Generate standalone WGSL shader functions from any compiled graph. As far as we know, no other Rust AD crate offers either capability.
Data-oriented design. Most AD libraries represent computation graphs as opaque internal structures with behavior baked in. bevy_autodiff takes a different approach: the graph is ECS data. Variables are entities, operations are components, and differentiation is a pure traversal that reads the graph and writes new entities. Data is separated from code at every level — a direct application of data-oriented programming to automatic differentiation. The compiled output (CompiledGraph) is a flat, cache-friendly array with no entity references or indirection.
This design makes the graph open and extensible: adding new metadata to nodes is just adding a new component. It also means CompiledGraph is a Bevy Component and Resource out of the box — compiled graphs live directly in your app's ECS, evaluated in parallel with par_iter_mut().
Features
- Write normal Rust functions, get exact derivatives —
#[autodiff]transforms regular functions into computation graphs - GPU batch evaluation — evaluate at millions of input points in parallel via wgpu (Metal, Vulkan, DX12)
- WGSL code generation — embed derivative functions directly in any shader
- Bevy-native —
CompiledGraphis aComponent+Resource; usepar_iter_mut()for parallel evaluation - Reverse-mode gradient — full gradient in a single backward pass, O(1) in input count
- Higher-order exact derivatives — Hessians, mixed partials via successive symbolic differentiation
- f32-stable second-order —
#[autodiff(stable_derivatives)]routes through logarithmic derivative variants - 23 elementary operations — all with symbolic differentiation rules and reverse-mode adjoints
Installation
[]
= { = "0.7", = ["proc-macros"] }
Without proc-macros (uses expr! macro or builder API only):
[]
= "0.7"
Quick Start
The #[autodiff] attribute lets you write normal Rust functions with operator overloading — the macro transforms them into computation graph builders:
use ;
let mut ad = new;
let x = ad.var.unwrap;
let y = ad.var.unwrap;
let f = rosenbrock;
// Reverse-mode gradient (recommended for first-order)
let mut cg = ad.compile_primal.unwrap;
cg.eval.unwrap;
let grad = cg.gradient; // [df/dx, df/dy]
// Re-evaluate at a new point without recompiling
cg.eval.unwrap;
let grad = cg.gradient; // [0.0, 0.0] — the minimum
For second-order derivatives, use compile_order:
let mut cg = ad.compile_order.unwrap;
cg.eval.unwrap;
let dfdx = cg.partial.unwrap; // df/dx
let dfdy = cg.partial.unwrap; // df/dy
let d2fdx2 = cg.partial.unwrap; // d²f/dx²
let d2fdy2 = cg.partial.unwrap; // d²f/dy²
let d2mix = cg.partial.unwrap; // d²f/dxdy
The stable_derivatives attribute automatically routes power and division operations to their logarithmic variants, which avoid catastrophic cancellation in f32 second-order derivatives:
Expression Macro
The expr! macro provides natural mathematical syntax without requiring the proc-macros feature:
use ;
let mut ad = new;
let x = ad.var.unwrap;
let y = ad.var.unwrap;
let f = expr!;
assert_eq!; // 4 + 6
See the Usage Guide for a comparison of all three API tiers (#[autodiff], expr!, and the builder API).
GPU Batch Evaluation
Evaluate compiled graphs at millions of input points in parallel on the GPU. Useful for Monte Carlo simulation, batch trajectory optimization, or any workload that evaluates the same function at many different inputs.
[]
= { = "0.7", = ["wgpu"] }
use AutoDiff;
use GpuContext;
let gpu = new.unwrap;
let mut ad = new;
let x = ad.var.unwrap;
let y = ad.var.unwrap;
let xy = ad.mul;
let f = ad.add;
let graph = ad.compile_order.unwrap;
let gpu_graph = gpu.prepare.unwrap;
// Evaluate at 1M points in parallel
let x_samples: = .map.collect;
let y_samples: = .map.collect;
let results = gpu_graph.eval_batch.unwrap;
let values = results.values; // f(x,y) for each sample
let dfdx = results.partials; // df/dx for each sample
The GPU path uses f32 precision (the CPU path uses f64). A WGSL interpreter kernel dispatches one GPU thread per sample with zero warp divergence.
WGSL Code Generation
Generate standalone WGSL functions from compiled graphs — no wgpu dependency required. The output is a struct + function that can be embedded in any WGSL shader (custom compute kernels, fragment shaders, procedural generation).
use AutoDiff;
let mut ad = new;
let x = ad.var.unwrap;
let y = ad.var.unwrap;
let f = ad.add;
let graph = ad.compile_order.unwrap;
let wgsl = graph.to_wgsl.unwrap;
This emits a self-contained WGSL snippet with a result struct and a pure function using direct WGSL expressions (no interpreter loop):
struct MyFuncOutput {
value: f32,
d1_0: f32, // df/dx
d0_1: f32, // df/dy
}
fn my_func(p0: f32, p1: f32) -> MyFuncOutput {
let v0 = p0;
let v1 = p1;
let v2 = v0 * v1;
let v3 = sin(v2);
// ... derivative nodes ...
return MyFuncOutput(v3, ...);
}
Complements the interpreter-based GPU dispatch: the interpreter is a self-contained "eval at N points" path, while codegen produces an embeddable function for use inside other shaders.
Bevy Integration
CompiledGraph derives Clone, Component, and Resource, so compiled graphs can live directly in your Bevy app's ECS. Use par_iter_mut() to evaluate many graphs in parallel via Bevy's ComputeTaskPool.
use *;
use ;
// Build once with AutoDiff, clone to many entities
let mut ad = new;
let x = ad.var.unwrap;
let y = ad.var.unwrap;
let f = ad.add;
let template = ad.compile_primal.unwrap;
// Parallel evaluation system
AutoDiff uses its own private World for graph construction. CompiledGraph is the compiled artifact that crosses into your application's ECS — a flat array with no entity references.
GPU types also integrate: GpuContext derives Resource (singleton), GpuGraph derives Component and Resource.
Performance
CompiledGraph flattens the ECS graph into a plain array — evaluation is ~130x faster than rebuilding through ECS. Reverse-mode computes the full gradient in the same time regardless of input count. A realistic 3x6 Jacobian (two-body gravity, 6 state variables) takes about 2x the time of a single row — reverse-mode pays for itself with just 2 inputs.
GPU batch evaluation processes millions of function+derivative evaluations per second on consumer hardware.
Run cargo bench for performance numbers on your system.
Supported Operations
| Category | Operations |
|---|---|
| Arithmetic | add, sub, mul, div, neg, square |
| Powers | sqrt, pow, powi, powf |
| Trigonometric | sin, cos, tan, asin, acos, atan |
| Hyperbolic | sinh, cosh, tanh, asinh, acosh, atanh |
| Exponential | exp, ln |
| Logarithmic derivatives | pow_log, powi_log, powf_log, div_log |
The logarithmic derivative variants (pow_log, div_log) produce identical primal values but use a different symbolic differentiation rule that avoids catastrophic cancellation in f32 at second order. Use them when computing Hessians or second-order partials that will be evaluated in f32 (e.g., on GPU), or use #[autodiff(stable_derivatives)] to route automatically. See Numerical Precision for details.
Higher-Order Derivatives
compile_order pre-compiles symbolic derivative subgraphs up to the requested order. Useful for Hessians, mixed partials, and beyond:
use AutoDiff;
let mut ad = new;
let x = ad.var.unwrap;
let y = ad.var.unwrap;
let xy = ad.mul;
let x2 = ad.square;
let f = ad.add;
// Compile with all partials up to order 2
let mut cg = ad.compile_order.unwrap;
cg.eval.unwrap;
let dfdx = cg.partial.unwrap; // df/dx = 2x + y = 4
let dfdy = cg.partial.unwrap; // df/dy = x = 1
let d2fdx = cg.partial.unwrap; // d²f/dx² = 2
let d2mix = cg.partial.unwrap; // d²f/dxdy = 1
For first-order gradients, prefer compile_primal + gradient() (reverse-mode) — it computes all partial derivatives in a single backward pass, O(1) in the number of inputs.
How It Works
bevy_autodiff builds a computation graph as ECS entities in a private Bevy World. differentiate(output, wrt) walks the graph in topological order and applies the chain rule at every node, creating new entities for the derivative subgraph. Constant folding eliminates zero and one terms during differentiation to prevent graph bloat.
compile() flattens the entire graph (primal + derivative subgraphs) into a Vec<NodeOp> — a topologically sorted flat array. A single forward pass evaluates all values. For reverse-mode, one additional backward pass propagates adjoints to compute the full gradient.
For more detail, see Architecture.
ECS Findings
This project explores what ECS can do for automatic differentiation. ECS is used for graph construction and symbolic differentiation only — all hot paths (eval(), gradient(), GPU eval_batch()) operate on flat arrays with no entity lookups. ECS provides open extensibility (new metadata = new component), cache-friendly SoA layout, and natural Bevy integration. The main cost is bevy_ecs compile time, which is confined to the cold path.
See Architecture — What Does ECS Actually Bring? for the full analysis.
Builder API
For full control over graph construction, use the builder methods directly on AutoDiff:
use AutoDiff;
let mut ad = new;
let x = ad.var.unwrap;
// Build computation graph: f(x) = x² + 3x + 1
let x_squared = ad.square;
let three = ad.constant;
let three_x = ad.mul;
let one = ad.constant;
let sum = ad.add;
let f = ad.add;
assert_eq!; // f(2) = 4 + 6 + 1
// Symbolic differentiation
let dfdx = ad.differentiate.unwrap;
assert_eq!; // f'(2) = 2·2 + 3
// Higher-order via successive differentiation
assert_eq!; // f''(x) = 2
assert_eq!; // f'''(x) = 0
Examples
See examples/README.md for descriptions. Run with:
The examples use the builder API for clarity. For ergonomic function definition, see #[autodiff] and expr! in the Usage Guide.
Testing
RUSTFLAGS="-Zautodiff=Enable"
The test suite validates correctness through:
| Test type | What it validates | Count |
|---|---|---|
| Unit tests | Graph construction, all 23 operations, derivative properties, constant folding, CompiledGraph eval, reverse-mode adjoint formulas, reverse-mode backward pass, WGSL codegen, f32 stability, Bevy trait bounds | 313 |
| Proc-macro tests | #[autodiff], expr! macro, stable_derivatives attribute |
76 |
| GPU unit tests | NodeOp conversion, GPU dispatch, buffer readback, error paths, Bevy trait bounds | 16 |
| Oracle (autodiff crate) | First derivatives against independent forward-mode AD | 22 |
| Oracle (GPU vs CPU) | GPU f32 results against CPU f64 for all ops, compositions, partials, batch sizes | 27 |
| Doc-tests | Code examples in documentation | 15 |
| Cross-validation | Reverse-mode gradient matches forward-mode symbolic partials | 8 (within unit) |
Documentation
- Usage Guide -- three API tiers:
#[autodiff],expr!, builder - Architecture -- ECS graph representation, compilation pipeline, differentiation approaches
- Numerical Precision -- precision tiers, tolerance justification, known considerations
- API Reference -- rustdoc on docs.rs
Development
This project was co-developed with Claude, an AI assistant by Anthropic.
License
MIT