1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
// Crate-level lint baseline. Mirrors the ferrotorch-whisper / ferrotorch-bert
// posture: deny correctness / idiom / Debug / docs problems; warn pedantic
// stylistic issues. Specific pedantic lints are allowed crate-wide where
// the lint is consistently wrong for ML/numeric kernel code.
// Casts: dimension math (`as usize`, `as f32`, `as u32`) is intrinsic
// to tensor indexing — every kernel call would otherwise need a
// per-call allow.
// Builder-style accessors don't all need `#[must_use]`.
// Identifiers like `bf16`, `f32`, `VAE`, `SD`, `SiLU` are flagged as
// missing backticks even when they appear in code-fenced text.
// `needless_pass_by_value` would force `&VaeDecoderConfig` signatures
// throughout, hiding intent in the API.
// `unnecessary_wraps` flags `Result`-returning helpers that today
// always succeed but are part of an extensible API surface.
// `uninlined_format_args` flags `format!("x={}", x)` vs
// `format!("x={x}")`. Both are equally clear; the fixup churn is high.
// `many_single_char_names` flags conventional ML kernel locals
// (`q`, `k`, `v`, `h`).
// `similar_names` flags variable pairs that are intentionally similar
// (e.g. `q2` / `q_h`).
// `module_name_repetitions`: every type starts with `Vae` / `UNet`
// (matching the HF / diffusers naming) — the lint would force renames
// that lose the upstream-1:1 mapping.
// `too_many_lines`: the decoder / UNet forward is one cohesive sequence
// of ops mirroring the diffusers reference; splitting it hurts
// cross-reading.
// UNet builders take a handful of (in_c, out_c, temb, layers, heads,
// dim_head, cross_dim, groups, …) parameters — the explicit list is
// shorter than the struct-of-args alternative for an internal builder.
// `items_after_statements` flags the in-test helper layout used widely.
// `redundant_else` flags `if x { return …; } else { … }`; the
// alternative (`if x { return …; } …`) loses the structural shape.
// Tensor ops naturally use `for i in 0..n { … }` over `.iter()` when
// the index itself is used; clippy's preferred form hurts readability.
//! Stable-Diffusion model composition for ferrotorch.
//!
//! Phase B.3 of real-artifact-driven development. This crate implements
//! the **VAE decoder** (Phase B.3a) and the **UNet2DConditionModel**
//! (Phase B.3b) of `runwayml/stable-diffusion-v1-5`. The encoder, the
//! CLIP text encoder, and the scheduler are out of scope and tracked
//! under follow-up dispatches.
//!
//! ## VAE decoder
//!
//! Mirrors `vae/config.json` — `VaeDecoder` inverts a latent
//! `[B, 4, 64, 64]` into an image `[B, 3, 512, 512]`. See [`vae`].
//!
//! ## UNet2DConditionModel
//!
//! Mirrors `unet/config.json` — `UNet2DConditionModel` consumes
//! `(noisy_latent [B, 4, 64, 64], timestep [B], text_embed [B, S, 768])`
//! and returns predicted noise `[B, 4, 64, 64]`. See [`unet`].
//!
//! ResnetBlock2DTime (UNet flavour with time bias):
//!
//! ```text
//! h = silu(norm1(x)); h = conv1(h)
//! t = silu(temb); h = h + Linear(t).view(B, out, 1, 1)
//! h = silu(norm2(h)); h = conv2(h)
//! out = h + (x if in==out else conv_shortcut(x))
//! ```
//!
//! Transformer2DModel (SD UNet flavour):
//!
//! ```text
//! h = GroupNorm(x); h = proj_in (Conv2d k=1, [B, inner, H, W])
//! h = flatten to [B, HW, inner]; for block in blocks: h = block(h, ehs)
//! h = reshape back; h = proj_out (Conv2d k=1); out = h + residual
//! ```
//!
//! Each `BasicTransformerBlock` is the canonical pre-LN
//! (self-attn → cross-attn → GEGLU FF) stack.
//!
//! ## REQ status (per `.design/ferrotorch-diffusion/lib.md`)
//!
//! | REQ | Status | Evidence |
//! |---|---|---|
//! | REQ-1 | SHIPPED | `pub mod` block at `ferrotorch-diffusion/src/lib.rs:100..114` declares every sub-module; consumer: `ferrotorch-diffusion/src/safetensors_loader.rs:17..21` imports six of them |
//! | REQ-2 | SHIPPED | `pub use` block at `ferrotorch-diffusion/src/lib.rs:116..139` re-exports the top-level types; consumer: `ferrotorch-hub/src/registry.rs` references `ClipTextEncoder` through the re-export |
//! | REQ-3 | SHIPPED | crate-level lint attributes at `ferrotorch-diffusion/src/lib.rs:6..59`; consumer: `cargo clippy -p ferrotorch-diffusion --lib -- -D warnings` enforces it |
//! | REQ-4 | SHIPPED | crate `//!` doc-comment at `ferrotorch-diffusion/src/lib.rs:61..98`; consumer: `cargo doc -p ferrotorch-diffusion` renders this as the crate landing page |
pub use ;
pub use ;
pub use ;
pub use VaeDecoderConfig;
pub use ;
pub use ResnetBlock2DTime;
pub use ;
pub use ;
pub use ;
pub use ;
pub use UNet2DConditionConfig;
pub use ;
pub use ;