Expand description
Stable-Diffusion VAE encoder composition.
Mirrors diffusers.AutoencoderKL.encode(image).latent_dist for
runwayml/stable-diffusion-v1-5:
image (pixel-space, [B, 3, H, W])
-> Encoder.conv_in
-> Encoder.down_blocks[0..N] (last block has no Downsample2D)
-> Encoder.mid_block
-> Encoder.conv_norm_out -> SiLU -> Encoder.conv_out
(output: [B, 2 * latent_channels, H/8, W/8] — mean/logvar concat)
-> quant_conv ([2*L -> 2*L], 1x1)
-> DiagonalGaussianDistribution::from_parametersThe encoder-side mirror of crate::vae::VaeDecoder. The
encode_with_scaling helper composes
latent_dist.sample(seed) * scaling_factor, matching
AutoencoderKL.encode(x).latent_dist.sample() * vae.config.scaling_factor.
The bare Module::forward returns the raw [B, 2*L, h, w] parameters
tensor (no split, no sample, no scaling) so callers can swap in their
own sampling strategy (e.g. .mode() for deterministic decoding).
§REQ status (per .design/ferrotorch-diffusion/vae_encoder.md)
| REQ | Status | Evidence |
|---|---|---|
| REQ-1 | SHIPPED | Encoder<T> at vae_encoder.rs:51..79 and Encoder::new at vae_encoder.rs:81..153; consumer: VaeEncoder::new at vae_encoder.rs:307 builds it; itself consumed by safetensors_loader.rs:425 load_vae_encoder |
| REQ-2 | SHIPPED | VaeEncoder<T> at vae_encoder.rs:288..297 and VaeEncoder::new at vae_encoder.rs:299..316; consumer: safetensors_loader.rs:425 load_vae_encoder; gpu/vae_encoder.rs:317 GpuVaeEncoder::from_module consumes its state_dict() |
| REQ-3 | SHIPPED | VaeEncoder::encode at vae_encoder.rs:325..328 and DiagonalGaussianDistribution::from_parameters at vae_encoder.rs:471..501; consumer: vae_encoder.rs:349 encode_with_scaling invokes it |
| REQ-4 | SHIPPED | DiagonalGaussianDistribution::sample_with_seed at vae_encoder.rs:527..539, mode at vae_encoder.rs:506..508, randn_with_seed at vae_encoder.rs:548..587; consumer: vae_encoder.rs:350 encode_with_scaling calls dist.sample_with_seed(seed) |
| REQ-5 | SHIPPED | encode_with_scaling at vae_encoder.rs:348..361; consumer: re-exported via lib.rs:148 pub use vae_encoder::VaeEncoder (boundary method IS the public API per goal.md S5 grandfathering) |
| REQ-6 | SHIPPED | Module<T>::forward at vae_encoder.rs:369..382; consumer: vae_encoder.rs:326 encode calls self.forward(image)? to produce the [B, 2L, h, w] parameters |
| REQ-7 | SHIPPED | Module<T>::load_state_dict at vae_encoder.rs:421..444; consumer: safetensors_loader.rs:394 VaeEncoder::load_hf_state_dict calls self.load_state_dict(&remapped, strict) after stripping the vae. prefix |
Structs§
- Diagonal
Gaussian Distribution - Diagonal Gaussian over latent space — the same parameterization
diffusers.models.autoencoders.vae.DiagonalGaussianDistributionuses. Holdsmeanandlogvartensors (both[B, L, h, w]) split from the encoder’s concatenated parameters output. - Encoder
- The bare
Encoderhalf — matchesdiffusers.models.autoencoders.vae.Encoder. - VaeEncoder
AutoencoderKL-style VAE encoder =Encoder+quant_conv.
Type Aliases§
- VaeEncoder
Config - Type alias — the SD VAE encoder and decoder share their config shape
(mirrors
diffusers.AutoencoderKL.config, which spans both halves).