how to

unit3-01_stable_diffusion_introduction

Jun 20, 2025
notesjulyfun技术学习diffusion-models-class
2 Minutes
289 Words

Stable Diffusion Pipeline 的职责是?

说白了 unet 预测的就是 vae 的 latent.

pipe:

  • unet
  • vae
  • text_encoder
  • image_encoder
  • feature_extractor
  • tokenizer (is a nn.Module without params)
  • scheduler
  • safety_checker ?
graph
    text --> t(tokenizer & text_encoder) --> e[text_embedding]
    e --> unet(unet)
    n["noisy_latents (4, 64, 64)"] --> unet
    timestep --> unet
    unet --> p["noise prediction (4, 64, 64)"]
    p --> v_d(vae decoder)

Unet 架构

1
AutoencoderKL [1, 3, 64, 64] --
2
├─Encoder: 1-1 [1, 8, 8, 8] --
3
│ └─Conv2d: 2-1 [1, 128, 64, 64] 3,584
4
│ └─ModuleList: 2-2 -- --
5
│ │ └─DownEncoderBlock2D: 3-1 [1, 128, 32, 32] 738,944
6
│ │ └─DownEncoderBlock2D: 3-2 [1, 256, 16, 16] 2,690,304
7
│ │ └─DownEncoderBlock2D: 3-3 [1, 512, 8, 8] 10,754,560
8
│ │ └─DownEncoderBlock2D: 3-4 [1, 512, 8, 8] 9,443,328
9
│ └─UNetMidBlock2D: 2-3 [1, 512, 8, 8] --
10
│ │ └─ModuleList: 3-7 -- (recursive)
11
│ │ └─ModuleList: 3-6 -- 1,051,648
12
│ │ └─ModuleList: 3-7 -- (recursive)
13
│ └─GroupNorm: 2-4 [1, 512, 8, 8] 1,024
14
│ └─SiLU: 2-5 [1, 512, 8, 8] --
15
│ └─Conv2d: 2-6 [1, 8, 8, 8] 36,872
16 collapsed lines
16
├─Conv2d: 1-2 [1, 8, 8, 8] 72
17
├─Conv2d: 1-3 [1, 4, 8, 8] 20
18
├─Decoder: 1-4 [1, 3, 64, 64] --
19
│ └─Conv2d: 2-7 [1, 512, 8, 8] 18,944
20
│ └─UNetMidBlock2D: 2-8 [1, 512, 8, 8] --
21
│ │ └─ModuleList: 3-10 -- (recursive)
22
│ │ └─ModuleList: 3-9 -- 1,051,648
23
│ │ └─ModuleList: 3-10 -- (recursive)
24
│ └─ModuleList: 2-9 -- --
25
│ │ └─UpDecoderBlock2D: 3-11 [1, 512, 16, 16] 16,524,800
26
│ │ └─UpDecoderBlock2D: 3-12 [1, 512, 32, 32] 16,524,800
27
│ │ └─UpDecoderBlock2D: 3-13 [1, 256, 64, 64] 4,855,296
28
│ │ └─UpDecoderBlock2D: 3-14 [1, 128, 64, 64] 1,067,648
29
│ └─GroupNorm: 2-10 [1, 128, 64, 64] 256
30
│ └─SiLU: 2-11 [1, 128, 64, 64] --
31
│ └─Conv2d: 2-12 [1, 3, 64, 64] 3,459
Article title:unit3-01_stable_diffusion_introduction
Article author:Julyfun
Release time:Jun 20, 2025
Copyright 2025
Sitemap