Posts with tag `diffusion-models-class`

Stable Diffusion Pipeline text --> t(tokenizer e --> unet(unet) n["noisy_latents timestep --> unet unet --> p["noise p --> v_d(vae decoder)Unet ├─Encoder: 1-1 │ └─Conv2d: 2-1 │ └─ModuleList: 2-2 │ │ └─DownEncoderBlock2D: 3-1 │ │ └─DownEncoderBlock2D: 3-2 │ │ └─DownEncoderBlock2D: 3-3 │ │ └─DownEncoderBlock2D: 3-4 │ └─UNetMidBlock2D: 2-3 │ │ └─ModuleList: 3-7 │ │ └─ModuleList: 3-6 │ │ └─ModuleList: 3-7 │ └─GroupNorm: 2-4 │ └─SiLU: 2-5 │ └─Conv2d: 2-6 ├─Conv2d: 1-2 ├─Conv2d: 1-3 ├─Decoder: 1-4 │ └─Conv2d: 2-7 │ └─UNetMidBlock2D: 2-8 │ │ └─ModuleList: 3-10 │ │ └─ModuleList: 3-9 │ │ └─ModuleList: 3-10 │ └─ModuleList: 2-9 │ │ └─UpDecoderBlock2D: 3-11 │ │ └─UpDecoderBlock2D: 3-12 │ │ └─UpDecoderBlock2D: 3-13 │ │ └─UpDecoderBlock2D: 3-14 │ └─GroupNorm: 2-10 │ └─SiLU: 2-11 │ └─Conv2d: 2-12 的职责是?说白了 unet 预测的就是 vae 的 latent.pipe:unetvaetext_encoderimage_encoderfeature_extractortokenizer (is a nn.Module without params)schedulersafety_checker ?graph & text_encoder) --> e[text_embedding] (4, 64, 64)"] --> unet prediction (4, 64, 64)"] 架构AutoencoderKL [1, 3, 64, 64] -- [1, 8, 8, 8] -- [1, 128, 64, 64] 3,584 -- -- [1, 128, 32, 32] 738,944 [1, 256, 16, 16] 2,690,304 [1, 512, 8, 8] 10,754,560 [1, 512, 8, 8] 9,443,328 [1, 512, 8, 8] -- -- (recursive) -- 1,051,648 -- (recursive) [1, 512, 8, 8] 1,024 [1, 512, 8, 8] -- [1, 8, 8, 8] 36,872 [1, 8, 8, 8] 72 [1, 4, 8, 8] 20 [1, 3, 64, 64] -- [1, 512, 8, 8] 18,944 [1, 512, 8, 8] -- -- (recursive) -- 1,051,648 -- (recursive) -- -- [1, 512, 16, 16] 16,524,800 [1, 512, 32, 32] 16,524,800 [1, 256, 64, 64] 4,855,296 [1, 128, 64, 64] 1,067,648 [1, 128, 64, 64] 256 [1, 128, 64, 64] -- [1, 3, 64, 64] 3,45

unit2-02_class_conditioned_diffusion_model_example

2025-06-10

diffusion-models-classjulyfunnotes技术学习

Class-conditioned指的是类别-Conditioned. 或者说 class-label-conditioned.网络输入改成啥样了? 其实就是 concat.Unet 输入通道直接改成了 in_channels=1 + class_emb_sizeUNet2DModel( in_channels=1 + class_emb_size,forward 时广播 + torch.cat 一下.def forward(self, x, t, class_labels): bs, ch, w, h = x.shape # & self.class_emb = nn.Embedding(num_classes, class_emb_size) class_cond = self.class_emb(class_labels) # * # 广播 class_cond = class_cond.view(bs, class_cond.shape[1], 1, 1).expand(bs, class_cond.shape[1], w, h) net_input = torch.cat((x, class_cond), dim=1) # model 返回 ModelOutput. # sample: 就是预测的噪声张量. # additional_residuals: 存储额外残差信息. 一般没用. return self.model(net_input, t).sampl

unit2-01_finetuning_and_guidance

2025-06-10

diffusion-models-classjulyfunnotes技术学习

Generating process:x = torch.randn(4, 3, 256, 256).to(device) for i, t in tqdm(enumerate(scheduler.timesteps)): model_input = scheduler.scale_model_input(x, t) with torch.no_grad(): noise_pred = image_pipe.unet(model_input, t)["sample"] x = scheduler.step(noise_pred, t, sample=x).prev_sampleGuidancex = torch.randn(4, 3, 256, 256).to(device) for i, t in tqdm(enumerate(scheduler.timesteps)): x = x.detach().requires_grad_() model_input = scheduler.scale_model_input(x, t) noise_pred = image_pipe.unet(model_input, t)["sample"] x0 = scheduler.step(noise_pred, t, x).pred_original_sample loss = <custom_loss>(x0) * <guidance_loss_scale> cond_grad = -torch.autograd.grad(loss, x)[0] x = x.detach() + cond_grad x = scheduler.step(noise_pred, t, x).prev_sampleCLIP Guidance with torch.no_grad(): text_features = clip_model.encode_text(text) for i, t in tqdm(enumerate(scheduler.timesteps)): # print(i, t) # (1, tensor(1000)), (2, tensor(980))... model_input = scheduler.scale_model_input(x, t) # DDIM loaded with torch.no_grad(): # image_pipe is loaded by the same name noise_pred = image_pipe.unet(model_input, t)["sample"] cond_grad = 0 for cut in range(n_cuts): x = x.detach().requires_grad_() x0 = scheduler.step(noise_pred,t, sample=x).pred_original_sample loss = <clip_loss>(x0, text_features) * guidance_scale cond_grad -= torch.autograd.grad(loss, x)[0] / n_cuts if i % 25 == 0: print(f"Steps {i} loss: {loss.item()}") alpha_bar = scheduler.alphas_cumprod[i] # `alpha_bar` here is decreasing and works for textures. # Can be changed to some increasing coefficients! x = x.detach() + cond_grad * alpha_bar.sqrt() x = scheduler.step(noise_pred, t, x).prev_sampl

No more posts to load.

Posts with tag `diffusion-models-class`

unit4-01_ddim_inversion

unit3-01_stable_diffusion_introduction

unit2-02_class_conditioned_diffusion_model_example

unit2-01_finetuning_and_guidance

Posts with tag diffusion-models-class

unit4-01_ddim_inversion

unit3-01_stable_diffusion_introduction

unit2-02_class_conditioned_diffusion_model_example

unit2-01_finetuning_and_guidance

Posts with tag `diffusion-models-class`