Posts with tag diffusion-models-class

unit2-02_class_conditioned_diffusion_model_example

2025-06-10
diffusion-models-classjulyfunnotes技术学习

Class-conditioned指的是类别-Conditioned. 或者说 class-label-conditioned.网络输入改成啥样了? 其实就是 concat.Unet 输入通道直接改成了 in_channels=1 + class_emb_sizeforward 其实就是广播 + torch.cat 一下.def forward(self, x, t, class_labels): bs, ch, w, h = x.shape # [pre-defined] self.class_emb = nn.Embedding(num_classes, class_emb_size) class_cond = self.class_emb(class_labels) class_cond = class_cond.view(bs, class_cond.shape[1], 1, 1).expand(bs, class_cond.shape[1], w, h) net_input = torch.cat((x, class_cond), dim=1) # model 返回 ModelOutput. # sample: 就是预测的噪声张量. # additional_residuals: 存储额外残差信息. 一般没用. return self.model(net_input, t).sampl

unit2-01_finetuning_and_guidance

2025-06-10
diffusion-models-classjulyfunnotes技术学习

Generating process:x = torch.randn(4, 3, 256, 256).to(device) for i, t in tqdm(enumerate(scheduler.timesteps)): model_input = scheduler.scale_model_input(x, t) with torch.no_grad(): noise_pred = image_pipe.unet(model_input, t)["sample"] x = scheduler.step(noise_pred, t, sample=x).prev_sampleGuidancex = torch.randn(4, 3, 256, 256).to(device) for i, t in tqdm(enumerate(scheduler.timesteps)): x = x.detach().requires_grad_() model_input = scheduler.scale_model_input(x, t) noise_pred = image_pipe.unet(model_input, t)["sample"] x0 = scheduler.step(noise_pred, t, x).pred_original_sample loss = <custom_loss>(x0) * <guidance_loss_scale> cond_grad = -torch.autograd.grad(loss, x)[0] x = x.detach() + cond_grad x = scheduler.step(noise_pred, t, x).prev_sampleCLIP Guidance with torch.no_grad(): text_features = clip_model.encode_text(text) for i, t in tqdm(enumerate(scheduler.timesteps)): # print(i, t) # (1, tensor(1000)), (2, tensor(980))... model_input = scheduler.scale_model_input(x, t) # DDIM loaded with torch.no_grad(): # image_pipe is loaded by the same name noise_pred = image_pipe.unet(model_input, t)["sample"] cond_grad = 0 for cut in range(n_cuts): x = x.detach().requires_grad_() x0 = scheduler.step(noise_pred,t, sample=x).pred_original_sample loss = <clip_loss>(x0, text_features) * guidance_scale cond_grad -= torch.autograd.grad(loss, x)[0] / n_cuts if i % 25 == 0: print(f"Steps {i} loss: {loss.item()}") alpha_bar = scheduler.alphas_cumprod[i] # `alpha_bar` here is decreasing and works for textures. # Can be changed to some increasing coefficients! x = x.detach() + cond_grad * alpha_bar.sqrt() x = scheduler.step(noise_pred, t, x).prev_sampl

No more posts to load.