Lab4 - How to?

5 步走 [pytorch]

opt zero
forward
loss
backward
opt step

1
for batch_idx, (data, _) in enumerate(train_loader):
2
    data = data.to(device)
3
    optimizer_vae.zero_grad()
4
    recon_batch, mu, logvar = vae(data)
5
    loss = vae_loss(recon_batch, data, mu, logvar)
6
    loss.backward()
7
    optimizer_vae.step()

💡 Discussion: VAE

Question 1: Why do the generated digits from the VAE look somewhat blurry compared to real MNIST images?

特征在数字特征之间了

Question 2: What would happen if we remove the KL-divergence term from the loss?

让我试试. 生成了非常糟糕的结果

💡 Discussion: GAN

Question 1: Why do GANs generate sharper images than VAEs?

因为对抗损失过度逼近高频细节而 VAE 用了 KL，如果高频细节过多，KL 损失就大

Question 2: What is “Mode Collapse”? (You might even observe it in your output grid)

只学会了其中集中样式。因为判别器也不判多样性。

💡 Discussion: Diffusion Models

Question 1: What exactly does num_inference_steps do?

调节生成单步走多少

Question 2: What is guidance_scale (Classifier-Free Guidance / CFG)?

就是对 prompt 的权重（放大条件信号） Classifier-Free 的原理：每步做两个预测，uncond 和 prompt-conditioned，然后 diffusion 走的方向取个加权. Use-classifer 方法：用分类器梯度引导

(\epsilon_{\text{guided}})：引导后的噪声预测，用它来做反向采样更新。
(\epsilon_\theta(x_t,t))：扩散模型（参数为 (\theta)）在时间步 (t) 对当前噪声图 (x_t) 的原始噪声预测。
(x_t)：第 (t) 步的带噪图像（latent/像素都可，看模型定义）。
(t)：扩散时间步。
(s)：guidance strength / scale，引导强度超参数。
(\sigma_t)：与时间步 (t) 对应的噪声尺度（由噪声调度器决定）。
(\nabla_{x_t})：对 (x_t) 求梯度。
(\log p_\phi(y|x_t,t))：分类器（参数 (\phi)）在给定 (x_t,t) 时，对目标类别 (y) 的对数概率。
(p_\phi(y|x_t,t))：分类器输出的类别后验概率。
(y)：你希望生成结果符合的目标类别标签。
(\phi)：外部分类器参数。
(\theta)：扩散模型参数。

1
# x_t: [B, C, H, W]
2
# t:   [B] or scalar timestep
3
# y:   [B] target class ids
4
# s: guidance scale (float)
5
# sigma_t: noise scale at timestep t (float or tensor broadcastable to x_t)
6

7
x_t = x_t.detach().requires_grad_(True)
8

9
# 扩散模型给原始噪声预测
10
with torch.no_grad():
11
    eps = unet(x_t, t)  # epsilon_theta(x_t, t)
12

13
# 3) 外部分类器给类别分数（需要梯度）
14
logits = classifier(x_t, t)              # [B, num_classes]
15
log_probs = F.log_softmax(logits, dim=-1)
11 collapsed lines
16
# 每个样本目标类log p
17
selected = log_probs[torch.arange(x_t.size(0), device=x_t.device), y]
18
obj = selected.sum()  # 标量，便于 autograd
19

20
# 4) 求 grad = ∇_{x_t} log p(y|x_t,t)
21
grad = torch.autograd.grad(obj, x_t, only_inputs=True)[0]
22
eps_guided = eps - s * sigma_t * grad
23
with torch.no_grad():
24
    x_prev = ddpm_step(x_t.detach(), eps_guided.detach(), t)
25

26
x_t = x_prev