How to?

Lab4

Jan 1, 2025
研一下深度学习
3 Minutes
581 Words

六步走

  • data to
  • opt zero
  • forward
  • loss
  • backward
  • opt step
1
for batch_idx, (data, _) in enumerate(train_loader):
2
data = data.to(device)
3
optimizer_vae.zero_grad()
4
recon_batch, mu, logvar = vae(data)
5
loss = vae_loss(recon_batch, data, mu, logvar)
6
loss.backward()
7
optimizer_vae.step()

💡 Discussion: VAE

Question 1: Why do the generated digits from the VAE look somewhat blurry compared to real MNIST images?

特征在数字特征之间了

Question 2: What would happen if we remove the KL-divergence term from the loss?

让我试试. 生成了非常糟糕的结果 default

💡 Discussion: GAN

Question 1: Why do GANs generate sharper images than VAEs?

因为对抗损失过度逼近高频细节 而 VAE 用了 KL,如果高频细节过多,KL 损失就大

Question 2: What is “Mode Collapse”? (You might even observe it in your output grid)

只学会了其中集中样式。因为判别器也不判多样性。

default

💡 Discussion: Diffusion Models

Question 1: What exactly does num_inference_steps do?

调节生成单步走多少

Question 2: What is guidance_scale (Classifier-Free Guidance / CFG)?

就是对 prompt 的权重(放大条件信号) Classifier-Free 的原理:每步做两个预测,uncond 和 prompt-conditioned,然后 diffusion 走的方向取个加权. Use-classifer 方法:用分类器梯度引导

𝜖guided=𝜖𝜃(𝑥𝑡,𝑡)𝑠𝜎𝑡𝑥𝑡log𝑝𝜙(𝑦|𝑥𝑡,𝑡)

  • (\epsilon_{\text{guided}}):引导后的噪声预测,用它来做反向采样更新。
  • (\epsilon_\theta(x_t,t)):扩散模型(参数为 (\theta))在时间步 (t) 对当前噪声图 (x_t) 的原始噪声预测。
  • (x_t):第 (t) 步的带噪图像(latent/像素都可,看模型定义)。
  • (t):扩散时间步。
  • (s):guidance strength / scale,引导强度超参数。
  • (\sigma_t):与时间步 (t) 对应的噪声尺度(由噪声调度器决定)。
  • (\nabla_{x_t}):对 (x_t) 求梯度。
  • (\log p_\phi(y|x_t,t)):分类器(参数 (\phi))在给定 (x_t,t) 时,对目标类别 (y) 的对数概率。
  • (p_\phi(y|x_t,t)):分类器输出的类别后验概率。
  • (y):你希望生成结果符合的目标类别标签。
  • (\phi):外部分类器参数。
  • (\theta):扩散模型参数。
1
# x_t: [B, C, H, W]
2
# t: [B] or scalar timestep
3
# y: [B] target class ids
4
# s: guidance scale (float)
5
# sigma_t: noise scale at timestep t (float or tensor broadcastable to x_t)
6
7
x_t = x_t.detach().requires_grad_(True)
8
9
# 扩散模型给原始噪声预测
10
with torch.no_grad():
11
eps = unet(x_t, t) # epsilon_theta(x_t, t)
12
13
# 3) 外部分类器给类别分数(需要梯度)
14
logits = classifier(x_t, t) # [B, num_classes]
15
log_probs = F.log_softmax(logits, dim=-1)
11 collapsed lines
16
# 每个样本目标类log p
17
selected = log_probs[torch.arange(x_t.size(0), device=x_t.device), y]
18
obj = selected.sum() # 标量,便于 autograd
19
20
# 4) 求 grad = ∇_{x_t} log p(y|x_t,t)
21
grad = torch.autograd.grad(obj, x_t, only_inputs=True)[0]
22
eps_guided = eps - s * sigma_t * grad
23
with torch.no_grad():
24
x_prev = ddpm_step(x_t.detach(), eps_guided.detach(), t)
25
26
x_t = x_prev
Article title:Lab4
Article author:Julyfun
Release time:Jan 1, 2025
Copyright 2026
Sitemap