7.3.nin - how to

关于卷积层的提示

注意输入通道和输出通道是全连接的，即：

若单层图像的核有 X 个（取决于图像长宽，kernel_size, padding 和 stride），输入通道 n，输出通道为 m，则核函数有 X * n * m 个

核参数则有 X \times n \times m \times \texttt{kernel_size} \times \texttt{kernel_size} 个

ref: https://zh.d2l.ai/chapter_convolutional-neural-networks/channels.html

特色

相比 vgg，nin 主要特色两个：

block 内部是 Conv + ReLU + Conv(kernel_size = 1) + ReLU + Conv(kernel_size = 1) + ReLU + MaxPool，给每个像素做了通道到通道的全连接
最后直接是 384 通道到分类数通道的 nin block，没有 Linear 层

全局平均汇聚层将图像的大小压缩至 1 * 1，不改变 n 和 channels，也不改变维数。

1
def nin_block(in_channels, out_channels, kernel_size, strides, padding):
2
    return nn.Sequential(
3
        nn.Conv2d(in_channels, out_channels, kernel_size, strides, padding),
4
        nn.ReLU(),
5
        nn.Conv2d(out_channels, out_channels, kernel_size=1), nn.ReLU(),
6
        nn.Conv2d(out_channels, out_channels, kernel_size=1), nn.ReLU())
7

8
net = nn.Sequential(
9
    nin_block(1, 96, kernel_size=11, strides=4, padding=0),
10
    nn.MaxPool2d(3, stride=2),
11
    nin_block(96, 256, kernel_size=5, strides=1, padding=2),
12
    nn.MaxPool2d(3, stride=2),
13
    nin_block(256, 384, kernel_size=3, strides=1, padding=1),
14
    nn.MaxPool2d(3, stride=2),
15
    nn.Dropout(0.5),
5 collapsed lines
16
    # 标签类别数是10
17
    nin_block(384, 10, kernel_size=3, strides=1, padding=1),
18
    nn.AdaptiveAvgPool2d((1, 1)),
19
    # 将四维的输出转成二维的输出，其形状为(批量大小,10)
20
    nn.Flatten())

1
X = torch.rand(size=(1, 1, 224, 224))
2
for layer in net:
3
    X = layer(X)
4
    print(layer.__class__.__name__,'output shape:\t', X.shape)

1
Copy to clipboard
2
Sequential output shape:     torch.Size([1, 96, 54, 54]) # 包含整个 nin_block
3
MaxPool2d output shape:      torch.Size([1, 96, 26, 26])
4
Sequential output shape:     torch.Size([1, 256, 26, 26])
5
MaxPool2d output shape:      torch.Size([1, 256, 12, 12])
6
Sequential output shape:     torch.Size([1, 384, 12, 12])
7
MaxPool2d output shape:      torch.Size([1, 384, 5, 5])
8
Dropout output shape:        torch.Size([1, 384, 5, 5])
9
Sequential output shape:     torch.Size([1, 10, 5, 5])
10
AdaptiveAvgPool2d output shape:      torch.Size([1, 10, 1, 1])
11
Flatten output shape:        torch.Size([1, 10])