关于卷积层的提示
注意输入通道和输出通道是全连接的,即:
若单层图像的核有 X 个(取决于图像长宽,kernel_size, padding 和 stride),输入通道 n,输出通道为 m,则核函数有 X * n * m 个
核参数则有 X \times n \times m \times \texttt{kernel_size} \times \texttt{kernel_size}
个
ref: https://zh.d2l.ai/chapter_convolutional-neural-networks/channels.html
特色
相比 vgg,nin 主要特色两个:
- block 内部是
Conv + ReLU + Conv(kernel_size = 1) + ReLU + Conv(kernel_size = 1) + ReLU + MaxPool
,给每个像素做了通道到通道的全连接 - 最后直接是 384 通道到分类数通道的 nin block,没有
Linear
层
全局平均汇聚层将图像的大小压缩至 1 * 1,不改变 n 和 channels,也不改变维数。
1def nin_block(in_channels, out_channels, kernel_size, strides, padding):2 return nn.Sequential(3 nn.Conv2d(in_channels, out_channels, kernel_size, strides, padding),4 nn.ReLU(),5 nn.Conv2d(out_channels, out_channels, kernel_size=1), nn.ReLU(),6 nn.Conv2d(out_channels, out_channels, kernel_size=1), nn.ReLU())7
8net = nn.Sequential(9 nin_block(1, 96, kernel_size=11, strides=4, padding=0),10 nn.MaxPool2d(3, stride=2),11 nin_block(96, 256, kernel_size=5, strides=1, padding=2),12 nn.MaxPool2d(3, stride=2),13 nin_block(256, 384, kernel_size=3, strides=1, padding=1),14 nn.MaxPool2d(3, stride=2),15 nn.Dropout(0.5),5 collapsed lines
16 # 标签类别数是1017 nin_block(384, 10, kernel_size=3, strides=1, padding=1),18 nn.AdaptiveAvgPool2d((1, 1)),19 # 将四维的输出转成二维的输出,其形状为(批量大小,10)20 nn.Flatten())
1X = torch.rand(size=(1, 1, 224, 224))2for layer in net:3 X = layer(X)4 print(layer.__class__.__name__,'output shape:\t', X.shape)
1Copy to clipboard2Sequential output shape: torch.Size([1, 96, 54, 54]) # 包含整个 nin_block3MaxPool2d output shape: torch.Size([1, 96, 26, 26])4Sequential output shape: torch.Size([1, 256, 26, 26])5MaxPool2d output shape: torch.Size([1, 256, 12, 12])6Sequential output shape: torch.Size([1, 384, 12, 12])7MaxPool2d output shape: torch.Size([1, 384, 5, 5])8Dropout output shape: torch.Size([1, 384, 5, 5])9Sequential output shape: torch.Size([1, 10, 5, 5])10AdaptiveAvgPool2d output shape: torch.Size([1, 10, 1, 1])11Flatten output shape: torch.Size([1, 10])