Samar-blog

导航

P17.神经网络——卷积层

17.1官网:Docs-PyTorch-torch.nn-Convolution Layers

1.nn.Conv2d:图像主要是二维矩阵

点击查看代码
class torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)[source]
in_channels:输入通道数,out_channels:输出通道数,kernel_size:卷积核大小,stride:步长,padding:填充,dilation:卷积核的一个对应位(距离),bias偏置(常设置为True)

比较常用的就是前面五个参数;

2.卷积的数学公式:

点击查看代码
Applies a 2D convolution over an input signal composed of several input planes.

In the simplest case, the output value of the layer with input size 
(N,Cin,H,W) and output (N,Cout,Hout,Wout) can be precisely described as:
#公式如下图所示:  
where ⋆ is the valid 2D cross-correlation operator, N is a batch size, C denotes a number of channels, H is a height of input planes in pixels, and W is width in pixels.

1

3.Conv2d的参数

点击查看代码
#Conv2d的Parameters
in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the convolving kernel
stride (int or tuple, optional) – Stride of the convolution. Default: 1
padding (int, tuple or str, optional) – Padding added to all four sides of the input. Default: 0
dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1
groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool, optional) – If True, adds a learnable bias to the output. Default: True
padding_mode (str, optional) – 'zeros', 'reflect', 'replicate' or 'circular'. Default: 'zeros'

dilation controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of what dilation does.

点击link后:

2

5.in_channel和out_channel

设置out_channel=2

3

17.2在pycharm执行卷积层操作

1.输入代码,创建dyl神经网络

点击查看代码
import torch
import torchvision
from torch.nn import Conv2d
from torch.utils.data import DataLoader
from torch import nn

dataset = torchvision.datasets.CIFAR10(root = "./dataset",train = False,transform=torchvision.transforms.ToTensor(),download=True)
dataloader = DataLoader(dataset,batch_size=64)

#1.定义神经网络的模板
class Dyl(nn.Module):
    #先完成父类的初始化
    def __init__(self) -> None:
        super(Dyl,self).__init__()
        self.conv1 = Conv2d(in_channels=3,out_channels=6,kernel_size=3,stride=1,padding=0)
        #CIFAR10是彩色图像,所以输入是三层即in_channels=3,同时,设置out_channels为6层
    def forward(self,x):
        x = self.conv1(x)
        return x
#2.创建dyl神经网络
dyl = Dyl()
print(dyl)

2.输出结果如下

点击查看代码
D:\anaconda3\envs\pytorch\python.exe D:/DeepLearning/Learn_torch/P17_conv2d.py
Files already downloaded and verified
Dyl(
  (conv1): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))
)

进程已结束,退出代码0
输出结果:神经网络名字叫Dyl,它其中有个卷积层叫conv1,卷积层用参数定义是kernel为3*3矩阵

3.现在想把dataloaderl里面的每张图像放到神经网络

点击查看代码
import torch
import torchvision
from torch.nn import Conv2d
from torch.utils.data import DataLoader
from torch import nn

dataset = torchvision.datasets.CIFAR10(root = "./dataset",train = False,transform=torchvision.transforms.ToTensor(),download=True)
dataloader = DataLoader(dataset,batch_size=64)

#1.定义神经网络的模板
class Dyl(nn.Module):
    #先完成父类的初始化
    def __init__(self) -> None:
        super(Dyl,self).__init__()
        self.conv1 = Conv2d(in_channels=3,out_channels=6,kernel_size=3,stride=1,padding=0)
        #CIFAR10是彩色图像,所以输入是三层即in_channels=3,同时,设置out_channels为6层
    def forward(self,x):
        x = self.conv1(x)
        return x
#2.创建dyl神经网络
dyl = Dyl()
# print(dyl)

#3.现在想把dataloaderl里面的每张图像放到神经网络
for data in dataloader:
    imgs,targets = data
    output = dyl(imgs)
    print(imgs.shape)
    print(output.shape)

4.输出结果如下

点击查看代码
D:\anaconda3\envs\pytorch\python.exe D:/DeepLearning/Learn_torch/P17_conv2d.py
Files already downloaded and verified
torch.Size([64, 3, 32, 32])
torch.Size([64, 6, 30, 30])
......
torch.Size([64, 3, 32, 32])
torch.Size([64, 6, 30, 30])
torch.Size([16, 3, 32, 32])
torch.Size([16, 6, 30, 30])

进程已结束,退出代码0

卷积层会正常对输入图像进行计算,输出的output.shape会根据卷积参数(输入通道、输出通道、卷积核大小等)发生相应变化。

可以看到,卷积之前的batch_size是64,in_channels是3,图像是32×32;经过卷积之后,channels变成6,图像变成了30×30

5.在tensorboard中更直观展示

(1)代码如下:
点击查看代码
writer = SummaryWriter("conv2d")
step = 0
for data in dataloader:
    imgs,targets = data
    output = dyl(imgs)
    print(imgs.shape)
    print(output.shape)
    #torch.Size([64, 3, 32, 32])
    writer.add_images("input",imgs,step)
    #torch.Size([64, 6, 30, 30])
    writer.add_images("output",output,step)
    step = step + 1
writer.close()
(2)报错了:
点击查看代码
D:\anaconda3\envs\pytorch\python.exe D:/DeepLearning/Learn_torch/P17_conv2d.py
Files already downloaded and verified
torch.Size([64, 3, 32, 32])
torch.Size([64, 6, 30, 30])
Traceback (most recent call last):
  File "D:/DeepLearning/Learn_torch/P17_conv2d.py", line 43, in <module>
    writer.add_images("output",output,step)#这个报错,是因为:
  File "D:\anaconda3\envs\pytorch\lib\site-packages\torch\utils\tensorboard\writer.py", line 662, in add_images
    image(tag, img_tensor, dataformats=dataformats), global_step, walltime
  File "D:\anaconda3\envs\pytorch\lib\site-packages\torch\utils\tensorboard\summary.py", line 436, in image
    tensor = convert_to_HWC(tensor, dataformats)
  File "D:\anaconda3\envs\pytorch\lib\site-packages\torch\utils\tensorboard\_utils.py", line 113, in convert_to_HWC
    tensor_CHW = make_grid(tensor_NCHW)
  File "D:\anaconda3\envs\pytorch\lib\site-packages\torch\utils\tensorboard\_utils.py", line 74, in make_grid
    assert I.ndim == 4 and I.shape[1] == 3
AssertionError

进程已结束,退出代码1
(3)【deepseek分析】

你这里将 add_image 改为了 add_images(注意多了一个 s),这个改动是合理的,因为 add_images 专门用于显示批量图像(4 维张量 [B, C, H, W] 或 [B, H, W, C])。但仍然可能存在一个问题:输出 output 的通道数为 6,而 add_images 默认期望图像通道数为 1(灰度)或 3(RGB),6 通道的张量无法直接作为图像显示。(没错,我最开始输入的是add_image()而非add_images())

具体分析:
输入图像(imgs):

形状为 [64, 3, 32, 32]([B, C, H, W]),通道数为 3(符合 RGB 图像格式)。
add_images 会自动识别批量图像格式,无需额外指定 dataformats(默认支持 [B, C, H, W]),因此输入图像可以正常显示。

输出特征图(output):

形状为 [64, 6, 30, 30],通道数为 6(卷积层输出的特征通道数)。
图像显示通常只支持 1 或 3 通道,6 通道张量会导致 add_images 报错(提示通道数不支持)。

6.对输出的 6 通道特征图进行处理

保留前 3 个通道(或通过其他方式如reshape变为 3 通道),使其符合图像显示要求
点击查看代码
import torch
import torchvision
from torch.nn import Conv2d
from torch.utils.data import DataLoader
from torch import nn
from torch.utils.tensorboard import SummaryWriter

dataset = torchvision.datasets.CIFAR10(root = "./dataset",train = False,transform=torchvision.transforms.ToTensor(),download=True)
dataloader = DataLoader(dataset,batch_size=64)

#1.定义神经网络的模板
class Dyl(nn.Module):
    #先完成父类的初始化
    def __init__(self) -> None:
        super(Dyl,self).__init__()
        self.conv1 = Conv2d(in_channels=3,out_channels=6,kernel_size=3,stride=1,padding=0)
        #CIFAR10是彩色图像,所以输入是三层即in_channels=3,同时,设置out_channels为6层
    def forward(self,x):
        x = self.conv1(x)
        return x
#2.创建dyl神经网络
dyl = Dyl()
# print(dyl)

#3.现在想把dataloaderl里面的每张图像放到神经网络
# for data in dataloader:
#     imgs,targets = data
#     output = dyl(imgs)
#     print(imgs.shape)
#     print(output.shape)

#4.在tensorboard中更直观展示
writer = SummaryWriter("conv2d")
step = 0
for data in dataloader:
    imgs,targets = data
    output = dyl(imgs)
    print(imgs.shape)
    print(output.shape)

    #torch.Size([64, 3, 32, 32])
    writer.add_images("input",imgs,step)
    #torch.Size([64, 6, 30, 30])
    #writer.add_images("output",output,step)这个报错,是因为:
#output的通道数为6,而add_images默认期望图像通道数为1(灰度)或3(RGB),6通道的张量无法直接作为图像显示。

    #(1)对output取前3个通道,形状变为[64, 3, 30, 30]
    #writer.add_images("output",output[:, 0:3, :, :],step)

    #(2)对output进行尺寸变化,想把[64,3,30,30]变成[xxx,3,30,30]
    #第一个数xxx不知道是多少的时候写-1,会根据我后面的值进行计算
    output = torch.reshape(output,(-1,3,30,30))
    writer.add_images("output", output, step)

    step = step + 1
writer.close()

7.TensorBoard 会正常显示输入的批量图像和卷积后的特征图(前 3 通道),在终端输入以下代码:

tensorboard --logdir=conv2d

8.成果如下:

输入和经过卷积之后得到的输出:

P17_tensorboard

posted on 2025-11-05 10:40  风居住的街道DYL  阅读(0)  评论(0)    收藏  举报