小土堆pytorch学习——P15-DataLoader使用

DataLoader Link:https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader

image-20230704090353627

Q:What does the figure mean?

A:连接数据集和样本,并提供给定数据集的可迭代形式。DataLoader支持以(单进程/多进程)、自定义加载顺序、可选的自动批处理(排序)和内存固定的映射方式和迭代方式的数据集

看参数

DataLoader众多参数皆有默认值,除dataset外,它需要给定数据集。

batch_size——每次从数据集中取多少样本进行处理

shuffle——再次载入数据集时,是否要打乱顺序,因为数据集可能不是读取一次

num_workers——决定是单线程还是多线程处理

drop_last——在tensorboard中显示,是否要显示最后的不规则图形的那次

sampler

batch_sampler

example

import torchvision
from tensorboardX import SummaryWriter
from torch.utils.data import DataLoader

test_data = torchvision.datasets.CIFAR10(root = "hymenoptera_data/val/CIFAR10" , train = False
                                        ,transform= torchvision.transforms.ToTensor())
test_loader = DataLoader(
    dataset = test_data , batch_size = 64 ,
    shuffle = True , num_workers = 0 ,
    drop_last = False
)
#通过该类的getitem,可知它返回一个img , target
img,target = test_data[0]

#测试数据集中第一张图片以及target
print(f"img.shape = {img.shape}")
print(f"target is {target}")

运行结果👇

image-20230704101110706

载入tensorboard中,先是对batch_size进行调整,有128和4的调整。

test_loader = DataLoader(
    dataset = test_data , batch_size =4 ,
    shuffle = True , num_workers = 0 ,
    drop_last = False
)
for data in test_loader:
    imgs , targets = data
    writer.add_images("test_loader_4" , imgs,step)
    step = step+1
    
test_loader = DataLoader(
    dataset = test_data , batch_size = 128 ,
    shuffle = True , num_workers = 0 ,
    drop_last = False
)   
for data in test_loader:
    imgs , targets = data
    writer.add_images("test_loader_128" , imgs,step)
    step = step+1
writer.close()

打开tensorboard展示如下👇

image-20230704101428357

有对shffle的调整,为True或False。下图是batch_size调整为64时,打乱顺序的结果。

test_loader = DataLoader(
    dataset = test_data , batch_size = 64 ,
    shuffle = True , num_workers = 0 ,
    drop_last = False
)
for epoch in range(2):
    print(f"epoch = {epoch}")
    for data in test_loader:
        imgs,targets = data
        writer.add_images(f"test_loader_epoch_{epoch}",
                            imgs, step)
        step+=1
writer.close()

打开tensorboard展示如下👇

image-20230704101444823

数据集中的dateset中有getitem方法,DataLoader中没有。getitem方法返回img,target,当dataloader(batch_size=4)时,相当于把dataset中img0,1,2,3打包,target0,1,2,3打包,作为dataloader中的返回值。

posted @ 2023-07-10 16:00  西红柿爆炒鸡蛋  阅读(128)  评论(0)    收藏  举报