Day155:torchvision.transforms（pytorch和torchvision版本对应关系）

ztj100 2024-10-31 16:12 41 浏览 0 评论

torchvision

PyTorch框架中有一个非常重要且好用的包：torchvision，该包主要由3个子包组成，分别是：torchvision.datasets、torchvision.models、torchvision.transforms。

torchvision.transformas

torchvision.transforms这个包中包含resize、crop等常见的data augmentation操作，基本上PyTorch中的data augmentation操作都可以通过该接口实现。该包主要包含两个脚本：transformas.py和functional.py，前者定义了各种data augmentation的类，在每个类中通过调用functional.py中对应的函数完成data augmentation操作。

pytorch的图像变换模块

pytorch的图像变换模块主要由五部分构成：Transforms on PIL Image，Transforms on torch.Tensor，Conversion Transforms，Generic Transforms，Functional Transforms。
常见的变换Transforms on PIL Image、Transforms on torch.Tensor、Conversion Transforms它们可以使用Compose链接在一起。

import torchvision
import torch
train_augmentation = torchvision.transforms.Compose([
                              torchvision.transforms.Resize(256),
                              torchvision.transforms.RandomCrop(224),                                                                            
                              torchvision.transofrms.RandomHorizontalFlip(),
                              torchvision.transforms.ToTensor(),
                              torch vision.Normalize([0.485, 0.456, -.406],[0.229, 0.224, 0.225])
                                                    ])

Class custom_dataread(torch.utils.data.Dataset):
    def __init__():
        ...
    def __getitem__():
        # use self.transform for input image
    def __len__():
        ...

train_loader = torch.utils.data.DataLoader(
    custom_dataread(transform=train_augmentation),
    batch_size = batch_size, shuffle = True,
    num_workers = workers, pin_memory = True)

注意

torchvision.transforms.ToTensor(),
torch vision.Normalize([0.485, 0.456, -.406],[0.229, 0.224, 0.225]

这两行使用使用时顺序不可以颠倒，原因是因为归一化需要是Tensor型的数据，所以要先将数据转化为Tensor型才可以进行归一化。
一般情况下我们将对图片的变换操作放到torchvision.transforms.Compose（）进行组合变换。

Transforms on PIL Image

torchvision.transforms.CenterCrop

torchvision.transforms.CenterCrop（大小）

参数介绍

size（sequence 或int） - 输出大小。如果size是int而不是像（h，w）这样的序列，则进行正方形裁剪（大小，大小）。

torchvision.transforms.ColorJitter

torchvision.transforms.ColorJitter（亮度= 0，对比度= 0，饱和度= 0，色调= 0 ）
#随机更改图像的亮度，对比度和饱和度。

参数介绍

亮度（浮点数或python的元组） - 抖动亮度多少。从[max（0,1-brightness），1 + brightness]或给定[min，max]均匀地选择brightness_factor。应该是非负数。
对比度（浮点数或python的元组） - 抖动对比度多少。contrast_factor从[max（0,1-contrast），1 + contrast]或给定[min，max]中均匀选择。应该是非负数。
饱和度（浮点数或python的元组数：float （min ，max ）） - 饱和度抖动多少。饱和度_因子从[max（0,1-saturation），1 + saturation]或给定[min，max]中均匀选择。应该是非负数。
色调（浮点数或python的元组：浮点数（最小值，最大值）） - 抖动色调多少。从[-hue，hue]或给定的[min，max]中均匀地选择hue_factor。应该有0 <= hue <= 0.5或-0.5 <= min <= max <= 0.5。

torchvision.transforms.FiveCrop

torchvision.transforms.FiveCrop（大小）
#将给定的PIL图像裁剪为四个角和中央裁剪

参数介绍

size（sequence 或int） - 输出大小。如果大小是int 而不是像（h，w）这样的序列，则进行大小（大小，大小）的正方形裁剪。
例子：

transform = Compose([
FiveCrop(size), # this is a list of PIL Images
Lambda(lambda crops: torch.stack([ToTensor()(crop) for crop in crops])) # returns a 4D tensor
])
#In your test loop you can do the following:
input, target = batch # input is a 5d tensor, target is 2d
bs, ncrops, c, h, w = input.size()
result = model(input.view(-1, c, h, w)) # fuse batch size and ncrops
result_avg = result.view(bs, ncrops, -1).mean(1) # avg over crops

torchvision.transforms.RandomCrop

torchvision.transforms.RandomCrop（size，padding = None，pad_if_needed = False，fill = 0
                                   padding_mode ='constant' ）
#在随机位置裁剪给定的PIL图像。

size（sequence 或int） - 出大小。如果size是int而不是像（h，w）这样的序列，则进行正方形裁剪（大小，大小）。
padding（int或sequence ，optional） - 图像每个边框上的可选填充。默认值为None，即无填充。如果提供长度为4的序列，则它用于分别填充左，上，右，下边界。如果提供长度为2的序列，则分别用于填充左/右，上/下边界。
pad_if_needed（boolean） - 如果小于所需大小，它将填充图像以避免引发异常。由于在填充之后完成裁剪，因此填充似乎是在随机偏移处完成的。
fill - 恒定填充的像素填充值。默认值为0.如果长度为3的元组，则分别用于填充R，G，B通道。仅当padding_mode为常量时才使用此值
padding_mode -填充类型。应该是：常量，边缘，反射或对称。默认值是常量。

常量：用常量值填充

edge：填充图像边缘的最后一个值

反射：具有图像反射的点（不重复边缘上的最后一个值），填充[1,2,3,4]在反射模式下两侧有2个元素将导致[3,2,1,2,3,4,3,2]

对称：具有图像反射的点（重复边缘上的最后一个值），填充[1,2,3,4]在对称模式下两侧有2个元素将导致[2,1,1,2,3,4,4,3]

torchvision.transforms.RandomHorizontalFlip

torchvision.transforms.RandomHorizontalFlip（p = 0.5 ）
#以给定的概率随机水平翻转给定的PIL图像

p（浮动） - 图像被翻转的概率。默认值为0.5

torchvision.transforms.RandomResizedCrop


torchvision.transforms.RandomResizedCrop(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), 
                                         interpolation=2)#将给定的PIL图像裁剪为随机大小和宽高比

生成原始图像的随机大小（默认值：0.08到1.0）和随机宽高比（默认值：3/4到4/3）。输出最终调整到适当的大小。这通常用于训练Inception网络。
参数介绍：

size - 每条边的预期输出大小

scale - 裁剪的原始尺寸的大小范围
ratio - 裁剪的原始宽高比的宽高比范围
interpolation - 默认值：PIL.Image.BILINEAR

torchvision.transforms.RandomRotation

torchvision.transforms.RandomRotation（degrees，resample = False，
                                       expand = False，center = None ）
#按角度旋转图像

参数介绍：

degrees（sequence 或float或int） - 要选择的度数范围。如果degrees是一个数字而不是像（min，max）这样的序列，则度数范围将是（-degrees，+ degrees）。

resample（{PIL.Image.NEAREST ，PIL.Image.BILINEAR ，PIL.Image.BICUBIC} ，可选） - 可选的重采样过滤器。请参阅过滤器以获取更多信如果省略，或者图像具有模式“1”或“P”，则将其设置为PIL.Image.NEAREST。
expand（bool，optional） - 可选的扩展标志。如果为true，则展开输出以使其足够大以容纳整个旋转图像。如果为false或省略，则使输出图像与输入图像的大小相同。请注意，展开标志假定围绕中心旋转而不进行平移。
center（2-tuple ，optional） - 可选的旋转中心。原点是左上角。默认值是图像的中心。

torchvision.transforms.RandomVerticalFlip

torchvision.transforms.RandomVerticalFlip（p = 0.5 ）
#以给定的概率随机垂直翻转给定的PIL图像

p（浮动） - 图像被翻转的概率。默认值为0.5

torchvision.transforms.TenCrop

torchvision.transforms.TenCrop（size，vertical_flip = False ）
#将给定的PIL图像裁剪为四个角，中央裁剪加上这些的翻转版本（默认使用水平翻转）

参数介绍：

size（sequence 或int） - 输出大小。如果size是int而不是像（h，w）这样的序列，则进行正方形裁剪（大小，大小）。
vertical_flip（bool） - 使用垂直翻转而不是水平翻转

例子：

transform = Compose([
TenCrop(size), # this is a list of PIL Images
Lambda(lambda crops: torch.stack([ToTensor()(crop) for crop in crops])) # returns a 4D tensor
])
#In your test loop you can do the following:
input, target = batch # input is a 5d tensor, target is 2d
bs, ncrops, c, h, w = input.size()
result = model(input.view(-1, c, h, w)) # fuse batch size and ncrops
result_avg = result.view(bs, ncrops, -1).mean(1) # avg over crops

torchvision.transforms.Grayscale

torchvision.transforms.Grayscale（num_output_channels = 1 ）
#将图像转换为灰度。

参数介绍：

num_output_channels（int） - （1或3）输出图像所需的通道数

如果num_output_channels == 1：返回的图像是单通道

- 如果num_output_channels == 3：返回的图像是3通道，r == g == b

torchvision.transforms.Pad

torchvision.transforms.Pad（padding，fill = 0，padding_mode ='constant' ）
#使用给定的“pad”值在所有面上填充给定的PIL图像

参数介绍：

padding（int或tuple） - 每个边框上的填充。如果提供单个int，则用于填充所有边框。如果提供长度为2的元组，则分别为左/右和上/下的填充。如果提供长度为4的元组，则分别为左，上，右和下边框的填充。
fill（int或tuple） - 常量填充的像素填充值。默认值为0.如果长度为3的元组，则分别用于填充R，G，B通道。仅当padding_mode为常量时才使用此值
padding_mode（str） -填充类型。应该是：常量，边缘，反射或对称。默认值是常量。

常量：用常量值填充

edge：填充图像边缘的最后一个值

反射：具有图像反射的点（不重复边缘上的最后一个值），填充[1,2,3,4]在反射模式下两侧有2个元素将导致[3,2,1,2,3,4,3,2]

对称：具有图像反射的点（重复边缘上的最后一个值），填充[1,2,3,4]在对称模式下两侧有2个元素将导致[2,1,1,2,3,4,4,3]

torchvision.transforms.RandomAffine

torchvision.transforms.RandomAffine（degrees，translate = None，scale = None，shear = None，
                                     resample = False，fillcolor = 0 ）
#图像保持中心不变的随机仿射变换

degrees（sequence 或float或int） - 要选择的度数范围。如果degrees是一个数字而不是像（min，max）这样的序列，则度数范围将是（-degrees，+ degrees）。设置为0可停用旋转。
translate（元组，可选） - 水平和垂直平移的最大绝对分数元组。例如translate =（a，b），然后在范围-img_width * a <dx <img_width * a中随机采样水平移位，并且在-img_height * b <dy <img_height * b范围内随机采样垂直移位。默认情况下不会翻译。
scale（元组，可选） - 缩放因子间隔，例如（a，b），然后从范围a <= scale <= b中随机采样缩放。默认情况下会保持原始比例。
shear（sequence 或float或int，optional） - 要选择的度数范围。如果degrees是一个数字而不是像（min，max）这样的序列，则度数范围将是（-degrees，+ degrees）。默认情况下不会应用剪切
resample（{PIL.Image.NEAREST ，PIL.Image.BILINEAR ，PIL.Image.BICUBIC} ，可选） - 可选的重采样过滤器。请参阅过滤器以获取更多信如果省略，或者图像具有模式“1”或“P”，则将其设置PIL.Image.NEAREST。
fillcolor（tuple或int） - 输出图像中变换外部区域的可选填充颜色（RGB图像的元组和灰度的int）。（Pillow> = 5.0.0）

torchvision.transforms.RandomApply

torchvision.transforms.RandomApply(transforms, p=0.5)
#随机应用给定概率的变换列表

transforms（列表或元组） - 转换列表
p（浮动） - 概率

torchvision.transforms.RandomChoice(transforms)#应用从列表中随机挑选的单个转换
torchvision.transforms.RandomGrayscale（p = 0.1 ）
              #将图像随机转换为灰度，概率为p（默认值为0.1）
              #p（float） - 图像应转换为灰度的概率
              #输入图像的灰度版本，概率为p，概率不变（1-p）。 - 如果输入图像为1通道：灰度版本为
              #通道 - 如果输入图像为3通道：灰度版本为3通道，r == g == b
torchvision.transforms.RandomOrder（transforms）   #以随机顺序应用转换列表
torchvision.transforms.RandomPerspective（distortion_scale = 0.5，p = 0.5，interpolation = 3 ）
                                          #以给定的概率随机执行给定PIL图像的透视变换

interpolation – Default- Image.BICUBIC
p（浮动） - 图像被透视变换的概率。默认值为0.5
distortion_scale（float） - 它控制失真程度，范围从0到1.默认值为0.5

transform = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
transforms.RandomErasing(),
])

torchvision.transforms.Resize

torchvision.transforms.Resize（size，interpolation = 2 ）
#将输入PIL图像的大小调整为给定大小

size（sequence 或int） - 所需的输出大小。如果size是类似（h，w）的序列，则输出大小将与此匹配。如果size是int，则图像的较小边缘将与此数字匹配。即，如果高度>宽度，则图像将重新缩放为（尺寸*高度/宽度，尺寸）
interpolation（int，optional） - 所需的插值。默认是 PIL.Image.BILINEAR

Transforms on torch.Tensor

torchvision.transforms.Normalize（mean，std，inplace = False ）
#用平均值和标准偏差归一化张量图像。给定mean：(M1,...,Mn)和
#std：(S1,..,Sn)对于n通道，此变换将标准化输入的每个通道，
#torch.*Tensor即 input[channel] = (input[channel] - mean[channel]) / std[channel]

Conversion Transforms

torchvision.transforms.ToPILImage（mode = None ）# tensor或ndarray转换为PIL图像
#形状C x H x W或形状为H x W x C的numpy ndarray到PIL图像，同时保留值范围

模式（PIL.Image模式） -输入数据的颜色空间和像素深度（可选）。如果mode是None（默认），则对输入数据做出一些假设：

如果输入有4个通道，mode则假定为RGBA。

如果输入有3个通道，mode则假定为RGB。

如果输入有2个通道，mode则假定为LA。

如果输入有1个通道时，mode是由数据类型确定（即int，float， short）。

torchvision.transforms.ToTensor
# Convert a PIL Image or numpy.ndarray to tensor.

https://blog.csdn.net/sinat_42239797/article/details/93857364

transforms.resize

上一篇：基于Python和OpenCV的人脸检测与警告系统实现
下一篇：利用稳定扩散快速修复图像（利用稳定扩散快速修复图像的软件）

Day155:torchvision.transforms（pytorch和torchvision版本对应关系）

torchvision

torchvision.transformas

pytorch的图像变换模块

Transforms on PIL Image

Transforms on torch.Tensor

Conversion Transforms

相关推荐

取消回复欢迎你发表评论:

MySQL中这14个小玩意，让人眼前一亮!

旗舰机新标杆 OPPO Find X2系列正式发布售价5499元起

【VueTorrent】一款吊炸天的qBittorrent主题，人人都可用

面试官:使用int类型做加减操作，是线程安全吗

C++编程知识:ToString()字符串转换你用正确了吗?

【Spring Boot】WebSocket 的 6 种集成方式

PyTorch 深度学习实战(26):多目标强化学习Multi-Objective RL

pytorch中的 scatter_()函数使用和详解

与 Java 17 相比，Java 21 究竟有多快?

基于TensorRT_LLM的大模型推理加速与OpenAI兼容服务优化

Day155:torchvision.transforms（pytorch和torchvision版本对应关系）

torchvision

torchvision.transformas

pytorch的图像变换模块

Transforms on PIL Image

Transforms on torch.Tensor

Conversion Transforms

相关推荐

取消回复欢迎 你 发表评论:

MySQL中这14个小玩意，让人眼前一亮!

旗舰机新标杆 OPPO Find X2系列正式发布 售价5499元起

【VueTorrent】一款吊炸天的qBittorrent主题，人人都可用

面试官:使用int类型做加减操作，是线程安全吗

C++编程知识:ToString()字符串转换你用正确了吗?

【Spring Boot】WebSocket 的 6 种集成方式

PyTorch 深度学习实战(26):多目标强化学习Multi-Objective RL

pytorch中的 scatter_()函数使用和详解

与 Java 17 相比，Java 21 究竟有多快?

基于TensorRT_LLM的大模型推理加速与OpenAI兼容服务优化

取消回复欢迎你发表评论:

旗舰机新标杆 OPPO Find X2系列正式发布售价5499元起