AMD GPU上对比语言图像预训练（CLIP）模型的交互（下）

3. 步骤3：检查图像和文本

将COCO数据集中的8幅样本图像及其文本描述输入模型，并比较相应特征之间的相似性。

import os

import matplotlib.pyplot as plt

from PIL import Image

# 使用COCO数据集中的图像及其文本描述

image_urls = [

"*/6/8378612_34ab6787ae_z.jpg",

"*/8456/8033451486_aa38ee006c_z.jpg",

"*/8344/8221561363_a6042ba9e0_z.jpg",

"*/4147/5210232105_b22d909ab7_z.jpg",

"*/3098/2852057907_29f1f35ff7_z.jpg",

"*/3324/3289158186_155a301760_z.jpg",

"*/3718/9148767840_a30c2c7dcb_z.jpg",

"*/8030/7989105762_4ef9e7a03c_z.jpg"[1] [2]

]

text_descriptions = [

“一只猫站在木地板上”，

“跑道上的飞机”，

“一辆停在树旁的白色卡车”，

“一头大象站在动物园里”，

“窗边桌子上的笔记本电脑”，

“一只长颈鹿站在泥地里”，

“一辆公共汽车停在公共汽车站”，

“市场上有两束香蕉”

]

将八幅图像与其各自的文本描述一起显示。

import requests

from io import BytesIO

images_for_display=[]

images=[]

# 创建新图形

plt.figure(figsize=(12, 6))

size = (400, 320)

# 循环浏览每个URL，并在子图中绘制图像

for i, url1 in enumerate(image_urls):

# # 从URL获取图像

response = requests.get(url1)

image = Image.open(BytesIO(response.content))

image = image.resize(size)

# 添加子地块subplot（2行，4列，索引i+1）

plt.subplot(2, 4, i + 1)

# 绘图

plt.imshow(image)

plt.axis('off') # 关闭轴标签

# 添加标题（可选）

plt.title(f'{text_descriptions[i]}')

images_for_display.append(image)

images.append(preprocess(image))

# 调整布局以防止重叠

plt.tight_layout()

# 显示图

plt.show()

将COCO数据集中的8幅样本图像及其文本，特征显示，如图3-2所示。

图3-2 将COCO数据集中的8幅样本图像及其文本，特征显示

4. 步骤4：生成特征

接下来，准备图像和文本输入，并继续执行模型的前向传递。此步骤的结果是提取相应的图像和文本特征。

image_inputs = torch.tensor(np.stack(images)).cuda()

text_tokens = clip.tokenize(["It is " + text_descriptions中的文本]).cuda()

with torch.no_grad():

image_features = model.encode_image(image_inputs).float()

text_features = model.encode_text(text_tokens).float()

5. 步骤5：计算文本和图像之间的相似性得分

对特征进行归一化，并计算每对的点积。

image_features /= image_features.norm(dim=-1, keepdim=True)

text_features /= text_features.norm(dim=-1, keepdim=True)

similarity_score = text_features.cpu().numpy() @ image_features.cpu().numpy().T

6. 步骤6：可视化文本和图像之间的相似性

def plot_similarity(text_descriptions, similarity_score, images_for_display):

count = len(text_descriptions)

fig, ax = plt.subplots(figsize=(18, 15))

im = ax.imshow(similarity_score, cmap=plt.cm.YlOrRd)

plt.colorbar(im, ax=ax)

# y轴刻度：文本描述

ax.set_yticks(np.arange(count))

ax.set_yticklabels(text_descriptions, fontsize=12)

ax.set_xticklabels([])

ax.xaxis.set_visible(False)

for i, image in enumerate(images_for_display):

ax.imshow(image, extent=(i - 0.5, i + 0.5, -1.6, -0.6), origin="lower")

for x in range(similarity_score.shape[1]):

for y in range(similarity_score.shape[0]):

ax.text(x, y, f"{similarity_score[y, x]:.2f}", ha="center", va="center", size=10)

ax.spines[["left", "top", "right", "bottom"]].set_visible(False)

# 设置x和y轴的限制

ax.set_xlim([-0.5, count - 0.5])

ax.set_ylim([count + 0.5, -2])

# 为布局添加标题

ax.set_title("用CLIP计算文本和图像相似性得分", size=14)

plt.show()

plot_similarity(text_descriptions, similarity_score, images_for_display)

使用CLIP计算文本和图像相似性得分，如图3-3所示。

图3-3 使用CLIP计算文本和图像相似性得分

posted @ 2025-03-30 04:17 吴建明wujianming 阅读(23) 评论(0) 收藏举报

刷新页面返回顶部

吴建明

微信视频号：sph0RgSyDYV47z6 快手号：4874645212 抖音号：dy0so323fq2w 小红书号：95619019828 B站1：UID:3546863642871878 B站2：UID: 3546955410049087

AMD GPU上对比语言图像预训练（CLIP）模型的交互（下）

公告