微信扫一扫打赏支持

《python深度学习》笔记---5.4-1、卷积网络可视化-可视化中间激活

《python深度学习》笔记---5.4-1、卷积网络可视化-可视化中间激活

一、总结

一句话总结:

【更高的层激活包含关于特定输入的信息越来越少,而关于目标的信息越来越多】:随着层数的加深,层所提取的特征变得越来越抽象。更高的层激活包含关于特定输入的信息越来越少,而关于目标的 信息越来越多(本例中即图像的类别:猫或狗)
【信息蒸馏管道】:深度神经网络可以有效地作为信息蒸馏管道 (information distillation pipeline),输入原始数据(本例中是RGB 图像),反复对其进行变换, 将无关信息过滤掉(比如图像的具体外观),并放大和细化有用的信息(比如图像的类别)。

 

 

1、卷积网络绝不是黑盒?

【卷积网络绝不是黑盒】:人们常说,深度学习模型是“黑盒”,即模型学到的表示很难用人类可以理解的方式来提取 和呈现。虽然对于某些类型的深度学习模型来说,这种说法部分正确,但对卷积神经网络来说 绝对不是这样。
【视觉概念的表示】:卷积神经网络学到的表示非常适合可视化,很大程度上是因为它们是视觉概念 的表示。

 

 

2、三种比较常见的卷积神经网络可视化?

可视化卷积神经网络的中间输出(中间激活):有助于理解卷积神经网络连续的层如何 对输入进行变换,也有助于初步了解卷积神经网络每个过滤器的含义。
可视化卷积神经网络的过滤器:有助于精确理解卷积神经网络中每个过滤器容易接受的视觉模式或视觉概念。
可视化图像中类激活的热力图:有助于理解图像的哪个部分被识别为属于某个类别,从而可以定位图像中的物体。

 

 

3、层的激活 是什么意思?

【层的输出】:层的输出通常被称为该层的激活,即激活函数的输出

 

 

4、可视化中间激活?

【展示各个特征图】:可视化中间激活,是指对于给定输入,展示网络中各个卷积层和池化层输出的特征图(层的输出通常被称为该层的激活,即激活函数的输出)。
【输入如何被分解为过滤器】:这让我们可以看到输入如何被分解为网络 学到的不同过滤器。我们希望在三个维度对特征图进行可视化:宽度、高度和深度(通道)。每 个通道都对应相对独立的特征,所以将这些特征图可视化的正确方法是将每个通道的内容分别 绘制成二维图像。

 

 

5、Model类和Sequential模型输出方面的不同?

【多个输出】:Model 类允许 模型有多个输出,这一点与 Sequential 模型不同。

 

 

6、可视化卷积神经网络的中间输出 图像说明?

【第一层是各种边缘探测器的集合】:在这一阶段,激活几乎保留了原始图像中的所有信息。
【随着层数的加深,激活变得越来越抽象】:并且越来越难以直观地理解。它们开始表示更 高层次的概念,比如“猫耳朵”和“猫眼睛”。层数越深,其表示中关于图像视觉内容 的信息就越少,而关于类别的信息就越多。
【激活的稀疏度(sparsity)随着层数的加深而增大】:在第一层里,所有过滤器都被输入图像激活,但在后面的层里,越来越多的过滤器是空白的。也就是说,输入图像中找不到 这些过滤器所编码的模式。

 

 

7、为什么 激活的稀疏度(sparsity)随着层数的加深而增大?

【输入图像中找不到这些过滤器所编码的模式】:激活的稀疏度(sparsity)随着层数的加深而增大。在第一层里,所有过滤器都被输入图像激活,但在后面的层里,越来越多的过滤器是空白的。也就是说,输入图像中找不到 这些过滤器所编码的模式。

 

 

8、深度网络感受物体与人感受物体的共同点?

【更高的层激活包含关于特定输入的信息越来越少,而关于目标的信息越来越多】:我们刚刚揭示了深度神经网络学到的表示的一个重要普遍特征:随着层数的加深,层所 提取的特征变得越来越抽象。更高的层激活包含关于特定输入的信息越来越少,而关于目标的 信息越来越多(本例中即图像的类别:猫或狗)
【信息蒸馏管道】:深度神经网络可以有效地作为信息蒸馏管道 (information distillation pipeline),输入原始数据(本例中是RGB 图像),反复对其进行变换, 将无关信息过滤掉(比如图像的具体外观),并放大和细化有用的信息(比如图像的类别)。
【记住其中有那些抽象物体(比如自行车、树),但记不住这些物体的具体外观】:这与人类和动物感知世界的方式类似:人类观察一个场景几秒钟后,可以记住其中有哪些 抽象物体(比如自行车、树),但记不住这些物体的具体外观。事实上,如果你试着凭记忆 画一辆普通自行车,那么很可能完全画不出真实的样子,虽然你一生中见过上千辆自行车
【你的大脑会将输入抽象化】:你可以现在就试着画一下,这个说法绝对是真实的。你的大脑已经学会将视觉输入完 全抽象化,即将其转换为更高层次的视觉概念,同时过滤掉不相关的视觉细节,这使得大脑很 难记住周围事物的外观。

 

 

 

 

二、5.4-1、卷积网络可视化-可视化中间激活

博客对应课程的视频位置:

import pandas as pd
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
In [3]:
from tensorflow.keras.models import load_model 
model = load_model('../5.2_dogs-vs-cats/cats_and_dogs_small_4.h5') 
model.summary()  # 作为提醒 
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 148, 148, 32)      896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 74, 74, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 72, 72, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 36, 36, 64)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 34, 34, 128)       73856     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 17, 17, 128)       0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 15, 15, 128)       147584    
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 7, 7, 128)         0         
_________________________________________________________________
flatten (Flatten)            (None, 6272)              0         
_________________________________________________________________
dropout (Dropout)            (None, 6272)              0         
_________________________________________________________________
dense (Dense)                (None, 512)               3211776   
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 513       
=================================================================
Total params: 3,453,121
Trainable params: 3,453,121
Non-trainable params: 0
_________________________________________________________________

预处理单张图像

In [5]:
img_path = 'E:\\78_recorded_lesson\\001_course_github\\AI_dataSet\\dogs-vs-cats\\cats_and_dogs_small\\test\\cats\\cat.1700.jpg'

# 将图像预处理为一个 4D 张量
from tensorflow.keras.preprocessing import image 
import numpy as np 

img = image.load_img(img_path, target_size=(150, 150))  
img_tensor = image.img_to_array(img) 
img_tensor = np.expand_dims(img_tensor, axis=0)  
# 请记住,训练模型的输入数据 都用这种方法预处理
img_tensor /= 255. 

# 其形状为 (1, 150, 150, 3) 
print(img_tensor.shape) 
(1, 150, 150, 3)
In [7]:
plt.imshow(img)
plt.show()

用一个输入张量和一个输出张量列表将模型实例化

In [8]:
from tensorflow.keras import models 

# 提取前 8 层的输出
layer_outputs = [layer.output for layer in model.layers[:8]] 
# 创建一个模型,给定模型输入, 可以返回这些输出
activation_model = models.Model(inputs=model.input, outputs=layer_outputs) 

以预测模式运行模型

In [11]:
# 返回8个Numpy数组组成的列表, 每个层激活对应一个 Numpy 数组
activations = activation_model.predict(img_tensor) 
# print(activations.shape)
In [12]:
# 对于输入的猫图像,第一个卷积层的激活如下所示。
first_layer_activation = activations[0] 
print(first_layer_activation.shape) 
(1, 148, 148, 32)

将第 4 个通道可视化

In [13]:
plt.matshow(first_layer_activation[0, :, :, 4], cmap='viridis')
Out[13]:
<matplotlib.image.AxesImage at 0x24c91a3f908>

这个通道似乎是对角边缘检测器。

我们再看一下第 7 个通道。但请注意,你的 通道可能与此不同,因为卷积层学到的过滤器并不是确定的。

将第 7 个通道可视化

In [17]:
plt.matshow(first_layer_activation[0, :, :, 7], cmap='viridis')
Out[17]:
<matplotlib.image.AxesImage at 0x24c82b95b88>

将每个中间激活的所有通道可视化

In [16]:
# 层的名称,这样你可以将这些名称画到图中
# These are the names of the layers, so can have them as part of our plot
layer_names = []
for layer in model.layers[:8]:
    layer_names.append(layer.name)

images_per_row = 16

# 显示特征图
# Now let's display our feature maps
for layer_name, layer_activation in zip(layer_names, activations):
    # 特征图中的特征个数
    # This is the number of features in the feature map
    n_features = layer_activation.shape[-1]

    # The feature map has shape (1, size, size, n_features)
    size = layer_activation.shape[1]

    # 在这个矩阵中将激活通道平铺
    # We will tile the activation channels in this matrix
    n_cols = n_features // images_per_row
    display_grid = np.zeros((size * n_cols, images_per_row * size))

    # 将每个过滤器平铺到 一个大的水平网格中
    # We'll tile each filter into this big horizontal grid
    for col in range(n_cols):
        for row in range(images_per_row):
            channel_image = layer_activation[0,
                                             :, :,
                                             col * images_per_row + row]
            
            # 对特征进行后处理,使其看起来更美观
            # Post-process the feature to make it visually palatable
            channel_image -= channel_image.mean()
            channel_image /= channel_image.std()
            channel_image *= 64
            channel_image += 128
            channel_image = np.clip(channel_image, 0, 255).astype('uint8')
            display_grid[col * size : (col + 1) * size,
                         row * size : (row + 1) * size] = channel_image

    # 显示网格
    # Display the grid
    scale = 1. / size
    plt.figure(figsize=(scale * display_grid.shape[1],
                        scale * display_grid.shape[0]))
    plt.title(layer_name)
    plt.grid(False)
    plt.imshow(display_grid, aspect='auto', cmap='viridis')
    
plt.show()
D:\software\coding\python_software\Anaconda3\lib\site-packages\ipykernel_launcher.py:35: RuntimeWarning: invalid value encountered in true_divide

A few remarkable things to note here:

  • The first layer acts as a collection of various edge detectors. At that stage, the activations are still retaining almost all of the information present in the initial picture.
  • As we go higher-up, the activations become increasingly abstract and less visually interpretable. They start encoding higher-level concepts such as "cat ear" or "cat eye". Higher-up presentations carry increasingly less information about the visual contents of the image, and increasingly more information related to the class of the image.
  • The sparsity of the activations is increasing with the depth of the layer: in the first layer, all filters are activated by the input image, but in the following layers more and more filters are blank. This means that the pattern encoded by the filter isn't found in the input image.

We have just evidenced a very important universal characteristic of the representations learned by deep neural networks: the features extracted by a layer get increasingly abstract with the depth of the layer. The activations of layers higher-up carry less and less information about the specific input being seen, and more and more information about the target (in our case, the class of the image: cat or dog). A deep neural network effectively acts as an information distillation pipeline, with raw data going in (in our case, RBG pictures), and getting repeatedly transformed so that irrelevant information gets filtered out (e.g. the specific visual appearance of the image) while useful information get magnified and refined (e.g. the class of the image).

This is analogous to the way humans and animals perceive the world: after observing a scene for a few seconds, a human can remember which abstract objects were present in it (e.g. bicycle, tree) but could not remember the specific appearance of these objects. In fact, if you tried to draw a generic bicycle from mind right now, chances are you could not get it even remotely right, even though you have seen thousands of bicycles in your lifetime. Try it right now: this effect is absolutely real. You brain has learned to completely abstract its visual input, to transform it into high-level visual concepts while completely filtering out irrelevant visual details, making it tremendously difficult to remember how things around us actually look.

In [ ]:
 

 

 
posted @ 2020-10-12 15:14  范仁义  阅读(323)  评论(0)    收藏  举报