TensorFlow-2-0-快速入门指南-全-

TensorFlow 2.0 快速入门指南（全）

原文：TensorFlow 2.0 Quick Start Guide

协议：CC BY-NC-SA 4.0

零、前言

TensorFlow 是 Python 中最受欢迎的机器学习框架之一。通过这本书，您将提高对 TensorFlow 最新功能的了解，并能够使用 Python 执行监督和无监督的机器学习。

这本书是给谁的

顾名思义，本书旨在向读者介绍 TensorFlow 及其最新功能，包括 2.0.0 版以内的 Alpha 版本，包括急切的执行tf.data，tf.keras，TensorFlow Hub，机器学习，和神经网络应用。

本书适用于对机器学习及其应用有所了解的任何人：数据科学家，机器学习工程师，计算机科学家，计算机科学专业的学生和业余爱好者。

本书涵盖的内容

第 1 章，“TensorFlow 2 简介”，通过查看一些代码片段（说明一些基本操作）来介绍 TensorFlow。我们将概述现代 TensorFlow 生态系统，并了解如何安装 TensorFlow。

第 2 章，“Keras，TensorFlow 2 的高级 API”，介绍了 Keras API，包括一些一般性的评论和见解，其后以四种不同的形式表示了基本架构 MNIST 数据集训练的方法。

第 3 章，“TensorFlow 2 和 ANN 技术”，探讨了许多支持创建和使用神经网络的技术。本章将介绍到 ANN 的数据表示，ANN 的层，创建模型，梯度下降算法的梯度计算，损失函数以及保存和恢复模型。

第 4 章，“TensorFlow 2 和监督机器学习”，描述了在涉及线性回归的两种情况下使用 TensorFlow 的示例，在这些情况下，特征映射到具有连续值的已知标签，从而可以进行预测看不见的特征。

第 5 章，“TensorFlow 2 和无监督学习”着眼于自编码器在无监督学习中的两种应用：首先用于压缩数据；其次用于压缩数据。第二，用于降噪，换句话说，去除图像中的噪声。

第 6 章，“使用 TensorFlow 2 识别图像”，首先查看 Google Quick Draw 1 图像数据集，其次查看 CIFAR 10 图像数据集。

第 7 章，“TensorFlow 2 和神经风格迁移”，说明如何拍摄内容图像和风格图像，然后生成混合图像。我们将使用经过训练的 VGG19 模型中的层来完成此任务。

第 8 章，“TensorFlow 2 和循环神经网络”首先讨论了 RNN 的一般原理，然后介绍了如何获取和准备一些文本以供模型使用。

第 9 章， “TensorFlow 估计器和 TensorFlow Hub”首先介绍了用于训练时装数据集的估计器。我们将看到估计器如何为 TensorFlow 提供简单直观的 API。我们还将研究用于分析电影反馈数据库 IMDb 的神经网络。

附录，“从 tf1.12 转换为 tf2”包含一些将 tf1.12 文件转换为 tf2 的技巧。

充分利用这本书

假定熟悉 Python 3.6，并且熟悉 Jupyter 笔记本的使用。

本书的编写是假定读者比以文本形式出现的冗长文本解释更高兴以代码段和完整程序的形式给出的解释，当然，后者以不同的风格出现在本书中。

强烈建议您对机器学习的概念和技术有所了解，但是如果读者愿意对这些主题进行一些阅读，则这不是绝对必要的。

使用约定

本书中使用了许多文本约定。

CodeInText：指示文本，数据库表名称，文件夹名称，文件名，文件扩展名，路径名，虚拟 URL，用户输入和 Twitter 句柄中的代码字。这是一个示例：“将下载的WebStorm-10*.dmg磁盘映像文件安装为系统中的另一个磁盘。”

代码块设置如下：

image1 = tf.zeros([7, 28, 28, 3]) #  example-within-batch by height by width by color

当我们希望引起您对代码块特定部分的注意时，相关的行或项目以粗体显示：

r1 = tf.reshape(t2,[2,6]) # 2 rows 6 cols
r2 = tf.reshape(t2,[1,12]) # 1 rows 12 cols
r1
# <tf.Tensor: id=33, shape=(2, 6), dtype=float32, 
numpy= array([[ 0., 1., 2., 3., 4., 5.], [ 6., 7., 8., 9., 10., 11.]], dtype=float32)>

任何命令行输入或输出的编写方式如下：

var = tf.Variable([3, 3])

粗体：表示新术语，重要单词或您在屏幕上看到的单词。例如，菜单或对话框中的单词会出现在这样的文本中。这是一个示例：“从管理面板中选择系统信息。”

警告或重要提示如下所示。

提示和技巧如下所示。

一、TensorFlow 2 简介

TensorFlow 于 2011 年以 Google 的内部封闭源代码项目 DisBelief 诞生。 DisBelief 是采用深度学习神经网络的机器学习系统。该系统演变为 TensorFlow，并在 2015 年 11 月 9 日根据 Apache 2.0 开源许可证发布到开发人员社区。版本 1.0.0 于 2017 年 2 月 11 日出现。此后有许多版本发布。合并了许多新功能。

在撰写本书时，最新版本是 TensorFlow 2.0.0 alpha 版本，该版本在 2019 年 3 月 6 日的 TensorFlow 开发峰会上宣布。

TensorFlow 的名字来源于张量。张量是向量和矩阵到更高维度的一般化。张量的等级是唯一指定该张量的每个元素所用的索引数。标量（简单数字）是等级 0 的张量，向量是等级 1 的张量，矩阵是等级 2 的张量，三维数组是等级 3 的张量。张量具有数据类型和形状（张量中的所有数据项必须具有相同的类型）。 4 维张量的示例（即等级 4）是图像，其中维是例如batch，height，width和color通道内的示例：

image1 = tf.zeros([7, 28, 28, 3]) #  example-within-batch by height by width by color

尽管 TensorFlow 通常可以用于许多数值计算领域，尤其是机器学习，但其主要研究和开发领域是深层神经网络（DNN）的应用，它已在语音和声音识别等不同领域使用，例如，在如今广泛使用的声控助手中；基于文本的应用，例如语言翻译器；图像识别，例如系外行星搜寻，癌症检测和诊断；以及时间序列应用（例如推荐系统）。

在本章中，我们将讨论以下内容：

现代 TensorFlow 生态系统
安装 TensorFlow
急切操作
提供有用的 TensorFlow 操作

现代 TensorFlow 生态系统

让我们讨论急切执行。 TensorFlow 的第一个化身包括构造由操作和张量组成的计算图，随后必须在 Google 所谓的会话中对其进行评估（这称为声明性编程）。这仍然是编写 TensorFlow 程序的常用方法。但是，急切执行的功能（以研究形式从版本 1.5 开始可用，并从版本 1.7 被烘焙到 TensorFlow 中）需要立即评估操作，结果是可以将张量像 NumPy 数组一样对待（这被称为命令式编程）。

谷歌表示，急切执行是研究和开发的首选方法，但计算图对于服务 TensorFlow 生产应用将是首选。

tf.data是一种 API，可让您从更简单，可重复使用的部件中构建复杂的数据输入管道。最高级别的抽象是Dataset，它既包含张量的嵌套结构元素，又包含作用于这些元素的转换计划。有以下几种类：

Dataset包含来自至少一个二进制文件（FixedLengthRecordDataset）的固定长度记录集
Dataset由至少一个 TFRecord 文件（TFRecordDataset）中的记录组成
Dataset由记录组成，这些记录是至少一个文本文件（TFRecordDataset）中的行
还有一个类表示通过Dataset（tf.data.Iterator）进行迭代的状态

让我们继续进行估计器，这是一个高级 API，可让您构建大大简化的机器学习程序。估计员负责训练，评估，预测和导出服务。

TensorFlow.js 是 API 的集合，可让您使用底层 JavaScript 线性代数库或高层 API 来构建和训练模型。因此，可以训练模型并在浏览器中运行它们。

TensorFlow Lite 是适用于移动和嵌入式设备的 TensorFlow 的轻量级版本。它由运行时解释器和一组工具组成。这个想法是您在功率更高的机器上训练模型，然后使用工具将模型转换为.tflite格式。然后将模型加载到您选择的设备中。在撰写本文时，使用 C++ API 在 Android 和 iOS 上支持 TensorFlow Lite，并且具有适用于 Android 的 Java 包装器。如果 Android 设备支持 Android 神经网络（ANN）API 进行硬件加速，则解释器将使用此 API，否则它将默认使用 CPU 执行。

TensorFlow Hub 是一个旨在促进机器学习模型的可重用模块的发布，发现和使用的库。在这种情况下，模块是 TensorFlow 图的独立部分，包括其权重和其他资产。该模块可以通过称为迁移学习的方法在不同任务中重用。这个想法是您在大型数据集上训练模型，然后将适当的模块重新用于您的其他但相关的任务。这种方法具有许多优点-您可以使用较小的数据集训练模型，可以提高泛化能力，并且可以大大加快训练速度。

例如，ImageNet 数据集以及许多不同的神经网络架构（例如inception_v3）已非常成功地用于解决许多其他图像处理训练问题。

TensorFlow Extended（TFX）是基于 TensorFlow 的通用机器学习平台。迄今为止，已开源的库包括 TensorFlow 转换，TensorFlow 模型分析和 TensorFlow 服务。

tf.keras是用 Python 编写的高级神经网络 API，可与 TensorFlow（和其他各种张量工具）接口。 tf.k eras支持快速原型设计，并且用户友好，模块化且可扩展。它支持卷积和循环网络，并将在 CPU 和 GPU 上运行。 Keras 是 TensorFlow 2 中开发的首选 API。

TensorBoard 是一套可视化工具，支持对 TensorFlow 程序的理解，调试和优化。它与急切和图执行环境兼容。您可以在训练期间使用 TensorBoard 可视化模型的各种指标。

TensorFlow 的一项最新开发（在撰写本文时仍处于实验形式）将 TensorFlow 直接集成到 Swift 编程语言中。 Swift 中的 TensorFlow 应用是使用命令性代码编写的，即命令急切地（在运行时）执行的代码。 Swift 编译器会自动将此源代码转换为一个 TensorFlow 图，然后在 CPU，GPU 和 TPU 上以 TensorFlow Sessions 的全部性能执行此编译后的代码。

在本书中，我们将重点介绍那些使用 Python 3.6 和 TensorFlow 2.0.0 alpha 版本启动和运行 TensorFlow 的 TensorFlow 工具。特别是，我们将使用急切的执行而不是计算图，并且将尽可能利用tf.keras的功能来构建网络，因为这是研究和实验的现代方法。

安装 TensorFlow

TensorFlow 的最佳编程支持是为 Python 提供的（尽管确实存在 Java，C 和 Go 的库，而其他语言的库正在积极开发中）。

Web 上有大量信息可用于为 Python 安装 TensorFlow。

Google 也建议在虚拟环境中安装 TensorFlow，这是一种标准做法，该环境将一组 API 和代码与其他 API 和代码以及系统范围的环境隔离开来。

TensorFlow 有两种不同的版本-一个用于在 CPU 上执行，另一个用于在 GPU 上执行。最后，这需要安装数值库 CUDA 和 CuDNN。 Tensorflow 将在可能的情况下默认执行 GPU。参见这里。

与其尝试重新发明轮子，不如跟随资源来创建虚拟环境和安装 TensorFlow。

总而言之，可能会为 Windows 7 或更高版本，Ubuntu Linux 16.04 或更高版本以及 macOS 10.12.6 或更高版本安装 TensorFlow。

有关虚拟环境的完整介绍，请参见这里。

Google 的官方文档中提供了有关安装 TensorFlow 所需的所有方面的非常详细的信息。

安装后，您可以从命令终端检查 TensorFlow 的安装。这个页面有执行此操作，以及安装 TensorFlow 的夜间版本（其中包含所有最新更新）的说明。

急切的操作

我们将首先介绍如何导入 TensorFlow，然后介绍 TensorFlow 编码风格，以及如何进行一些基本的整理工作。之后，我们将看一些基本的 TensorFlow 操作。您可以为这些代码片段创建 Jupyter 笔记本，也可以使用自己喜欢的 IDE 创建源代码。该代码在 GitHub 存储库中都可用。

导入 TensorFlow

导入 TensorFlow 很简单。请注意几个系统检查：

import tensorflow as tf
print("TensorFlow version: {}".format(tf.__version__))
print("Eager execution is: {}".format(tf.executing_eagerly()))
print("Keras version: {}".format(tf.keras.__version__))

TensorFlow 的编码风格约定

对于 Python 应用，Google 遵守 PEP8 标准约定。特别是，他们将 CamelCase 用于类（例如hub.LatestModuleExporter），将snake_case用于函数，方法和属性（例如tf.math.squared_difference）。 Google 还遵守《Google Python 风格指南》，该指南可在这个页面中找到。

使用急切执行

急切执行是 TensorFlow 2 中的默认设置，因此不需要特殊设置。

以下代码可用于查找是否正在使用 CPU 或 GPU，如果它是 GPU，则该 GPU 是否为#0。

我们建议键入代码，而不要使用复制和粘贴。这样，您将对以下命令有所了解：

var = tf.Variable([3, 3])

if tf.test.is_gpu_available(): 
 print('Running on GPU')
 print('GPU #0?')
 print(var.device.endswith('GPU:0'))
else: 
 print('Running on CPU')

声明急切变量

声明 TensorFlow 急切变量的方法如下：

t0 = 24 # python variable
t1 = tf.Variable(42) # rank 0 tensor
t2 = tf.Variable([ [ [0., 1., 2.], [3., 4., 5.] ], [ [6., 7., 8.], [9., 10., 11.] ] ]) #rank 3 tensor
t0, t1, t2

输出将如下所示：

(24,
 <tf.Variable 'Variable:0' shape=() dtype=int32, numpy=42>,
 <tf.Variable 'Variable:0' shape=(2, 2, 3) dtype=float32, numpy=
 array([[[ 0.,  1.,  2.],
         [ 3.,  4.,  5.]],
         [[ 6.,  7.,  8.],
         [ 9., 10., 11.]]], dtype=float32)>)

TensorFlow 将推断数据类型，对于浮点数默认为tf.float32，对于整数默认为tf.int32（请参见前面的示例）。

或者，可以显式指定数据类型，如下所示：

f64 = tf.Variable(89, dtype = tf.float64)
f64.dtype

TensorFlow 具有大量的内置数据类型。

示例包括之前看到的示例tf.int16，tf.complex64和tf.string。参见这里。要重新分配变量，请使用var.assign()，如下所示：

f1 = tf.Variable(89.)
f1

# <tf.Variable 'Variable:0' shape=() dtype=float32, numpy=89.0>

f1.assign(98.)
f1

# <tf.Variable 'Variable:0' shape=() dtype=float32, numpy=98.0>

声明 TensorFlow 常量

TensorFlow 常量可以在以下示例中声明：

m_o_l = tf.constant(42)

m_o_l

# <tf.Tensor: id=45, shape=(), dtype=int32, numpy=42>

m_o_l.numpy()

# 42

同样，TensorFlow 将推断数据类型，或者可以像使用变量那样显式指定它：

unit = tf.constant(1, dtype = tf.int64)

unit

# <tf.Tensor: id=48, shape=(), dtype=int64, numpy=1>

调整张量

张量的形状通过属性（而不是函数）访问：

t2 = tf.Variable([ [ [0., 1., 2.], [3., 4., 5.] ], [ [6., 7., 8.], [9., 10., 11.] ] ]) # tensor variable
print(t2.shape)

输出将如下所示：

(2, 2, 3)

张量可能会被重塑并保留相同的值，这是构建神经网络经常需要的。

这是一个示例：

r1 = tf.reshape(t2,[2,6]) # 2 rows 6 cols
r2 = tf.reshape(t2,[1,12]) # 1 rows 12 cols
r1
# <tf.Tensor: id=33, shape=(2, 6), dtype=float32, 
numpy= array([[ 0., 1., 2., 3., 4., 5.], [ 6., 7., 8., 9., 10., 11.]], dtype=float32)>

这是另一个示例：

r2 = tf.reshape(t2,[1,12]) # 1 row 12 columns
r2
# <tf.Tensor: id=36, shape=(1, 12), dtype=float32, 
numpy= array([[ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11.]], dtype=float32)>

张量的等级（尺寸）

张量的等级是它具有的维数，即指定该张量的任何特定元素所需的索引数。

张量的等级可以这样确定，例如：

tf.rank(t2)

输出将如下所示：

<tf.Tensor: id=53, shape=(), dtype=int32, numpy=3>
(the shape is () because the output here is a scalar value)

指定张量的元素

正如您期望的那样，通过指定所需的索引来指定张量的元素。

以这个为例：

t3 = t2[1, 0, 2] # slice 1, row 0, column 2
t3

输出将如下所示：

<tf.Tensor: id=75, shape=(), dtype=float32, numpy=8.0>

将张量转换为 NumPy/Python 变量

如果需要，可以将张量转换为numpy变量，如下所示：

print(t2.numpy())

输出将如下所示：

[[[ 0\. 1\. 2.] [ 3\. 4\. 5.]] [[ 6\. 7\. 8.] [ 9\. 10\. 11.]]]

也可以这样：

print(t2[1, 0, 2].numpy())

输出将如下所示：

8.0

查找张量的大小（元素数）

张量中的元素数量很容易获得。再次注意，使用.numpy()函数从张量中提取 Python 值：

s =  tf.size(input=t2).numpy()
s

输出将如下所示：

查找张量的数据类型

TensorFlow 支持您期望的所有数据类型。完整列表位于这里，其中包括tf.int32（默认整数类型），tf.float32（默认浮动点类型）和tf.complex64（复数类型）。

要查找张量的数据类型，请使用以下dtype属性：

t3.dtype

输出将如下所示：

tf.float32

指定按元素的基本张量操作

如您所料，使用重载运算符+，-，*和/来指定逐元素基本张量操作，如下所示：

t2*t2

输出将如下所示：

<tf.Tensor: id=555332, shape=(2, 2, 3), dtype=float32, numpy= array([[[ 0., 1., 4.], [ 9., 16., 25.]], [[ 36., 49., 64.], [ 81., 100., 121.]]], dtype=float32)>

广播

按元素张量操作以与 NumPy 数组相同的方式支持广播。最简单的示例是将张量乘以标量：

t4 = t2*4
print(t4)

输出将如下所示：

tf.Tensor( [[[ 0\. 4\. 8.] [12\. 16\. 20.]] [[24\. 28\. 32.] [36\. 40\. 44.]]], shape=(2, 2, 3), dtype=float32)

在该示例中，在概念上至少将标量乘法器 4 扩展为一个数组，该数组可以与t2逐元素相乘。在上对广播进行了非常详细的讨论，网址为。

转置 TensorFlow 和矩阵乘法

要紧急转置矩阵和矩阵乘法，请使用以下命令：

u = tf.constant([[3,4,3]]) 
v = tf.constant([[1,2,1]])
tf.matmul(u, tf.transpose(a=v))

输出将如下所示：

<tf.Tensor: id=555345, shape=(1, 1), dtype=int32, numpy=array([[14]], dtype=int32)>

再次注意，默认整数类型为tf.int32，默认浮点类型为tf.float32。

可用于构成计算图一部分的张量的所有操作也可用于急切执行变量。

在这个页面上有这些操作的完整列表。

将张量转换为另一个（张量）数据类型

一种类型的 TensorFlow 变量可以强制转换为另一种类型。可以在这个页面中找到更多详细信息。

请看以下示例：

i = tf.cast(t1, dtype=tf.int32) # 42
i

输出将如下所示：

<tf.Tensor: id=116, shape=(), dtype=int32, numpy=42>

截断后，将如下所示：

j = tf.cast(tf.constant(4.9), dtype=tf.int32) # 4
j

输出将如下所示：

<tf.Tensor: id=119, shape=(), dtype=int32, numpy=4>

声明参差不齐的张量

参差不齐的张量是具有一个或多个参差不齐尺寸的张量。参差不齐的尺寸是具有可能具有不同长度的切片的尺寸。

声明参差不齐的数组的方法有很多种，最简单的方法是常量参差不齐的数组。

以下示例显示了如何声明一个常数的，参差不齐的数组以及各个切片的长度：

ragged =tf.ragged.constant([[5, 2, 6, 1], [], [4, 10, 7], [8], [6,7]])

print(ragged)
print(ragged[0,:])
print(ragged[1,:])
print(ragged[2,:])
print(ragged[3,:])
print(ragged[4,:])

输出如下：

<tf.RaggedTensor [[5, 2, 6, 1], [], [4, 10, 7], [8], [6, 7]]>
tf.Tensor([5 2 6 1], shape=(4,), dtype=int32)
tf.Tensor([], shape=(0,), dtype=int32)
tf.Tensor([ 4 10  7], shape=(3,), dtype=int32)
tf.Tensor([8], shape=(1,), dtype=int32)
tf.Tensor([6 7], shape=(2,), dtype=int32)

注意单个切片的形状。

创建参差不齐的数组的常用方法是使用tf.RaggedTensor.from_row_splits()方法，该方法具有以下签名：

@classmethod
from_row_splits(
    cls,
    values,
    row_splits,
    name=None
)

在这里，values是要变成参差不齐的数组的值的列表，row_splits是要拆分该值列表的位置的列表，因此行ragged[i]的值存储在其中 ragged.values[ragged.row_splits[i]:ragged.row_splits[i+1]]：

print(tf.RaggedTensor.from_row_splits(values=[5, 2, 6, 1, 4, 10, 7, 8, 6, 7],
row_splits=[0, 4, 4, 7, 8, 10]))

RaggedTensor如下：

<tf.RaggedTensor [[5, 2, 6, 1], [], [4, 10, 7], [8], [6, 7]]>

提供有用的 TensorFlow 操作

在这个页面上有所有 TensorFlow Python 模块，类和函数的完整列表。

可以在这个页面中找到所有数学函数。

在本节中，我们将研究一些有用的 TensorFlow 操作，尤其是在神经网络编程的上下文中。

求两个张量之间的平方差

在本书的后面，我们将需要找到两个张量之差的平方。方法如下：

tf.math.squared.difference( x,  y, name=None)

请看以下示例：

x = [1,3,5,7,11]
y = 5
s = tf.math.squared_difference(x,y)
s

输出将如下所示：

<tf.Tensor: id=279, shape=(5,), dtype=int32, numpy=array([16, 4, 0, 4, 36], dtype=int32)>

请注意，在此示例中，Python 变量x和y被转换为张量，然后y跨x广播。因此，例如，第一计算是(1 - 5)^2 = 16。

求平均值

以下是tf.reduce_mean()的签名。

请注意，在下文中，所有 TensorFlow 操作都有一个名称参数，当使用急切执行作为其目的是在计算图中识别操作时，可以安全地将其保留为默认值None。

请注意，这等效于np.mean，除了它从输入张量推断返回数据类型，而np.mean允许您指定输出类型（默认为float64）：

tf.reduce_mean(input_tensor, axis=None, keepdims=None, name=None)

通常需要找到张量的平均值。当在单个轴上完成此操作时，该轴被称为减少了。

这里有些例子：

numbers = tf.constant([[4., 5.], [7., 3.]])

求所有轴的均值

求出所有轴的平均值（即使用默认的axis = None）：

tf.reduce_mean(input_tensor=numbers)
#( 4\. + 5\. + 7\. + 3.)/4 = 4.75

输出将如下所示：

<tf.Tensor: id=272, shape=(), dtype=float32, numpy=4.75>

求各列的均值

用以下方法找到各列的均值（即减少行数）：

tf.reduce_mean(input_tensor=numbers, axis=0) # [ (4\. + 7\. )/2 , (5\. + 3.)/2 ] = [5.5, 4.]

输出将如下所示：

<tf.Tensor: id=61, shape=(2,), dtype=float32, numpy=array([5.5, 4\. ], dtype=float32)>

当keepdims为True时，缩小轴将保留为 1：

 tf.reduce_mean(input_tensor=numbers, axis=0, keepdims=True)

输出如下：

array([[5.5, 4.]])        (1 row, 2 columns)

求各行的均值

使用以下方法找到各行的均值（即减少列数）：

tf.reduce_mean(input_tensor=numbers, axis=1) # [ (4\. + 5\. )/2 , (7\. + 3\. )/2] = [4.5, 5]

输出将如下所示：

<tf.Tensor: id=64, shape=(2,), dtype=float32, numpy=array([4.5, 5\. ], dtype=float32)>

当keepdims为True时，缩小轴将保留为 1：

tf.reduce_mean(input_tensor=numbers, axis=1, keepdims=True)

输出如下：

([[4.5], [5]])      (2 rows, 1 column)

生成充满随机值的张量

开发神经网络时，例如初始化权重和偏差时，经常需要随机值。 TensorFlow 提供了多种生成这些随机值的方法。

使用`tf.random.normal()`

tf.random.normal()输出给定形状的张量，其中填充了来自正态分布的dtype类型的值。

所需的签名如下：

tf. random.normal(shape, mean = 0, stddev =2, dtype=tf.float32, seed=None, name=None)

以这个为例：

tf.random.normal(shape = (3,2), mean=10, stddev=2, dtype=tf.float32, seed=None,  name=None)
ran = tf.random.normal(shape = (3,2), mean=10.0, stddev=2.0)
print(ran)

输出将如下所示：

<tf.Tensor: id=13, shape=(3, 2), dtype=float32, numpy= array([[ 8.537131 , 7.6625767], [10.925293 , 11.804686 ], [ 9.3763075, 6.701221 ]], dtype=float32)>

使用`tf.random.uniform()`

所需的签名是这样的：

tf.random.uniform(shape, minval = 0, maxval= None, dtype=tf.float32, seed=None,  name=None)

这将输出给定形状的张量，该张量填充了从minval到maxval范围内的均匀分布的值，其中下限包括在内，而上限不包括在内。

以这个为例：

tf.random.uniform(shape = (2,4),  minval=0, maxval=None, dtype=tf.float32, seed=None,  name=None)

输出将如下所示：

tf.Tensor( [[ 6 7] [ 0 12]], shape=(2, 2), dtype=int32)

请注意，对于这两个随机操作，如果您希望生成的随机值都是可重复的，则使用tf.random.set_seed()。还显示了非默认数据类型的使用：

tf.random.set_seed(11)
ran1 = tf.random.uniform(shape = (2,2), maxval=10, dtype = tf.int32)
ran2 =  tf.random.uniform(shape = (2,2), maxval=10, dtype = tf.int32)
print(ran1) #Call 1
print(ran2)

tf.random.set_seed(11) #same seed
ran1 = tf.random.uniform(shape = (2,2), maxval=10, dtype = tf.int32)
ran2 = tf.random.uniform(shape = (2,2), maxval=10, dtype = tf.int32)
print(ran1) #Call 2
print(ran2)

Call 1和Call 2将返回相同的一组值。

输出将如下所示：

tf.Tensor(
[[4 6]
 [5 2]], shape=(2, 2), dtype=int32)
tf.Tensor(
[[9 7]
 [9 4]], shape=(2, 2), dtype=int32)

tf.Tensor(
[[4 6]
 [5 2]], shape=(2, 2), dtype=int32)
tf.Tensor(
[[9 7]
 [9 4]], shape=(2, 2), dtype=int32)

使用随机值的实际示例

这是一个适合从这个页面执行的小示例。

请注意，此示例显示了如何通过调用 TensorFlow 函数来初始化急切变量。

 dice1 = tf.Variable(tf.random.uniform([10, 1], minval=1, maxval=7, dtype=tf.int32))
 dice2 = tf.Variable(tf.random.uniform([10, 1], minval=1, maxval=7, dtype=tf.int32))

 # We may add dice1 and dice2 since they share the same shape and size.
 dice_sum = dice1 + dice2

 # We've got three separate 10x1 matrices. To produce a single
 # 10x3 matrix, we'll concatenate them along dimension 1.
 resulting_matrix = tf.concat(values=[dice1, dice2, dice_sum], axis=1)

 print(resulting_matrix)

示例输出如下：

tf.Tensor( 
[[ 5 4 9] 
[ 5 1 6] 
[ 2 4 6] 
[ 5 6 11]
[ 4 4 8] 
[ 4 6 10]
[ 2 2 4]
[ 5 6 11] 
[ 2 6 8] 
[ 5 4 9]], shape=(10, 3), dtype=int32)

查找最大和最小元素的索引

现在，我们将研究如何在张量轴上查找具有最大值和最小值的元素的索引。

这些函数的签名如下：

tf.argmax(input, axis=None, name=None, output_type=tf.int64 )

tf.argmin(input, axis=None, name=None, output_type=tf.int64 )

以这个为例：

# 1-D tensor
t5 = tf.constant([2, 11, 5, 42, 7, 19, -6, -11, 29])
print(t5)
i = tf.argmax(input=t5)
print('index of max; ', i)
print('Max element: ',t5[i].numpy())

i = tf.argmin(input=t5,axis=0).numpy()
print('index of min: ', i)
print('Min element: ',t5[i].numpy())

t6 = tf.reshape(t5, [3,3])

print(t6)
i = tf.argmax(input=t6,axis=0).numpy() # max arg down rows
print('indices of max down rows; ', i)
i = tf.argmin(input=t6,axis=0).numpy() # min arg down rows
print('indices of min down rows ; ',i)

print(t6)
i = tf.argmax(input=t6,axis=1).numpy() # max arg across cols
print('indices of max across cols: ',i)
i = tf.argmin(input=t6,axis=1).numpy() # min arg across cols
print('indices of min across cols: ',i)

输出将如下所示：

tf.Tensor([ 2 11 5 42 7 19 -6 -11 29], shape=(9,), dtype=int32) 

index of max; tf.Tensor(3, shape=(), dtype=int64) 
Max element: 42 

index of min: tf.Tensor(7, shape=(), dtype=int64) 
Min element: -11 

tf.Tensor( [[ 2 11 5] [ 42 7 19] [ -6 -11 29]], shape=(3, 3), dtype=int32) 
indices of max down rows; tf.Tensor([1 0 2], shape=(3,), dtype=int64) 
indices of min down rows ; tf.Tensor([2 2 0], shape=(3,), dtype=int64) 

tf.Tensor( [[ 2 11 5] [ 42 7 19] [ -6 -11 29]], shape=(3, 3), dtype=int32) 
indices of max across cols: tf.Tensor([1 0 2], shape=(3,), dtype=int64) 
indices of min across cols: tf.Tensor([0 1 1], shape=(3,), dtype=int64)

使用检查点保存和恢复张量值

为了保存和加载张量值，这是最好的方法（有关保存完整模型的方法，请参见第 2 章和 “Keras，TensorFlow 2” 的高级 API）：

variable = tf.Variable([[1,3,5,7],[11,13,17,19]])
checkpoint= tf.train.Checkpoint(var=variable)
save_path = checkpoint.save('./vars')
variable.assign([[0,0,0,0],[0,0,0,0]])
variable
checkpoint.restore(save_path)
print(variable)

输出将如下所示：

<tf.Variable 'Variable:0' shape=(2, 4) dtype=int32, numpy= array([[ 1, 3, 5, 7],  [11, 13, 17, 19]], dtype=int32)>

使用`tf.function`

tf.function是将采用 Python 函数并返回 TensorFlow 图的函数。这样做的好处是，图可以在 Python 函数（func）中应用优化并利用并行性。 tf.function是 TensorFlow 2 的新功能。

其签名如下：

tf.function(
    func=None,
    input_signature=None,
    autograph=True,
    experimental_autograph_options=None
)

示例如下：

def f1(x, y):
    return tf.reduce_mean(input_tensor=tf.multiply(x ** 2, 5) + y**2)

f2 = tf.function(f1)

x = tf.constant([4., -5.])
y = tf.constant([2., 3.])

# f1 and f2 return the same value, but f2 executes as a TensorFlow graph

assert f1(x,y).numpy() == f2(x,y).numpy()

断言通过，因此没有输出。

总结

在本章中，我们通过查看一些说明一些基本操作的代码片段开始熟悉 TensorFlow。我们对现代 TensorFlow 生态系统以及如何安装 TensorFlow 进行了概述。我们还研究了一些管家操作，一些急切操作以及各种 TensorFlow 操作，这些操作在本书的其余部分中将是有用的。在 www.youtube.com/watch?v=k5c-vg4rjBw 上对 TensorFlow 2 进行了出色的介绍。

另请参阅“附录 A”，以获得tf1.12到tf2转换工具的详细信息。在下一章中，我们将介绍 Keras，这是 TensorFlow 2 的高级 API。

二、Keras：TensorFlow 2 的高级 API

在本章中，我们将讨论 Keras，这是 TensorFlow 2 的高级 API。Keras 是由 FrançoisChollet 在 Google 上开发的。 Keras 在快速原型制作，深度学习模型的构建和训练以及研究和生产方面非常受欢迎。 Keras 是一个非常丰富的 API。正如我们将看到的，它支持急切的执行和数据管道以及其他功能。

自 2017 年以来，Keras 已可用于 TensorFlow，但随着 TensorFlow 2.0 的发布，其用途已扩展并进一步集成到 TensorFlow 中。 TensorFlow 2.0 已将 Keras 用作大多数深度学习开发工作的首选 API。

可以将 Keras 作为独立模块导入，但是在本书中，我们将集中精力在 TensorFlow 2 内部使用 Keras。因此，该模块为tensorflow.keras。

在本章中，我们将介绍以下主题：

Keras 的采用和优势
Keras 的特性
默认的 Keras 配置文件
Keras 后端
Keras 数据类型
Keras 模型
Keras 数据集

Keras 的采用和优势

下图显示了 Keras 在工业和研究领域的广泛应用。 PowerScore 排名由 Jeff Hale 设计，他使用了 7 个不同类别的 11 个数据源来评估框架的使用，兴趣和受欢迎程度。然后，他对数据进行了加权和合并，如 2018 年 9 月的这篇文章所示：

Keras 具有许多优点，其中包括：

它专为新用户和专家而设计，提供一致且简单的 API
通过简单，一致的接口对用户友好，该接口针对常见用例进行了优化
它为用户错误提供了很好的反馈，这些错误很容易理解，并且经常伴随有用的建议
它是模块化且可组合的； Keras 中的模型是通过结合可配置的构建块来构建的
通过编写自定义构建块很容易扩展
无需导入 Keras，因为它可以作为tensorflow.keras获得

Keras 的特性

如果您想知道 TensorFlow 随附的 Keras 版本，请使用以下命令：

import tensorflow as tf
print(tf.keras.__version__)

在撰写本文时，这产生了以下内容（来自 TensorFlow 2 的 Alpha 版本）：

2.2.4-tf

Keras 的其他功能包括对多 GPU 数据并行性的内置支持，以及 Keras 模型可以转化为 TensorFlow Estimators 并在 Google Cloud 上的 GPU 集群上进行训练的事实。

Keras 可能是不寻常的，因为它具有作为独立开源项目维护的参考实现，位于 www.keras.io 。

尽管 TensorFlow 在tf.keras模块中确实具有 Keras 的完整实现，但它独立于 TensorFlow 进行维护。默认情况下，该实现具有 TensorFlow 特定的增强功能，包括对急切执行的支持。

急切的执行意味着代码的执行是命令式编程环境，而不是基于图的环境，这是在 TensorFlow（v1.5 之前）的初始产品中工作的唯一方法。这种命令式（即刻）风格允许直观的调试，快速的开发迭代，支持 TensorFlow SavedModel格式，并内置支持对 CPU，GPU 甚至 Google 自己的硬件张量处理单元（TPU）进行分布式训练。

TensorFlow 实现还支持tf.data，分发策略，导出模型（可通过 TensorFlow Lite 部署在移动和嵌入式设备上）以及用于表示和分类结构化数据的特征列。

默认的 Keras 配置文件

Linux 用户的默认配置文件如下：

$HOME/.keras/keras.json

对于 Windows 用户，将$HOME替换为%USERPROFILE%。

它是在您第一次使用 Keras 时创建的，可以进行编辑以更改默认值。以下是.json文件包含的内容：

{ "image_data_format": "channels_last", "epsilon": 1e-07, "floatx": "float32", "backend": "tensorflow" }

默认值如下：

image_data_format：这是图像格式的字符串，"channels_last"或channels_first。在 TensorFlow 之上运行的 Keras 使用默认值。
epsilon：这是一个浮点数，是一个模糊常数，用于在某些操作中避免被零除。
floatx：这是一个字符串，指定默认的浮点精度，为"float16"，"float32"或"float64"之一。
backend：这是一个字符串，指定 Keras 在"tensorflow"，"theano"或"cntk"中的一种之上发现自己的工具。

对于所有这些值，在keras.backend中有获取器和设置器方法。参见这里。

例如，在以下集合中，供 Keras 使用的浮点类型为floatx，其中floatx参数是以下命令中所示的三种精度之一：

keras.backend.set_floatx(floatx)

Keras 后端

由于其模型级别的库结构，Keras 可能具有处理低级操作（例如卷积，张量乘积等）的不同张量操纵引擎。这些引擎称为后端。其他后端可用；我们在这里不考虑它们。

相同的链接可带您使用许多keras.backend函数。

使用 Keras backend的规范方法是：

from keras import backend as K

例如，以下是有用函数的签名：

K.constant(value, dtype=None, shape=None, name=None)

value是要赋予常数的值，dtype是创建的张量的类型，shape是创建的张量的形状，name是可选名称。

实例如下：

from tensorflow.keras import backend as K
const = K.constant([[42,24],[11,99]], dtype=tf.float16, shape=[2,2])
const

这将产生以下恒定张量。注意，由于启用了急切执行，（默认情况下）在输出中给出常量的值：

<tf.Tensor: id=1, shape=(2, 2), dtype=float16, numpy= array([[42., 24.], [11., 99.]], dtype=float16)>

急切不启用，输出将如下所示：

<tf.Tensor 'Const:0' shape=(2, 2) dtype=float16>

Keras 数据类型

Keras 数据类型（dtypes）与 TensorFlow Python 数据类型相同，如下表所示：

Python 类型	描述
`tf.float16`	16 位浮点
`tf.float32`	32 位浮点
`tf.float64`	64 位浮点
`tf.int8`	8 位有符号整数
`tf.int16`	16 位有符号整数
`tf.int32`	32 位有符号整数
`tf.int64`	64 位有符号整数
`tf.uint8`	8 位无符号整数
`tf.string`	可变长度字节数组
`tf.bool`	布尔型
`tf.complex64`	由两个 32 位浮点组成的复数-一个实部和虚部
`tf.complex128`	由两个 64 位浮点组成的复数-一个实部和一个虚部
`tf.qint8`	量化运算中使用的 8 位有符号整数
`tf.qint32`	量化运算中使用的 32 位有符号整数
`tf.quint8`	量化运算中使用的 8 位无符号整数

Keras 模型

Keras 基于神经网络模型的概念。主要模型称为序列，是层的线性栈。还有一个使用 Keras 函数式 API 的系统。

Keras 顺序模型

要构建 Keras Sequential模型，请向其中添加层，其顺序与您希望网络进行计算的顺序相同。

建立模型后，您可以编译；这样可以优化要进行的计算，并且可以在其中分配优化器和希望模型使用的损失函数。

下一步是使模型拟合数据。这通常称为训练模型，是所有计算发生的地方。可以分批或一次将数据呈现给模型。

接下来，您评估模型以建立其准确率，损失和其他指标。最后，在训练好模型之后，您可以使用它对新数据进行预测。因此，工作流程是：构建，编译，拟合，评估，做出预测。

有两种创建Sequential模型的方法。让我们看看它们中的每一个。

创建顺序模型的第一种方法

首先，可以将层实例列表传递给构造器，如以下示例所示。

在下一章中，我们将对层进行更多的讨论。目前，我们将仅作足够的解释，以使您了解此处发生的情况。

采集数据。 mnist是手绘数字的数据集，每个数字在28 x 28像素的网格上。每个单独的数据点都是一个无符号的 8 位整数（uint8），如标签所示：

mnist = tf.keras.datasets.mnist
(train_x,train_y), (test_x, test_y) = mnist.load_data()

epochs变量存储我们将数据呈现给模型的次数：

epochs=10
batch_size = 32 # 32 is default in fit method but specify anyway

接下来，将所有数据点（x）归一化为float32类型的浮点数范围为 0 到 1。另外，根据需要将标签（y）投射到int64：

train_x, test_x = tf.cast(train_x/255.0, tf.float32), tf.cast(test_x/255.0, tf.float32)
train_y, test_y = tf.cast(train_y,tf.int64),tf.cast(test_y,tf.int64)

模型定义如下。

注意在模型定义中我们如何传递层列表：

Flatten接受28 x 28（即 2D）像素图像的输入，并产生 784（即 1D）向量，因为下一个（密集）层是一维的。
Dense是一个完全连接的层，意味着其所有神经元都连接到上一层和下一层中的每个神经元。下面的示例有 512 个神经元，其输入通过 ReLU（非线性）激活函数传递。
Dropout随机关闭上一层神经元的一部分（在这种情况下为 0.2）。这样做是为了防止任何特定的神经元变得过于专业化，并导致模型与数据过拟合，从而影响测试数据上模型的准确率指标（在后面的章节中将对此进行更多介绍）。
最后的Dense层具有一个称为softmax的特殊激活函数，该函数将概率分配给可能的 10 个输出单元中的每一个：

model1 = tf.keras.models.Sequential([
 tf.keras.layers.Flatten(),
 tf.keras.layers.Dense(512,activation=tf.nn.relu),
 tf.keras.layers.Dropout(0.2),
 tf.keras.layers.Dense(10,activation=tf.nn.softmax)
])

model.summary()函数是一种有用的同义词方法，并为我们的模型提供以下输出：

401920的数字来自输入28 x 28 = 784 x 512（dense_2层）输出784 * 512 = 401,408以及每个dense_1层神经元的偏置单元，则401,408 + 512 = 401,920。

5130的数字来自512 * 10 + 10 = 5,130。

接下来，我们编译模型，如以下代码所示：

optimiser = tf.keras.optimizers.Adam()
model1.compile (optimizer= optimiser, loss='sparse_categorical_crossentropy', metrics = ['accuracy'])

optimizer是一种方法，通过该方法可以调整模型中加权连接的权重以减少损失。

loss是模型所需输出与实际输出之间差异的度量，而metrics是我们评估模型的方式。

为了训练我们的模型，我们接下来使用fit方法，如下所示：

model1.fit(train_x, train_y, batch_size=batch_size, epochs=epochs)

调用fit()的输出如下，显示了周期训练时间，损失和准确率：

Epoch 1/10 60000/60000 [==============================] - 5s 77us/step - loss: 0.2031 - acc: 0.9394 ...
Epoch 10/10 60000/60000 [==============================] - 4s 62us/step - loss: 0.0098 - acc: 0.9967

最后，我们可以使用evaluate方法检查我们训练有素的模型的准确率：

model1.evaluate(test_x, test_y)

这将产生以下输出：

10000/10000 [==============================] - 0s 39us/step [0.09151900197149189, 0.9801]

这表示测试数据的损失为 0.09，准确率为 0.9801。精度为 0.98 意味着该模型平均可以识别出 100 个测试数据点中的 98 个。

创建顺序模型的第二种方法

对于同一体系结构，将层列表传递给Sequential模型的构造器的替代方法是使用add方法，如下所示：

model2 = tf.keras.models.Sequential();
model2.add(tf.keras.layers.Flatten())
model2.add(tf.keras.layers.Dense(512, activation='relu'))
model2.add(tf.keras.layers.Dropout(0.2))
model2.add(tf.keras.layers.Dense(10,activation=tf.nn.softmax))
model2.compile (optimizer= tf.keras.Adam(), loss='sparse_categorical_crossentropy', 
 metrics = ['accuracy'])

如我们所见，fit()方法执行训练，使用模型将输入拟合为输出：

model2.fit(train_x, train_y, batch_size=batch_size, epochs=epochs)

然后，我们使用test数据评估模型的表现：

model2.evaluate(test_x, test_y)

这给我们带来了0.07的损失和0.981的准确率。

因此，这种定义模型的方法产生的结果与第一个结果几乎相同，这是可以预期的，因为它是相同的体系结构，尽管表达方式略有不同，但具有相同的optimizer和loss函数。现在让我们看一下函数式 API。

Keras 函数式 API

与以前看到的Sequential模型的简单线性栈相比，函数式 API 使您可以构建更复杂的体系结构。它还支持更高级的模型。这些模型包括多输入和多输出模型，具有共享层的模型以及具有剩余连接的模型。

这是函数式 API 的使用的简短示例，其架构与前两个相同。

设置代码与先前演示的相同：

import tensorflow as tf
mnist = tf.keras.datasets.mnist
(train_x,train_y), (test_x, test_y) = mnist.load_data()
train_x, test_x = train_x/255.0, test_x/255.0
epochs=10

这是模型定义。

注意如何在tensor上调用层并返回张量作为输出，然后如何使用这些输入和输出张量来定义模型：

inputs = tf.keras.Input(shape=(28,28)) # Returns a 'placeholder' tensor
x = tf.keras.layers.Flatten()(inputs)
x = tf.layers.Dense(512, activation='relu',name='d1')(x)
x = tf.keras.layers.Dropout(0.2)(x)
predictions = tf.keras.layers.Dense(10,activation=tf.nn.softmax, name='d2')(x)

model3 = tf.keras.Model(inputs=inputs, outputs=predictions)

请注意，此代码如何产生与model1和model2相同的体系结构：

None出现在这里是因为我们没有指定我们有多少输入项（即批量大小）。这确实意味着未提供。

其余代码与前面的示例相同：

optimiser = tf.keras.optimizers.Adam()
model3.compile (optimizer= optimiser, loss='sparse_categorical_crossentropy', metrics = ['accuracy'])

model3.fit(train_x, train_y, batch_size=32, epochs=epochs)

model3.evaluate(test_x, test_y)

对于相同的体系结构，这同样会产生0.067的损失和0.982的精度。

接下来，让我们看看如何对 Keras model类进行子类化。

子类化 Keras 模型类

Keras Model类可以被子类化，如下面的代码所示。 Google 指出，纯函数风格（如前面的示例所示）比子类风格更可取（我们在此包括其内容是出于完整性的考虑，因为它很有趣）。

首先，请注意如何在构造器（.__init__()）中分别声明和命名层。

然后，注意在call()方法中各层如何以函数风格链接在一起。此方法封装了前向传播：

class MyModel(tf.keras.Model):
 def __init__(self, num_classes=10):
  super(MyModel, self).__init__()
 # Define your layers here.
   inputs = tf.keras.Input(shape=(28,28)) # Returns a placeholder tensor
   self.x0 = tf.keras.layers.Flatten()
   self.x1 = tf.keras.layers.Dense(512, activation='relu',name='d1')
   self.x2 = tf.keras.layers.Dropout(0.2)
   self.predictions = tf.keras.layers.Dense(10,activation=tf.nn.softmax, name='d2')

 def call(self, inputs):
 # This is where to define your forward pass
 # using the layers previously defined in `__init__`
   x = self.x0(inputs)
   x = self.x1(x)
   x = self.x2(x) 
   return self.predictions(x)

model4 = MyModel()

该定义可以代替本章中的任何较早的模型定义使用，它们具有相同的数据下载支持代码，以及相似的用于训练/评估的代码。下面的代码显示了最后一个示例：

model4 = MyModel()
batch_size = 32
steps_per_epoch = len(train_x.numpy())//batch_size
print(steps_per_epoch)

model4.compile (optimizer= tf.keras.Adam(), loss='sparse_categorical_crossentropy', 
 metrics = ['accuracy'])

model4.fit(train_x, train_y, batch_size=batch_size, epochs=epochs)

 model4.evaluate(test_x, test_y)

结果是0.068的损失，准确率为0.982；再次与本章中其他三种模型构建风格产生的结果几乎相同。

使用数据管道

也可以使用以下代码将数据作为tf.data.Dataset()迭代器传递到fit方法中（数据获取代码与先前描述的相同）。 from_tensor_slices()方法将 NumPy 数组转换为数据集。注意batch()和shuffle()方法链接在一起。接下来，map()方法在输入图像x上调用一种方法，该方法在y轴上随机翻转其中的两个，有效地增加了图像集的大小。标签y在这里保持不变。最后，repeat()方法意味着在到达数据集的末尾（连续）时，将从头开始重新填充该数据集：

batch_size = 32
buffer_size = 10000

train_dataset = tf.data.Dataset.from_tensor_slices((train_x, train_y)).batch(32).shuffle(10000)

train_dataset = train_dataset.map(lambda x, y: (tf.image.random_flip_left_right(x), y))
train_dataset = train_dataset.repeat()

test设置的代码类似，除了不进行翻转：

test_dataset = tf.data.Dataset.from_tensor_slices((test_x, test_y)).batch(batch_size).shuffle(10000)

test_dataset = train_dataset.repeat()

现在，在fit()函数中，我们可以直接传递数据集，如下所示：

steps_per_epoch = len(train_x)//batch_size # required because of the repeat on the dataset
optimiser = tf.keras.optimizers.Adam()
model5.compile (optimizer= optimiser, loss='sparse_categorical_crossentropy', metrics = ['accuracy'])
model.fit(train_dataset, batch_size=batch_size, epochs=epochs, steps_per_epoch=steps_per_epoch)

编译和评估代码与之前看到的类似。

使用data.Dataset迭代器的优点在于，管道可以处理通常用于准备数据的大部分管道，例如批量和改组。我们也已经看到，各种操作可以链接在一起。

保存和加载 Keras 模型

TensorFlow 中的 Keras API 具有轻松保存和恢复模型的能力。这样做如下，并将模型保存在当前目录中。当然，这里可以通过更长的路径：

model.save('./model_name.h5')

这将保存模型体系结构，权重，训练状态（loss，optimizer）和优化器的状态，以便您可以从上次中断的地方继续训练模型。

加载保存的模型的步骤如下。请注意，如果您已经编译了模型，那么负载将使用保存的训练配置来编译模型：

from tensorflow.keras.models import load_model
new_model = load_model('./model_name.h5')

也可以仅保存模型权重并以此加载它们（在这种情况下，必须构建体系结构以将权重加载到其中）：

model.save_weights('./model_weights.h5')

然后使用以下内容加载它：

model.load_weights('./model_weights.h5')

Keras 数据集

可从 Keras 中获得以下数据集：boston_housing，cifar10，cifar100，fashion_mnist，imdb，mnist和reuters。

它们都可以通过load_data()函数访问。例如，要加载fashion_mnist数据集，请使用以下命令：

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()

可以在这个页面中找到更多详细信息。

总结

在本章中，我们使用通用注释和见解探索了 Keras API，然后以四种不同的方式表示相同的基本体系结构，以训练mnist数据集。

在下一章中，我们将通过探索许多监督的学习场景，包括线性回归，逻辑回归和 K 近邻，开始认真使用 TensorFlow。

三、TensorFlow 2 和 ANN 技术

在本章中，我们将讨论并举例说明 TensorFlow 2 的那些部分，这些部分对于构建，训练和评估人工神经网络及其推理的利用目的是必需的。最初，我们不会提供完整的申请。相反，在将所有概念和技术放在一起并在随后的章节中介绍完整的模型之前，我们将重点关注它们。

在本章中，我们将介绍以下主题：

将数据呈现给人工神经网络（ANN）
神经网络层
梯度下降算法的梯度计算
损失函数

将数据呈现给人工神经网络

Google 推荐的将数据呈现给 TensorFlow ANN 的规范方法是通过由tf.data.Dataset对象和tf.data.Iterator方法组成的数据管道。 tf.data.Dataset对象由一系列元素组成，其中每个元素包含一个或多个张量对象。 tf.data.Iterator是一种用于遍历数据集以便可以访问其中的连续单个元素的方法。

我们将研究构建数据管道的两种重要方法，首先是从内存中的 NumPy 数组，其次是从逗号分隔值（CSV）文件。我们还将研究二进制 TFRecord 格式。

将 NumPy 数组与数据集结合使用

首先让我们看一些简单的例子。这是一个 NumPy 数组：

import tensorflow as tf
import numpy as np 

num_items = 11
num_list1 = np.arange(num_items)
num_list2 = np.arange(num_items,num_items*2)

这是使用from_tensor_slices()方法创建数据集的方法：

num_list1_dataset = tf.data.Dataset.from_tensor_slices(num_list1)

这是使用make_one_shot_iterator()方法在其上创建iterator的方法：

iterator = tf.compat.v1.data.make_one_shot_iterator(num_list1_dataset)

这是使用get_next方法将它们一起使用的方法：

for item in num_list1_dataset:
    num = iterator1.get_next().numpy()
    print(num)

请注意，由于我们使用的是单次迭代器，因此在同一程序运行中两次执行此代码会引发错误。

也可以使用batch方法批量访问数据。请注意，第一个参数是每个批次中要放置的元素数，第二个参数是不言自明的drop_remainder参数：

num_list1_dataset = tf.data.Dataset.from_tensor_slices(num_list1).batch(3, drop_remainder = False)
iterator = tf.compat.v1.data.make_one_shot_iterator(num_list1_dataset)
for item in num_list1_dataset:
    num = iterator.get_next().numpy()
    print(num)

还有一种zip方法，可用于一起显示特征和标签：

dataset1 = [1,2,3,4,5]
dataset2 = ['a','e','i','o','u']
dataset1 = tf.data.Dataset.from_tensor_slices(dataset1)
dataset2 = tf.data.Dataset.from_tensor_slices(dataset2)
zipped_datasets = tf.data.Dataset.zip((dataset1, dataset2))
iterator = tf.compat.v1.data.make_one_shot_iterator(zipped_datasets)
for item in zipped_datasets:
    num = iterator.get_next()
    print(num)

我们可以使用concatenate方法如下连接两个数据集：

ds1 = tf.data.Dataset.from_tensor_slices([1,2,3,5,7,11,13,17])
ds2 = tf.data.Dataset.from_tensor_slices([19,23,29,31,37,41])
ds3 = ds1.concatenate(ds2)
print(ds3)
iterator = tf.compat.v1.data.make_one_shot_iterator(ds3)
for i in range(14):
  num = iterator.get_next()
  print(num)

我们还可以完全取消迭代器，如下所示：

epochs=2
for e in range(epochs):
  for item in ds3:
    print(item)

请注意，此处的外部循环不会引发错误，因此在大多数情况下将是首选方法。

将逗号分隔值（CSV）文件与数据集一起使用

CSV 文件是一种非常流行的数据存储方法。 TensorFlow 2 包含灵活的方法来处理它们。这里的主要方法是tf.data.experimental.CsvDataset。

CSV 示例 1

使用以下参数，我们的数据集将由filename文件每一行中的两项组成，均为浮点类型，忽略文件的第一行，并使用第 1 列和第 2 列（当然，列编号为，从 0 开始）：

filename = ["./size_1000.csv"]
record_defaults = [tf.float32] * 2 # two required float columns
dataset = tf.data.experimental.CsvDataset(filename, record_defaults, header=True, select_cols=[1,2])
for item in dataset:
  print(item)

CSV 示例 2

在此示例中，使用以下参数，我们的数据集将包含一个必需的浮点数，一个默认值为0.0的可选浮点和一个int，其中 CSV 文件中没有标题，而只有列 1 ，2 和 3 被导入：

#file Chapter_2.ipynb
filename = "mycsvfile.txt"
record_defaults = [tf.float32, tf.constant([0.0], dtype=tf.float32), tf.int32,]
dataset = tf.data.experimental.CsvDataset(filename, record_defaults, header=False, select_cols=[1,2,3])
for item in dataset:
  print(item)

CSV 示例 3

对于最后一个示例，我们的dataset将由两个必需的浮点数和一个必需的字符串组成，其中 CSV 文件具有header变量：

filename = "file1.txt"
record_defaults = [tf.float32, tf.float32, tf.string ,]
dataset = tf.data.experimental.CsvDataset(filename, record_defaults, header=False)
or item in dataset:
    print(item[0].numpy(), item[1].numpy(),item[2].numpy().decode() ) 
# decode as string is in binary format.

TFRecord

另一种流行的存储数据选择是 TFRecord 格式。这是一个二进制文件格式。对于大文件，这是一个不错的选择，因为二进制文件占用的磁盘空间更少，复制所需的时间更少，并且可以非常有效地从磁盘读取。所有这些都会对数据管道的效率以及模型的训练时间产生重大影响。该格式还以多种方式与 TensorFlow 一起进行了优化。这有点复杂，因为在存储之前必须将数据转换为二进制格式，并在回读时将其解码。

TFRecord 示例 1

我们在此处显示的第一个示例将演示该技术的基本内容。（文件为TFRecords.ipynb）。

由于 TFRecord 文件是二进制字符串序列，因此必须在保存之前指定其结构，以便可以正确地写入并随后回读。 TensorFlow 为此具有两个结构，即tf.train.Example和tf.train.SequenceExample。您要做的是将每个数据样本存储在这些结构之一中，然后对其进行序列化，然后使用tf.python_io.TFRecordWriter将其保存到磁盘。

在下面的示例中，浮点数组data被转换为二进制格式，然后保存到磁盘。 feature是一个字典，包含在序列化和保存之前传递给tf.train.Example的数据。 “TFRecord 示例 2”中显示了更详细的示例：

TFRecords 支持的字节数据类型为FloatList，Int64List和BytesList。

# file: TFRecords.ipynb
import tensorflow as tf
import numpy as np

data=np.array([10.,11.,12.,13.,14.,15.])

def npy_to_tfrecords(fname,data):
    writer = tf.io.TFRecordWriter(fname)
    feature={}
    feature['data'] = tf.train.Feature(float_list=tf.train.FloatList(value=data))
    example = tf.train.Example(features=tf.train.Features(feature=feature))
    serialized = example.SerializeToString()
    writer.write(serialized)
    writer.close()

npy_to_tfrecords("./myfile.tfrecords",data)

读回记录的代码如下。构造了parse_function函数，该函数对从文件读回的数据集进行解码。这需要一个字典（keys_to_features），其名称和结构与保存的数据相同：

dataset = tf.data.TFRecordDataset("./myfile.tfrecords")

def parse_function(example_proto):
 keys_to_features = {'data':tf.io.FixedLenSequenceFeature([], dtype = tf.float32, allow_missing = True) }
    parsed_features = tf.io.parse_single_example(serialized=example_proto, features=keys_to_features)
    return parsed_features['data']

dataset = dataset.map(parse_function)
iterator = tf.compat.v1.data.make_one_shot_iterator(dataset)
# array is retrieved as one item
item = iterator.get_next()
print(item)
print(item.numpy())
print(item[2].numpy())

TFRecord 示例 2

在这个例子中，我们看一下这个字典给出的更复杂的记录结构：

filename = './students.tfrecords'
data = {
            'ID': 61553,
            'Name': ['Jones', 'Felicity'],
            'Scores': [45.6, 97.2] 
        }

使用此方法，我们可以再次使用Feature()方法构造一个tf.train.Example类。注意我们如何编码字符串：

ID = tf.train.Feature(int64_list=tf.train.Int64List(value=[data['ID']]))

Name = tf.train.Feature(bytes_list=tf.train.BytesList(value=[n.encode('utf-8') for n in data['Name']]))

Scores = tf.train.Feature(float_list=tf.train.FloatList(value=data['Scores']))

example = tf.train.Example(features=tf.train.Features(feature={'ID': ID, 'Name': Name, 'Scores': Scores }))

将此记录串行化并将其写入光盘与“TFRecord 示例 1”相同：

writer = tf.io.TFRecordWriter(filename)
writer.write(example.SerializeToString())
writer.close()

为了回读这一点，我们只需要构造我们的parse_function函数即可反映记录的结构：

dataset = tf.data.TFRecordDataset("./students.tfrecords")

def parse_function(example_proto):
    keys_to_features = {'ID':tf.io.FixedLenFeature([], dtype = tf.int64),
                       'Name':tf.io.VarLenFeature(dtype = tf.string),
                        'Scores':tf.io.VarLenFeature(dtype = tf.float32)
                       }
    parsed_features = tf.io.parse_single_example(serialized=example_proto, features=keys_to_features)
    return parsed_features["ID"], parsed_features["Name"],parsed_features["Scores"]

下一步与之前相同：

dataset = dataset.map(parse_function)

iterator = tf.compat.v1.data.make_one_shot_iterator(dataset)
item = iterator.get_next()
# record is retrieved as one item
print(item)

输出如下：

(<tf.Tensor: id=264, shape=(), dtype=int64, numpy=61553>, <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7f1bfc7567b8>, <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7f1bfc771e80>)

现在我们可以从item中提取数据（注意，必须解码（从字节开始）字符串，其中 Python 3 的默认值为utf8）。还要注意，字符串和浮点数数组将作为稀疏数组返回，并且要从记录中提取它们，我们使用稀疏数组value方法：

print("ID: ",item[0].numpy())
name = item[1].values.numpy()
name1= name[0].decode()returned
name2 = name[1].decode('utf8')
print("Name:",name1,",",name2)
print("Scores: ",item[2].values.numpy())

单热编码

单热编码（OHE）是根据数据标签构造张量的方法，在每个标签中，与标签值相对应的每个元素中的数字为 1，其他地方为 0；也就是说，张量中的位之一是热的（1）。

OHE 示例 1

在此示例中，我们使用tf.one_hot()方法将十进制值5转换为一个单编码的值0000100000：

y = 5
y_train_ohe = tf.one_hot(y, depth=10).numpy() 
print(y, "is ",y_train_ohe,"when one-hot encoded with a depth of 10")
# 5 is 00000100000 when one-hot encoded with a depth of 10

OHE 示例 2

在下面的示例中，还使用从时尚 MNIST 数据集导入的示例代码很好地展示了这一点。

原始标签是从 0 到 9 的整数，因此，例如2的标签在进行一次热编码时变为0010000000，但请注意索引与该索引处存储的标签之间的区别：

import tensorflow as tf
from tensorflow.python.keras.datasets import fashion_mnist
tf.enable_eager_execution()
width, height, = 28,28
n_classes = 10

# load the dataset
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
split = 50000
#split feature training set into training and validation sets
(y_train, y_valid) = y_train[:split], y_train[split:]

# one-hot encode the labels using TensorFlow. 
# then convert back to numpy for display 
y_train_ohe = tf.one_hot(y_train, depth=n_classes).numpy() 
y_valid_ohe = tf.one_hot(y_valid, depth=n_classes).numpy()
y_test_ohe = tf.one_hot(y_test, depth=n_classes).numpy()

# show difference between the original label and a one-hot-encoded label

i=5
print(y_train[i]) # 'ordinary' number value of label at index i=5 is 2
# 2
# note the difference between the *index* of 5 and the *label* at that index which is 2
print(y_train_ohe[i]) # 
# 0\. 0\. 1\. 0\. 0.0 .0 .0\. 0\. 0.

接下来，我们将检查神经网络的基本数据结构：神经元的层。

层

ANN 使用的基本数据结构是层，许多相互连接的层构成了一个完整的 ANN。可以将一层设想为神经元的数组，尽管使用单词神经元可能会产生误导，因为在人脑神经元和构成一层的人工神经元之间只有很少的对应关系。记住这一点，我们将在下面使用术语神经元。与任何计算机处理单元一样，神经元的特征在于其输入和输出。通常，神经元具有许多输入和一个输出值。每个输入连接均带有权重wᵢ。

下图显示了一个神经元。重要的是要注意，激活函数f对于平凡的 ANN 而言是非线性的。网络中的一般神经元接收来自其他神经元的输入，并且每个神经元的权重为wᵢ，如图所示，网络通过调整这些权重来学习权重，以便输入生成所需的输出：

图 1：人工神经元

通过将输入乘以权重，将偏差乘以其权重相加，然后应用激活函数，可以得出神经元的输出（请参见下图）。

下图显示了如何配置各个人工神经元和层以创建 ANN：

图 2：人工神经网络

层的输出由以下公式给出：

在此， W是输入的权重， X是输入向量， f是非线性激活函数。

层的类型很多，支持大量的 ANN 模型结构。可以在这个页面中找到非常全面的列表。

在这里，我们将研究一些更流行的方法，以及 TensorFlow 如何实现它们。

密集（完全连接）层

密集层是完全连接的层。这意味着上一层中的所有神经元都连接到下一层中的所有神经元。在密集的网络中，所有层都是密集的。（如果网络具有三个或更多隐藏层，则称为深度网络）。

layer = tf.keras.layers.Dense(n)行构成了一个密集层，其中n是输出单元的数量。

注意，密集层是一维的。请参考“模型”的部分。

卷积层

卷积层是一层，其中层中的神经元通过使用通常为正方形的过滤器分组为小块，并通过在该层上滑动过滤器来创建。每个色块由卷积，即乘以滤波器并相加。简而言之，卷积网或 ConvNets 已经证明自己非常擅长图像识别和处理。

对于图像，卷积层具有部分签名tf.keras.layers.Conv2D(filters, kernel_size, strides=1, padding='valid')。

因此，在下面的示例中，该第一层具有一个大小为(1, 1)的过滤器，并且其填充'valid'。其他填充可能性是'same'。

区别在于，使用'same'填充，必须在外部填充该层（通常用零填充），以便在卷积发生后，输出大小与该层大小相同。如果使用'valid'填充，则不会进行填充，并且如果跨度和内核大小的组合不能完全适合该层，则该层将被截断。输出大小小于正在卷积的层：

seqtial_Net = tf.keras.Sequential([tf.keras.layers.Conv2D(   1, (1, 1), strides = 1, padding='valid')

最大池化层

当窗口在层上滑动时，最大池化层在其窗口内取最大值，这与卷积发生的方式几乎相同。

空间数据（即图像）的最大池签名如下：

tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=None, padding='valid', data_format=None)

因此，要使用默认值，您只需拥有以下内容：

layer = tf.keras.maxPooling2D()

批量归一化层和丢弃层

批量归一化是一个接受输入并输出相同数量的输出的层，其中激活的平均值和单位方差为零，因为这对学习有益。批量标准化规范了激活，使它们既不会变得很小也不会爆炸性地变大，这两种情况都阻止了网络的学习。

BatchNormalization层的签名如下：

tf.keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True, beta_initializer='zeros', gamma_initializer='ones', moving_mean_initializer='zeros', moving_variance_initializer='ones', beta_regularizer=None, gamma_regularizer=None, beta_constraint=None, gamma_constraint=None)

因此，要使用默认值，只需使用以下命令：

layer = tf.keras.layers.BatchNormalization()

丢弃层是其中一定百分比的神经元在训练过程中（而不是在推理过程中）随机关闭的层。由于不鼓励单个神经元对其输入进行专门化，因此这迫使网络在泛化方面变得更好。

Dropout层的签名如下：

tf.keras.layers.Dropout(rate, noise_shape=None, seed=None)

rate参数是神经元被关闭的部分。

因此，要使用它，例如，您需要：

layer = tf.keras.layers.Dropout(rate = 0.5)

随机选择的 50% 的神经元将被关闭。

Softmax 层

softmax 层是其中每个输出单元的激活对应于输出单元与给定标签匹配的概率的层。因此，具有最高激活值的输出神经元是网络的预测。当要学习的类互斥时使用此函数，以使 softmax 层输出的概率总计为 1。

它被实现为在密集层上的激活。

因此，例如，我们有以下内容：

model2.add(tf.keras.layers.Dense(10,activation=tf.nn.softmax))

这将添加具有 10 个神经元的密集 softmax 层，其中神经元的激活总数为 1。

接下来，我们将进一步讨论激活函数。

激活函数

重要的是要注意，神经网络具有非线性激活函数，即应用于神经元加权输入之和的函数。除了平凡的神经网络模型外，线性激活单元无法将输入层映射到输出层。

有许多常用的激活函数，包括 Sigmoid，tanh，ReLU 和泄漏的 ReLU。一个很好的总结，以及这些函数的图表，可以在这里找到。

建立模型

使用 Keras 创建 ANN 模型的方法有四种：

方法 1 ：参数已传递给tf.keras.Sequential
方法 2 ：使用tf.keras.Sequential的.add方法
方法 3 ：使用 Keras 函数式 API
方法 4 ：通过将tf.keras.Model对象子类化

有关这四种方法的详细信息，请参考第 2 章“TensorFlow 2 的高级 API，Keras”。

梯度下降算法的梯度计算

TenorFlow 的一大优势是它能够自动计算梯度以用于梯度下降算法，这当然是大多数机器学习模型的重要组成部分。 TensorFlow 提供了许多用于梯度计算的方法。

启用急切执行时，有四种自动计算梯度的方法（它们也适用于图模式）：

tf.GradientTape：上下文记录了计算，因此您可以调用tf.gradient()来获取记录时针对任何可训练变量计算的任何张量的梯度
tfe.gradients_function()：采用一个函数（例如f()）并返回一个梯度函数（例如fg()），该函数可以计算f()的输出相对于f()或其部分参数的梯度
tfe.implicit_gradients()：这非常相似，但是fg()会针对这些输出所依赖的所有可训练变量计算f()输出的梯度
tfe.implicit_value_and_gradients()：几乎相同，但fg()也返回函数f()的输出

我们将看看其中最流行的tf.GradientTape。同样，在其上下文中，随着计算的进行，对这些计算进行记录（录音），以便可以使用tf.gradient()重放磁带，并实现适当的自动微分。

在以下代码中，当计算sum方法时，磁带将在tf.GradientTape()上下文中记录计算结果，以便可以通过调用tape.gradient()找到自动微分。

注意在[weight1_grad] = tape.gradient(sum, [weight1])中的此示例中如何使用列表。

默认情况下，仅可以调用tape.gradient()：

# by default, you can only call tape.gradient once in a GradientTape context
weight1 = tf.Variable(2.0)
def weighted_sum(x1):
   return weight1 * x1
with tf.GradientTape() as tape:
   sum = weighted_sum(7.)
   [weight1_grad] = tape.gradient(sum, [weight1])
print(weight1_grad.numpy()) # 7 , weight1*x diff w.r.t. weight1 is x, 7.0, also see below.

在下一个示例中，请注意，参数persistent=True已传递给tf.GradientTape()。这使我们可以多次调用tape.gradient()。同样，我们在tf.GradientTape上下文中计算一个加权和，然后调用tape.gradient()来计算每项相对于weight变量的导数：

# if you need to call tape.gradient() more than once
# use GradientTape(persistent=True) 
weight1 = tf.Variable(2.0)
weight2 = tf.Variable(3.0)
weight3 = tf.Variable(5.0)

def weighted_sum(x1, x2, x3):
    return weight1*x1 + weight2*x2 + weight3*x3

with tf.GradientTape(persistent=True) as tape:
   sum = weighted_sum(7.,5.,6.)

[weight1_grad] = tape.gradient(sum, [weight1])
[weight2_grad] = tape.gradient(sum, [weight2])
[weight3_grad] = tape.gradient(sum, [weight3])

print(weight1_grad.numpy()) #7.0
print(weight2_grad.numpy()) #5.0
print(weight3_grad.numpy()) #6.0

接下来，我们将研究损失函数。这些是在训练神经网络模型期间优化的函数。

损失函数

loss函数（即，误差测量）是训练 ANN 的必要部分。它是网络在训练期间计算出的输出与其所需输出的差异程度的度量。通过微分loss函数，我们可以找到一个量，通过该量可以调整各层之间的连接权重，以使 ANN 的计算输出与所需输出更紧密匹配。

最简单的loss函数是均方误差：

，

在此， y是实际标签值，y_hat是预测标签值。

特别值得注意的是分类交叉熵loss函数，它由以下方程式给出：

当所有可能的类别中只有一类正确时，使用loss函数；当softmax函数用作 ANN 的最后一层的输出时，将使用此loss函数。

请注意，这两个函数可以很好地微分，这是反向传播所要求的。

总结

在本章中，我们研究了许多支持神经网络创建和使用的技术。

我们涵盖了到 ANN 的数据表示，ANN 的各层，创建模型，梯度下降算法的梯度计算，损失函数以及保存和恢复模型的内容。这些主题是在开发神经网络模型时将在后续章节中遇到的概念和技术的重要前提。

确实，在下一章中，我们将通过探索许多监督的学习场景，包括线性回归，逻辑回归和 K 近邻，来认真地使用 TensorFlow。

四、TensorFlow 2 和监督机器学习

在本章中，我们将讨论并举例说明 TensorFlow 2 在以下情况下的监督机器学习问题中的使用：线性回归，逻辑回归和 K 最近邻（KNN）。

在本章中，我们将研究以下主题：

监督学习
线性回归
我们的第一个线性回归示例
波士顿住房数据集
逻辑回归（分类）
K 最近邻（KNN）

监督学习

监督学习是一种机器学习场景，其中一组数据点中的一个或多个数据点与标签关联。然后，模型学习，以预测看不见的数据点的标签。为了我们的目的，每个数据点通常都是张量，并与一个标签关联。在计算机视觉中，有很多受监督的学习问题；例如，算法显示了许多成熟和未成熟的西红柿的图片，以及表明它们是否成熟的分类标签，并且在训练结束后，该模型能够根据训练集预测未成熟的西红柿的状态。这可能在番茄的物理分拣机制中有非常直接的应用。或一种算法，该算法可以在显示许多示例以及它们的性别和年龄之后，学会预测新面孔的性别和年龄。此外，如果模型已经在许多树图像及其类型标签上进行了训练，则可以学习根据树图像来预测树的类型可能是有益的。

线性回归

线性回归问题是在给定一个或多个其他变量（数据点）的值的情况下，您必须预测一个连续变量的值的问题。例如，根据房屋的占地面积，预测房屋的售价。在这些示例中，您可以将已知特征及其关联的标签绘制在简单的线性图上，如熟悉的x, y散点图，并绘制最适合数据的线。这就是最适合的系列。然后，您可以读取对应于该图的x范围内的任何特征值的标签。

但是，线性回归问题可能涉及几个特征，其中使用了术语多个或多元线性回归。在这种情况下，不是最适合数据的线，而是一个平面（两个特征）或一个超平面（两个以上特征）。在房价示例中，我们可以将房间数量和花园的长度添加到特征中。有一个著名的数据集，称为波士顿住房数据集，涉及 13 个特征。考虑到这 13 个特征，此处的回归问题是预测波士顿郊区的房屋中位数。

术语：特征也称为预测变量或自变量。标签也称为响应变量或因变量。

我们的第一个线性回归示例

我们将从一个简单的，人为的，线性回归问题开始设置场景。在此问题中，我们构建了一个人工数据集，首先在其中创建，因此知道了我们要拟合的线，但是随后我们将使用 TensorFlow 查找这条线。

我们执行以下操作-在导入和初始化之后，我们进入一个循环。在此循环内，我们计算总损失（定义为点的数据集y的均方误差）。然后，我们根据我们的权重和偏置来得出这种损失的导数。这将产生可用于调整权重和偏差以降低损失的值；这就是所谓的梯度下降。通过多次重复此循环（技术上称为周期），我们可以将损失降低到尽可能低的程度，并且可以使用训练有素的模型进行预测。

首先，我们导入所需的模块（回想一下，急切执行是默认的）：

 import tensorflow as tf
 import numpy as np

接下来，我们初始化重要的常量，如下所示：

n_examples = 1000 # number of training examples
training_steps = 1000 # number of steps we are going to train for
display_step = 100 # after multiples of this, we display the loss
learning_rate = 0.01 # multiplying factor on gradients
m, c = 6, -5 # gradient and y-intercept of our line, edit these for a different linear problem

给定weight和bias（m和c）的函数，用于计算预测的y：

def train_data(n, m, c):
    x = tf.random.normal([n]) # n values taken from a normal distribution,
    noise = tf.random.normal([n])# n values taken from a normal distribution
    y = m*x + c + noise # our scatter plot
    return x, y
def prediction(x, weight, bias):
    return weight*x + bias # our predicted (learned) m and c, expression is like y = m*x + c

用于获取初始或预测的权重和偏差并根据y计算均方损失（偏差）的函数：

def loss(x, y, weights, biases): 
    error = prediction(x, weights, biases) - y # how 'wrong' our predicted (learned) y is
    squared_error = tf.square(error)
    return tf.reduce_mean(input_tensor=squared_error) # overall mean of squared error, scalar value.

这就是 TensorFlow 发挥作用的地方。使用名为GradientTape()的类，我们可以编写一个函数来计算相对于weights和bias的损失的导数（梯度）：

def grad(x, y, weights, biases):
    with tf.GradientTape() as tape:
         loss_ = loss(x, y, weights, biases)
    return tape.gradient(loss, [weights, bias]) # direction and value of the gradient of our weights and biases

为训练循环设置回归器，并显示初始损失，如下所示：

x, y = train_data(n_examples,m,c) # our training values x and y
plt.scatter(x,y)
plt.xlabel("x")
plt.ylabel("y")
plt.title("Figure 1: Training Data")
W = tf.Variable(np.random.randn()) # initial, random, value for predicted weight (m)
B = tf.Variable(np.random.randn()) # initial, random, value for predicted bias (c)

print("Initial loss: {:.3f}".format(loss(x, y, W, B)))

输出如下所示：

接下来，我们的主要训练循环。这里的想法是根据我们的learning_rate来少量调整weights和bias，以将损失依次降低到我们最适合的线上收敛的点：

for step in range(training_steps): #iterate for each training step
     deltaW, deltaB = grad(x, y, W, B) # direction(sign) and value of the gradients of our loss 
   # with respect to our weights and bias
     change_W = deltaW * learning_rate # adjustment amount for weight
     change_B = deltaB * learning_rate # adjustment amount for bias
     W.assign_sub(change_W) # subract change_W from W
     B.assign_sub(change_B) # subract change_B from B
     if step==0 or step % display_step == 0:

   # print(deltaW.numpy(), deltaB.numpy()) # uncomment if you want to see the gradients

  print("Loss at step {:02d}: {:.6f}".format(step, loss(x, y, W, B)))

最终结果如下：

print("Final loss: {:.3f}".format(loss(x, y, W, B)))
print("W = {}, B = {}".format(W.numpy(), B.numpy()))
print("Compared with m = {:.3f}, c = {:.3f}".format(m, c)," of the original line")
xs = np.linspace(-3, 4, 50)
ys = W.numpy()*xs + B.numpy()
plt.scatter(xs,ys)
plt.xlabel("x")
plt.ylabel("y")
plt.title("Figure 2: Line of Best Fit")

您应该看到，发现W和B的值非常接近我们用于m和c的值，这是可以预期的：

波士顿住房数据集

接下来，我们将类似的回归技术应用于波士顿房屋数据集。

此模型与我们之前的仅具有一个特征的人工数据集之间的主要区别在于，波士顿房屋数据集是真实数据，具有 13 个特征。这是一个回归问题，因为我们认为房价（即标签）被不断估价。

同样，我们从导入开始，如下所示：

import tensorflow as tf
from sklearn.datasets import load_boston
from sklearn.preprocessing import scale
import numpy as np

我们的重要常数如下所示：

learning_rate = 0.01
epochs = 10000
display_epoch = epochs//20
n_train = 300
n_valid = 100

接下来，我们加载数据集并将其分为训练，验证和测试集。我们在训练集上进行训练，并在验证集上检查和微调我们的训练模型，以确保例如没有过拟合。然后，我们使用测试集进行最终精度测量，并查看我们的模型在完全看不见的数据上的表现如何。

注意scale方法。这用于将数据转换为均值为零且单位标准差为零的集合。 sklearn.preprocessing方法scale通过从特征集中的每个数据点减去平均值，然后将每个特征除以该特征集的标准差来实现此目的。

这样做是因为它有助于我们模型的收敛。所有特征也都转换为float32数据类型：

features, prices = load_boston(True)
 n_test = len(features) - n_train - n_valid

# Keep n_train samples for training
 train_features = tf.cast(scale(features[:n_train]), dtype=tf.float32) 
 train_prices = prices[:n_train]

# Keep n_valid samples for validation
 valid_features = tf.cast(scale(features[n_train:n_train+n_valid]), dtype=tf.float32)
 valid_prices = prices[n_train:n_train+n_valid]

# Keep remaining n_test data points as test set)
 test_features = tf.cast(scale(features[n_train+n_valid:n_train+n_valid+n_test]), dtype=tf.float32)

test_prices = prices[n_train + n_valid : n_train + n_valid + n_test]

接下来，我们具有与上一个示例相似的函数。首先，请注意我们现在使用的是更流行的路径，均方误差：

# A loss function using root mean-squared error
def loss(x, y, weights, bias):
  error = prediction(x, weights, bias) - y # how 'wrong' our predicted (learned) y is
  squared_error = tf.square(error)
  return tf.sqrt(tf.reduce_mean(input_tensor=squared_error)) # squre root of overall mean of squared error.

接下来，我们找到相对于weights和bias的损失梯度的方向和值：

# Find the derivative of loss with respect to weight and bias
def gradient(x, y, weights, bias):
  with tf.GradientTape() as tape:
    loss_value = loss(x, y, weights, bias)
  return tape.gradient(loss_value, [weights, bias])# direction and value of the gradient of our weight and bias

然后，我们查询设备，将初始权重设置为随机值，将bias设置为0，然后打印初始损失。

请注意，W现在是1向量的13，如下所示：

# Start with random values for W and B on the same batch of data
W = tf.Variable(tf.random.normal([13, 1],mean=0.0, stddev=1.0, dtype=tf.float32))
B = tf.Variable(tf.zeros(1) , dtype = tf.float32)
print(W,B)
print("Initial loss: {:.3f}".format(loss(train_features, train_prices,W, B)))

现在，进入我们的主要训练循环。这里的想法是根据我们的learning_rate将weights和bias进行少量调整，以将损失逐步降低至我们已经收敛到最佳拟合线的程度。如前所述，此技术称为梯度下降：

for e in range(epochs): #iterate for each training epoch
    deltaW, deltaB = gradient(train_features, train_prices, W, B) # direction (sign) and value of the gradient of our weight and bias
    change_W = deltaW * learning_rate # adjustment amount for weight
    change_B = deltaB * learning_rate # adjustment amount for bias
    W.assign_sub(change_W) # subract from W
    B.assign_sub(change_B) # subract from B
    if e==0 or e % display_epoch == 0:
        # print(deltaW.numpy(), deltaB.numpy()) # uncomment if you want to see the gradients
        print("Validation loss after epoch {:02d}: {:.3f}".format(e, loss(valid_features, valid_prices, W, B)))

最后，让我们将实际房价与其预测值进行比较，如下所示：

example_house = 69
y = test_prices[example_house]
y_pred = prediction(test_features,W.numpy(),B.numpy())[example_house]
print("Actual median house value",y," in $10K")
print("Predicted median house value ",y_pred.numpy()," in $10K")

逻辑回归（分类）

这类问题的名称令人迷惑，因为正如我们所看到的，回归意味着连续值标签，例如房屋的中位数价格或树的高度。

逻辑回归并非如此。当您遇到需要逻辑回归的问题时，这意味着标签为categorical；例如，零或一，True或False，是或否，猫或狗，或者它可以是两个以上的分类值；例如，红色，蓝色或绿色，或一，二，三，四或五，或给定花的类型。标签通常具有与之相关的概率；例如，P(cat = 0.92)，P(dog = 0.08)。因此，逻辑回归也称为分类。

在下一个示例中，我们将使用fashion_mnist数据集使用逻辑回归来预测时尚商品的类别。

这里有一些例子：

逻辑回归以预测项目类别

我们可以在 50,000 张图像上训练模型，在 10,000 张图像上进行验证，并在另外 10,000 张图像上进行测试。

首先，我们导入建立初始模型和对其进行训练所需的模块，并启用急切的执行：

import numpy as np
import tensorflow as tf
import keras
from tensorflow.python.keras.datasets import fashion_mnist #this is our dataset 
from keras.callbacks import ModelCheckpoint

tf.enable_eager_execution()

接下来，我们初始化重要的常量，如下所示：

# important constants
batch_size = 128
epochs = 20
n_classes = 10
learning_rate = 0.1
width = 28 # of our images
height = 28 # of our images

然后，我们将我们训练的时尚标签的indices与它们的标签相关联，以便稍后以图形方式打印出结果：

fashion_labels =

["Shirt/top","Trousers","Pullover","Dress","Coat","Sandal","Shirt","Sneaker","Bag","Ankle boot"]
 #indices 0       1         2          3      4         5       6       7       8        9

# Next, we load our fashion data set, 
# load the dataset
 (x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

然后，我们将每个图像中的每个整数值像素转换为float32并除以 255 以对其进行归一化：

# normalize the features for better training
 x_train = x_train.astype('float32') / 255.
 x_test = x_test.astype('float32') / 255.

x_train现在由60000，float32值组成，并且x_test保持10000相似的值。

然后，我们展平特征集，准备进行训练：

# flatten the feature set for use by the training algorithm
 x_train = x_train.reshape((60000, width * height))
 x_test = x_test.reshape((10000, width * height))

然后，我们将训练集x_train和y_train进一步分为训练集和验证集：

split = 50000
 #split training sets into training and validation sets
 (x_train, x_valid) = x_train[:split], x_train[split:]
 (y_train, y_valid) = y_train[:split], y_train[split:]

如果标签是单热编码的，那么许多机器学习算法效果最好，因此我们接下来要做。但请注意，我们会将产生的一束热张量转换回（单热）NumPy 数组，以备稍后由 Keras 使用：

# one hot encode the labels using TensorFLow.
 # then convert back to numpy as we cannot combine numpy
 # and tensors as input to keras later
 y_train_ohe = tf.one_hot(y_train, depth=n_classes).numpy()
 y_valid_ohe = tf.one_hot(y_valid, depth=n_classes).numpy()
 y_test_ohe = tf.one_hot(y_test, depth=n_classes).numpy()
 #or use tf.keras.utils.to_categorical(y_train,10)

这是一段代码，其中显示了一个介于零到九之间的值以及其单热编码版本：

# show difference between original label and one-hot-encoded label
i=5
print(y_train[i]) # 'ordinairy' number value of label at index i
print (tf.one_hot(y_train[i], depth=n_classes))# same value as a 1\. in correct position in an length 10 1D tensor
print(y_train_ohe[i]) # same value as a 1\. in correct position in an length 10 1D numpy array

在这里重要的是要注意索引i和存储在索引i的标签之间的差异。这是另一段代码，显示y_train中的前 10 个时尚项目：

# print sample fashion images.
# we have to reshape the image held in x_train back to width by height
# as we flattened it for training into width*height
import matplotlib.pyplot as plt
%matplotlib inline
_,image = plt.subplots(1,10,figsize=(8,1))

for i in range(10):
    image[i].imshow(np.reshape(x_train[i],(width, height)), cmap="Greys")
    print(fashion_labels[y_train[i]],sep='', end='')

现在，我们进入代码的重要且可概括的部分。 Google 建议，对于创建任何类型的机器学习模型，都可以通过将其分类为tf.keras.Model来创建模型。

这具有直接的优势，即我们可以在我们的子类化模型中使用tf.keras.Model的所有功能，包括编译和训练例程以及层功能，在后续的章节中，我们将详细介绍。

对于我们的逻辑回归示例，我们需要在子类中编写两个方法。首先，我们需要编写一个构造器，该构造器调用超类的构造器，以便正确创建模型。在这里，我们传入正在使用的类数（10），并在实例化模型以创建单个层时使用此构造器。我们还必须声明call方法，并使用该方法来编程在模型训练的正向传递过程中发生的情况。

稍后，当我们考虑具有前向和后向传递的神经网络时，我们将对这种情况进行更多说明。对于我们当前的目的，我们只需要知道在call方法中，我们采用输入的softmax来产生输出。 softmax函数的作用是获取一个向量（或张量），然后在其元素具有该向量最大值的位置上用几乎为 1 的值覆盖，在所有其他位置上使用几乎为零的值覆盖。这与单热编码很相似。请注意，在此方法中，由于softmax未为 GPU 实现，因此我们必须在 CPU 上强制执行：

# model definition (the canonical Google way)
class LogisticRegression(tf.keras.Model):

    def __init__(self, num_classes):
        super(LogisticRegression, self).__init__() # call the constructor of the parent class (Model)
        self.dense = tf.keras.layers.Dense(num_classes) #create an empty layer called dense with 10 elements.

    def call(self, inputs, training=None, mask=None): # required for our forward pass
        output = self.dense(inputs) # copy training inputs into our layer

        # softmax op does not exist on the gpu, so force execution on the CPU
        with tf.device('/cpu:0'):
            output = tf.nn.softmax(output) # softmax is near one for maximum value in output
                                           # and near zero for the other values.

        return output

现在，我们准备编译和训练我们的模型。

首先，我们确定可用的设备，然后使用它。然后，使用我们开发的类声明模型。声明要使用的优化程序后，我们将编译模型。我们使用的损失，分类交叉熵（也称为对数损失），通常用于逻辑回归，因为要求预测是概率。

优化器是一个选择和有效性的问题，有很多可用的方法。接下来是带有三个参数的model.compile调用。我们将很快看到，它为我们的训练模型做准备。

在撰写本文时，优化器的选择是有限的。 categorical_crossentropy是多标签逻辑回归问题的正态损失函数，'accuracy'度量是通常用于分类问题的度量。

请注意，接下来，我们必须使用样本大小仅为输入图像之一的model.call方法进行虚拟调用，否则model.fit调用将尝试将整个数据集加载到内存中以确定输入特征的大小。

接下来，我们建立一个ModelCheckpoint实例，该实例用于保存训练期间的最佳模型，然后使用model.fit调用训练模型。

找出model.compile和model.fit（以及所有其他 Python 或 TensorFlow 类或方法）的所有不同参数的最简单方法是在 Jupyter 笔记本中工作，然后按Shift + TAB + TAB，当光标位于相关类或方法调用上时。

从代码中可以看到，model.fit在训练时使用callbacks方法（由验证准确率确定）保存最佳模型，然后加载最佳模型。最后，我们在测试集上评估模型，如下所示：

# build the model
model = LogisticRegression(n_classes)
# compile the model
#optimiser = tf.train.GradientDescentOptimizer(learning_rate)
optimiser =tf.keras.optimizers.Adam() #not supported in eager execution mode.
model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['accuracy'], )

# TF Keras tries to use the entire dataset to determine the shape without this step when using .fit()
# So, use one sample of the provided input dataset size to determine input/output shapes for the model
dummy_x = tf.zeros((1, width * height))
model.call(dummy_x)

checkpointer = ModelCheckpoint(filepath="./model.weights.best.hdf5", verbose=2, save_best_only=True, save_weights_only=True)
    # train the model
model.fit(x_train, y_train_ohe, batch_size=batch_size, epochs=epochs,
              validation_data=(x_valid, y_valid_ohe), callbacks=[checkpointer], verbose=2)
    #load model with the best validation accuracy
model.load_weights("./model.weights.best.hdf5")

    # evaluate the model on the test set
scores = model.evaluate(x_test, y_test_ohe, batch_size, verbose=2)
print("Final test loss and accuracy :", scores)
y_predictions = model.predict(x_test)

最后，对于我们的逻辑回归示例，我们有一些代码可以检查一个时尚的测试项目，以查看其预测是否准确：

    # example of one predicted versus one true fashion label
index = 42
index_predicted = np.argmax(y_predictions[index]) # largest label probability
index_true = np.argmax(y_test_ohe[index]) # pick out index of element with a 1 in it
print("When prediction is ",index_predicted)
print("ie. predicted label is", fashion_labels[index_predicted])
print("True label is ",fashion_labels[index_true])

print ("\n\nPredicted V (True) fashion labels, green is correct, red is wrong")
size = 12 # i.e. 12 random numbers chosen out of x_test.shape[0] =1000, we do not replace them
fig = plt.figure(figsize=(15,3))
rows = 3
cols = 4

检查 12 个预测的随机样本，如下所示：

for i, index in enumerate(np.random.choice(x_test.shape[0], size = size, replace = False)):
          axis = fig.add_subplot(rows,cols,i+1, xticks=[], yticks=[]) # position i+1 in grid with rows rows and cols columns
          axis.imshow(x_test[index].reshape(width,height), cmap="Greys")
          index_predicted = np.argmax(y_predictions[index])
          index_true = np.argmax(y_test_ohe[index])
          axis.set_title(("{} ({})").format(fashion_labels[index_predicted],fashion_labels[index_true]),
                                                  color=("green" if index_predicted==index_true else "red"))

以下屏幕快照显示了真实与（预测）时尚标签：

时尚标签

到此结束我们对逻辑回归的研究。现在，我们将看看另一种非常强大的监督学习技术，即 K 最近邻。

K 最近邻（KNN）

KNN 背后的想法相对简单。给定新的特定数据点的值，请查看该点的 KNN，并根据该 k 个邻居的标签为该点分配标签，其中k是算法的参数。

在这种情况下，没有这样构造的模型。该算法仅查看数据集中新点与所有其他数据点之间的所有距离，接下来，我们将使用由三种类型的鸢尾花组成的著名数据集：iris setosa， iris virginica和iris versicolor。对于这些标签中的每一个，特征都是花瓣长度，花瓣宽度，萼片长度和萼片宽度。有关显示此数据集的图表，请参见这里。

有 150 个数据点（每个数据点都包含前面提到的四个测量值）和 150 个相关标签。我们将它们分为 120 个训练数据点和 30 个测试数据点。

首先，我们有通常的导入，如下所示：

import numpy as np
from sklearn import datasets
import tensorflow as tf
# and we next load our data:

iris = datasets.load_iris()
x = np.array([i for i in iris.data])
y = np.array(iris.target)

x.shape, y.shape

然后，我们将花标签放在列表中以备后用，如下所示：

flower_labels = ["iris setosa", "iris virginica", "iris versicolor"]

现在是时候对标签进行一次热编码了。 np.eye返回一个二维数组，在对角线上有一个，默认为主对角线。然后用y进行索引为我们提供了所需的y单热编码：

#one hot encoding, another method
y = np.eye(len(set(y)))[y]
y[0:10]

接下来，我们将特征规格化为零到一，如下所示：

x = (x - x.min(axis=0)) / (x.max(axis=0) - x.min(axis=0))

为了使算法正常工作，我们必须使用一组随机的训练特征。接下来，我们还要通过从数据集的整个范围中删除训练指标来设置测试指标：

# create indices for the train-test split
np.random.seed(42)
split = 0.8 # this makes 120 train and 30 test features
train_indices = np.random.choice(len(x), round(len(x) * split), replace=False)
test_indices =np.array(list(set(range(len(x))) - set(train_indices)))

我们现在可以创建我们的训练和测试特征，以及它们的相关标签：

# the train-test split
 train_x = x[train_indices]
 test_x = x[test_indices]
 train_y = y[train_indices]
 test_y = y[test_indices]

现在，我们将k的值设置为5，如下所示：

k = 5

接下来，在 Jupyter 笔记本中，我们具有预测测试数据点类别的函数。我们将逐行对此进行细分。

首先是我们的distance函数。执行此函数后，可变距离包含我们 120 个训练点与 30 个测试点之间的所有（曼哈顿）距离；也就是说，由 30 行乘 120 列组成的数组-曼哈顿距离，有时也称为城市街区距离，是x[1], x[2]的两个数据点向量的值之差的绝对值；即|x[1] - x[2]|。如果需要的话（如本例所示），将使用各个特征差异的总和。

tf.expand在test_x上增加了一个额外的维数，以便在减法发生之前，可以通过广播使两个数组扩展以使其与减法兼容。由于x具有四个特征，并且reduce_sum超过axis=2，因此结果是我们 30 个测试点和 120 个训练点之间的距离的 30 行。所以我们的prediction函数是：

def prediction(train_x, test_x, train_y,k):
    print(test_x)
    d0 = tf.expand_dims(test_x, axis =1)
    d1 = tf.subtract(train_x, d0)
    d2 = tf.abs(d1)
    distances = tf.reduce_sum(input_tensor=d2, axis=2)
    print(distances)
    # or
    # distances = tf.reduce_sum(tf.abs(tf.subtract(train_x, tf.expand_dims(test_x, axis =1))), axis=2)

然后，我们使用tf.nn.top_k返回 KNN 的索引作为其第二个返回值。请注意，此函数的第一个返回值是距离本身的值，我们不需要这些距离，因此我们将其“扔掉”（带下划线）：

_, top_k_indices = tf.nn.top_k(tf.negative(distances), k=k)

接下来，我们gather，即使用索引作为切片，找到并返回与我们最近的邻居的索引相关联的所有训练标签：

top_k_labels = tf.gather(train_y, top_k_indices)

之后，我们对预测进行汇总，如下所示：

predictions_sum = tf.reduce_sum(input_tensor=top_k_labels, axis=1)

最后，我们通过找到最大值的索引来返回预测的标签：

pred = tf.argmax(input=predictions_sum, axis=1)

返回结果预测pred。作为参考，下面是一个完整的函数：

def prediction(train_x, test_x, train_y,k):
     distances = tf.reduce_sum(tf.abs(tf.subtract(train_x, tf.expand_dims(test_x, axis =1))), axis=2)
     _, top_k_indices = tf.nn.top_k(tf.negative(distances), k=k)
     top_k_labels = tf.gather(train_y, top_k_indices)
     predictions_sum = tf.reduce_sum(top_k_labels, axis=1)
     pred = tf.argmax(predictions_sum, axis=1)
     return pred

打印在此函数中出现的各种张量的形状可能非常有启发性。

代码的最后一部分很简单。我们将花朵标签的预测与实际标签压缩（连接）在一起，然后我们可以遍历它们，打印出来并求出正确性总计，然后将精度打印为测试集中数据点数量的百分比：

i, total = 0 , 0
results = zip(prediction(train_x, test_x, train_y,k), test_y) #concatenate predicted label with actual label
print("Predicted Actual")
print("--------- ------")
for pred, actual in results:
    print(i, flower_labels[pred.numpy()],"\t",flower_labels[np.argmax(actual)] )
    if pred.numpy() == np.argmax(actual):
        total += 1
    i += 1
accuracy = round(total/len(test_x),3)*100
print("Accuracy = ",accuracy,"%")

如果您自己输入代码，或运行提供的笔记本电脑，则将看到准确率为 96.7%，只有一个iris versicolor被误分类为iris virginica（测试索引为 25）。

总结

在本章中，我们看到了在涉及线性回归的两种情况下使用 TensorFlow 的示例。其中将特征映射到具有连续值的已知标签，从而可以对看不见的特征进行预测。我们还看到了逻辑回归的一个示例，更好地描述为分类，其中将特征映射到分类标签，再次允许对看不见的特征进行预测。最后，我们研究了用于分类的 KNN 算法。

我们现在将在第 5 章“将 TensorFlow 2 用于无监督学习”，继续进行无监督学习，在该过程中，特征和标签之间没有初始映射，并且 TensorFlow 的任务是发现特征之间的关系。

五、TensorFlow 2 和无监督学习

在本章中，我们将研究使用 TensorFlow 2 进行无监督学习。无监督学习的目的是在数据中发现以前未标记数据点的模式或关系；因此，我们只有特征。这与监督式学习形成对比，在监督式学习中，我们既提供了特征及其标签，又希望预测以前未见过的新特征的标签。在无监督学习中，我们想找出我们的数据是否存在基础结构。例如，可以在不事先了解其结构的情况下以任何方式对其进行分组或组织吗？这被称为聚类。例如，亚马逊在其推荐系统中使用无监督学习来建议您以书本方式可能购买的商品，例如，通过识别以前购买的商品类别来提出建议。

无监督学习的另一种用途是在数据压缩技术中，其中数据中的模式可以用更少的内存表示，而不会损害数据的结构或完整性。在本章中，我们将研究两个自编码器，以及如何将它们用于压缩数据以及如何消除图像中的噪声。

在本章中，我们将深入探讨自编码器。

自编码器

自编码是一种使用 ANN 实现的数据压缩和解压缩算法。由于它是学习算法的无监督形式，因此我们知道只需要未标记的数据。它的工作方式是通过强制输入通过瓶颈（即，宽度小于原始输入的一层或多层）来生成输入的压缩版本。要重建输入（即解压缩），我们可以逆向处理。我们使用反向传播在中间层中创建输入的表示形式，并重新创建输入作为表示形式的输出。

自编码是有损的，也就是说，与原始输入相比，解压缩后的输出将变差。这与 MP3 和 JPEG 压缩格式相似。

自编码是特定于数据的，也就是说，只有与它们经过训练的数据相似的数据才可以正确压缩。例如，训练有素的自编码器在汽车图片上的表现会很差，这是因为其学习到的特征将是汽车特有的。

一个简单的自编码器

让我们编写一个非常简单的自编码器，该编码器仅使用一层 ANN。首先，像往常一样，让我们从导入开始，如下所示：

from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping
from tensorflow.keras import regularizers

import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

预处理数据

然后，我们加载数据。对于此应用，我们将使用fashion_mnist数据集，该数据集旨在替代著名的 MNIST 数据集。本节末尾有这些图像的示例。每个数据项（图像中的像素）都是 0 到 255 之间的无符号整数，因此我们首先将其转换为float32，然后将其缩放为零至一的范围，以使其适合以后的学习过程：

(x_train, _), (x_test, _) = fashion_mnist.load_data() # we don't need the labels
x_train = x_train.astype('float32') / 255\. # normalize
x_test = x_test.astype('float32') / 255.

print(x_train.shape) # shape of input
print(x_test.shape)

这将给出形状，如以下代码所示：

(60000, 28, 28)
(10000, 28, 28)

接下来，我们将图像展平，因为我们要将其馈送到一维的密集层：

x_train = x_train.reshape(( x_train.shape[0], np.prod(x_train.shape[1:]))) #flatten
x_test = x_test.reshape((x_test.shape[0], np.prod(x_test.shape[1:])))

print(x_train.shape)
print(x_test.shape)

现在的形状如下：

(60000, 784)
(10000, 784)

分配所需的尺寸，如以下代码所示：

image_dim = 784 # this is the size of our input image, 784
encoding_dim = 32 # this is the length of our encoded items.Compression of factor=784/32=24.5

接下来，我们构建单层编码器和自编码器模型，如下所示：

input_image = Input(shape=(image_dim, )) # the input placeholder

encoded_image = Dense(encoding_dim, activation='relu',
 activity_regularizer=regularizers.l1(10e-5))(input_image)# "encoded" is the encoded representation of the input

encoder = Model(input_image, encoded_image)

decoded_image = Dense(image_dim, activation='sigmoid')(encoded_image)# "decoded" is the lossy reconstruction of the input

autoencoder = Model(input_image, decoded_image) # this model maps an input to its reconstruction

然后，我们构造解码器模型，如下所示：

encoded_input = Input(shape=(encoding_dim,))# create a placeholder for an encoded (32-dimensional) input

decoder_layer = autoencoder.layers[-1]# retrieve the last layer of the autoencoder model

decoder = Model(encoded_input, decoder_layer(encoded_input))# create the decoder model

接下来，我们可以编译我们的自编码器。由于数据几乎是二元的，因此选择了binary_crossentropy损失，因此，我们可以最小化每个像素的二元交叉熵：

autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

我们可以定义两个有用的检查点。第一个在每个周期后保存模型。如果save_best_only=True，根据监视的数量（验证损失），最新的最佳模型将不会被覆盖。

其签名如下：

keras.callbacks.ModelCheckpoint(filepath, monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto', period=1)

我们声明如下：

checkpointer1 = ModelCheckpoint(filepath= 'model.weights.best.hdf5' , verbose =2, save_best_only = True)

当监视器中的更改（验证损失）小于min_delta时，即小于min_delta的更改不算改善时，第二个检查点停止训练。这对于patience周期必定会发生，然后停止训练。其签名如下：

EarlyStopping(monitor='val_loss', min_delta=0, patience=0, verbose=0, mode='auto', baseline=None)

我们声明如下：

checkpointer2 = EarlyStopping(monitor='val_loss', min_delta=0.0005, patience=2, verbose=2, mode='auto')

训练

训练运行使用.fit方法，其签名如下：

autoencoder.fit(x=None, y=None, batch_size=None, epochs=1, verbose=1, callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None, max_queue_size=10, workers=1, use_multiprocessing=False, **kwargs)

香草训练运行如下。注意，我们如何传递x和y的x_train，因为我们要使用x输入并尝试在输出（y=x）上再现它。请注意以下代码：

epochs = 50
autoencoder.fit(x_train, x_train, epochs=epochs, batch_size=256, verbose=2, shuffle=True, validation_data=(x_test, x_test))

这之后是一些代码，用于压缩和解压缩（编码和解码）test数据。请记住，encoder和decoder都是模型，所以我们可以调用该方法。在它们上使用predict方法生成其输出：

encoded_images = encoder.predict(x_test) #compress
decoded_images = decoder.predict(encoded_images) #decompress

我们还可以使用ModelCheckpoint检查点，在这种情况下，我们的.fit调用如下：

epochs = 50
autoencoder.fit(x_train, x_train, epochs=epochs, batch_size=256, verbose=2, callbacks=[checkpointer1], shuffle=True, validation_data=(x_test, x_test))

我们还需要按如下方式加载保存的权重，以获取最佳模型：

autoencoder.load_weights('model.weights.best.hdf5' )
encoded_images = encoder.predict(x_test)
decoded_images = decoder.predict(encoded_images)

以类似的方式，我们可以使用EarlyStopping，在这种情况下，.fit调用如下：

epochs = 50
autoencoder.fit(x_train, x_train, epochs=epochs, batch_size=256, verbose=2, callbacks=[checkpointer2], shuffle=True, validation_data=(x_test, x_test))

显示结果

下面是一些代码，可以在屏幕上前后打印一些内容。我们正在使用以下代码：

plt.subplot(nrows, ncols, index, **kwargs)

子图在具有nrows行和ncols列的网格上的index位置处，index位置从左上角的一个位置开始，并向右增加以定位时尚项目：

number_of_items = 12 # how many items we will display
plt.figure(figsize=(20, 4))
for i in range(number_of_items):
    # display items before compression 
    graph = plt.subplot(2, number_of_items, i + 1)
    plt.imshow(x_test[i].reshape(28, 28))
    plt.gray()
    graph.get_xaxis().set_visible(False)
    graph.get_yaxis().set_visible(False)

    # display items after decompression
    graph = plt.subplot(2, number_of_items, i + 1 + number_of_items)
    plt.imshow(decoded_images[i].reshape(28, 28))
    plt.gray()
    graph.get_xaxis().set_visible(False)
    graph.get_yaxis().set_visible(False)
plt.show()

压缩前的结果如下：

减压后，结果如下所示：

因此，压缩/解压缩的有损性很明显。作为一种可能的健全性检查，如果我们使用encoding_dim = 768（与输入相同数量的隐藏层节点），我们将得到以下结果：

这可能与原始版本略有不同。接下来，我们将看一下自编码的应用。

自编码器应用–去噪

自编码器的一个很好的应用是去噪：去除图像（噪声）中小的随机伪像的过程。我们将用多层卷积码代替简单的一层自编码器。

我们将人造噪声添加到我们的时装中，然后将其消除。我们还将借此机会研究使用 TensorBoard 来检查一些网络训练指标。

构建模型

我们最初的导入包括我们的卷积网络的导入。

注意，我们不必显式地使用 Keras，因为它是 TensorFlow 本身的模块，如以下代码所示：

from tensorflow.keras.layers import Input, Dense, Conv2D, MaxPooling2D, UpSampling2D
from tensorflow.keras.models import Model
from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras.callbacks import TensorBoard
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

预处理数据

首先，加载图像数据；我们不需要标签，因为我们只关注图像本身：

(train_x, _), (test_x, _) = fashion_mnist.load_data()

接下来，像以前一样，将图像数据点转换为零至一范围内的float32值：

train_x = train_x.astype('float32') / 255.
test_x = test_x.astype('float32') / 255.

检查形状，如以下代码所示：

print(train_x.shape)
print(test_x.shape)

它给出以下结果：

(60000, 28, 28) (10000, 28, 28)

输入卷积层需要以下形状：

train_x = np.reshape(train_x, (len(train_x), 28, 28, 1)) 
test_x = np.reshape(test_x, (len(test_x), 28, 28, 1))

在这里，形状中的一个是用于灰度通道；以下是形状的完整性检查：

print(train_x.shape)
print(test_x.shape)

得到以下结果：

(60000, 28, 28, 1) (10000, 28, 28, 1)

为了在图像中引入一些随机噪声，我们在训练和测试集中添加了np.random.normal（即高斯）值数组。所需的签名如下：

numpy.random.normal(loc=0.0, scale=1.0, size=None)

在这里，loc是分布的中心，scale是标准差，size是输出形状。因此，我们使用以下代码：

noise = 0.5
train_x_noisy = train_x + noise * np.random.normal(loc=0.0, scale=1.0, size=train_x.shape) 
test_x_noisy = test_x + noise * np.random.normal(loc=0.0, scale=1.0, size=test_x.shape)

由于这可能会使我们的值超出零至一的范围，因此我们将值裁剪到该范围：

train_x_noisy = np.clip(train_x_noisy, 0., 1.)
test_x_noisy = np.clip(test_x_noisy, 0., 1.)

噪声图像

下面的代码从测试集中打印出一些嘈杂的图像。注意如何调整图像的显示形状：

plt.figure(figsize=(20, 2))
for i in range(number_of_items):
    display = plt.subplot(1, number_of_items,i+1)
    plt.imshow(test_x_noisy[i].reshape(28, 28))
    plt.gray()
    display.get_xaxis().set_visible(False)
    display.get_yaxis().set_visible(False)
plt.show()

这是结果，如以下屏幕快照所示：

因此很明显，原始图像与噪点几乎没有区别。

创建编码层

接下来，我们创建编码和解码层。我们将使用 Keras 函数式 API 风格来设置模型。我们从一个占位符开始，以（下一个）卷积层所需的格式输入：

input_image = Input(shape=(28, 28, 1))

接下来，我们有一个卷积层。回忆卷积层的签名：

Conv2D(filters, kernel_size, strides=(1, 1), padding='valid', data_format=None, dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, **kwargs)

我们将主要使用默认值；接下来是我们的第一个Conv2D。注意(3,3)的内核大小；这是 Keras 应用于输入图像的滑动窗口的大小。还记得padding='same'表示图像用 0 左右填充，因此卷积的输入和输出层是内核（过滤器）以其中心“面板”开始于图像中第一个像素时的大小。。默认步幅(1, 1)表示滑动窗口一次从图像的左侧到末尾水平移动一个像素，然后向下移动一个像素，依此类推。接下来，我们将研究每个层的形状，如下所示：

im = Conv2D(filters=32, kernel_size=(3, 3), activation='relu', padding='same')(input_image)
print(x.shape)

得到以下结果：

(?, 28, 28, 32)

?代表输入项目的数量。

接下来，我们有一个MaxPooling2D层。回想一下，在此情况下，此操作将在图像上移动(2, 2)大小的滑动窗口，并采用在每个窗口中找到的最大值。其签名如下：

MaxPooling2D(pool_size=(2, 2), strides=None, padding='valid', data_format=None, **kwargs)

这是下采样的示例，因为生成的图像尺寸减小了。我们将使用以下代码：

im = MaxPooling2D((2, 2), padding='same')(im)
print(im.shape)

得到以下结果：

(?, 14, 14, 32)

其余的编码层如下：

im = Conv2D(32, (3, 3), activation='relu', padding='same')(im)
print(im.shape)
encoded = MaxPooling2D((2, 2), padding='same')(im)
print(encoded.shape)

所有这些都结束了编码。

创建解码层

为了进行解码，我们反转了该过程，并使用上采样层UpSampling2D代替了最大池化层。上采样层分别按大小[0]和大小[1]复制数据的行和列。

因此，在这种情况下，会取消最大合并层的效果，尽管会损失细粒度。签名如下：

 UpSampling2D(size=(2, 2), data_format=None, **kwargs)

我们使用以下内容：

im = UpSampling2D((2, 2))(im)

以下是解码层：

im = Conv2D(32, (3, 3), activation='relu', padding='same')(encoded)
print(im.shape)
im = UpSampling2D((2, 2))(im)
print(im.shape)
im = Conv2D(32, (3, 3), activation='relu', padding='same')(im)
print(im.shape)
im = UpSampling2D((2, 2))(im)
print(im.shape)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(im)
print(decoded.shape)

得到以下结果：

(?, 7, 7, 32) (?, 14, 14, 32) (?, 14, 14, 32) (?, 28, 28, 32) (?, 28, 28, 1)

因此，您可以看到解码层如何逆转编码层的过程。

模型摘要

这是我们模型的摘要：

看看我们如何得出参数数字很有启发性。

公式是参数数量 = 过滤器数量 x 内核大小 x 上一层的深度 + 过滤器数量（用于偏差）：

input_1：这是一个占位符，没有可训练的参数
conv2d：过滤器数量= 32，内核大小= 3 * 3 = 9，上一层的深度= 1，因此32 * 9 + 32 = 320
max_pooling2d：最大池化层没有可训练的参数。
conv2d_1：过滤器数= 32，内核大小= 3 * 3 = 9，上一层的深度= 14，因此32 * 9 * 32 + 32 = 9,248
conv_2d_2，conv2d_3：与conv2d_1相同
conv2d_4：1 * 9 * 32 + 1 = 289

模型实例化，编译和训练

接下来，我们用输入层和输出层实例化模型，然后使用.compile方法设置模型以进行训练：

autoencoder = Model(inputs=input_img, outputs=decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

现在，我们准备训练模型以尝试恢复时尚商品的图像。请注意，我们已经为 TensorBoard 提供了回调，因此我们可以看一下一些训练指标。 Keras TensorBoard 签名如下：

keras.callbacks.TensorBoard(
    ["log_dir='./logs'", 'histogram_freq=0', 'batch_size=32', 'write_graph=True', 'write_grads=False', 'write_images=False', 'embeddings_freq=0', 'embeddings_layer_names=None', 'embeddings_metadata=None', 'embeddings_data=None', "update_freq='epoch'"],
)

我们将主要使用默认值，如下所示：

tb = [TensorBoard(log_dir='./tmp/tb', write_graph=True)]

接下来，我们使用.fit()方法训练自编码器。以下代码是其签名：

fit(x=None, y=None, batch_size=None, epochs=1, verbose=1, callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None, validation_freq=1)

注意我们如何将x_train_noisy用于特征（输入），并将x_train用于标签（输出）：

epochs=100
batch_size=128

autoencoder.fit(x_train_noisy, x_train, epochs=epochs,batch_size=batch_size, shuffle=True, validation_data=(x_test_noisy, x_test), callbacks=tb)

去噪图像

现在，通过解码以下第一行中的所有测试集，然后循环遍历一个固定数字（number_of_items）并显示它们，来对测试集中的一些噪点图像进行去噪。请注意，在显示每个图像（im）之前，需要对其进行重塑：

decoded_images = autoencoder.predict(test_noisy_x)
number_of_items = 10
plt.figure(figsize=(20, 2))
for item in range(number_of_items):
    display = plt.subplot(1, number_of_items,item+1)
    im = decoded_images[item].reshape(28, 28)
   plt.imshow(im, cmap="gray")
    display.get_xaxis().set_visible(False)
    display.get_yaxis().set_visible(False)
plt.show()

我们得到以下结果：

考虑到图像最初模糊的程度，降噪器已经做了合理的尝试来恢复图像。

TensorBoard 输出

要查看 TensorBoard 输出，请在命令行上使用以下命令：

tensorboard  --logdir=./tmp/tb

然后，您需要将浏览器指向http://localhost:6006。

下图显示了作为训练和验证时间的函数（x轴）的损失（y轴）：

下图显示了训练损失：

验证损失如下图所示：

到此结束我们对自编码器的研究。

总结

在本章中，我们研究了自编码器在无监督学习中的两种应用：首先用于压缩数据，其次用于降噪，这意味着从图像中去除噪声。

在下一章中，我们将研究如何在图像处理和识别中使用神经网络。

六、使用 TensorFlow 2 识别图像

本章分为两部分，但我们将同时学习使用 TensorFlow 进行图像分类。

在本章中，我们将涵盖以下主要主题：

QuickDraw – 使用 TensorFlow 进行图像分类
使用 TensorFlow 的 CIFAR 10 图像分类

在第一部分中，我们将使用在前几章中学到的技术开发 TensorFlow 2 模型以进行图像识别，尤其是第 2 章， “Keras，TensorFlow 2 的高级 API”。这将使我们能够看到如何使用 TensorFlow 2 将所有相关技术结合在一起来创建，训练和评估完整的模型。我们将利用 Google 提供的 QuickDraw 图片数据集可帮助您解决此问题。

QuickDraw – 使用 TensorFlow 进行图像分类

我们将使用从 Google QuickDraw 拍摄的图像数据集。这是一个公开的开放源代码，它包含 345 个类别的 5000 万张图像的数据集，所有这些图像都是由参与挑战的 1500 万名用户在 20 秒或更短的时间内绘制的。我们将训练 10 个类别的 10,000 张图像，其中一些被选择为相似图像，以便我们可以测试模型的区分能力。您可以在这个页面上查看这些图像的示例。这些图片有多种格式，请参见这个页面中的所有格式。

在这里，我们将使用已存储为.npy文件的图像。 .npy文件的公共数据集托管在这个页面上。从这里可以一次下载一组。要使用不同的图像运行此示例，请从数据目录中删除图像文件，然后将所需的图像下载到存储库中的同一目录中。该程序从文件名中读取标签。

在本节中，我们将涵盖以下主题：

采集数据
预处理数据
建立模型
训练和测试模型
保存，加载和重新测试模型
使用.h5格式保存和加载 NumPy 图像数据
加载预训练的模型
使用预训练的模型

我们将逐步开发和呈现代码片段。这些代码段通过螺栓连接在一起，成为存储库中的完整程序。

采集数据

我们将需要从 Google 下载数据。您可以将数据下载到一个空目录data_files。

转到这里并将 10 个数据集下载到data_files文件夹中。以下是将要下载的文件的示例：

'alarm_clock.npy', 'broom.npy', 'ant.npy', 'bee.npy', 'cell_phone.npy', 'baseball.npy', 'dolphin.npy', 'crocodile.npy', 'aircraft_carrier.npy', 'asparagus.npy'

您将下载的文件名称前会带有多余的位，例如full_numpy_bitmap_alarm clock.npy。

为了使这些内容更简洁，请删除开头的位，然后重命名文件，以使文件名在我们的示例中变为alarm_clock.npy。对所有 10 个文件执行此操作。

建立环境

首先，我们需要导入依赖项：

import tensorflow as tf
import keras
import numpy as np
from sklearn.model_selection import train_test_split
from os import walk

您可能需要运行pip install sklearn。接下来，我们将建立一些常量供以后使用：

batch_size = 128
img_rows, img_cols = 28, 28 # image dimensions

接下来，我们将使用os.walk方法从data_files文件夹中收集数据集的文件名：

请注意，文件名收集在列表变量filenames中。

data_path = "data_files/" 
for (dirpath, dirnames, filenames) in walk(data_path):
     pass # filenames accumulate in list 'filenames'
print(filenames)

对于我们的示例，文件名（对应于label类别）如下：

['alarm_clock.npy', 'broom.npy', 'ant.npy', 'bee.npy', 'cell_phone.npy', 'baseball.npy', 'dolphin.npy', 'crocodile.npy', 'aircraft_carrier.npy', 'asparagus.npy']

要使用不同的图像运行该示例，只需将 10 个不同的文件下载到data文件夹中。

接下来，我们将定义模型所需的更多值。图像总数（num_images）可在此处更改：

num_images = 1000000 ### was 100000, reduce this number if memory issues.
num_files = len(filenames) # we have 10 files
images_per_category = num_images//num_files
seed = np.random.randint(1, 10e7)
i=0
print(images_per_category)

预处理数据

接下来是将图像加载到内存中的代码。我们将遍历文件，并在获取文件路径的值之后，加载该文件或一组图像（x）。然后，将x转换为浮点数，然后除以 255，将其设置为 0 到 1 的范围。之后，我们为该组图像x创建一个数字标签y。对于第一组图像，该值为 0，对于下一组图像，此值为 1，一直到最后一组图像的 9，由变量i控制。然后，我们将集合x和y切片，以将图像和标签放回x和y中。之后，我们将x和y累积到x_all和y_all中，如果这是它们第一次进入循环（即i=0），则创建这两个新列表，并将x和[ 如果这不是他们第一次通过循环（即i>0），则将它们移到y上。当此循环终止时，x_all和y_all将分别包含带有标签的图像：

i=0
for file in filenames:
     file_path = data_path + file
     x = np.load(file_path)
     x = x.astype('float32') ##normalize images
     x /= 255.0
     y = [i] * len(x) # create numeric label for this image

     x = x[:images_per_category] # get the sample of images 
     y = y[:images_per_category] # get the sample of labels 

     if i == 0: 
         x_all = x
         y_all = y
     else: 
         x_all = np.concatenate((x,x_all), axis=0)
         y_all = np.concatenate((y,y_all), axis=0)
     i += 1

之后，我们将使用sklearn.model_selection模块中的train_test_split方法将x_all和y_all分为训练和测试集，并以 80/20 的训练/测试进行分割：

#split data arrays into train and test segments
x_train, x_test, y_train, y_test = train_test_split(x_all, y_all, test_size=0.2, random_state=42)

由于我们将使用卷积神经网络（convNet）对快速抽奖进行分类！图像，接下来要做的是将x_train和x_test重塑为28 x 28 x 1图像，它们开始出现时的样子，其中前两个维度是图像的高度和宽度（以像素为单位），第三个维度是每个像素的灰度。我们还将建立input_shape，并将其用于convNet的第一层：

x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1) 
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1) 
input_shape = (img_rows, img_cols, 1)

此后，我们将根据convNet的要求对y_train和y_test标签进行一次热编码：

y_train = tf.keras.utils.to_categorical(y_train, num_files) 
y_test = tf.keras.utils.to_categorical(y_test, num_files)

接下来，我们将训练和测试x集进一步与验证集一起分成 90/10 的更小的测试集：

x_train, x_valid, y_train, y_valid = train_test_split(x_train, y_train, test_size=0.1, random_state=42)

建立模型

现在，我们准备创建convNet模型。

有两个卷积层（具有 ReLU 激活），每个卷积层都插入最大池化和丢弃层，然后是一个将卷积层的输出展平为一维的层。在这些层之后是密集的（完全连接的）一维层（同样具有 ReLU 激活），最后的丢弃层，最后是具有 10 个单元的 softmax 层。 softmax 层中每个输出单元的激活给出了该图像是 10 张图像之一的可能性。这种 ANN 架构有足够的实验空间。

然后使用分类交叉熵的损失来编译模型：

model = tf.keras.Sequential()

model.add(tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape)) 
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2))) 
model.add(tf.keras.layers.Dropout(0.25))

model.add(tf.keras.layers.Conv2D(64, (3, 3), activation='relu')) 
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2))) 
model.add(tf.keras.layers.Dropout(0.25))

model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(128, activation='relu')) 
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(num_files, activation='softmax')) 

print("Compiling...........")
model.compile(loss=tf.keras.losses.categorical_crossentropy,
 optimizer=tf.keras.optimizers.Adadelta(),
 metrics=['accuracy'])

训练和测试模型

现在，我们可以使用fit方法训练模型。注意验证集的使用，它不同于训练集。 callbacks列表还可以用于诸如保存最佳模型或在学习停止时终止训练（如果在所有周期完成之前发生这种情况）的操作。有关详细信息，请参见这里：

epochs=25
callbacks=[tf.keras.callbacks.TensorBoard(logdir = "./tb_log_dir")]
model.fit( x_train, y_train,
 batch_size=batch_size,
 epochs=epochs,
 callbacks=callbacks,
 verbose=1,
 validation_data=(x_valid, y_valid)
)

根据模型所处的硬件配置，如果该模型在 GPU 上运行，或者在 CPU 上运行缓慢，则训练速度将非常快。为了说明的目的，可以减少周期的数量。在 NVIDIA GTX 1080 GPU 上，时间/周期约为 38 秒。

为了确定模型的准确率，按以下方法使用evaluate方法。请注意，测试集用于此评估：

score = model.evaluate(x_test, y_test, verbose=1)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

我们还可以对测试图像进行随机采样，并使用以下代码查看模型的效果。从文件名中检索标签并打印以供参考，然后再打印成对的预测标签与实际标签：

import os
labels = [os.path.splitext(file)[0] for file in filenames]
print(labels)
print("\nFor each pair in the following, the first label is predicted, second is actual\n")
for i in range(20):
  t = np.random.randint(len(x_test) )
  x1= x_test[t]
  x1 = x1.reshape(1,28,28,1) 
  p = model.predict(x1)
  print("-------------------------")
  print(labels[np.argmax(p)])
  print(labels[np.argmax(y_test[t])])
  print("-------------------------")

TensorBoard 回调

TensorBoard 是用于训练模型的可视化工具。 TensorBoard 回调的完整签名如下：

tf.keras.callbacks.TensorBoard(log_dir='./logs', histogram_freq=0, batch_size=32, write_graph=True, write_grads=False, write_images=False, embeddings_freq=0, embeddings_layer_names=None, embeddings_metadata=None, embeddings_data=None, update_freq='epoch')

在这个页面上有所有这些参数的非常清晰而详细的描述。 TensorBoard 可以从命令行调用，如下所示：

tensorboard --logdir=/full_path_to_your_logs

例如，我们可以使用tensorboard --logdir=./logs作为默认目录。将histogram_freq设置为非 0 的值会导致在写入数据时epochs之间有明显的停顿，并且仅在需要模型所有层的激活和权重直方图时才应使用。

保存，加载和重新测试模型

现在，我们可以保存模型并将其删除：

model.save("./QDrawModel.h5")
del model

然后，我们需要重新加载它：

from tensorflow.keras.models import load_model
model = load_model('./QDrawModel.h5')

最后，我们必须对其进行总结以表明我们已经成功地重新加载了保存的模型：

model.summary()

最后，我们打印出 20 种时尚商品的测试样本，以确保网络正常运行：

print("For each pair, first is predicted, second is actual")
for i in range(20):
  t = np.random.randint(len(x_test))
  x1= x_test[t]
  x1 = x1.reshape(1,28,28,1) 
  p = model.predict(x1)
  print("-------------------------")
  print(labels[np.argmax(p)])
  print(labels[np.argmax(y_test[t])])
  print("-------------------------")

使用`.h5`格式保存和加载 NumPy 图像数据

如果需要保存先前程序中的训练和测试数据，则可以使用以下代码：

import h5py
with h5py.File('x_train.h5', 'w') as hf:
  hf.create_dataset("QuickDraw", data=x_train)
with h5py.File('y_train.h5', 'w') as hf:
  hf.create_dataset("QuickDraw", data=y_train)
with h5py.File('x_test.h5', 'w') as hf:
  hf.create_dataset("QuickDraw", data=x_test)
with h5py.File('y_test.h5', 'w') as hf:
  hf.create_dataset("QuickDraw", data=y_test)

请注意，加载数据集时传递给h5py.File()方法的数据集名称必须与使用h5py.File.create_dataset()方法保存数据集时使用的名称相同：

import h5py
hf = h5py.File('x_train.h5', 'r')
x_train = np.array(hf["QuickDraw"][:])
hf = h5py.File('x_test.h5', 'r')
x_test = np.array(hf["QuickDraw"][:])
hf = h5py.File('y_train.h5', 'r')
y_train = np.array(hf["QuickDraw"][:])
hf = h5py.File('y_test.h5', 'r')
y_test = np.array(hf["QuickDraw"][:])

使用预训练的模型进行加载和推断

经过训练的模型'QDrawModel.h5'已运行 25 个周期，并且达到了 90% 以上的测试准确率，已保存在存储库中。您已经看过此代码；为方便起见，在此复制。

因此，重申一下，您可以使用以下代码加载此经过训练的模型：

from keras.models import load_model
model = load_model('./QDrawModel.h5')
model.summary()

同样，可以使用以下代码加载训练/测试数据：

import h5py
import numpy as np
hf = h5py.File('x_train.h5', 'r')
x_train = np.array(hf["QuickDraw"][:])
hf = h5py.File('x_test.h5', 'r')
x_test = np.array(hf["QuickDraw"][:])
hf = h5py.File('y_train.h5', 'r')
y_train = np.array(hf["QuickDraw"][:])
hf = h5py.File('y_test.h5', 'r')
y_test = np.array(hf["QuickDraw"][:])

再次重申，我们可以使用以下代码获取标签（我们看到的标签对应于图像文件名）：

from os import walk
import os
data_path = "data_files/" # folder for image files
for (dirpath, dirnames, filenames) in walk(data_path):
  pass # filenames accumulate in list 'filenames'
labels = [os.path.splitext(file)[0] for file in filenames]
print(labels)

然后，可以通过以下代码使用我们加载的模型进行推理。请注意，如果有必要，这还将演示如何强制在 CPU 上进行计算：

import tensorflow as tf
with tf.device('/cpu:0'):
     for i in range(10):
         t = np.random.randint(len(x_test) )
         x1= x_test[t]
         x1 = x1.reshape(1,28,28,1) 
         p = model.predict(x1)
         y1 = y_test[t]
         print("-------------------------")
         print(labels[np.argmax([p])])
         print(labels[y1]) 
         print("-------------------------")

使用 TensorFlow 的 CIFAR 10 图像分类

在第二部分中，我们将研究训练模型以识别 CIFAR10 图像数据集中的图像。这将使我们有机会举例说明顺序模型创建的稍有不同的风格。

介绍

具有 10 个类别的 CIFAR 10 图像数据集是 8000 万个微型图像数据集的标记子集。这些图像由 Alex Krizhevsky，Vinod Nair 和 Geoffrey Hinton 收集。有关此数据集的完整详细信息，请访问这里。

在 10 个类别中，总共有 60,000 个32 x 32彩色图像，包括 50,000 个训练图像和 10,000 个测试图像。

类别如下：

labels = ['airplane','automobile','bird','cat','deer','dog','frog','horse','ship','truck']

以下是这些类别的图像的一些示例：

应用

首先，以下是设置所需的导入：

import tensorflow as tf
import numpy as np
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D,BatchNormalization
from tensorflow.keras import regularizers
from tensorflow.keras.models import load_model
import os
from matplotlib import pyplot as plt
from PIL import Image

您可能需要运行pip install PIL。

接下来，我们将在其余的代码中使用一组值：

batch_size = 32
number_of_classes = 10
epochs = 100 # for testing; use epochs = 100 for training ~30 secs/epoch on CPU
weight_decay = 1e-4
save_dir = os.path.join(os.getcwd(), 'saved_models')
model_name = 'keras_cifar10_trained_model.h5'
number_of_images = 5

然后，我们可以加载并查看数据的形状：

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

这将产生预期的输出：

x_train shape: (50000, 32, 32, 3) 50000 train samples 10000 test samples

现在，我们有了一个显示图像子集的函数。这将在网格中显示它们：

def show_images(images):
    plt.figure(1)
    image_index = 0
    for i in range(0,number_of_images):
        for j in range(0,number_of_images):
            plt.subplot2grid((number_of_images, number_of_images),(i,j))
            plt.imshow(Image.fromarray(images[image_index]))
            image_index +=1
            plt.gca().axes.get_yaxis().set_visible(False)
            plt.gca().axes.get_xaxis().set_visible(False) 
    plt.show()

让我们执行以下函数的调用：

show_images(x_test[:number_of_images*number_of_images])

这给我们以下输出：

请注意，图像在原始数据集中故意很小。

现在，我们可以将图像投射到浮动对象上，并将其范围更改为[0, 1]：

x_train = x_train.astype('float32')/255
x_test = x_test.astype('float32')/255

如果将标签作为一站式向量提供，则最好了解它们，因此，我们现在将这样做：

y_train = tf.keras.utils.to_categorical(y_train, number_of_classes) # or use tf.one_hot()
y_test = tf.keras.utils.to_categorical(y_test, number_of_classes)

接下来，我们可以指定模型的架构。请注意，与之前的操作相比，我们使用的激活指定方法略有不同：

model.add(Activation('elu'))

elu激活函数代表指数线性单元。在这个页面中有很好的描述。

注意，我们正在使用具有卷积层，BatchNormalization和 MaxPooling 层的顺序模型。倒数第二层使结构变平，最后一层使用 softmax 激活，因此我们预测的类将显示为具有最高激活的输出神经元：

model = Sequential()
model.add(Conv2D(32, (3,3), padding='same', kernel_regularizer=regularizers.l2(weight_decay), input_shape=x_train.shape[1:]))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(Conv2D(32, (3,3), padding='same', kernel_regularizer=regularizers.l2(weight_decay)))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))

model.add(Conv2D(64, (3,3), padding='same', kernel_regularizer=regularizers.l2(weight_decay)))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(Conv2D(64, (3,3), padding='same', kernel_regularizer=regularizers.l2(weight_decay)))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.3))

model.add(Conv2D(128, (3,3), padding='same', kernel_regularizer=regularizers.l2(weight_decay)))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(Conv2D(128, (3,3), padding='same', kernel_regularizer=regularizers.l2(weight_decay)))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.4))

model.add(Flatten())
model.add(Dense(number_of_classes, activation='softmax'))

接下来，我们必须定义我们的优化器； RMSprop。 decay是每次更新后学习率降低的速度：

opt = tf.keras.optimizers.RMSprop(lr=0.0001, decay=decay)

现在，我们将编译我们的模型：

model.compile(loss='categorical_crossentropy', optimizer=opt,metrics=['accuracy'])

为了帮助模型学习和推广，我们将实现实时数据增强。

这是通过ImageDataGenerator()函数完成的。其签名如下：

keras.preprocessing.image.ImageDataGenerator(featurewise_center=False, samplewise_center=False, featurewise_std_normalization=False, samplewise_std_normalization=False, zca_whitening=False, zca_epsilon=1e-06, rotation_range=0, width_shift_range=0.0, height_shift_range=0.0, brightness_range=None, shear_range=0.0, zoom_range=0.0, channel_shift_range=0.0, fill_mode='nearest', cval=0.0, horizontal_flip=False, vertical_flip=False, rescale=None, preprocessing_function=None, data_format=None, validation_split=0.0, dtype=None)

但是，我们将主要使用前面签名中所示的默认值。数据将分批循环。

这是对图像应用各种转换，例如水平翻转，高度偏移，宽度偏移，旋转等。我们将使用以下代码进行演示：

 # This will do preprocessing and real-time data augmentation:
datagen = ImageDataGenerator(
 rotation_range=10, # randomly rotate images in the range 0 to 10 degrees

 width_shift_range=0.1,# randomly shift images horizontally (fraction of total width)

 height_shift_range=0.1,# randomly shift images vertically (fraction of total height)

 horizontal_flip=True, # randomly flip images

 validation_split=0.1)

我们还将建立一个回调，以便如果模型的准确率停止提高，训练将停止，并且将为模型恢复最佳权重。

EarlyStopping回调的签名如下：

keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=0, verbose=0, mode='auto', baseline=None, restore_best_weights=False)

Monitor是要跟踪的数量，min_delta是被算作改进的跟踪数量的最小变化，patience是没有变化的周期数，之后将停止训练，而mode 是['min'，'max'，'auto']之一，它分别确定所跟踪的值是应该减少还是增加，或者分别从其名称中确定。 baseline是要达到的跟踪值的值，而restore_best_weights确定是否应恢复最佳周期的模型权重（如果使用false，则使用最新权重）。

我们将有以下代码：

callback = tf.keras.callbacks.EarlyStopping(monitor='accuracy', min_delta=0, patience=1, verbose=1,mode='max', restore_best_weights=True)

现在，我们可以训练模型了。 fit.generator()函数用于根据flow()生成器批量显示的数据训练模型。可以在这个页面中找到更多详细信息：

model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size), epochs=epochs, callbacks=[callback])

让我们保存模型，以便以后可以重新加载它：

if not os.path.isdir(save_dir):
  os.makedirs(save_dir)

model_path = os.path.join(save_dir, model_name)
model.save(model_path)
print('Model saved at: %s ' % model_path)

现在让我们重新加载它：

model1 = tf.keras.models.load_model(model_path)
model1.summary()

最后，让我们看看我们的模型在测试集上的表现如何。我们需要重新加载数据，因为它已被损坏：

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
show_images(x_test[:num_images*num_images])
x_test = x_test.astype('float32')/255

这里又是标签：

labels = ['airplane','automobile','bird','cat','deer','dog','frog','horse','ship','truck']

最后，我们可以检查预测的标签：

indices = tf.argmax(input=model1.predict(x_test[:number_of_images*number_of_images]),axis=1)
i = 0
print('Learned \t True')
print('=====================')
for index in indices:
    print(labels[index], '\t', labels[y_test[i][0]])
    i+=1

在一次运行中，提前停止开始了 43 个周期，测试准确率为 81.4%，并且前 25 张图像的测试结果如下：

Learned  True
=====================
cat      cat
ship     ship
ship     ship
ship     airplane
frog     frog
frog     frog
automobile       automobile
frog     frog
cat      cat
automobile       automobile
airplane         airplane
truck    truck
dog      dog
horse    horse
truck    truck
ship     ship
dog      dog
horse    horse
ship     ship
frog     frog
horse    horse
airplane         airplane
deer     deer
truck    truck
deer     dog

可以通过进一步调整模型架构和超参数（例如学习率）来提高此准确率。

到此结束了我们对 CIFAR 10 图像数据集的了解。

总结

本章分为两个部分。在第一部分中，我们研究了来自 Google 的数据集 QuickDraw。我们介绍了它，然后看到了如何将其加载到内存中。这很简单，因为 Google 善意地将数据集作为一组.npy文件提供，这些文件可以直接加载到 NumPy 数组中。接下来，我们将数据分为训练，验证和测试集。创建ConvNet模型后，我们对数据进行了训练并进行了测试。在测试中，经过 25 个周期，该模型的准确率刚好超过 90%，我们注意到，通过进一步调整模型，可能会改善这一精度。最后，我们看到了如何保存经过训练的模型，然后如何重新加载它并将其用于进一步的推断。

在第二部分中，我们训练了一个模型来识别 CIFAR 10 图像数据集中的图像。该数据集包含 10 类图像，是用于测试体系结构和进行超参数研究的流行数据集。我们的准确率刚刚超过 81%。

在下一章中，我们将研究神经风格迁移，其中涉及获取一个图像的内容并将第二个图像的风格强加于其上，以生成第三个混合图像。

七、TensorFlow 2 和神经风格迁移

神经风格迁移是一种使用神经网络将一幅图像的艺术风格施加到另一幅图像的内容上的技术，因此最终得到的是两种图像的混合体。您开始使用的图像称为内容图像。您在内容图像上加上风格的图像称为风格参考图像。 Google 将转换后的图像称为输入图像，这似乎令人困惑（输入是从两个不同来源获取输入的意思）；让我们将其称为混合图像。因此，混合图像是具有风格参考图像风格的内容图像。

神经风格迁移通过定义两个损失函数来工作-一个描述两个图像的内容之间的差异，另一个描述两个图像之间的风格差异。

为了开始该过程，用内容图像初始化混合图像。然后，使用反向传播将内容和内容以及混合图像的风格之间的差异（也称为损失或距离）最小化。这将创建具有风格参考图像风格和内容图像内容的新图像（即混合图像）。

此过程中涉及一些技术-使用函数式 API，使用预训练的模型及其特征图以及使用自定义训练循环以最小化loss函数。我们将在下面的代码中满足所有这些要求。

要充分利用该技术，有两个先决条件-Gatys 等人在 2015 年发表的原始论文虽非必要，但确实可以解释该技术。技术非常好，因此非常有必要了解如何通过梯度下降来减少损失。

我们将使用 VGG19 架构中的特征层（已在著名的 ImageNet 数据集上进行了训练，其中包含 1400 万张图像和 1000 个类别）。

我们将检查的代码源自 Google 提供的代码；它使用了急切的执行程序，我们当然不需要编写代码，因为它是 TensorFlow 2 中的默认代码。该代码在 GPU 上运行得更快，但在耐心等待的情况下仍可以在 CPU 上合理的时间内进行训练。

在本章中，我们将介绍以下主题：

配置导入
预处理图像
查看原始图像
使用 VGG19 架构
建立模型
计算损失
执行风格迁移

配置导入

要对您自己的图像使用此实现，您需要将这些图像保存在下载的存储库的./tmp/nst目录中，然后编辑content_path和style_path路径，如以下代码所示。

与往常一样，我们要做的第一件事是导入（并配置）所需的模块：

import numpy as np
from PIL import Image
import time
import functools

import matplotlib.pyplot as plt
import matplotlib as mpl
# set things up for images display
mpl.rcParams['figure.figsize'] = (10,10)
mpl.rcParams['axes.grid'] = False

您可能需要pip install pillow，这是 PIL 的分支。接下来是 TensorFlow 模块：

import tensorflow as tf

from tensorflow.keras.preprocessing import image as kp_image
from tensorflow.keras import models
from tensorflow.keras import losses
from tensorflow.keras import layers
from tensorflow.keras import backend as K
from tensorflow.keras import optimizers

这是我们最初将使用的两个图像：

content_path = './tmp/nst/elephant.jpg'#Andrew Shiva / Wikipedia / CC BY-SA 4.0
style_path = './tmp/nst/zebra.jpg' # zebra:Yathin S Krishnappa, https://creativecommons.org/licenses/by-sa/4.0/deed.en

预处理图像

下一个函数只需稍作预处理即可加载图像。 Image.open()是所谓的惰性操作。该函数找到文件并将其打开以进行读取，但是实际上直到从您尝试对其进行处理或加载数据以来，才从文件中读取图像数据。下一组三行会调整图像的大小，以便任一方向的最大尺寸为 512（max_dimension）像素。例如，如果图像为1,024 x 768，则scale将为 0.5（512 / 1,024），并且这将应用于图像的两个尺寸，从而将图像大小调整为512 x 384。Image.ANTIALIAS参数保留最佳图像质量。接下来，使用img_to_array()调用（tensorflow.keras.preprocessing的方法）将 PIL 图像转换为 NumPy 数组。

最后，为了与以后的使用兼容，图像需要沿零轴的批次尺寸（由于图像是彩色的，因此共给出了四个尺寸）。这可以通过调用np.expand_dims()实现：

def load_image(path_to_image):
    max_dimension = 512
    image = Image.open(path_to_image)
    longest_side = max(image.size)
    scale = max_dimension/longest_side
    image = image.resize((round(image.size[0]*scale), round(image.size[1]*scale)), Image.ANTIALIAS)

    image = kp_image.img_to_array(image) # keras preprocessing

    # Broadcast the image array so that it has a batch dimension on axis 0
    image = np.expand_dims(image, axis=0)
    return image

下一个函数显示已由load_image()预处理过的图像。由于我们不需要额外的尺寸来显示，因此可以通过调用np.squeeze()将其删除。之后，根据对plt.imshow()的调用（后面带有可选标题）的要求，将图像数据中的值转换为无符号的 8 位整数：

def show_image(image, title=None):
  # Remove the batch dimension from the image
    image1 = np.squeeze(image, axis=0)
  # Normalize the image for display 
    image1 = image1.astype('uint8')
    plt.imshow(image1)
    if title is not None:
        plt.title(title)
    plt.imshow(image1)

查看原始图像

接下来，我们使用对前面两个函数的调用来显示内容和风格图像，请记住图像像素必须是无符号 8 位整数类型。 plt.subplot(1,2,1)函数意味着在位置 1 使用一排两列的网格； plt.subplot(1,2,2)表示在位置 2 使用一排两列的网格：

channel_means = [103.939, 116.779, 123.68] # means of the BGR channels, for VGG processing

plt.figure(figsize=(10,10))

content_image = load_image(content_path).astype('uint8')
style_image = load_image(style_path).astype('uint8')

plt.subplot(1, 2, 1)
show_image(content_image, 'Content Image')

plt.subplot(1, 2, 2)
show_image(style_image, 'Style Image')

plt.show()

输出显示在以下屏幕截图中：

接下来是加载图像的函数。正如我们将要提到的那样，在经过训练的vgg19模型中，我们需要相应地预处理图像数据。

tf.keras模块为我们提供了执行此操作的方法。这里的预处理将我们的 RGB 彩色图像翻转为 BGR：

def load_and_process_image(path_to_image):
  image = load_image(path_to_image)
  image = tf.keras.applications.vgg19.preprocess_input(image)
  return image

为了显示我们的图像，我们需要一个函数来获取用load_and_process_image处理的数据，并将图像数据返回到其原始状态。这必须手动完成。

首先，我们检查图像的尺寸是否正确，如果不是 3 或 4，则会引发错误。

预处理从每个通道中减去其平均值，因此通道的平均值为零。减去的值来自 ImageNet 分析，其中 BGR 通道的均值分别为103.939，116.779和123.68。

因此，接下来，我们将这些值添加回 BGR（彩色）通道以恢复原始值，然后将 BGR 序列翻转回 RGB。

最后，对于此函数，我们需要确保我们的值是无符号的 8 位整数，其值在 0 到 255 之间；这可以通过np.clip()函数实现：

def deprocess_image(processed_image):
  im = processed_image.copy()
  if len(im.shape) == 4:
    im = np.squeeze(im, 0)
  assert len(im.shape) == 3, ("Input to deprocess image must be an image of "
                             "dimension [1, height, width, channel] or [height, width, channel]")
  if len(im.shape) != 3:
    raise ValueError("Invalid input to deprocessing image")

  # the inverse of the preprocessing step
  im[:, :, 0] += channel_means[0] # these are the means subtracted by the preprocessing step
  im[:, :, 1] += channel_means[1]
  im[:, :, 2] += channel_means[2]
  im= im[:, :, ::-1] # channel last

  im = np.clip(im, 0, 255).astype('uint8')
  return im

使用 VGG19 架构

了解下一个代码片段的最好方法是查看 VGG19 架构。这是一个好地方（大约位于页面的一半）。

在这里，您将看到 VGG19 是一个相当简单的体系结构，由卷积层的块组成，每个块的末尾都有一个最大池化层。

对于内容层，我们使用block5中的第二个卷积层。之所以使用这个最高的块，是因为较早的块具有更能代表单个像素的特征图。网络中的高层会根据对象及其在输入图像中的排列来捕获高级内容，但不会限制重建的实际精确像素值。

对于风格层，我们将在每个层块中使用第一个卷积层，即block1_conv1到block5_conv5。

然后保存内容和风格层的长度，以供以后使用：

# The feature maps are obtained from this content layer
content_layers = ['block5_conv2']

# Style layers we need
style_layers = ['block1_conv1',
                'block2_conv1',
                'block3_conv1',
                'block4_conv1',
                'block5_conv1'
               ]

number_of_content_layers = len(content_layers)
number_of_style_layers = len(style_layers)

建立模型

现在，接下来是一系列函数，这些函数最终导致执行风格迁移（run_style_transfer()）的主要函数。

此序列中的第一个函数get_model()创建我们将要使用的模型。

它首先加载训练后的vgg_model（已在ImageNet上进行训练），而没有其分类层（include_top=False）。接下来，它冻结加载的模型（vgg_model.trainable = False）。

然后，使用列表推导获取风格和内容层的输出值，该列表推导遍历我们在上一节中指定的层的名称。

然后将这些输出值与 VGG 输入一起使用，以创建可以访问 VGG 层的新模型，即get_model()返回 Keras 模型，该模型输出已训练的 VGG19 模型的风格和内容中间层。不必使用顶层，因为这是 VGG19 中的最终分类层，我们将不再使用。

我们将创建一个输出图像，以使输出和相应特征层上的输入/风格之间的距离（差异）最小化：

def get_model():
 vgg_model = tf.keras.applications.vgg19.VGG19(include_top=False, weights='imagenet')
 vgg_model.trainable = False

 # Acquire the output layers corresponding to the style layers and the content layers 
 style_outputs = [vgg_model.get_layer(name).output for name in style_layers]
 content_outputs = [vgg_model.get_layer(name).output for name in content_layers]
 model_outputs = style_outputs + content_outputs

# Build model 
 return models.Model(vgg_model.input, model_outputs)

计算损失

现在，我们需要两个图像的内容和风格之间的损失。我们将使用均方损失如下。请注意，image1 - image2中的减法是两个图像数组之间逐元素的。此减法有效，因为图像已在load_image中调整为相同大小：

def rms_loss(image1,image2):
    loss = tf.reduce_mean(input_tensor=tf.square(image1 - image2))
    return loss

接下来，我们定义content_loss函数。这只是函数签名中content和target之间的均方差：

def content_loss(content, target):
  return rms_loss(content, target)

风格损失是根据称为 Gram 矩阵的数量定义的。 Gram 矩阵（也称为度量）是风格矩阵及其自身的转置的点积。因为这意味着图像矩阵的每一列都与每一行相乘，所以我们可以认为原始表示中包含的空间信息已经分配。结果是有关图像的非本地化信息，例如纹理，形状和权重，即其风格。

产生gram_matrix的代码如下：

def gram_matrix(input_tensor):
  channels = int(input_tensor.shape[-1]) # channels is last dimension
  tensor = tf.reshape(input_tensor, [-1, channels]) # Make the image channels first 
  number_of_channels = tf.shape(input=tensor)[0] # number of channels
  gram = tf.matmul(tensor, tensor, transpose_a=True) # produce tensorT*tensor
  return gram / tf.cast(number_of_channels, tf.float32) # scaled by the number of channels.

因此，风格损失（其中gram_target将是混合图像上风格激活的 Gram 矩阵）如下：

def style_loss(style, gram_target):
  gram_style = gram_matrix(style)
  return rms_loss(gram_style, gram_target)

接下来，我们通过获取content_image和style_image并将它们馈入模型来找到content_features和style_features表示形式。此代码分为两个块，一个用于content_features，另一个用于style_features。对于内容块，我们加载图像，在其上调用我们的模型，最后，提取先前分配的特征层。 style_features的代码是相同的，除了我们首先加载风格图像：

def get_feature_representations(model, content_path, style_path):
  #Function to compute content and style feature representations.

  content_image = load_and_process_image(content_path)
  content_outputs = model(content_image)
  #content_features = [content_layer[0] for content_layer in content_outputs[:number_of_content_layers]]
  content_features = [content_layer[0] for content_layer in content_outputs[number_of_style_layers:]]

  style_image = load_and_process_image(style_path)
  style_outputs = model(style_image)
  style_features = [style_layer[0] for style_layer in style_outputs[:number_of_style_layers]]

  return style_features, content_features

接下来，我们需要计算总损失。查看该方法的签名，我们可以看到，首先，我们传入模型（可以访问 VGG19 的中间层）。接下来，进入loss_weights，它们是每个损失函数（content_weight，style_weight和总变化权重）的每个贡献的权重。然后，我们有了初始图像，即我们正在通过优化过程更新的图像。接下来是gram_style_features和content_features，分别对应于我们正在使用的风格层和内容层。

首先从方法签名中复制风格和content_weight。然后，在我们的初始图像上调用模型。我们的模型可以直接调用，因为我们使用的是急切执行，如我们所见，这是 TensorFlow 2 中的默认执行。此调用返回所有模型输出值。

然后，我们有两个类似的块，一个块用于内容，一个块用于风格。对于第一个（内容）块，获取我们所需层中的内容和风格表示。接下来，我们累积来自所有内容损失层的内容损失，其中每一层的贡献均被加权。

第二个块与第一个块相似，不同之处在于，这里我们累积来自所有风格损失层的风格损失，其中每个损失层的每个贡献均被平均加权。

最后，该函数返回总损失，风格损失和内容损失，如以下代码所示：

def compute_total_loss(model, loss_weights, init_image, gram_style_features, content_features):

   style_weight, content_weight = loss_weights
   model_outputs = model(init_image)

   content_score = 0
   content_output_features = model_outputs[number_of_style_layers:] 
   weight_per_content_layer = 1.0 / float(number_of_content_layers)
   for target_content, comb_content in zip(content_features, content_output_features):
      content_score += weight_per_content_layer*content_loss(comb_content[0], target_content)
   content_score *= content_weight

   style_score = 0
   style_output_features = model_outputs[:number_of_style_layers]
   weight_per_style_layer = 1.0 / float(number_of_style_layers)
   for target_style, comb_style in zip(gram_style_features, style_output_features):
     style_score += weight_per_style_layer *style_loss(comb_style[0], target_style)
   style_score ***= style_weight

 total_loss = style_score + content_score
 return total_loss, style_score, content_score

接下来，我们有一个计算梯度的函数：

def compute_grads(config):
   with tf.GradientTape() as tape: 
      all_loss = compute_total_loss(**config)
    # Compute gradients wrt input image
  total_loss = all_loss[0]
  return tape.gradient(total_loss, config['init_image']), all_loss

import IPython.display

执行风格迁移

执行style_transfer的函数很长，因此我们将分节介绍。其签名如下：

def run_style_transfer(content_path,
                       style_path,
                       number_of_iterations=1000,
                       content_weight=1e3,
                       style_weight=1e-2):

由于我们实际上不想训练模型中的任何层，因此只需使用如前所述的层的输出值即可。我们相应地设置其可训练属性：

model = get_model() 
for layer in model.layers:
  layer.trainable = False

接下来，我们使用先前定义的函数从模型的各层获得style_features和content_features表示形式：

style_features, content_features = get_feature_representations(model, content_path, style_path)

gram_style_features使用style_features上的循环，如下所示：

gram_style_features = [gram_matrix(style_feature) for style_feature in style_features]

接下来，我们通过加载内容图像并将其转换为张量，来初始化将成为算法输出的图像，即混合图像（也称为 Pastiche 图像）：

initial_image = load_and_process_image(content_path)
initial_image = tf.Variable(initial_image, dtype=tf.float32)

下一行定义所需的AdamOptimizer函数：

optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=5, beta1=0.99, epsilon=1e-1)

我们将继续保存best_image和best_loss，因此请初始化变量以存储它们：

 best_loss, best_image = float('inf'), None

接下来，我们设置将被传递到compute_grads()函数的配置值字典：

loss_weights = (style_weight, content_weight)
  config = {
      'model': model,
      'loss_weights': loss_weights,
      'init_image': initial_image,
      'gram_style_features': gram_style_features,
      'content_features': content_features
  }

这是显示常量：

number_rows = 2
number_cols = 5
display_interval = number_of_iterations/(number_rows*number_cols)

接下来，我们计算图像边界，如下所示：

norm_means = np.array(channel_means)
minimum_vals = -norm_means
maximum_vals = 255 - norm_means

此列表将存储混合图像：

images = []

接下来，我们开始主图像处理循环，如下所示：

for i in range(number_of_iterations):

因此，接下来我们计算梯度，计算损失，调用优化器以应用梯度，并将图像裁剪到我们先前计算的边界：

   grads, all_loss = compute_grads(config)
   loss, style_score, content_score = all_loss
   optimizer.apply_gradients([(grads, initial_image)])
   clipped_image = tf.clip_by_value(initial_image, minimum_vals, maximum_vals)
   initial_image.assign(clipped_image)

我们将继续保存best_loss和best_image，因此下一步：

 if loss < best_loss:
 # Update best loss and best image from total loss. 
   best_loss = loss
   best_image = deprocess_image(initial_image.numpy()

然后，我们有条件地保存混合图像（总共 10 张图像），并将其与训练指标一起显示：

if i % display_interval== 0:
  # Use the .numpy() method to get the numpy image array, needs eager execution
  plot_image = initial_image.numpy()
  plot_image = deprocess_image(plot_image)
  images.append(plot_image)
  IPython.display.clear_output(wait=True)
  IPython.display.display_png(Image.fromarray(plot_image))
  print('Iteration: {}'.format(i)) 
  print('Total loss: {:.4e}, ' 
        'style loss: {:.4e}, '
        'content loss: {:.4e} '
        .format(loss, style_score, content_score))

最后，对于此函数，我们显示所有best_image和best_loss：

 IPython.display.clear_output(wait=True)
 plt.figure(figsize=(14,4))
 for i,image in enumerate(images):
   plt.subplot(number_rows,number_cols,i+1)
   plt.imshow(image)
   plt.xticks([])
   plt.yticks([])

 return best_image, best_loss

接下来，我们调用前面的函数来获取best_image和best_loss（还将显示 10 张图像）：

的代码如下：

best_image, best_loss = run_style_transfer(content_path, style_path, number_of_iterations=100)
Image.fromarray(best_image)

以下是best_image的显示：

最终展示

最后，我们有一个函数，它与best_image一起显示内容和风格图像：

def show_results(best_image, content_path, style_path, show_large_final=True):
 plt.figure(figsize=(10, 5))
  content = load_image(content_path)
  style = load_image(style_path)

  plt.subplot(1, 2, 1)
  show_image(content, 'Content Image')

  plt.subplot(1, 2, 2)
  show_image(style, 'Style Image')

  if show_large_final:
    plt.figure(figsize=(10, 10))

    plt.imshow(best_image)
    plt.title('Output Image')
    plt.show()

接下来是对该函数的调用，如下所示：

show_results(best_image, content_path, style_path)

总结

到此结束我们对神经风格迁移的研究。我们看到了如何拍摄内容图像和风格图像并生成混合图像。我们使用训练有素的 VGG19 模型中的层来完成此任务。

在下一章中，我们将研究循环神经网络。这些网络可以处理顺序的输入值，并且输入值和输出值中的一个或两个具有可变长度。

八、TensorFlow 2 和循环神经网络

包括卷积网络（CNN）在内的许多神经网络体系结构的主要缺点之一是它们不允许处理顺序数据。换句话说，一个完整的特征（例如图像）必须一次全部呈现。因此，输入是固定长度张量，而输出必须是固定长度张量。先前特征的输出值也不会以任何方式影响当前特征。同样，所有输入值（和输出值）都应视为彼此独立。例如，在我们的fashion_mnist模型（第 4 章“使用 TensorFlow 2的监督机器学习”）中，每个输入时尚图像都独立于并且完全不了解先前图像。

循环神经网络（RNN）克服了这个问题，并使许多新的应用成为可能。

在本章中，我们将研究以下主题：

神经网络处理模式
循环架构
RNN 的应用
我们的 RNN 示例的代码
建立并实例化我们的模型
训练和使用我们的模型

神经网络处理模式

下图说明了各种神经网络处理模式：

矩形代表张量，箭头代表函数，红色是输入，蓝色是输出，绿色是张量状态。

从左到右，我们有以下内容：

普通前馈网络，固定尺寸的输入和固定尺寸的输出，例如图像分类
序列输出，例如，拍摄一张图像并输出一组用于标识图像中项目的单词的图像字幕
序列输入，例如情感识别（如我们的 IMDb 应用），其中句子被分为正面情感或负面情感
序列输入和输出，例如机器翻译，其中 RNN 接受英语句子并将其翻译为法语输出
逐帧同步输入和输出的序列，例如，类似于视频分类的两者

循环架构

因此，需要一种新的体系结构来处理顺序到达的数据，并且其输入值和输出值中的一个或两个具有可变长度，例如，语言翻译应用中句子中的单词。在这种情况下，模型的输入和输出都具有不同的长度，就像之前的第四种模式一样。同样，为了预测给定当前词的后续词，还需要知道先前的词。这种新的神经网络架构称为 RNN，专门设计用于处理顺序数据。

出现术语循环是因为此类模型对序列的每个元素执行相同的计算，其中每个输出都依赖于先前的输出。从理论上讲，每个输出都取决于所有先前的输出项，但实际上，RNN 仅限于回顾少量步骤。这种布置等效于具有存储器的 RNN，该存储器可以利用先前的计算结果。

RNN 用于顺序输入值，例如时间序列，音频，视频，语音，文本，财务和天气数据。它们在消费产品中的使用示例包括 Apple 的 Siri，Google 翻译和亚马逊的 Alexa。

将传统前馈网络与 RNN 进行比较的示意图如下：

每个 RNN 单元上的回送代表记忆。前馈网络无法区分序列中的项目顺序，而 RNN 从根本上取决于项目的顺序。例如，假设前馈网络接收到输入字符串aardvark：到输入为d时，网络已经忘记了先前的输入值为a，a和r，因此无法预测下一个va。另一方面，在给定相同输入的情况下，循环网络“记住”先前的输入值为a，a和r，因此有可能根据其先前的训练来预测va是下一个。

RNN 的每个单独项目到网络的输入称为时间步长。因此，例如，在字符级 RNN 中，每个字符的输入都是一个时间步。下图说明了 RNN 的展开。

时间步长从t = 0开始，输入为X₀，一直到时间步长t = t，输入为Xₜ，相应的输出值为h₀至hₜ，如下图所示：

展开式循环神经网络

RNN 在称为沿时间反向传播（BPTT）的过程中通过反向传播进行训练。在此可以想象 RNN 的展开（也称为展开）会创建一系列神经网络，并且会针对每个时间步长计算误差并将其合并，以便可以使用反向传播更新网络中的权重。例如，为了计算梯度，从而计算误差，在时间步t = 6时，我们将向后传播五个步，并对梯度求和。但是，在尝试学习长期依赖关系时（即在相距很远的时间步之间），这种方法存在问题，因为梯度可能变得太小而使学习变得不可能或非常缓慢，或者它们可能变得太大并淹没了网络。这被称为消失/爆炸梯度问题，并且已经发明了各种修改方法来解决它，包括长短期记忆（LSTM）网络和门控循环单元（GRU s），我们将在以后使用。

下图显示了有关展开（或展开）的更多详细信息：

循环神经网络的示意图

在该图中，我们可以看到以下内容：

xₜ是时间步长t的输入。例如，xₜ可以是基于字符的 RNN 中的第十个字符，表示为来自字符集的索引。
sₜ是时间步t的隐藏状态，因此是网络的内存。
sₜ的计算公式为s[t] = f(Ux[t] + Ws[t-1])，其中f是非线性函数，例如 ReLU。 U，V和W是权重。
oₜ是时间步长t的输出。例如，如果我们要计算字符序列中的下一个字母，它将是字符集o[t] = Vs[t]的概率向量。

如前所述，我们可以将sₜ视为网络的内存，因为它包含有关网络中较早时间步长发生了什么的信息。请注意，权重U，V和W在每个步骤中都是共享的，因为我们在每个步骤都执行相同的计算，只是使用不同的输入值（结果是学习权重的数量大大减少了）。还要注意，我们可能不需要每个时间步长的输出值（如图所示）。如果我们要进行情感分析，每个步骤都是一个词，比如说电影评论，那么我们可能只关心最终的输出（正面或负面）。

现在，让我们看一个使用 RNN 的有趣示例，在该示例中，我们尝试以给定的写作风格创建文本。

RNN 的应用

在此应用中，我们将看到如何使用基于字符的循环神经网络创建文本。更改要使用的文本的语料库很容易（请参见下面的示例）；在这里，我们将使用查尔斯·狄更斯（Charles Dickens）的小说《伟大的期望》。我们将在此文本上训练网络，以便如果我们给它一个字符序列，例如thousan，它将产生序列中的下一个字符d。此过程可以继续进行，可以通过在不断演变的序列上反复调用模型来创建更长的文本序列。

这是训练模型之前创建的文本的示例：

Input: 
 'o else is there to inform?”\n\n“Is there no chance person who might identify you in the street?” said\n'
Next Char Predictions: 
 "dUFdZ!mig())'(ZIon“4g&HZ”@\nWGWtlinnqQY*dGJ7ioU'6(vLKL&cJ29LG'lQW8n-,M!JSVy”cjN;1cH\ndEEeMXhtW$U8Mt&sp"

这是一些文本，其中包含Pip序列，该序列是在模型经过 0.1 个温度（请参阅下文）进行 100 个周期（约 10 秒每个）的训练后创建的：

Pip; it was not to be done. I had been a little while I was a look out and the strength of considerable particular by the windows of the rest of his prospering look at the windows of the room wing and the courtyard in the morning was the first time I had been a very much being strictly under the wall of my own person to me that he had done my sister, and I went on with the street common, I should have been a very little for an air of the river by the fire. For the man who was all the time of the money. My dear Herbert, who was a little way to the marshes he had ever seemed to have had once more than once and the more was a ragged hand before I had ever seemed to have him a dreadful loveriement in his head and with a falling to the table, and I went on with his arms, I saw him ever so many times, and we all the courtyard to the fire to be so often to be on some time when I saw his shoulder as if it were a long time in the morning I was a woman and a singer at the tide was remained by the

对于不了解语法或拼写的系统来说，这并不是一个坏结果。这显然是荒谬的，但那时我们并不是在追求理性。只有一个不存在的单词（loveriement）。因此，网络已经完成了学习拼写和学习单词是文本单元的工作。还要注意，在下面的代码中，仅在短序列（sequence_length = 100）上训练网络。

接下来，我们将查看用于设置，训练和测试循环神经网络的代码。

我们的 RNN 示例的代码

此应用基于 Google 根据 Apache 2 许可提供的应用。

像往常一样，我们会将代码分解成片段，然后将您引到存储库中获取许可证和完整的工作版本。首先，我们有模块导入，如下所示：

import tensorflow as tf
import numpy as np
import os
import time

接下来，我们有文本文件的下载链接。

您可以通过在file中指定文件名和在url中指定文件的完整 URL，轻松地将其更改为所需的任何文本：

file='1400-0.txt'
url='https://www.gutenberg.org/files/1400/1400-0.txt' # Great Expectations by Charles Dickens

然后，我们为该文件设置了 Keras get_file()工具，如下所示：

path = tf.keras.utils.get_file(file,url)

然后，我们打开并读取文件，并以字符为单位查看文件的长度：

text = open(path).read()
print ('Length of text: {} characters'.format(len(text)))

在文件开头没有我们不需要的文本，因此我们将其剥离掉，然后再看一下前几个字符就很有帮助了，接下来我们要做：

# strip off text we don't need
text = text[835:]

# Take a look at the first 300 characters in text
print(text[:300])

输出应如下所示：

My father's family name being Pirrip, and my Christian name Philip, my
infant tongue could make of both names nothing longer or more explicit
than Pip. So, I called myself Pip, and came to be called Pip.

I give Pirrip as my father's family name, on the authority of his
tombstone and my sister,--Mrs

现在，让我们看一下文本中有多少个唯一字符，使用一组字符来获取它们，并按其 ASCII 码的顺序对其进行排序：

# The unique characters in the file
vocabulary = sorted(set(text))
print ('{} unique characters.'.format(len(vocabulary)))

这应该提供 84 个唯一字符。

接下来，我们创建一个字典，其中字符是键，而连续的整数是值。

这样我们就可以找到索引，表示任何给定字符的数值：

# Create a  dictionary of unique character keys to index values
char_to_index = {char:index for index, char in enumerate(vocabulary)}
print(char_to_index)

输出如下：

{'\n': 0, ' ': 1, '!': 2, '$': 3, '%': 4, '&': 5, "'": 6, '(': 7, ')': 8, '*': 9, ',': 10, '-': 11, '.': 12, '/': 13, '0': 14, '1': 15, '2': 16, '3': 17, '4': 18, '5': 19, '6': 20, '7': 21, '8': 22, '9': 23, ':': 24, ';': 25, '?': 26, '@': 27, 'A': 28, 'B': 29, 'C': 30, 'D': 31, 'E': 32, 'F': 33, 'G': 34, 'H': 35, 'I': 36, 'J': 37, 'K': 38, 'L': 39, 'M': 40, 'N': 41, 'O': 42, 'P': 43, 'Q': 44, 'R': 45, 'S': 46, 'T': 47, 'U': 48, 'V': 49, 'W': 50, 'X': 51, 'Y': 52, 'Z': 53, 'a': 54, 'b': 55, 'c': 56, 'd': 57, 'e': 58, 'f': 59, 'g': 60, 'h': 61, 'i': 62, 'j': 63, 'k': 64, 'l': 65, 'm': 66, 'n': 67, 'o': 68, 'p': 69, 'q': 70, 'r': 71, 's': 72, 't': 73, 'u': 74, 'v': 75, 'w': 76, 'x': 77, 'y': 78, 'z': 79, 'ê': 80, 'ô': 81, '“': 82, '”': 83}

我们还需要将字符存储在数组中。这样我们就可以找到与任何给定数值对应的字符，即index：

index_to_char = np.array(vocabulary)
print(index_to_char)

输出如下：

['\n' ' ' '!' '$' '%' '&' "'" '(' ')' '*' ',' '-' '.' '/' '0' '1' '2' '3' '4' '5' '6' '7' '8' '9' ':' ';' '?' '@' 'A' 'B' 'C' 'D' 'E' 'F' 'G' 'H' 'I' 'J' 'K' 'L' 'M' 'N' 'O' 'P' 'Q' 'R' 'S' 'T' 'U' 'V' 'W' 'X' 'Y' 'Z' 'a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j' 'k' 'l' 'm' 'n' 'o' 'p' 'q' 'r' 's' 't' 'u' 'v' 'w' 'x' 'y' 'z' 'ê' 'ô' '“' '”']

现在，我们正在使用的整个文本已转换为我们作为字典创建的整数数组char_to_index：

text_as_int = np.array([char_to_index[char] for char in text]

这是字符及其索引的示例：

print('{')
for char,_ in zip(char_to_index, range(20)):
    print(' {:4s}: {:3d},'.format(repr(char), char_to_index[char]))
print(' ...\n}')

输出如下：

{
  '\n':   0,
  ' ' :   1,
  '!' :   2,
  '$' :   3,
  '%' :   4,
  '&' :   5,
  "'" :   6,
  '(' :   7,
  ')' :   8,
  '*' :   9,
  ',' :  10,
  '-' :  11,
  '.' :  12,
  '/' :  13,
  '0' :  14,
  '1' :  15,
  '2' :  16,
  '3' :  17,
  '4' :  18,
  '5' :  19,
  ...
}

接下来，查看文本如何映射为整数很有用；这是前几个：

# Show how the first 15 characters from the text are mapped to integers
print ('{} ---- characters mapped to int ---- > {}'.format(repr(text[:15]), text_as_int[:15]))

输出如下：

"My father's fam" ---- characters mapped to int ---- > [40 78  1 59 54 73 61 58 71  6 72  1 59 54 66]

然后，我们设置每个输入的句子长度，并因此设置训练周期中的示例数：

# The maximum length sentence we want for a single input in characters
sequence_length = 100
examples_per_epoch = len(text)//seq_length

接下来，我们创建data.Dataset以在以后的训练中使用：

# Create training examples / targets
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)
# Display , sanity check
for char in char_dataset.take(5):
  print(index_to_char[char.numpy()])

输出如下：

M y f a

我们需要批量此数据以将其馈送到我们的 RNN，因此接下来我们要这样做：

sequences = char_dataset.batch(sequence_length+1, drop_remainder=True)

请记住，我们已经设置了sequence_length = 100，所以批量中的字符数是 101。

现在，我们有了一个函数来创建我们的输入数据和目标数据（必需的输出）。

该函数返回我们一直在处理的文本以及相同的文本，但是一起移动了一个字符，即，如果第一个单词是Python和sequence_length = 5，则该函数返回Pytho和ython 。

然后，我们通过连接输入和输出字符序列来创建数据集：

def split_input_target(chunk):
   input_text = chunk[:-1]
   target_text = chunk[1:]
   return input_text, target_text

dataset = sequences.map(split_input_target)

接下来，我们执行另一个健全性检查。我们使用先前创建的数据集来显示输入和目标数据。

请注意，dataset.take(n)方法从数据集中返回n批次。

在这里还请注意，由于我们已经启用了急切执行（当然，默认情况下，在 TensorFlow 2 中是这样），因此我们可以使用numpy()方法来查找张量的值：

for input_example, target_example in dataset.take(1):
 print ('Input data: ', repr(''.join(index_to_char[input_example.numpy()]))) #101 characters
 print ('Target data:', repr(''.join(index_to_char[target_example.numpy()])))

输出如下：

Input data: "My father's family name being Pirrip, and my Christian name Philip, my\ninfant tongue could make of b" Target data: "y father's family name being Pirrip, and my Christian name Philip, my\ninfant tongue could make of bo"

现在，我们可以通过几个步骤显示输入和预期输出：

for char, (input_index, target_index) in enumerate(zip(input_example[:5], target_example[:5])):
    print("Step {:4d}".format(char))
    print(" input: {} ({:s})".format(input_index, repr(index_to_char[input_index])))
    print(" expected output: {} ({:s})".format(target_index, repr(index_to_char[target_index])))

以下是此输出：

Step 0:      input: 40 ('M'),  expected output: 78 ('y') Step 1:      input: 78 ('y'),  expected output: 1 (' ') Step 2:      input: 1 (' '),  expected output: 59 ('f') Step 3:      input: 59 ('f'),  expected output: 54 ('a') Step 4:      input: 54 ('a'),  expected output: 73 ('t')

接下来，我们为训练进行设置，如下所示：

# how many characters in a batch
batch = 64

# the number of training steps taken in each epoch
steps_per_epoch = examples_per_epoch//batch # note integer division

# TF data maintains a buffer in memory in which to shuffle data 
# since it is designed to work with possibly endless data
buffer = 10000

dataset = dataset.shuffle(buffer).batch(batch, drop_remainder=True)

# call repeat() on dataset so data can be re-fed into the model from the beginning
dataset = dataset.repeat()

dataset

这给出了以下数据集结构：

<RepeatBatchDataset shapes: ((64, 100), (64, 100)), types: (tf.int64, tf.int64)>

此处，64是批次大小，100是序列长度。以下是我们训练所需的一些值：

# The vocabulary length in characters
vocabulary_length = len(vocabulary)

# The embedding dimension 
embedding_dimension = 256

# The number of recurrent neural network units
recurrent_nn_units = 1024

我们正在使用 GRU，在 CUDA 深度神经网络（cuDNN）库中，如果代码在 GPU 上运行，则可以使用这些例程进行快速计算。 GRU 是在 RNN 中实现内存的一种方式。下一节将实现此想法，如下所示：

if tf.test.is_gpu_available():
    recurrent_nn = tf.compat.v1.keras.layers.CuDNNGRU
    print("GPU in use")
else:
    import functools
    recurrent_nn = functools.partial(tf.keras.layers.GRU, recurrent_activation='sigmoid')
    print("CPU in use")

建立并实例化我们的模型

如我们先前所见，一种用于构建模型的技术是将所需的层传递到tf.keras.Sequential()构造器中。在这种情况下，我们分为三层：嵌入层，RNN 层和密集层。

第一嵌入层是向量的查找表，一个向量用于每个字符的数值。它的尺寸为embedding_dimension。中间，循环层是 GRU；其大小为recurrent_nn_units。最后一层是长度为vocabulary_length单元的密集输出层。

该模型所做的是查找嵌入，使用嵌入作为输入来运行 GRU 一次，然后将其传递给密集层，该层生成下一个字符的对数（对数赔率）。

如下图所示：

因此，实现此模型的代码如下：

def build_model(vocabulary_size, embedding_dimension, recurrent_nn_units, batch_size):
    model = tf.keras.Sequential(
        [tf.keras.layers.Embedding(vocabulary_size, embedding_dimension, batch_input_shape=[batch_size, None]),
    recurrent_nn(recurrent_nn_units, return_sequences=True, recurrent_initializer='glorot_uniform', stateful=True),
    tf.keras.layers.Dense(vocabulary_length)
  ])
    return model

现在我们可以实例化我们的模型，如下所示：

model = build_model(
  vocabulary_size = len(vocabulary),
  embedding_dimension=embedding_dimension,
  recurrent_nn_units=recurrent_nn_units,
  batch_size=batch)

现在，我们可以进行健全性检查，以确保我们的模型输出正确的形状。注意使用dataset.take()提取数据集的元素：

for batch_input_example, batch_target_example in dataset.take(1):
    batch_predictions_example = model(batch_input_example)
    print(batch_predictions_example.shape, "# (batch, sequence_length, vocabulary_length)")

以下是此输出：

(64, 100, 84) # (batch, sequence_length, vocabulary_length)

这是预期的；回想一下，我们的字符集中有84个唯一字符。

这是显示我们的模型外观的代码：

model.summary()

我们的模型架构摘要的输出如下：

再次回想一下，我们有84输入值，我们可以看到，对于嵌入层，84 * 256 = 21,504，对于密集层，1024 * 84 + 84（偏置单元）= 86,100。

使用我们的模型获得预测

为了从我们的模型中获得预测，我们需要从输出分布中抽取一个样本。此采样将为我们提供该输出分布所需的字符（对输出分布进行采样很重要，因为像通常那样对它进行argmax提取，很容易使模型陷入循环）。

在显示索引之前，tf.random.categorical进行此采样，axis=-1与tf.squeeze删除张量的最后一个维度。

tf.random.categorical的签名如下：

tf.random.categorical(logits, num_samples, seed=None, name=None, output_dtype=None)

将其与调用进行比较，我们看到我们正在从预测（example_batch_predictions[0]）中获取一个样本（长度为sequence_length = 100）。然后删除了多余的尺寸，因此我们可以查找与示例相对应的字符：

sampled_indices = tf.random.categorical(logits=batch_predictions_example[0], num_samples=1)

sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()

sampled_indices

这将产生以下输出：

array([79, 43, 3, 12, 20, 24, 54, 10, 61, 43, 46, 15, 0, 24, 39, 77, 2, 73, 4, 78, 5, 60, 13, 65, 1, 75, 47, 33, 61, 13, 64, 41, 32, 42, 40, 20, 37, 10, 60, 51, 21, 17, 69, 8, 3, 74, 64, 68, 2, 3, 35, 13, 67, 16, 46, 48, 47, 1, 38, 80, 47, 8, 32, 53, 50, 28, 63, 33, 35, 72, 80, 0, 7, 64, 2, 79, 1, 56, 61, 13, 55, 28, 62, 30, 40, 22, 32, 40, 27, 46, 21, 51, 10, 76, 64, 47, 72, 83, 45, 8])

让我们看一下到训练之前的一些输入和输出：

print("Input: \n", repr("".join(index_to_char[batch_input_example[0]])))

print("Next Char Predictions: \n", repr("".join(index_to_char[sampled_indices ])))
#

因此输出如下。输入的文本之后是下一个字符预测（在训练之前）：

Input: 
 'r, that I might refer to it again; but I could not find it, and\nwas uneasy to think that it must hav'
Next Char Predictions: 
 "hFTzJe;rAô:G*'”x4d?&ôce9QekL:*O7@KuoZM&“$r0mg\n%/2-6QaE&$)/'Y8m.x)94b?fKp.rRô.3IMMTMjMMag.iL1LuM6 ?';"

接下来，我们定义loss函数：

def loss(labels, logits):
 return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

然后，我们在训练之前查看模型的损失，并进行另一次尺寸完整性检查：

batch_loss_example = tf.compat.v1.losses.sparse_softmax_cross_entropy(batch_target_example, batch_predictions_example)
print("Prediction shape: ", batch_predictions_example.shape, " # (batch_size, sequence_length, vocab_size)")
print("scalar_loss: ", batch_loss_example.numpy())

这将产生以下输出：

Prediction shape: (64, 100, 84) # (batch, sequence_length, vocabulary_length) 
scalar_loss: 4.429237

为了准备我们的训练模型，我们现在使用AdamOptimizer和 softmax 交叉熵损失对其进行编译：

#next produced by upgrade script.... 
#model.compile(optimizer = tf.compat.v1.train.AdamOptimizer(), loss = loss) 
#.... but following optimizer is available.
model.compile(optimizer = tf.optimizers.Adam(), loss = loss)

我们将保存模型的权重，因此，接下来，我们为此准备检查点：

# The checkpoints will be saved in this directory
directory = './checkpoints'

# checkpoint files
file_prefix = os.path.join(directory, "ckpt_{epoch}")
callback=[tf.keras.callbacks.ModelCheckpoint(filepath=file_prefix, save_weights_only=True)]

最后，我们可以使用对model.fit()的调用来训练模型：

epochs=45 # *much* faster on GPU, ~10s / epoch, reduce this figure significantly if on CPU 

history = model.fit(dataset, epochs=epochs, steps_per_epoch=steps_per_epoch, callbacks=callback)

这给出以下输出：

Epoch 1/50 158/158 [==============================] - 10s 64ms/step - loss: 2.6995 .................... Epoch 50/50 158/158 [==============================] - 10s 65ms/step - loss: 0.6143

以下是最新的检查点：

tf.train.latest_checkpoint(directory)

可以解决以下结果：

'./checkpoints/ckpt_45'

因此，我们可以重建模型（以展示其完成方式）：

model = build_model(vocabulary_size, embedding_dimension, recurrent_nn_units, batch_size=1)

model.load_weights(tf.train.latest_checkpoint(directory))

model.build(tf.TensorShape([1, None]))

model.summary()

下表显示了我们模型的摘要：

接下来，在给定训练有素的模型，起始字符串和温度的情况下，我们使用一个函数来生成新文本，其值确定文本的随机性（低值给出更多可预测的文本；高值给出更多随机的文本）。

首先，我们确定要生成的字符数，然后向量化起始字符串，并为其添加空白尺寸。我们将额外的维添加到input_string变量中，因为 RNN 单元需要它（两个必需的维是批量长度和序列长度）。然后，我们初始化一个变量，用于存储生成的文本。

temperature的值确定生成的文本的随机性（较低的随机性较小，意味着更可预测）。

在一个循环中，对于要生成的每个新字符，我们使用包含 RNN 状态的模型来获取下一个字符的预测分布。然后使用多项式分布来找到预测字符的索引，然后将其用作模型的下一个输入。由于存在循环，模型返回的 RNN 状态将反馈到模型中，因此它现在不仅具有一个字符，而且具有更多信息。一旦预测了下一个字符，就将修改后的 RNN 状态反复反馈到模型中，以便模型学习，因为它从先前预测的字符获得的上下文会增加。

下图显示了它是如何工作的：

在这里，多项式用tf.random.categorical实现；现在我们准备生成我们的预测文本：

def generate_text(model, start_string, temperature, characters_to_generate):

# Vectorise the start string into numbers 
  input_string = [char_to_index[char] for char in start_string]
# add extra dimension to input_string
  input_string = tf.expand_dims(input_string, 0)

# Empty list to store generated text
  generated = []

# (batch size is 1)
  model.reset_states()
  for i in range(characters_to_generate):
    predictions = model(input_string) #here's where we need the extra dimension

    # remove the batch dimension
    predictions = tf.squeeze(predictions, 0)

    # using a random categorical (multinomial) distribution to predict word returned by the model
    predictions = predictions / temperature
    predicted_id = tf.random.categorical(logits=predictions, num_samples=1)[-1,0].numpy()

    # Pass  predicted word as  next input to the model along with previous hidden state
    input_string = tf.expand_dims([predicted_id], 0)

    generated.append(index_to_char[predicted_id])
return (start_string + ''.join(generated)) # generated is a list

因此，在定义函数之后，我们可以调用它以返回生成的文本。

在给定的函数参数中，低温给出更多可预测的文本，而高温给出更多随机的文本。同样，您可以在此处更改起始字符串并更改函数生成的字符数：

generated_text = generate_text(model=model, start_string="Pip", temperature=0.1, characters_to_generate = 1000)
print(generated_text)

经过 30 个训练周期后，将产生以下输出：

Pip; it was a much better to and the Aged and weaking his hands of the windows of the way who went them on which the more I had been a very little for me, and I went on with his back in the soldiers of the room with the whole hand the other gentleman with the hand on the service, when I was a look of half of the room was was the first time of the money. I forgetter, Mr. Pip?” “I don't know that I have no more than I know what I have no inquiry with the rest of its being straight up again. He came out of the room, and in the midst of the room was was all the words, “and he came into the Castle. One would repeat it to your expectations condition of the courtyard. In a moment was the first time in the house to the fork, and we all lighted and at his being so beautiful looking at the convicts. My depression of the morning, I looked at him in the morning, I should not have been made a strong for the first time of the wall before the table to the forefinger of the room, and had not quite diffi

Loss = 0.6761；该文本或多或少地被正确地拼写和标点，尽管其含义（我们并未试图实现）的含义在很大程度上是愚蠢的。它还没有学习如何正确使用语音标记。只有两个无意义的单词（forgetter和weaking），经过检查，在语义上仍然是合理的。生成的是否为 Charles Dickens 风格是一个悬而未决的问题。

周期数的实验表明，损失在约 45 周期时达到最小值，此后它开始增加。

45 个周期后，输出如下：

Pip; or I should
have felt painfully consciousness that he was the man with his back to the kitchen, and he seemed to have no
strength, and as I had often seen her shutters with the poker on
the parlor, through having been every disagreeable to be seen; I thought I would give him more letters of my own
eyes and flared about the fire, and showed the greatest state of mind,
I thought I would give up of his having fastened out of the room, and had
made some advance in that respect to me to feel an
indescribable awe as it was a to be even than ever of her steps, or for old
asked, “Yes.”

“What is it?” repeated Mr. Jaggers. “You know I was in my mind by his blue eyes most of all admirers,
and that she had shaken hands contributing the poker out of his
hands in his pockets and his dinner loosely tied in a busy preparation for the reference to my United and
self-possession when Miss Havisham and Estella now that I had been too much to be the salvey dark night, which seemed so long
ago. “Yes, de

Loss = 0.6166；该模型现在似乎已正确配对了语音标记，并且没有无意义的单词。

总结

这样就结束了我们对 RNN 的研究。在本章中，我们首先讨论了 RNN 的一般原理，然后介绍了如何获取和准备一些供模型使用的文本，并指出在此处使用替代文本源很简单。然后，我们看到了如何创建和实例化我们的模型。然后，我们训练了模型并使用它从起始字符串中产生文本，并注意到网络已了解到单词是文本的单元以及如何拼写各种各样的单词（有点像文本作者的风格），几个非单词。

在下一章中，我们将研究 TensorFlow Hub 的使用，它是一个软件库。

九、TensorFlow 估计器和 TensorFlow HUB

本章分为两部分，但是此处的技术是相关的。首先，我们将研究 TensorFlow 估计器如何为 TensorFlow 提供简单的高级 API，其次，我们将研究 TensorFlow Hub 如何包含可在自己的应用中使用的模块。

在本章中，我们将涵盖以下主要主题：

TensorFlow 估计器
TensorFlow HUB

TensorFlow 估计器

tf.estimator是 TensorFlow 的高级 API。它通过提供用于服务模型的直接训练，评估，预测和导出的方法来简化机器学习编程。

估计器为 TensorFlow 开发人员带来了许多优势。与低级 API 相比，使用估计器开发模型更容易，更直观。特别是，同一模型可以在本地计算机或分布式多服务器系统上运行。该模型也不了解其所处的处理器，即 CPU，GPU 或 TPU。估计器还通过简化模型开发人员共享实现的过程，简化了开发过程，并且由于构建在 Keras 层上，因此使自定义更加简单。

估计器会处理与 TensorFlow 模型一起使用的所有背景管线。它们支持安全，分布式的训练循环，用于图构建，变量初始化，数据加载，异常处理，创建检查点文件，从故障中恢复以及为 TensorBoard 保存摘要。正如我们将看到的，由于它们创建检查点，因此它们支持在给定数量的步骤之后停止和开始训练。

开发估计器模型的过程分为四个步骤：

采集数据并创建数据函数
创建特征列
实例化估计器
评估模型的表现

我们将在以下代码中举例说明这些步骤。

我们之前已经看过fashion_mnist数据集（在第 5 章“将 TensorFlow 2 用于无监督学习”），因此我们将再次使用该数据集来演示估计器的用例。

代码

首先，这是必需的导入：

import tensorflow as tf
import numpy as np

接下来，我们获取并预处理数据。注意，tf.keras.datasets中方便地存在fashion_mnist。数据集中的x值采用整数 NumPy 数组的形式，每个元素的范围为 0 到 255，代表28 x 28像素时尚图像中每个像素的灰度值。为了进行训练，必须将这些值转换为 0 到 1 范围内的浮点数。y值采用无符号 8 位整数(uint8)的形式，并且必须转换为 32 位整数（int32 ），供估计工具再次使用。

尽管可以用以下方法试验该超参数值，但将学习率设置为一个很小的值：

fashion = tf.keras.datasets.fashion_mnist
(x_train, y_train),(x_test, y_test) = fashion.load_data()
print(type(x_train))
x_train, x_test = x_train / 255.0, x_test / 255.0

y_train, y_test = np.int32(y_train), np.int32(y_test)

learning_rate = 1e-4

之后，是我们的训练输入特征。

当您具有数组中的完整数据集并需要快速进行批量，混排和/或重复的方法时，将使用tf.compat.v1.estimator.inputs.numpy_input_fn。

其签名如下：

tf.compat.v1.estimator.inputs.numpy_input_fn(
 x,
 y=None,
 batch_size=128,
 num_epochs=1,
 shuffle=None,
 queue_capacity=1000,
 num_threads=1
)

将此与我们对函数的调用进行比较，您可以看到x值如何作为 NumPy 数组的字典（与张量兼容）传递，以及y照原样传递。在此阶段，我们尚未指定周期数，即该函数将永远运行（稍后将指定步骤），我们的批量大小（即一步中显示的图像数）为50，并在每一步之前将数据在队列中混洗。其他参数保留为其默认值：

train_input_fn = tf.compat.v1.estimator.inputs.numpy_input_fn(
    x={"x": x_train},
        y=y_train,
        num_epochs=None,
        batch_size=50,
        shuffle=True
)

值得一提的是，尽管这样的便利函数虽然在 TensorFlow 2.0 alpha 中不可用，但仍有望改用 TensorFlow2。

测试函数具有相同的签名，但是在这种情况下，我们仅指定一个周期，并且正如 Google 所建议的那样，我们不会对数据进行混洗。同样，其余参数保留为其默认值：

test_input_fn = tf.compat.v1.estimator.inputs.numpy_input_fn(
    x={"x": x_test},
        y=y_test,
        num_epochs=1,
        shuffle=False
)

接下来，我们建立特征列。特征列是一种将数据传递给估计器的方法。

特征列函数的签名如下。 key是唯一的字符串，是与我们先前在输入函数中指定的字典名称相对应的列名称（有关不同类型的特征列的更多详细信息，请参见这里）：

tf.feature_column.numeric_column(
    key,
    shape=(1,),
    default_value=None,
    dtype=tf.float32,
    normalizer_fn=None
)

在我们的特定特征列中，我们可以看到关键是"x"，并且形状就是fashion_mnist数据集图像的28 x 28像素形状：

feature_columns = [tf.feature_column.numeric_column("x", shape=[28, 28])]

接下来，我们实例化我们的估计器，它将进行分类。它将为我们构建一个深度神经网络。它的签名很长很详细，因此我们将带您参考这里，因为我们将主要使用其默认参数。它的第一个参数是我们刚刚指定的特征，而第二个参数是我们的网络规模。（输入层和输出层由估计器在后台添加。）AdamOptimizer是安全的选择。 n_classes对应于我们fashion_mnist数据集的y标签数量，我们在其中添加了0.1的适度dropout。然后，model_dir是我们保存模型参数及其图和检查点的目录。此目录还用于将检查点重新加载到估计器中以继续训练：

# Build 2 layer DNN classifier
classifier = tf.estimator.DNNClassifier(
    feature_columns=feature_columns,
    hidden_units=[256, 32],
    optimizer=tf.compat.v1.train.AdamOptimizer(learning_rate),
    n_classes=10,
    dropout=0.1,
    model_dir="./tmp/mnist_modelx"
, loss_reduction=tf.compat.v1.losses.Reduction.SUM)

现在，我们准备训练模型。如果您第二次或之后运行.train循环，则 Estimator 将从model_dir加载其模型参数，并进行进一步的steps训练（要完全从头开始，只需通过model_dir删除指定的目录）：

classifier.train(input_fn=train_input_fn, steps=10000)

典型的输出线如下所示：

INFO:tensorflow:loss = 25.540459, step = 1600 (0.179 sec) INFO:tensorflow:global_step/sec: 523.471

最终输出如下所示：

INFO:tensorflow:Saving checkpoints for 10000 into ./tmp/mnist_modelx/model.ckpt.
INFO:tensorflow:Loss for final step: 13.06977.

model_dir中指定的目录如下所示：

为了评估模型的表现，使用了classifier.evaluate方法。其签名如下：

classifier.evaluate(input_fn, steps=None, hooks=None, checkpoint_path=None, name=None)

这将返回一个字典，因此在我们的调用中，我们正在提取准确率指标。

在此，steps默认为None。这将评估模型，直到input_fn引发输入结束异常，即，它将评估整个测试集：

 accuracy_score = classifier.evaluate(input_fn=test_input_fn)["accuracy"]
 print("\nTest Accuracy: {0:f}%\n".format(accuracy_score*100))

我们还可以使用以下命令在 TensorBoard 中查看训练的进度：

tensorboard --logdir=./tmp/mnist_modelx

此处，损失图如下所示，其中x轴以 1,000（k）单位表示：

到此结束我们对时尚估计器分类器的了解。现在我们来看看 TensorFlow Hub。

TensorFlow HUB

TensorFlow Hub 是一个软件库。其目的是提供可重用的组件（称为模块），这些组件可在开发组件的原始上下文之外的上下文中使用。所谓模块，是指 TensorFlow 图的一个独立部分及其权重，可以在其他类似任务中重复使用。

IMDb（电影评论数据库）

在本节中，我们将研究一种基于 Google 的应用，该应用在情感分析中分析了电影评论的 IMDb 的子集。该子集由斯坦福大学主持，包含每部电影的评论，以及情感积极性等级为 1 到 4（差）和 7 到 10（好）的情感。问题在于确定关于每个电影的文本句子中表达的视图的极性，即针对每个评论，以确定它是正面评论还是负面评论。我们将在 TensorFlow Hub 中使用一个模块，该模块先前已经过训练以生成单词嵌入。

词嵌入是数字的向量，因此具有相似含义的词也具有类似的向量。这是监督学习的示例，因为评论的训练集将使用 IMDB 数据库提供的阳性值来训练模型。然后，我们将在测试集上使用经过训练的模型，并查看其预测与 IMDB 数据库中存储的预测相比如何，从而为我们提供了一种准确率度量。

可以在这个页面中找到该数据库论文的引文。

数据集

以下是数据库随附的自述文件：

"The core dataset contains 50,000 reviews split evenly into 25k train and 25k test sets. The overall distribution of labels is balanced (25k pos and 25k neg)."
"In the entire collection, no more than 30 reviews are allowed for any given movie because reviews for the same movie tend to have correlated ratings. Further, the train and test sets contain a disjoint set of movies, so no significant performance is obtained by memorizing movie-unique terms and their associated with observed labels. In the labeled train/test sets, a negative review has a score <= 4 out of 10, and a positive review has a score >= 7 out of 10. Thus, reviews with more neutral ratings are not included in the train/test sets."

这是从 IMDb 训练头的顶部起的五行示例：

	句子	情感	极性
0	`I came here for a review last night before dec...`	3	0
1	`Look, I'm reading and reading these comments and...`	4	0
2	`I was overtaken by the emotion. Unforgettable ...`	10	1
3	`This movie could have been a decent B-movie if...`	4	0
4	`I have a thing for old black and white movies ...`	10	1

这是其尾部的五行：

	句子	情感
24995	`I have watched some pretty poor films in the p...`	1
24996	`This film is a calculated attempt to cash in t...`	1
24997	`This movie was so very badly written. The char...`	1
24998	`I am a huge Stooges fan but the one and only r...`	2
24999	`Well, let me start off by saying how utterly H...`	3

以下是测试集：

代码

现在，让我们看一下在这些数据上训练的代码。在程序的顶部，我们有通常的导入，以及可能需要与pip – tensorflow_hub，pandas和seaborn一起安装的三个额外的导入。如前所述，我们将使用tensorflow_hub中的模块；我们还将使用pandas的一些DataFrame属性和seaborn的一些绘制方法：

import tensorflow as tf
import tensorflow_hub as hub
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import re
import seaborn as sns

另外，这是一些值和我们稍后需要的方法：

n_classes = 2
hidden_units = [500,100]
learning_rate = 1e-4
steps = 1000
optimizer = tf.optimizers.Adagrad(learning_rate=learning_rate)
# upgrade script gave this:
#optimizer = tf.compat.v1.train.AdagradOptimizer(learning_rate = learning_rate)

重要的是要认识到，这里使用的 IMDb 数据是目录的分层结构形式。

顶级 IMDb 目录包含两个子目录：train和test。 train和test子目录分别包含另外两个子目录pos和neg：

pos：包含文本文件的集合。每个文本文件都是正面评价（极性为 1）。
neg：包含文本文件的集合。每个文本文件都是负面评论（极性为 0）。

情感（分别为 7 到 10 或 1 到 4）记录在文件名中；例如，文件名为18_7.txt的文本文件评论的情感为 7（pos），而文件名为38_2.txt的文本文件评论的情感为 2（neg）：

IMDb 目录/文件层次结构

我们从调用层次结构中的三个函数开始，这些函数获取并预处理审阅数据。

在第一个函数load_data(directory)中，directory_data是一个字典，其中加载了directory中的数据，该数据作为参数传入并作为 pandas DataFrame返回。

用description和sentiment键初始化directory_data字典，然后将它们分配为空列表作为值。

然后，该函数循环遍历directory中的每个文件，并且对于每个文本文件，读取其内容（作为电影评论）并将其附加到情感列表中。然后，它使用正则表达式分析文件名并提取数字情感，如前所示，该数字情感紧随文件名中的下划线（_）。该函数将此数字情感附加到sentiment列表中。当所有.txt文件都循环通过后，该函数将返回已转换为 pandas DataFrame的字典：

# Load all files from a directory into a Pandas DataFrame.
def load_data(directory):
    directory_data = {}
    directory_data["description"] = []
    directory_data["sentiment"] = []
    for file in os.listdir(directory):
        with tf.io.gfile.GFile(os.path.join(directory, file), "r") as f:
            directory_data["description"].append(f.read())
            directory_data["sentiment"].append(re.match("\d+_(\d+)\.txt", file).group(1))
    return pd.DataFrame.from_dict(directory_data)

如我们前面所述，下一个函数load(directory)调用load_data(directory)从pos和neg子目录创建一个DataFrame。它将适当的极性作为额外字段添加到每个DataFrame。然后，它返回一个新的DataFrame，该数据帧由pos和neg的DataFrame的连接组成，经过混洗（sample(frac=1)），并插入了新的数字索引（因为我们已经对行进行了混排）：

# Merge positive and negative examples, add a polarity column and shuffle.
def load(directory):
    positive_df = load_data(os.path.join(directory, "pos"))
    positive_df["polarity"] = 1

    negative_df = load_data(os.path.join(directory, "neg"))
    negative_df["polarity"] = 0
    return pd.concat([positive_df, negative_df]).sample(frac=1).reset_index(drop=True)

第三个也是最后一个函数是acquire_data()。如果缓存中不存在该函数，则使用 Keras 工具从 Stanford URL 中获取我们所需的文件。默认情况下，高速缓存是位于~/.keras/datasets的目录，如有必要，文件将提取到该位置。该工具将返回到我们的 IMDb 的路径。然后将其传递给load_dataset()的两个调用，以获取训练和测试DataFrame：

# Download and process the dataset files.
def acquire_data():
    data = tf.keras.utils.get_file(
    fname="aclImdb.tar.gz",
    origin="http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz", extract=True)

    train_df = load(os.path.join(os.path.dirname(data), "aclImdb", "train"))
    test_df = load(os.path.join(os.path.dirname(data), "aclImdb", "test"))

    return train_df, test_df
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

主程序要做的第一件事是通过调用我们刚刚描述的函数来获取训练并测试 pandas DataFrame：

train_df, test_df = acquire_data()

此时，train_df和test_df包含我们要使用的数据。

在查看下一个片段之前，让我们看一下它的签名。这是一个估计器，它返回用于将 Pandas DataFrame馈入模型的输入函数：

tf.compat.v1.estimator.inputs.pandas_input_fn(x, y=None, batch_size=128, num_epochs=1, shuffle=None, queue_capacity=1000, num_threads=1, target_column='target')

调用本身如下：

# Training input on the whole training set with no limit on training epochs
train_input_fn = tf.compat.v1.estimator.inputs.pandas_input_fn(train_df, train_df["polarity"], num_epochs=None, shuffle=True)

通过将此调用与函数签名进行比较，我们可以看到训练数据帧train_df与每个评论的极性一起传入。 num_epochs =None表示对训练周期的数量没有限制，因为我们将在后面进行指定； shuffle=True表示以随机顺序读取记录，即文件的每一行。

接下来是预测训练结果的函数：

# Prediction on the whole training set.
predict_train_input_fn = tf.compat.v1.estimator.inputs.pandas_input_fn(train_df, train_df["polarity"], shuffle=False)

我们还具有预测测试结果的函数：

# Prediction on the test set.
predict_test_input_fn = tf.compat.v1.estimator.inputs.pandas_input_fn(test_df, test_df["polarity"], shuffle=False)

然后，我们有特征列。特征列是原始数据和估计器之间的中介。共有九种特征列类型。它们根据其类型采用数值或分类数据，然后将数据转换为适用于估计器的格式。在这个页面上有一个出色的描述以及许多示例。

请注意，嵌入来自tf.hub：

embedded_text_feature_column = hub.text_embedding_column(
    key="description",
    module_spec="https://tfhub.dev/google/nnlm-en-dim128/1")

接下来，我们有我们的深度神经网络估计器。估计器是用于处理模型的高级工具。

估计器的示例包括DNNClassifier，即用于 TensorFlow 深层神经网络的分类器（在以下代码中使用），以及LinearRegressor，即用于线性回归问题的分类器。其签名如下：

tf.estimator.DNNClassifier(hidden_units, feature_columns, model_dir=None, n_classes=2, weight_column=None, label_vocabulary=None, optimizer='Adagrad', activation_fn=<function relu at 0x7fbb75512488>, dropout=None, input_layer_partitioner=None, config=None, warm_start_from=None, loss_reduction='weighted_sum', batch_norm=False, loss_reduction=None)

让我们将此与通话进行比较：

estimator = tf.estimator.DNNClassifier(
    hidden_units = hidden_units,
    feature_columns=[embedded_text_feature_column],
    n_classes=n_classes,
    optimizer= optimiser,
    model_dir = "./tmp/IMDbModel"
, loss_reduction=tf.compat.v1.losses.Reduction.SUM)

我们可以看到，我们将使用具有 500 和 100 个单元的隐藏层的神经网络，我们先前定义的特征列，两个输出类（标签）和ProximalAdagrad优化器。

请注意，与前面的示例一样，由于我们指定了model_dir，因此估计器将保存一个检查点和各种模型参数，以便在重新训练时，将从该目录加载模型并对其进行进一步的训练steps。

现在，我们可以使用以下代码来训练我们的网络：

estimator.train(input_fn=train_input_fn, steps=steps);

此代码块为我们的结果造成混淆矩阵。

在我们的上下文中，混淆矩阵是一个图表，显示了经过训练的模型的以下内容：

真阳性：真实的正面情感被正确地预测为正面的评论（右下）
真阴性：真实的负面情感被正确地预测为负面的评论（左上）
假阳性：真实的负面情感被错误地预测为正面的评论（右上）
假阴性：真实的正面情感被错误地预测为负面的评论（左下）

以下是我们的训练集的混淆矩阵：

训练集的混淆矩阵

原始数据如下：

| 9,898 | 2602 |
| 2,314 | 10,186 |

注意总数是 25,000，这是我们使用的训练示例的数量。

这是我们测试集的混淆矩阵：

测试集的混淆矩阵

原始数据如下：

| 9859 | 2641 |
| 2500 | 10000 |

对于混淆矩阵，重要的是，对角线的值（左上到右下）要比该对角线的值高得多。我们可以从混淆矩阵中立即看到，我们的模型在训练和测试集上都表现良好（如果在测试集上差一些）。

在代码中，我们首先有一个获取预测的函数：

def get_predictions(estimator, input_fn):
    return [prediction["class_ids"][0] for prediction in estimator.predict(input_fn=input_fn)]

TensorFlow 有一种创建混淆矩阵的方法（如前所述，它们可以显示在原始图中）。

其签名如下：

tf.math.confusion_matrix(labels, predictions, num_classes=None, dtype=tf.int32, name=None, weights=None)

在这里，labels是真实的标签。

我们的代码调用如下方法：

confusion_train = tf.math.confusion_matrix(labels=train_df["polarity"], predictions=get_predictions(estimator, predict_train_input_fn))
print("Raw figures:")
print(confusion_train.numpy())

接下来，我们对混淆矩阵进行归一化，以便其行总计为 1：

# Normalize the confusion matrix so that each row sums to 1.

top = confusion_train.numpy()
bottom = np.sum(top)
confusion_train = 2*top/bottom

最后，我们使用seaborn方法heatmap绘制混淆矩阵。此方法的签名很长且很详细，因此，查看它的最简单方法是在 Jupyter 笔记本中将光标放在Shift + TAB上。

我们在这里只需要四个参数：

sns.heatmap(confusion_train, annot=True, xticklabels=LABELS, yticklabels=LABELS)
plt.xlabel("Predicted")
plt.ylabel("True")

在这里，我们得到以下内容：

LABELS = ["negative", "positive"]

除了使用测试集代替训练集之外，用于显示测试集的混淆矩阵的代码是相同的：

# Create a confusion matrix on test data.
confusion_test = tf.math.confusion_matrix(labels=test_df["polarity"], predictions=get_predictions(estimator, predict_test_input_fn))
print(confusion_test.numpy())
# Normalize the confusion matrix so that each row sums to 1.
top = confusion_test.numpy()
bottom = np.sum(top)
confusion_test = 2*top/bottom
sns.heatmap(confusion_test, annot=True, xticklabels=LABELS, yticklabels=LABELS);
plt.xlabel("Predicted");
plt.ylabel("True");

到此结束我们对 IMDb 情感分析的研究。

总结

在本章中，我们介绍了用于训练时装数据集的估计器。我们了解了估计器如何为 TensorFlow 提供简单直观的 API。

然后，我们查看了另一个应用，这一次是对 IMDb 中电影评论的情感分类。我们看到了 TensorFlow Hub 如何为我们提供文本嵌入，即单词的向量，这是具有相似含义的单词具有相似向量的地方。

在本书中，我们看到了 TensorFlow 2.0 alpha 的概述。

十、从 tf1.12 转换为 tf2

Google 提供了一个名为 tf_upgrade_v2的命令行脚本，该脚本会将 1.12 版文件（.py和.ipynb文件）转换为 TensorFlow 2 兼容文件。

此转换的语法如下：

 tf_upgrade_v2   --infile  file_to_convert --outfile  converted_file

这里是更新脚本的实战演示，以及有关它的更多详细信息，请参见这里。

重要的是要注意，在运行脚本之前，不应该手动更新代码部分。

该脚本不会解决所有问题，但是它生成的报告将标识那些必须手动解决的问题。

特别是，tf.contrib已从 TF2 中删除，因此必须跟踪并手动修复以前驻留在其中的函数。

这是脚本生成的报告的示例：

Processing file 'Chapter1_TF2_Snippets.ipynb'
 outputting to 'Chapter1_TF2_alpha'
 --------------------------------------------------------------------------------

 37:4: INFO: Added keywords to args of function 'tf.size'
 48:13: INFO: Added keywords to args of function 'tf.transpose'
 74:0: INFO: Added keywords to args of function 'tf.reduce_mean'
 75:0: INFO: Added keywords to args of function 'tf.reduce_mean'
 76:0: INFO: Added keywords to args of function 'tf.reduce_mean'
 77:0: INFO: Added keywords to args of function 'tf.reduce_mean'
 78:0: INFO: Added keywords to args of function 'tf.reduce_mean'
 110:4: INFO: Added keywords to args of function 'tf.argmax'
 114:4: INFO: Added keywords to args of function 'tf.argmin'
 121:4: INFO: Added keywords to args of function 'tf.argmax'
 123:4: INFO: Added keywords to args of function 'tf.argmin'
 127:4: INFO: Added keywords to args of function 'tf.argmax'
 129:4: INFO: Added keywords to args of function 'tf.argmin'
 136:0: ERROR: Using member tf.contrib.integrate.odeint in deprecated module tf.contrib. tf.contrib.integrate.odeint cannot be converted automatically. tf.contrib will not be distributed with TensorFlow 2.0, please consider an alternative in non-contrib TensorFlow, a community-maintained repository, or fork the required code.
 162:10: INFO: Added keywords to args of function 'tf.transpose'
 173:11: INFO: Added keywords to args of function 'tf.reduce_mean'

第 1 部分：TensorFlow 2.00 Alpha 简介

在本部分中，我们将介绍 TensorFlow 2.00 alpha。我们将首先概述该机器学习生态系统的主要功能，并查看其使用示例。然后我们将介绍 TensorFlow 的高级 Keras API。我们将在本节结尾处研究人工神经网络技术。

本节包含以下章节：

第 1 章“TensorFlow 2 简介”
第 2 章“Keras，TensorFlow 2 的高级 API”
第 3 章“TensorFlow 2 和 ANN 技术”

第 2 部分：TensorFlow 2.00 Alpha 中的监督和无监督学习

在本节中，我们将首先看到 TensorFlow 在监督机器学习中的许多应用，包括线性回归，逻辑回归和聚类。然后，我们将研究无监督学习，特别是应用于数据压缩和去噪的自编码。

本节包含以下章节：

第 4 章“TensorFlow 2 和监督机器学习”
第 5 章“Tensorflow 2 和无监督学习”

第 3 部分：TensorFlow 2.00 Alpha 的神经网络应用

在本节中，我们将研究许多人工神经网络（ANN）应用。这些包括图像识别，神经风格迁移，文本风格生成，时尚识别以及电影评论的 IMDb 数据库的语义分析。

本节包含以下章节：

第 6 章“使用 TensorFlow 2 识别图像”
第 7 章“TensorFlow 2 和神经风格迁移”
第 8 章“Tensorflow 2 和循环神经网络”
第 9 章“TensorFlow 估计器和 TensorFlow HUB”

posted @ 2026-03-25 10:35 布客飞龙II 阅读(10) 评论(0) 收藏举报

刷新页面返回顶部

龙哥盟

人最大的痛苦就是说一些自己都不相信的话。

TensorFlow-2-0-快速入门指南-全-

TensorFlow 2.0 快速入门指南（全）

零、前言

这本书是给谁的

本书涵盖的内容

充分利用这本书

使用约定

一、TensorFlow 2 简介

现代 TensorFlow 生态系统

安装 TensorFlow

急切的操作

导入 TensorFlow

TensorFlow 的编码风格约定

使用急切执行

声明急切变量

声明 TensorFlow 常量

调整张量

张量的等级（尺寸）

指定张量的元素

将张量转换为 NumPy/Python 变量

查找张量的大小（元素数）

查找张量的数据类型

指定按元素的基本张量操作

广播

转置 TensorFlow 和矩阵乘法

将张量转换为另一个（张量）数据类型

声明参差不齐的张量

提供有用的 TensorFlow 操作

求两个张量之间的平方差

求平均值

求所有轴的均值

求各列的均值

求各行的均值

生成充满随机值的张量

使用tf.random.normal()

使用tf.random.uniform()

使用随机值的实际示例

查找最大和最小元素的索引

使用检查点保存和恢复张量值

使用tf.function

总结

二、Keras：TensorFlow 2 的高级 API

Keras 的采用和优势

Keras 的特性

默认的 Keras 配置文件

Keras 后端

Keras 数据类型

Keras 模型

Keras 顺序模型

创建顺序模型的第一种方法

创建顺序模型的第二种方法

Keras 函数式 API

子类化 Keras 模型类

使用数据管道

保存和加载 Keras 模型

Keras 数据集

总结

三、TensorFlow 2 和 ANN 技术

将数据呈现给人工神经网络

将 NumPy 数组与数据集结合使用

将逗号分隔值（CSV）文件与数据集一起使用

CSV 示例 1

CSV 示例 2

CSV 示例 3

TFRecord

TFRecord 示例 1

TFRecord 示例 2

单热编码

OHE 示例 1

OHE 示例 2

层

密集（完全连接）层

卷积层

最大池化层

批量归一化层和丢弃层

Softmax 层

激活函数

建立模型

使用`tf.random.normal()`

使用`tf.random.uniform()`

使用`tf.function`

使用`.h5`格式保存和加载 NumPy 图像数据