Simon J. D. Prince, Computer Vision: Models, Learning and Inference

13 SIFT 尺度不变特征转换

13.2.3 SIFT detector

https://blog.csdn.net/jinshengtao/article/details/50167533

scale-invariant feature transform

理论：

目标：检测、描述、匹配图像局部特征点(identifying interest points)
返回特征点（key point / interest points）描述符（descriptor）

特征点信息：坐标，方向，邻域直径
描述符信息：用来匹配连接多张图片的特征点，具体还不清楚，在 13.3 但没让看？

流程 / 原理：

高斯差分 (Dog) 滤波：利用高斯核的差异（difference of Gaussian kernel）把一张图像分成 K 层（尺度层），随着其增加尺度越来越粗（increasingly coarse scales）
$I \times J \times K$ 的 3D 空间（$I, J$ 是图片的长宽）找极值点（在 $3 \times 3 \times 3$ 立方体里的中心比其他 $26$ 个都大 / 都小）

候选监测点的细化（Refinement of SIFT detector candidates）：

去除平滑区域（eliminating those in smooth regions）
用图像结构张量（on edges using the image structure tensor.）

实现：

用哈里斯边角侦测 (Harris corner detector) 算每个位置的一个矩阵 / 张量 (tensor) $S_{i,j}$，考虑奇异值（singular values ）

local quadratic approximation (局部二次近似)

定义：泰勒展开成二次近似

用局部二次近似并返回峰或谷的位置（local quadratic approximation and returning the position of the peak or trough.）的方法，极值点被定位 (localized) 到亚体素精度（sub-voxel accuracy）

效果：这提供了比尺度采样（？）的分辨率更准确的尺度估计和具有子像素分辨率（？）的位置估计。（This provides a position estimate that has sub-pixel resolution and an estimate of the scale that is more accurate than the resolution of the scale sampling.）

最终返回一个特征点集合，每个特征点有个唯一的方向（？ a unique orientation）

计算局部梯度（？elocal gradients）的幅度和方向（amplitude and orientation）（成比例？）

计算方向直方图（？orientation histogram），36 个箱（bins）覆盖 $360°$

对直方图的贡献取决于梯度幅度，并通过以兴趣点位置为中心的高斯分布进行加权，因此附近的区域贡献更大。（？）

兴趣点的方向被指定为该直方图的峰值。如果在最大值的 80% 范围内有第二个峰值，我们可以选择此时计算两个方向的描述符。因此，最终检测到的点与特定的方向和比例相关联

SIFT 检测器的结果：每个最终兴趣点均使用箭头指示。箭头的长度表示识别兴趣点的尺度，箭头的角度表示相关的方向。其中有一些位置方向不唯一的图像，这里使用两个兴趣点，一个与每个方向相关联。

SIFT（13.2.3）算出来的描述符（Descriptors）具体有啥用，我们的项目需要他吗（？

13.3 Descriptors

compact representations 紧凑表示

Histograms 直方图

为兴趣点周围区域内的每个像素计算梯度。
该区域被细分为多个单元。信息汇集在这些单元格内以形成 8D 直方图
这些直方图连接在一起以提供最终的描述符，该描述符在本地进行池化以提供小变形的不变性，但也保留有关图像梯度的一些空间信息。（？）

13.3.2 SIFT descriptors

特征点是与特定的缩放和旋转相关联，SIFT 描述符将通常是在由这些值转换的方形区域上进行计算。

目标是以对强度和对比度变化以及小的几何变形部分不变的方式来表征图像区域

流程：

input：检测梯度方向和幅度图
方向被量化进 8 个bins（把坐标系分成八个方向然后看）
然后将 16 × 16 探测器区域划分为不重叠的 4 × 4 单元的规则网格。
在每一个内计算图像方向的八维直方图
每个对直方图的贡献由相关的梯度幅度加权，并且
距离感兴趣点越远的位置贡献越小。这 4 × 4 = 16 个直方图连接起来形成一个 128 × 1 向量，即
然后标准化。
该描述符对于恒定强度变化是不变的，因为它基于梯度。

所以描述符究竟能干啥。。

OpenCV 实现

获得关键点，缩放不影响观察

输入一张图片，

描述符：特征向量集（不受平移、缩放、旋转影响，适应光照变化、放射、投影变换）

https://docs.opencv.org/4.x/da/df5/tutorial_py_sift_intro.html

import cv2

# 三维 list，每个像素 RGB 三维数组
img = cv2.imread("a.jpg")


# 转换成灰度图片：二维数组，每个像素 0 ~ 255 数字越大越白
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

SIFT = cv2.SIFT_create()

# None：或传入一个 list 表示要 mask 的 ；返回一个元祖，每个类型是 cv2.KeyPoint
# interestPoints = SIFT.detect(gray, None)

interestPoints, descriptor = SIFT.detectAndCompute(gray, None)


# for i in interestPoints:
#     print(i.pt)
#     # point2f pt;//位置坐标
#     # float size; // 特征点邻域直径
#     #float angle; // 特征点的方向，值为[零, 三百六十)，负值表示不使用
#     # float response;
#     # int octave; // 特征点所在的图像金字塔的组
#     # int class_id; // 用于聚类的id


# drawKeypoints(检测的图, interestPoints, 绘制图, color, flags)
# DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS ： 显示方向 (orientation)
out = cv2.drawKeypoints(gray, interestPoints, img, flags = cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)

cv2.imwrite("out.jpg", out)

14 The Pinhole Camera 针孔摄像头

projective camera 投影相机

一个简单的几何模型，点被投影到相机里。

sparse stereo reconstruction 稀疏立体重建

14.1

针孔摄像头由一个封闭的空间（closed chamber1），前面有个针孔组成。

物体光线通过针孔在后方 / 图像位置（back face / image place）形成倒像（inverted image）

optical center：针孔本身作为原点（光学中心）。

任意点表示：$\textbf{w} = [u, v, w] ^T$

拍到多张照片，要还原原来的 3D 位置。越近的红色，越远的蓝色。

$w$-axis / optical axis 光轴。光轴和位置的交点是主点（principal point）

focal length 焦距：主点到原点的距离。

pinhole camera model 针孔摄像头模型

算条件概率 $Pr(\textbf{x|w})$。图片上一个点 $\textbf{x} = [x,y]^T$ 是 $\text{3D}$ 空间 $\textbf{w}$ 的概率

$x$ 可以直接唯一确定。

非常简单的相机模型（标准化相机）并构建完整的相机参数化。

14.1.1 The normalized camera

焦距是 $1$。光轴是 $w$
image place （$x, y$ 轴） 2D 以 principal point 展开。
三个轴 $x, y, w$

$\textbf{w} = [u, v, w] ^T $ 被映射到 $x = u/w, y = v/w$（相似三角形。

14.1.2 Focal length parameters

引入 scaling factor（比例因子）$\phi$：

\[x = \frac{\phi_x u}{ w},y= \frac{\phi_x v}{ w} \]

（光感受器的间距在不同轴可能不同）

别名 focal length parameters （？

14.1.3 Offset and skew parameters

大部分的时候 $\textbf{x} = [0, 0]$ 不在主点上，大部分可能是左上角。

所以引入偏移变量 $[\delta_x,\delta_y]$ ：

\[x = \frac{\phi_x u}{ w}+\delta_x,y= \frac{\phi_x v}{ w}+\delta_y \]

倾斜变量 $\gamma$：（？？We also introduce a skew term γ which moderates the projected position x as a function of the height v in the world. This parameter has no clear physical interpretation, but can help explain the projection of points into the image in practice.

\[x = \frac{\phi_x u+\gamma v}{ w}+\delta_x,y= \frac{\phi_x v}{ w}+\delta_y \]

14.1.4 Position and orientation of camera

camera 不是在中心的，再定义一些旋转平移变换：

$\textbf{w}' = \Omega \textbf{w} + \tau$

14.1.5 Full pinhole camera model

两种参数：

intrinsic（固有） or camera parameters：$\{ \phi, \gamma, \delta\ \}$
extrinsic parameters 外在参数：$\Omega, \tau$

构造内在参数矩阵：$\Lambda$。

这样针孔摄像头模型：$\textbf{x} = \text{pinhole}[\textbf{w}, \Lambda, \Omega, \tau]$

还有一些其他噪音的影响（？？

14.1.6 Radial distortion

现实世界很少有针孔，可能大多数是透镜（lens / lenses）

有太多径向畸变，我们先解决透镜太大的问题

$x' = x(1 + β_1r^2 + β_2r^4 ) $

$y' = y(1 + β_1r^2 + β_2r^4),$

这部变换在刚投影还没偏移的时候（normal 的第一下实现

14.2 Three geometric problems

14.2.1 Problem 1: Learning extrinsic(外在) parameters

还原位置：perspective-n-point (PnP) （n 点透视）problem or the exterior orientation problem（外部定向问题）.

应用：augmented reality（增强现实）render （渲染）

问题

给定 $I$ 个 3D 的点 $\textbf{w}_i$，对应投影 $\textbf{x}_i$，内在参数（ intrinsic parameters ）$\textbf{Λ}$

我们需要找到一组 rotation（旋转角度？） $\Omega$ 和 translation $\tau$ 使得 $Pr$ 的乘积最大

分析

maximum likelihood learning problem 最大似然学习问题

14.2.2 Problem 2: Learning intrinsic parameters

校准 calibration：找到最优的 $\Lambda$

三个是一起找的 $\Lambda, \Omega,\tau$

校准过程需要一个已知的 3D 物体，其上有不同的点可以被识别，并找到它们在图像中的投影。一个常见的方法是构建定制的 3D 校准目标。

14.2.3 Problem 3: Inferring 3D world points

俩摄像头：calibrated stereo reconstruction

$J \ge 3$ 个：multi-view reconstruction.

sparse 3D point cloud.：稀疏 3D 点云。自动驾驶观测环境 / 从新的视野模拟拍照

形式化问题：$J$ 个校准摄像头，都知道对应的 $\Lambda, \Omega,\tau$，知道一个校准点 $\textbf{w}$ 对应的 $J$ 个投影 $\text{x}_j$ 坐标，

找到一个 $\text{w}$ 的 $3D$ 坐标。

triangulation 三角定位

homogeneous coordinates 齐次坐标

14.2.4 Solving the problems

目标函数没法进入封闭形式？需要非线性优化？

需要一个优秀的起始状态，可以优化的新目标函数接近最优解。

14.3 Homogeneous coordinates

齐坐标系

为了获得前面优化问题中几何量的良好初始估计，我们使用了一个简单的技巧：我们改变两者的表示
2D 图像点和 3D 世界点，使得投影方程变为线性。进行此更改后，可以找到封闭形式的未知量的解。（但需要强调的是，这些解决方案并不直接解决原始优化标准：它们最小化更抽象的
基于代数误差的目标函数，其解不能保证与原始问题相同。不过，它们一般都很接近足以为真实的非线性优化提供良好的起点成本函数。？？？）

2D 转 3D 齐次坐标：$\tilde{x} = \lambda [x, y, 1]$，

3D 转 4D 也是类似的，加 $1$ 到末尾：$\tilde w = \lambda [u, v,w,1]$

14.3.1 Camera model in homogeneous coordinates

$\lambda x = \phi_xu +\gamma v+\delta_xw$

$\lambda y = \phi_yv +\delta_yw$

$\lambda = w$

消除了除法，3D 转 2D 成为了线性关系

在加入外在参数也是矩阵乘法关系

14.4 Learning extrinsic parameters 外在参数

14.2.1 的这个问题，转成上面的齐次坐标系的矩阵乘法。

这个问题不是凸的（？没有好性质 non-convex problem

放弃内在参数 $\Lambda$ 的影响：直接把那个矩阵扬了（两边同乘他的逆）

这样两边的坐标系可以叫做标准化图像坐标（normalized image coordinates），就是之前用标准化相机得到的，很闭环。

用齐次坐标系以后 $\lambda$ 可以简单映射求出。

这是线性等式（向量两组），$I$ 组也可以叠起来，，成为一个矩阵。

问题变成了已知 $\text{A}$，求一个矩阵 $\text{b}$ 使得 $|Ab|^2$ 最小，钦定 $|b| = 1$（避免变成 $0$

可以通过 singular value decomposition 奇异值分解完成。

但我们还要找到对的 scale 缩放因子，也可以通过找到正交矩阵（ orthogonal matrix），正交 Procrustes 问题 orthogonal Procrustes problem。缩放因子定义的是 9 哥矩阵系数的平均比。

14.5 Learning intrinsic parameters

坐标上升方法 coordinate ascent method （效率低下）

先把外部参数估计一个，然后找内部参数，有一个甚至不需要齐次坐标系的方法。

现在我们知道了 $\textbf{x,w}, \Omega,\tau$

最大似然问题转化成最小二乘问题 a least squares problem

也可以写成线性矩阵的形式。

更好的方法可以同时优化，例如使用牛顿迭代

14.6 Inferring 3D world points

最终我们可以定位了！给投影 $\textbf{w}$ 和 $J$ 个校准相机的参数 $\Lambda, \Omega,\tau$，我们可以把 $\textbf{x}$ 算出来！

可以 $J$ 个矩阵线性乘法的形式！最小二乘能得到一个不错的开始状态

correspondence 通信：找到 $J$ 个投影点

未校准的情况也可以重建？；这称为投影重建，因为直到使用 3D 投影变换来获取所有图像时，结果都是不明确的，因此可以估计单个内在序列中的矩阵和外部参数，并重建点达到恒定比例因子的场景。第 16 章提出了一个扩展对该方法的讨论。

14.7 Applications

两个应用：

建立 3D 模型
从物体轮廓构建的近似模型生成物体的新视图。

14.7.1 Depth from structured light

14.6 多个标定相机算点的深度，但没有讨论怎么匹配图像上的两个点。

完整的答案在 16。

一个投影（projector）的一个相机，知道内在参数和相对位置。

the projector has an optical center and has a regular pixel array that is analogous to the sensor in the camera.

每个像素对应一个激光（

最 basic 的方法：一个像素一个像素对应。
structured light：一系列水平竖直条纹。
- 上半亮下半暗 $I_1, I_2$ 反转，，如此这样二分。
- 格雷码解决伪影等问题 Gray codes

14.7.2 Shape from silhouette 根据轮廓塑造

根据多张图片把一个物体的轮廓估计出来

background subtraction approach 背景减法

但是这样凹面识别不了，所以有一个方法是 visual hull （视觉外壳）

$r = Λ^{-1}x$（不明白为啥这样的向量就是辐射的向量？？

$w = τ + κΩr$

建立虚拟摄像头（假装有摄像头在那里）生成一张图片

对于每个像素找到投射光线过去的第一个物品， copy 其他摄像头对应的像素颜色（如果对应的不是恰好一个，用一些方法例如 bilinear or bicubic interpolation 双线性或双三次插值。

OpenCV 实现

https://docs.opencv.org/3.4/d9/d0c/group__calib3d.html

https://blog.csdn.net/s1t16/article/details/134380651

单一相机标定 calibration + 3D 重建

15 Models for Transformations

我们考虑针孔摄像头模型的平面在这个世界中。

2D 平面到 3D 的点有一一映射。

这种关注三个几何问题在 2D 平面上怎么体现的。

为了激发本章的思想，考虑一个增强现实应用程序。3D内容叠加平面

分两步：

2D 变换标记 marker and 点 points
旋转平移

15.1 2D transformation models

简单 $\rightarrow$ 通用

15.1.1 Euclidean transformation model

观察一个 a fronto-parallel plane （前平行平面）

已知距离 $D$. (平面的到针孔距离)

平面上的点坐标 $\textbf{w} = [u,v,0]^T$ 真实世界单位，比如毫米

针孔摄像头模型：$λ\tilde{x} = Λ[Ω, τ ]\tilde{w} $，这里就是相当于第三维是 0 的特殊情况(?)

我们把 $D$ 放到内部参数里，（本来在外在参数的）

又同成 $\Lambda^{-1}$ 了，认为是 Euclidean transformation.

也可以不用齐坐标系，笛卡尔坐标系也是矩阵形式

$x' = \textbf{euc}[w, Ω, τ ]$，也可以直接用一个 $\theta$ 最简单的旋转矩阵.....

15.1.2 Similarity transformation model

现在考虑如果不知道 $D$

两边同乘 $\rho = \frac{1}{D}$ （左边在外面，右边在外部参数矩阵里。

$x' = \text{sim}[w, Ω, τ, ρ]. $

15.1.3 Affine transformation model

之前都是 a fronto-parallel plane ，现在扩展到更 general 的情况。

现在平面比较可以任意了。就和之前一样，设置矩阵线性变换，任意选。

$x' = \textbf{aff}[w, Φ, τ] $

注意 $Λ$ 也有仿射变换的形式，仿射变换矩阵可以叠加的。

这样任意就把相似和旋转拼起来了，而且还能 shear（就是平四变成正方形之类的。

问题：仿射变换是否给了很好的近似和对应关系？得看情况。

15.1.4 Projective transformation model

把内外参的矩阵合起来变成一个 $3 \times 3$ 的矩阵。

在齐次坐标系矩阵是线性可以乘法，但笛卡尔坐标系要除常数，不好整。

$\text{x} = hom[w, Φ]$

只有八个自由度，还有一个是放缩因子

15.1.5 Adding uncertainty

增加正态分布噪点

$Pr(x|w) = Norm_x[ hom[w, Φ], σ2I] $

15.2 Learning in transformation models

图片点 $\textbf{w}_i$ 平面点 $\textbf{x}_i$

转换类 $\textbf{trans}[w_i, θ]. $ 学习 $\theta$

maximum likelihood approach $\rightarrow$ least squares problem

15.2.1 Learning Euclidean parameters

三个自由度，需要至少两个 pair 来估计。

用除掉维 $0$ 获得平均 $\tilde{τ}$

把整体求和同除这个系数除掉获得 $\mu_x - Ω_µw $

最终代换得到 $B - ΩA$ 的形式

弗罗贝尼乌斯范数 Frobenius norm 正交普鲁克问题 orthogonal Procrustes problem.

15.2.2 Learning similarity parameters

$I \ge 2$

先算 $Ω $，然后 maximum likelihood solution 估计最大的 $\hat{ρ}$

15.2.3 Learning affine parameters

六个未知

$I \ge 3$

重新扩展成大矩阵

a linear least squares problem

15.2.4 Learning projective parameters

八个自由度 $I \ge 4$

非线性问题

gradient-based optimization techniques. 梯度

保证加起来是 1

direct linear transformation or DLT algorithm

先齐次然后转换

a least squares SVD

15.3 Inference in transformation models

four transformations (Euclidean, similarity, affine, projective)

$x = \textbf{trans}[w, θ] $

知道 $x$ 反着推 $w$ 乘逆矩阵就行了。

15.4 Three geometric problems for planes

之前讲 2D 映射 2D 对应性

重新看三个问题

15.4.1 Problem 1: learning extrinsic parameters

知道内部参数，我们先求出 homography（单一对应）$Φ$，分解为 $Ω ,τ$. SVD ，再除以下（代数操作

15.4.2 Problem 2: learning intrinsic parameters

camera calibration 原始校准比较慢但是利于理解。现代实现将使用更复杂的技术来查找内部参数

15.4.3 Problem 3: inferring 3D position relative to camera

给 $\textbf{x}$ 求 $\textbf{w}$

如果场景是平面很容易一一映射，矩阵可逆乘一下就行

15.5 Transformations between images

之前考虑的是拍一个平面，一个相机。现在考虑两个相机之间的转化，还是一一对应，写成矩阵直接求逆就行。

16 Multiple Cameras

posted @ 2023-11-28 12:58 DMoRanSky 阅读(161) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

DMoRanSky

CV

13 SIFT 尺度不变特征转换

13.2.3 SIFT detector

理论：

local quadratic approximation (局部二次近似)

13.3 Descriptors

13.3.2 SIFT descriptors

OpenCV 实现

14 The Pinhole Camera 针孔摄像头

14.1

pinhole camera model 针孔摄像头模型

14.1.1 The normalized camera

14.1.2 Focal length parameters

14.1.3 Offset and skew parameters

14.1.4 Position and orientation of camera

14.1.5 Full pinhole camera model

14.1.6 Radial distortion

14.2 Three geometric problems

14.2.1 Problem 1: Learning extrinsic(外在) parameters

问题

分析

14.2.2 Problem 2: Learning intrinsic parameters

14.2.3 Problem 3: Inferring 3D world points

14.2.4 Solving the problems

14.3 Homogeneous coordinates

14.3.1 Camera model in homogeneous coordinates

14.4 Learning extrinsic parameters 外在参数

14.5 Learning intrinsic parameters

坐标上升方法 coordinate ascent method （效率低下）

14.6 Inferring 3D world points

14.7 Applications

14.7.1 Depth from structured light

14.7.2 Shape from silhouette 根据轮廓塑造

OpenCV 实现

15 Models for Transformations

15.1 2D transformation models

15.1.1 Euclidean transformation model

15.1.2 Similarity transformation model

15.1.3 Affine transformation model

15.1.4 Projective transformation model

15.1.5 Adding uncertainty

15.2 Learning in transformation models

15.2.1 Learning Euclidean parameters

15.2.2 Learning similarity parameters

15.2.3 Learning affine parameters

15.2.4 Learning projective parameters

15.3 Inference in transformation models

15.4 Three geometric problems for planes

15.4.1 Problem 1: learning extrinsic parameters

15.4.2 Problem 2: learning intrinsic parameters

15.4.3 Problem 3: inferring 3D position relative to camera

15.5 Transformations between images

16 Multiple Cameras

公告