1. 第 1 个问题

You are building a 3-class object classification and localization algorithm. The classes are: pedestrian (c=1), car (c=2), motorcycle (c=3). What would be the label for the following image? Recall $[p_c, b_x, b_y, b_h, b_w, c_1, c_2, c_3]y=[pc,bx,by,bh,bw,c1,c2,c3]$

第 2 个问题

1
point

2. 第 2 个问题

Continuing from the previous problem, what should y be for the image below? Remember that “?” means “don’t care”, which means that the neural network loss function won’t care what the neural network gives for that component of the output. As before, $[p_c, b_x, b_y, b_h, b_w, c_1, c_2, c_3]y=[pc,bx,by,bh,bw,c1,c2,c3].$

第 3 个问题

1
point

3. 第 3 个问题

You are working on a factory automation task. Your system will see a can of soft-drink coming down a conveyor belt, and you want it to take a picture and decide whether (i) there is a soft-drink can in the image, and if so (ii) its bounding box. Since the soft-drink can is round, the bounding box is always square, and the soft drink can always appears as the same size in the image. There is at most one soft drink can in each image. Here’re some typical images in your training set:

What is the most appropriate set of output units for your neural network?

Logistic unit (for classifying if there is a soft-drink can in the image)

Logistic unit, $b_x and b_y √$

Logistic unit, $b_x, b_y, b_h (since b_w= b_h)$

Logistic unit, $b_x, b_y, b_h, b_w$

第 4 个问题

1
point

4. 第 4 个问题

If you build a neural network that inputs a picture of a person’s face and outputs N landmarks on the face (assume the input image always contains exactly one face), how many output units will the network have?

N

2N √

3N

$N^2N2$

第 5 个问题

1
point

5. 第 5 个问题

When training one of the object detection systems described in lecture, you need a training set that contains many pictures of the object(s) you wish to detect. However, bounding boxes do not need to be provided in the training set, since the algorithm can learn to detect the objects by itself.

True

False √

第 6 个问题

1
point

6. 第 6 个问题

Suppose you are applying a sliding windows classifier (non-convolutional implementation). Increasing the stride would tend to increase accuracy, but decrease computational cost.

True

False √

第 7 个问题

1
point

7. 第 7 个问题

In the YOLO algorithm, at training time, only one cell ---the one containing the center/midpoint of an object--- is responsible for detecting this object.

True √

False

第 8 个问题

1
point

8. 第 8 个问题

What is the IoU between these two boxes? The upper-left box is 2x2, and the lower-right box is 2x3. The overlapping region is 1x1.

1/6

1/9 √

1/10

None of the above

第 9 个问题

1
point

9. 第 9 个问题

Suppose you run non-max suppression on the predicted boxes above. The parameters you use for non-max suppression are that boxes with probability $\leq≤ 0.4 are discarded, and the IoU threshold for deciding if two boxes overlap is 0.5. How many boxes will remain after non-max suppression?$

3

4

5 √

6

7

第 10 个问题

1
point

10. 第 10 个问题

Suppose you are using YOLO on a 19x19 grid, on a detection problem with 20 classes, and with 5 anchor boxes. During training, for each image you will need to construct an output volume

19x19x(5x25)

19x19x(25x20)

19x19x(5x20) √

19x19x(20x25)

-----------------------------------------------------------中文版-------------------------------------------------------------------------

中文版摘自:https://blog.csdn.net/u013733326/article/details/80306093

检测算法

现在你要构建一个能够识别三个对象并定位位置的算法，这些对象分别是：行人（c=1），汽车（c=2），摩托车（c=3）。下图中的标签哪个是正确的？注： $y = [p_{c}, b_{x}, b_{y}, b_{h}, b_{w}, c_{1}, c_{2}, c_{3}]$
- 【★】 y=[1, 0.3, 0.7, 0.3, 0.3, 0, 1, 0]
- 【】 y=[1, 0.7, 0.5, 0.3, 0.3, 0, 1, 0]
- 【】 y=[1, 0.3, 0.7, 0.5, 0.5, 0, 1, 0]
- 【】 y=[1, 0.3, 0.7, 0.5, 0.5, 1, 0, 0]
- 【】 y=[0, 0.2, 0.4, 0.5, 0.5, 0, 1, 0]
继续上一个问题，下图中y的值是多少？注：“？”是指“不关心这个值”，这意味着神经网络的损失函数不会关心神经网络对输出的结果，和上面一样， $y = [p_{c}, b_{x}, b_{y}, b_{h}, b_{w}, c_{1}, c_{2}, c_{3}]$
- 【】 y=[1, ?, ?, ?, ?, 0, 0, 0]
- 【★】y=[0, ?, ?, ?, ?, ?, ?, ?]
- 【】 y=[?, ?, ?, ?, ?, ?, ?, ?]
- 【】 y=[0, ?, ?, ?, ?, 0, 0, 0]
- 【】 y=[1, ?, ?, ?, ?, ?, ?, ?]
你现在任职于自动化工厂中，你的系统会看到一罐饮料从传送带上下来，你想要对其进行拍照，然后确定照片中是否有饮料罐，如果有的话就对其进行包装。饮料罐头是圆的，而包装盒是方的，每一罐饮料的大小是一样的，每个图像中最多只有一罐饮料，现在你有下面的方案可供选择，这里有一些训练集图像：
- 【】 Logistic unit (用于分类图像中是否有罐头)
- 【★】Logistic unit, $b_{x}$
- 【】 Logistic unit, $b_{x}$
- 【】 Logistic unit, $b_{x}$
博主注：因为每个罐头大小是一定的，所以我们只需要知道它的中心位置就好了。
如果你想要构建一个能够输入人脸图片输出为N个标记的神经网络（假设图像只包含一张脸），那么你的神经网络有多少个输出节点？
- 【】 N
- 【★】2N
- 【】 3N
- 【】 $N^{2}$
博主注：图像是二维的，指定一个位置应该是(x,y)，那么，一个标记就需要两个节点。
当你训练一个视频中描述的对象检测系统时，里需要一个包含了检测对象的许多图片的训练集，然而边界框不需要在训练集中提供，因为算法可以自己学习检测对象，这个说法对吗？
- 【】正确
- 【★】错误
假如你正在应用一个滑动窗口分类器（非卷积实现），增加步伐不仅会提高准确性，也会降低成本。
- 【】正确
- 【★】错误
在YOLO算法训练时候，只有一个包含对象的中心/中点的一个单元负责检测这个对象。
- 【★】正确
- 【】错误
这两个框中IoU大小是多少？左上角的框是2x2大小，右下角的框是2x3大小，重叠部分是1x1。
- 【】 1/6
- 【★】1/9
- 【】 1/10
- 【】以上都不是
博主注： $\frac{1 \times 1}{2 \times 2 + 2 \times 3 - 1 \times 1} = \frac{1}{9}$
假如你在下图中的预测框中使用非最大值抑制，其参数是放弃概率≤ 0.4的框，并决定两个框IoU的阈值为0.5，使用非最大值抑制后会保留多少个预测框？
- 【】 3
- 【】 4
- 【★】5
- 【】 6
- 【】 7
假如你使用YOLO算法，使用19x19格子来检测20个分类，使用5个锚框（anchor box）。在训练的过程中，对于每个图像你需要输出卷积后的结果 $y$
- 【】 19x19x(25x20)
- 【】 19x19x(20x25)
- 【★】19x19x(5x25)
- 【】 19x19x(5x20)

吴恩达深度学习笔记 course4 week3 测验

1. 第 1 个问题

2. 第 2 个问题

3. 第 3 个问题

4. 第 4 个问题

5. 第 5 个问题

6. 第 6 个问题

7. 第 7 个问题

8. 第 8 个问题

9. 第 9 个问题

10. 第 10 个问题

检测算法

公告