detector head
In some domains, head is a term for the start or the beginning of something. In this domain it's different. In many tasks in computer vision you usually use a "backbone", which is usually pre-trained on ImageNet. This way, the backbone is used as a feature extractor, which gives you a feature map representation of the input. Now that you have such feature map, you need to perform the actual task, such as detection, segmentation, etc. The way you do it is usually by applying a "detection head" on the feature map(s), so it's like a head attached to the backbone.
In the case of object detection, you need two output types: classification confidences and bounding boxes. They can be two different, decoupled heads (e.g. RetinaNet), or a single head which computes both outputs (e.g. SSD). In both cases, you need to point out the exact way to interpret the output. For example, the bounding box regression outputs, are they relative to an anchor? Or maybe relative to the entire image? The classification confidences - do you use softmax on the output to receive the confidences? etc.
浙公网安备 33010602011771号