This post serves as a list of miscellaneous techniques that litters in the CV field. They are not listed in any particular order for now.
- Hard-negative mining. For each image and each possible scale of each image in your negative training set, apply the sliding window technique and slide your window across the image. At each window compute your HOG descriptors and apply your classifier. If your classifier (incorrectly) classifies a given window as an object (and it will, there will absolutely be false-positives), record the feature vector associated with the false-positive patch along with the probability of the classification. This approach is called hard-negative mining. Take the false-positive samples found during the hard-negative mining stage, sort them by their confidence (i.e. probability) and re-train your classifier using these hard-negative samples. (Note: You can iteratively apply steps 4-5, but in practice one stage of hard-negative mining usually [not not always] tends to be enough. The gains in accuracy on subsequent runs of hard-negative mining tend to be minimal.)
- Non-Maximum Suppression (link) can be used if multiple bounding boxes are returned for the same detected object.
- find bounding shapes (openCV link): boundingRect() finds the up-right bounding rectangle of a point set, minAreaRect() finds the rotated bounding rectangle of a point set which is often used with boxPoints() returning the four vertices of the rectangle, minEnclosingTriangle() finds the bounding triangle with min area, minEnclosingCircle() finds the bounding circle with min area.
- overlapping object detection, use watershed algorithm (openCV tutorial, PIS tutorial).
- Thresholding is to reduce a gray scale image to a binary image. Automatic (parameterless) threshold detection is usually more computationally intensive than those requiring a manual tuning process. Two widely used methods are Otsu’s method and Ridler-Calvard’s method, both of which are histogram-based thresholding method.
- The Otsu’s method assumes the pixels in a gray scale image are divided into two classes, the foreground and the background, following a bimodal histogram and finds the global optimal threshold to minimize the intra-cluster variance, or equivalently, maximize the inter-cluster variance. However, when the image background is uneven, finding a global threshold that generates good results may simply be impossible. This original method can be extended to a 2D Otsu’s adaptive method which find local threshold based on the gray scale value of each pixel and the average of its neighboring pixels. This can help greatly with noise corrupted images or images with uneven background (nonuniform illumination). Theoretically any method used for estimating the threshold can be made adaptive if applied locally in a block-wise or sliding window fashion, but the computational cost may be quite high, such as the 2D Otsu’s method. Ridler-Calvard’s method is an iterative version of Otsu’s method, and is generally faster and less computationally intensive as Otsu’s method.
- Drawbacks of Otsu’s method: it assumes the histogram is bimodal; it applies a global threshold and thus does not work with uneven background; it breaks when the two classes have extremely different sizes.
- Multilevel thresholding can be applied when there are more than 2 modes in the histogram, but it proves to be more difficult in practice.
- Histogram-based thresholding methods works the best when histogram peaks are tall, narrow, symmetric, and separated by deep valleys. If there is no clear valley in the histogram, that means there are background pixels with similar gray levels with object pixels. In this case, hysteresis thresholding which employs two threshold values, one at each side of the valley can be used. The threshold ratio is generally between 2:1 and 3:1. In hysteresis thresholding, low thresholded edges which are connected to high thresholded edges are retained. Low thresholded edges which are non connected to high thresholded edges are removed. Hysteresis thresholding is the only method that considers some form of spatial proximity. Other methods completely ignores spatial information.
- Niblack’s method is a much less computationally intensive version, which finds the local threshold to be t(i, j) = μ(i, j) + wσ(i, j), using a weighted average of local mean and standard deviation, but w need to be tuned manually.
- Edge detection employs gradients to find edge-like regions. The Sobel operator finds the partial derivatives of the image along the x- and y-axes by convolving with a ksize x ksize kernel. When ksize is 3, Sobel operator may generate noticeable inaccuracies. A similar Scharr operator is as fast but generates more accurate results. Laplacian operator adds up the second order derivatives along the x- and y-axes calculated by the Sobel operator.
- Canny detector is also called optimal detector, which has low error rate, good localization and minimal response. It has four steps: Filter out noise, find the intensity gradient of the image, apply non maximum suppression (only thin lines will remain), apply hysteresis thresholding using two thresholds.
- Contours can be found by calling cv2.findContours() function, which takes an input image of 8-bit single channel (grayscale) image. The image is treated as binary since nonzero values are treated as one. This binary image can be generated using threshold(), adaptiveThreshold() or Canny(), etc. The contour finding algorithm uses this algorithm and returns a topological hierarchy of contours.
- to be continued…