Important concepts in CV

This post serves as a list of miscellaneous techniques that litters in the CV field. They are not listed in any particular order for now.

  1. Hard-negative mining. For each image and each possible scale of each image in your negative training set, apply the sliding window technique and slide your window across the image. At each window compute your HOG descriptors and apply your classifier. If your classifier (incorrectly) classifies a given window as an object (and it will, there will absolutely be false-positives), record the feature vector associated with the false-positive patch along with the probability of the classification. This approach is called hard-negative mining. Take the false-positive samples found during the hard-negative mining stage, sort them by their confidence (i.e. probability) and re-train your classifier using these hard-negative samples. (Note: You can iteratively apply steps 4-5, but in practice one stage of hard-negative mining usually [not not always] tends to be enough. The gains in accuracy on subsequent runs of hard-negative mining tend to be minimal.)
  2. Non-Maximum Suppression (link) can be used if multiple bounding boxes are returned for the same detected object.
  3. find bounding shapes (openCV link): boundingRect() finds the up-right bounding rectangle of a point set, minAreaRect() finds the rotated bounding rectangle of a point set which is often used with boxPoints() returning the four vertices of the rectangle, minEnclosingTriangle() finds the bounding triangle with min area, minEnclosingCircle() finds the bounding circle with min area.
  4. overlapping object detection, use watershed algorithm (openCV tutorial, PIS tutorial).
  5. Thresholding is to reduce a gray scale image to a binary image. Automatic (parameterless) threshold detection is usually more computationally intensive than those requiring a manual tuning process. Two widely used methods are Otsu’s method and Ridler-Calvard’s method, both of which are histogram-based thresholding method.
  6. The Otsu’s method assumes the pixels in a gray scale image are divided into two classes, the foreground and the background, following a bimodal histogram and finds the global optimal threshold to minimize the intra-cluster variance, or equivalently, maximize the inter-cluster variance. However, when the image background is uneven, finding a global threshold that generates good results may simply be impossible. This original method can be extended to a 2D Otsu’s adaptive method which find local threshold based on the gray scale value of each pixel and the average of its neighboring pixels. This can help greatly with noise corrupted images or images with uneven background (nonuniform illumination). Theoretically any method used for estimating the threshold can be made adaptive if applied locally in a block-wise or sliding window fashion, but the computational cost may be quite high, such as the 2D Otsu’s method. Ridler-Calvard’s method is an iterative version of Otsu’s method, and is generally faster and less computationally intensive as Otsu’s method.
  7. Drawbacks of Otsu’s method: it assumes the histogram is bimodal; it applies a global threshold and thus does not work with uneven background; it breaks when the two classes have extremely different sizes.
  8. Multilevel thresholding can be applied when there are more than 2 modes in the histogram, but it proves to be more difficult in practice.
  9. Histogram-based thresholding methods works the best when histogram peaks are tall, narrow, symmetric, and separated by deep valleys. If there is no clear valley in the histogram, that means there are background pixels with similar gray levels with object pixels. In this case, hysteresis thresholding which employs two threshold values, one at each side of the valley can be used. The threshold ratio is generally between 2:1 and 3:1. In hysteresis thresholding, low thresholded edges which are connected to high thresholded edges are retained. Low thresholded edges which are non connected to high thresholded edges are removed. Hysteresis thresholding is the only method that considers some form of spatial proximity. Other methods completely ignores spatial information.
  10. Niblack’s method is a much less computationally intensive version, which finds the local threshold to be t(i, j) = μ(i, j) + wσ(i, j), using a weighted average of local mean and standard deviation, but w need to be tuned manually.
  11. Edge detection employs gradients to find edge-like regions. The Sobel operator finds the partial derivatives of the image along the x- and y-axes by convolving with a ksize x ksize kernel. When ksize is 3, Sobel operator may generate noticeable inaccuracies. A similar Scharr operator is as fast but generates more accurate results. Laplacian operator adds up the second order derivatives along the x- and y-axes calculated by the Sobel operator.
  12. Canny detector is also called optimal detector, which has low error rate, good localization and minimal response. It has four steps: Filter out noise, find the intensity gradient of the image, apply non maximum suppression (only thin lines will remain), apply hysteresis thresholding using two thresholds.
  13. Contours can be found by calling cv2.findContours() function, which takes an input image of 8-bit single channel (grayscale) image. The image is treated as binary since nonzero values are treated as one. This binary image can be generated using threshold(), adaptiveThreshold() or Canny(), etc. The contour finding algorithm uses this algorithm and returns a topological hierarchy of contours.
  14. to be continued…
Important concepts in CV

ML notes

ROC curve characterizes the performance of a binary classifier as its discrimination threshold varies. It plots the true positive rate (TPR) against the false positive rate (FPR). In other words, it plots the recall/sensitivity against the (1-specificity).



ML notes

points and matrix accessing order in openCV

In openCV,

  • Points is in the order of (x,y);
  • Size is in the oder of (width,height);
  • Matrix is accessed in the order of (row,col)
  • Image is stored in the order of (height, width)

For example, in

cv2.resize(image, dsize, interpolation=cv2.INTER_AREA)

dsize is the output (destination, ergo d) image size in the order of (width, height).

cv2.warpAffine(image, M, dsize)

M is the 2×3 transformation matrix, dsize is the output image size in the order of (w, h). According to the documentation of warpAffine,

x' = M_11 * x + M_12 * y + M_13
y' = M_21 * x + M_22 * y + M_23

where (x, y) is the coordinates of the source pixel, and (x’, y’) is the coordinate for the destination of the warp.

Using the affine transform matrix M, we can take care of scaling, rotation, translation at the same time. However, since the increasing y direction is downward in an image, the rotational matrix is

[M_11, M_12] = [ cos(theta),  sin(theta)]
[M_21, M_22]   [-sin(theta),  cos(theta)]

which is different from the rotational matrix for the Cartesian system

[M_11, M_12] = [ cos(theta), -sin(theta)]
[M_21, M_22]   [ sin(theta),  cos(theta)]

An easy way to remember this is, a point at (x, y) = (1, 1) will be rotated 45 degrees to (1.4, 0) on the x-axis, not y-axis, as it would be in the Cartesian system.

points and matrix accessing order in openCV

Why openCV uses BGR (not RGB)

When using openCV to load an image, the color information is stored in the order of Blue-Green-Red instead of the currently more popular scheme RGB.

import cv2
image = cv2.imread(args["image"])
cv2.imshow("Image" , image)

This reads in and displays the correct image file. An alternative way to do this using matplotlib is as follows.

import matplotlib.image as mpimg
image = mpimg.imread(args["image"])
# plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

However, if we read in the image file through cv2 and display it with matplotlib or vice versa, the image will not be displayed correctly, since the R and B channels are flipped (see above link for an example image). Luckily, cv2 has a built-in way to correct this.

import cv2
import matplotlib.image as mpimg
image = cv.imread(args["image"])
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

Alternatively, we can hack this by swapping the B and R channel since it is the third dimension of the image.

image = image[:, :, ::-1] # or image = image[:, :, (2, 1, 0)]

According to the following post, BGR was introduced to the openCV in a time when BGR was the most popular format, and it got stuck. It is very similar to the funny story why US railway gauge is 4’8.5″.


Why openCV uses BGR (not RGB)

Installing OpenCV3 and Python3 in virtualenv on MacOS Sierra

I spent almost 3 hours today and finally got CV3 installed. I found this detailed tutorial by Andrian Rosebrock early on:

My Hackintosh desktop has not been upgraded for 3 years after it was built in 2013 (running Mountain Lion), and there was no way to get homebrew properly installed due to “Command Line Tool version too old” error. So I decided to upgrade my MacBook Air to the latest MacOS Sierra and install OpenCV on it, also to get a more (physically) portable development environment.

One thing I had installed on my MBA is Anaconda. This, as it proved later on, made my installation a bit more complicated and I had to make adjustment to the tutorial to make OpenCV work.

$ python --version
Python 2.7.10 :: Anaconda 2.3.0 (x86_64)
$ conda --version
conda 3.14.1

Here is a list of adjustments I had to make:

  • is not in /usr/local/bin/, but rather in

    /anaconda/bin/. Specify this path instead in ~/.bash_profile. 

  • After installing virtualenv, when creating virtualenv named cv and later calling workon cv, I got the following error message:
    Usage: source deactivate
    removes the 'bin' directory of the environment activated with 'source
    activate' from PATH. 

The error message is a bit cryptic, but it turns out that anaconda has a command called deactivate in Anaconda’s bin. So I had to rename the activate and deactivate commands.

$ cd /anaconda/bin/
$ mv activate activate_ana 
$ mv deadectivate deactivate_ana

  • OpenCV compiled correctly, but when I tried to import it, it throw an error message “libhdf5.10.dylib not loaded opencv”. The culprit is also the previously installed anaconda.
$ sudo find / -name libhdf5.10.dylib

Luckily I found a blog post in Japanese (日本語ができてよかった!) and the blogger had almost exactly the same problem as this. The problem is solved by

$ brew tap homebrew/science
$ brew install hdf5
$ brew search

(cv) [ llgc @ megatron ~ ]$ python
Python 3.6.0 (default, Dec 24 2016, 08:01:42) 
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>> cv2.__version__

Now I got Python 3.6 + OpenCV 3.2 successfully installed and imported. I will update this post if there are more hiccups caused by Anaconda down the road.

Edit: 01/11/2017

An alternative way to load and display image to cv2 is using matplotlib module. However, the latest version (matplotlib-1.5.3) does not work with virtual environment (e.g., non-system version of Python), and throws a “Framework” error The fix is to install an older version that is compatible with virtual environment (such as 1.4.3).

$ pip uninstall matplotlib
$ pip install matplotlib==1.4.3


Installing OpenCV3 and Python3 in virtualenv on MacOS Sierra

Embarking on a new journey

Our company, the MI branch of PE, has signed a deal to be bought by company V. The news came during the Christmas shutdown of our SC facility and it was a surprise for me although most of my colleagues (myself included) sensed that the management was having some secretive M&A plan going on. We met the new leadership team from our soon-to-be-mother-company this Friday and, to be honest, I was quite impressed by the eloquence and zealousness in the new CEO’s presentation on his blueprint of the merged company. However the future company he painted, a cost leader in the MI and NDT imaging component market, was hardly the technology-focused, innovative new player as I had anticipated. He even briskly denied the possibility of us entering the system integration market. That was the time when I know that I may have to reconsider my future with the current company.

I majored in physics in college and in graduate school and I always told myself that I was lucky enough to find a position closely related to my research in graduate school. And I really was, given the situation many of my classmates are in today. It also feels good to see the awe in my peers’ eyes when I inform them my job in this part of the country where almost everyone works as a software engineer — very similar experience to what I had when I informed my better half’s colleague in the financial industry that I had studied physics for the last decade of my life. I even loved my job for many months when I was learning new things about our products on a daily basis.

Now two years into my first job out of the ivory tower, my enthusiasm in the daily routine is gradually waning and I often find myself wondering about the opportunity cost (both financial and career) of me staying at the company for the next few years. Most of the time, I spend hours implementing a simple solution I came up in seconds. The job requires more attention to detail than problem solving. I believe it is time to steer my career into a slightly different course. The foreseeable future belongs to software and artificial intelligence. Working as a physicist in a hardware company manufacturing a device based on a century old concept and a decade old technology is not quite the optimal way of spending my most creative and productive years. Not to mention the new company we are evolving into is determined to wage a price war in the ever-crowded market.

Don’t get me wrong — the past two years at the company has been rewarding. I led a team to evaluate a new type of a key product component, wrote and presented a paper; my PR has been approved; went to a different country and lead the responsibility transfer on my worn. I wish when I look back at my first two years out of college at a later stage of my life, I would recount an overall positive tale to myself and my audience. However the achievements I am most proud of seem to be accomplished out of work, the DS projects, learning Cantonese and developing a fitness habit. This may suggest that my current job is not suitable for a lifetime pursuit.

The first step of switching jobs/careers is to evaluate the core skills I have developed. I believe it is a relatively easy question since image processing that is an unchanged theme in my past positions, and I like this theme to continue in my future career.

The second step is to figure out which future field I would like to be in. This is a question with more uncertainty and one that calls for more thinking. One of the field I have been interested for years is machine learning. I know ML is not my major in graduate school and I am starting to dabble though it, but based on my philosophy of not to make any choices that I would regret later, I believe my best next job would involve some degree of machine learning.

The third step is to try to establish the missing link. Image processing + machine learning = computer vision. The answer is fair and simple, so I need to brush up on the field of computer vision.

The M&A deal will close at Q2, so I have six month or so to make myself an expert in computer vision.

  • The goal: finding a computer vision related job around the end of Q2.
  • How to get there:
  1. First, use my time at work to develop useful applications using CV that should be genuinely useful for the company as well.
  2. Second, make use of nights and weekends to read books and learn new things. Dedicate at least 3 hours a day during weekday nights and 4 hours a day during weekends. That is 3*5+4*2=23 hours a week, and 6 months would be 23*26=598 hours. 600 hours should get me somewhere in the intermediate level. At least one hour a day should be spent on summarizing the topics I have learned during the day. Keep this blog to keep track of progress and organize my new thoughts.
  3. During the first few weeks, I would expect reading several books and following along different online course to get a general feeling of the field. Make plans and adapt them to keep track of time.

Resources to start with:

  1. OpenCV crash course ( The openCV book “Practical Python and OpenCV + Case Studies”( also looks good.
  2. A list of books on openCV on openCV’s website (













Embarking on a new journey