Otsu's method

Otsu's method is an image processing algorithm commonly used to compute a threshold level to generate a mask of objects within an image. The original algorithm^[1] calculates a global threshold - a single threshold level is calculated that divides the pixels in an image into two classes (sometimes called the background and foreground). This threshold is typically determined by using an exhaustive search to identify the grayscale value which maximizes the variance between the two classes. Thus, this algorithm works best for images which have objects which have significantly different intensities compared to the background, e.g. for fluorescence images.

How it works

Otsu's original algorithm was developed for grayscale images, which can have pixel values between $0$ and $2^{N}-1$ , where $N$ is the number of bits in the image. The output of the algorithm is a threshold level $k$ , which splits the pixels in the image into two classes. Thus, the two classes will consist of all pixels between $[0,\ldots ,k]$ and $[k+1,\ldots ,2^{N}-1]$ . These classes are labeled 1 and 2 in the equations below.

To find the threshold level, the algorithm computes either the within-class variance $\sigma _{W}^{2}$ or the between-class variance $\sigma _{B}^{2}$ . These variances are statistical metrics defined by the following equations:

$\sigma _{W}^{2}=\omega _{1}\sigma _{1}^{2}+\omega _{2}\sigma _{2}^{2}$ $\sigma _{B}^{2}=\omega _{1}\omega _{2}(\mu _{1}-\mu _{2})^{2}$

Note that only one of the metrics - within-class variance or between-class variance $\sigma _{B}^{2}$ - needs to be computed as they are related by the equation $\sigma _{W}^{2}+\sigma _{B}^{2}=\sigma _{T}^{2}$ . Thus maximizing $\sigma _{B}^{2}$ is the same as minimizing $\sigma _{W}^{2}$ .

The probabilities of a pixel being in class 1 (or 2) $\omega _{1,2}$ are given by

$\omega _{1}=\sum _{i=1}^{k}p_{i}={\frac {N_{1}}{N_{1}+N_{2}}}$ $\omega _{2}=\sum _{i=k+1}^{2^{N}-1}p_{i}={\frac {N_{2}}{N_{1}+N_{2}}}$

where $p_{i}$ is the probability of a pixel of value $i$ (i.e., the height of each bar in the image histogram), and $N_{1}$ and $N_{2}$ are the number of pixels in class 1 or class 2. The mean values $\mu _{1,2}$ and the variances $\sigma _{1,2}$ of the two classes are given by the following equations:

$\mu _{1}=\sum _{i=1}^{k}p_{i}x_{i}$ $\mu _{2}=\sum _{i=k+1}^{2^{N}-1}p_{i}x_{i}$ $\sigma _{1}=\sum _{i=1}^{k}(i-\mu _{1})^{2}p_{i}/\omega _{1}$ $\sigma _{2}=\sum _{i=k+1}^{2^{N}-1}(i-\mu _{2})^{2}p_{i}/\omega _{2}$

The algorithm performs an exhaustive search, calculating these values for every possible value for the threshold (i.e., between $0$ and $2^{N}-1$ ). The optimal threshold (at least according to the original paper) is the one that best separates the two classes, i.e. provides the largest between-class variance $\sigma _{B}^{2}$ .

Implementation

Note that these quantities can be calculated off the intensity histogram. Most implementations use this to speed up calculations.

Additionally, as noted in the original paper, the statistic $\sigma _{W}^{2}$ relies on second-order statistics (variances) while $\sigma _{B}^{2}$ relies only on first-order statistics (means). Hence, $\sigma _{B}^{2}$ is often used as it is computationally simplest (i.e., requires the least number of computational operations).

Example

How the algorithm works is illustrated by the following example. As shown in Fig. 1, the example image used is 6 x 6 pixel and shows a bright object in the center (e.g., a bright fluorescent bead). The image intensity histogram for this image is shown on the right.

The algorithm starts by choosing a threshold value and calculates the within-class and between-class variances as shown in Template:EquationNote and Template:EquationNote above. For example, assuming a threshold value of 1, the pixels in the image will be split into two classes as shown in Fig. 2.

The probability, mean, and variance for the pixels in Class 1 are then given by:

${\begin{aligned}\omega _{1}&={\frac {N_{1}}{N_{1}+N_{2}}}\\&={\frac {9+6}{9+6+4+5+8+4}}\\&=0.42\end{aligned}}$

${\begin{aligned}\mu _{1}&={\frac {\sum _{i}p_{i}x_{i}}{N_{1}}}\\&={\frac {(9\cdot 0)+(6\cdot 1)}{9+6}}\\&=0.4\end{aligned}}$

${\begin{aligned}\sigma _{1}^{2}&={\frac {(9\cdot (0-0.4)^{2}+6\cdot (1-0.4)^{2})}{9+6}}\\&=0.24\end{aligned}}$

The probability, mean, and variance for the pixels in Class 2 are given by:

${\begin{aligned}\omega _{2}&={\frac {N_{2}}{N_{1}+N_{2}}}\\&={\frac {4+5+8+4}{9+6+4+5+8+4}}\\&=0.58\end{aligned}}$

${\begin{aligned}\mu _{2}&={\frac {\sum _{i}p_{i}x_{i}}{N_{2}}}\\&={\frac {(4\cdot 2)+(5\cdot 3)+(8\cdot 4)+(4\cdot 4)}{4+5+8+4}}\\&=3.94\end{aligned}}$

${\begin{aligned}\sigma _{2}^{2}&={\frac {(4\cdot (2-3.57)^{2}+5\cdot (3-3.57)^{2})+8\cdot (4-3.57)^{2}+4\cdot (5-3.57)^{2}}{4+5+8+4}}\\&=1.01\end{aligned}}$

Using these values, the within-class and between-class variances are:

${\begin{aligned}\sigma _{W}^{2}&=\omega _{1}\sigma _{1}^{2}+\omega _{2}\sigma _{2}^{2}\\&=0.42\cdot 0.24+0.58\cdot 1.01\\&=0.69\end{aligned}}$

${\begin{aligned}\sigma _{B}^{2}&=\omega _{1}\omega _{2}(\mu _{1}-\mu _{2})^{2}\\&=0.53\cdot 0.47(0.4-3.57)^{2}\\&=2.44\end{aligned}}$

To find the optimal threshold value, the algorithm performs these calculations for each possible value. In this example, the algorithm would compute the values between 0 and 5 since these is the range of intensities present in the image. The table below summarizes the calculations:

Calculated values for different thresholds
Threshold	0	1	2	3	4	5
$\omega _{1}$	0.25	0.42	0.53	0.67	0.89	1
$\mu _{1}$	0	0.4	0.74	1.21	1.91	2.25
$\sigma _{1}^{2}$	0	0.24	0.62	1.33	2.46	3.13
$\omega _{2}$	0.75	0.58	0.47	0.33	0.11	0
$\mu _{2}$	3	3.57	3.94	4.33	5	0
$\sigma _{2}^{2}$	1.93	1.01	0.53	0.22	0	0
$\sigma _{W}^{2}$	1.44	0.69	0.57	0.96	2.19	0
$\sigma _{B}^{2}$	1.69	2.44	2.56	2.17	0	0

The algorithm then outputs the threshold value that gives the lowest value for $\sigma _{W}^{2}$ or similarly the highest value for $\sigma _{B}^{2}$ , which is 2. Figure 3 below shows the result of a mask generated using the thresholding operation $I>2$ .

Advantages and disadvantages

Otsu's method provides a (computationally) fast way to determine a suitable threshold level during masking. However, the original algorithm assumes that there are two distinct classes of intensity (i.e., that the intensity histogram shows two peaks separated by a valley). The algorithm therefore does poorly if the image has objects with multiple distinct intensity distributions or if the objects are very dim, and therefore close in intensity to the background. It also performs poorly if the image suffers from uneven illumination. Enhancements to the algorithm have since been made to improve its performance in these situations.

References

↑ N. Otsu, "A Threshold Selection Method from Gray-Level Histograms," in IEEE Transactions on Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 62-66, Jan. 1979, doi: 10.1109/TSMC.1979.4310076.

[1] N. Otsu, "A Threshold Selection Method from Gray-Level Histograms," in IEEE Transactions on Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 62-66, Jan. 1979, doi: 10.1109/TSMC.1979.4310076.

[1]