# Comparison of Histogram Feature Based Thresholding with 3S Multi-Thresholding and Fuzzy C-Means

and#### Abstract

### Introduction:

Thresholding is one of the most important parts of segmentation whenever we want to detect a specific part of image. There are several thresholding methods that previous researchers used them frequently as bi-level techniques such as DBT or multilevel such as 3S. New histogram feature thresholding method is implemented to detect lesion area in digital mammograms and compared with 3S (Shrinking-Search-Space) multi-thresholding and FCM method in terms of segmentation quality and segmentation time as a benchmark in thresholding.

### Material and Methods:

These algorithms have been tested on 188 digital mammograms. Digital mammogram image used after preprocessing which was including crop the unnecessary area, resize the image into 1024 by 1024 pixel and then normalize pixel values by using simple contrast stretching method.

### Results:

The results show that suggested method results are not similar with 3S and FCM methods, and it is faster than other methods. This is another superiority of suggested method with respect to others. Results of previous studies showed that FCM is not a reliable clustering algorithm and it needs several run to give us a reliable result. Results of this study also showed that this approach is correct.

#### INTRODUCTION

Thresholding is one of the most important parts of segmentation whenever we want to detect a specific part of image. There are several thresholding methods that previous researchers used frequently as bi-level techniques such as DBT [1] or multilevel such as 3S [2]. The bi-level techniques also could be used as multilevel, since just the optimum search space should be found. The 3S multilevel thresholding method has been applied on MRI images and results showed that it is faster and more reliable in comparison with FCM.

C-means and Fuzzy C-means are two types of the earliest unsupervised learning methods in clustering. Each data will be categorized under a cluster based on the similarity criterion when c-means or fuzzy c-means are used.

According to previous researches several methods developed to produce effective multilevel thresholding techniques [3-8]. In this study a new thresholding method is suggested and compared with FCM and 3S in terms of detection of lesion area in digital mammograms. Since in digital mammograms, there are different tissues (e.g. fatty, dense, mass etc.), almost all bi-level thresholding methods are not useful to separate different areas. Using bi-level methods leads radiologists to miss malignancy and reduce sensitivity in terms of early detection of breast lesions. Therefore, an optimum thresholding method is highly necessary to help radiologists as a second reader in order to select lesion area.

The main aim of this research was to develop a new thresholding method in order to detect lesion area in digital mammograms. The other objective of this study was to compare new suggested method with two other common methods FCM and 3s.

C-Means and Fuzzy C-means

The C-Means method is suggested by McQueen in 1967 and it was developed during last decades. This method works based on the number of clusters that whole data divided to. A center point will be identified for each cluster in its center. Each data will be belonging to each cluster if its distance to the center point is less than other clusters [9].

At the first step the initial value of center points (there are k center points) will be defining. The number of center points defines based on the user knowledge on data distribution. At the second step all data cluster in terms of their distance to the center points.

At the next steps, calculation of center points and rearrange of clustering based on the new points frequently repeat until the objective function does not change anymore. Objective function is defined as equation (1).

${\mathit{obj}}_{-\mathit{func}}=\sum _{j=1}^{k}{\sum _{i=1}^{n}\parallel {x}_{j}^{i}-{c}_{j}\parallel}^{2}$

Where ${x}_{k}^{j}$ is the i^{th} data point that has been recognized to belong to the j^{th} cluster with the center point ${c}_{j}$ [2].

So the C-Means method steps could be summarized as: a) place the center points as representative of clusters b) assign data to closest cluster c) assign labels to clusters and d) repeat steps b and c until no change in objective function results. There are some limitations in using C-Means method since it needs to run several time and changing defined central point’s effect on the results.

Fuzzy C-Means (FCM) is developed based on the same idea however there is a number of differences here. The most important difference between C-Means and FCM is in the clustering method. In C-Means method each data could be belong to just one cluster however in FCM each data may categorized under more than one cluster based on its membership degree [10-12]. This method which is used by many researchers [13, 14] worked based on the following objective function:

${\mathit{obj}}_{-{\mathit{func}}_{\mathit{FCM}}}=\sum _{j=1}^{k}{\sum _{i=1}^{n}{u}_{\mathit{ij}}^{m}\parallel {x}_{i}-{c}_{j}\parallel}^{2}\mathit{for1}<m<\infty $

Where n is the number of data points, and m is the fuzzyness degree, which is any real number greater than 1, ${x}_{i}$is the i^{th} data point, ${c}_{j}$is the center point of the j^{th} cluster, and ${u}_{\mathit{ij}}^{m}$ is the membership degree of the i^{th} data point to the j^{th} cluster.

The membership and center points will be carried out based on the following equations:

${u}_{\mathit{ij}}^{m}=\frac{1}{\sum _{m=1}^{k}\left(\frac{\parallel {x}_{i}-{c}_{j}\parallel}{\parallel {x}_{i}-{c}_{m}\parallel}\right)}$

${c}_{j}=\frac{\sum _{i=1}^{n}{u}_{\mathit{ij}}^{m}{x}_{i}}{\sum _{i=1}^{n}{u}_{\mathit{ij}}^{m}}$

However this method is more optimize than C-Means, it is still depending on the number of clusters and predefined centers [15, 16].

3S method

This method is built based on DBT method which is a bi-level thresholding technique [1]. The DBT as a bi-level thresholding method tries to compute total average information for discriminating class C_{0} from class C_{1}. Discriminating information for class C_{1 }versus class C_{0} can be measured by using the logarithm of the likelihood ratios.

${J}_{\left({C}_{0}\mathit{.}{C}_{1}\right)}=\frac{1}{{w}_{0}}\mathit{ln}\frac{{w}_{1}}{{w}_{0}}+\frac{1}{{w}_{1}}\mathit{ln}\frac{{w}_{0}}{{w}_{1}}$

Where w_{0} and w_{1} are the probability of occurrence of classes C_{0} and C_{1} respectively [1]. To get thresholding value T, to make separation between object and background, the result of the function need to be minimized. As it shown earlier, DBT as a bi-level method is useful to apply on images that there is just one object as well as text and background. However in the other situations such as digital mammograms we need to use multilevel thresholding methods. So the 3S method is resulted from extension of DBT.

3S method works based on the following steps (Fig 1):

Identify the region of interest (ROI) that fits to whole image.

Perform DBT to find the best thresholding value.

Save class C

_{0}and continue with C_{1}as the original image.Class C

_{1}(as original image) divide into two class C_{1}and C_{2}.The procedure will be repeated until reach to highest value of histogram.

During above steps different optimum thresholding values will be found [2].

#### MATERIALS AND METHODS

The suggested method is built based on a bi-level thresholding technique as the core of a multi-thresholding technique. Block diagram (Fig 2) shows suggested method architecture. As clearly shown in Fig 2, first of all the original images were opened. Then, one of the original images (right or left) will be rotated to make same direction and a subtraction image of right and left side will be produced. At the next step, statistical histogram features including mean and standard deviation (SD) were calculated. In normal distribution, from -∞ up to mean is including 50% of under the curve area and also mean plus one SD is including 42.5% of under the curve area. Therefore, from -∞ up to mean plus 1 SD cover 92.5% of the under the curve area. This is proven previously that the lesions are in the last 10 percentile of pixel values [13]. Because of these reasons, thresholding value to extract the lesion area in digital mammograms is equal to Mean +SD. All pixel values which are equal to or greater than this criterion will be changed to 1 and the rest changed to 0. Based on this procedure a binary image will be produced that just is containing the objects. Thresholding is highly depends on pixel values. Since preprocessing phase specially filtering make changes in pixel values, thus in this research the filtering methods are not employed and original images were used for system evaluation.

#### RESULTS

Detection processes for FCM, 3S and the suggested method was run on the same computer with Intel core 2 duo processor (1.88 Mhz) and 2 GB RAM. All algorithms have been tested using 188 digital mammograms after preprocessing, which was including crop the unnecessary area to select an ROI in size of 1024 by 1024 pixel and then normalize pixel values using simple contrast stretching method. The contrast stretching method is employed since the proposed method worked based on normal distribution (Fig 3).

Threshold levels that FCM algorithm recognizes were shown after 235.47 seconds as average of processing time. We have defined three clusters for this algorithm, so gray levels of the input image have been separated to three groups (Fig 4).

3S multi-thresholding took an average of 33.896 seconds for processing. Comparing with the results obtained from FCM method, there was not equal quality of thresholding between FCM and 3S, and also there is different processing time (Fig 5).

The result of proposed histogram feature based thresholding method was clearly shown that using proposed method is helpful since it is less complex than other methods, less time consuming (21.45 seconds) and it selected correct area including lesion (Fig 6).

Three expert radiologists with at least 5 year experience in mammogram interpretation were invited to select lesion area. The performance of three methods in terms of correct detection of lesion area compared with radiologists idea. The results show that the FCM had the lowest performance in comparison with two other methods. It means that the performance was lower and the processing time was much higher than other methods. While 3S and proposed histogram feature based methods had almost similar results however the processing time of 3S was higher than our proposed method. It was because of frequently process that 3S did (Table 1).

#### DISCUSSION

We have implemented the FCM and 3S algorithm on the same data, to compare their functionality with our proposed method. The results showed that FCM is a time consuming method since it should repeat all steps to get better result [17, 18]. In the other side 3S had better results since it shows higher correct answer in terms of detection of breast lesions. In comparison with both FCM and 3S methods, the proposed method had better performance because it takes less time for processing and give us more correct answers. This showed that our proposed method is an acceptable method in comparison with 3S and FCM in terms of processing time and accurate detection of lesions.

#### CONFLICTS OF INTEREST

The authors declare no conflicts of interest regarding the publication of this study.

#### FINANCIAL DISCLOSURE

No financial interests related to the material of this manuscript have been declared.

#### References

*IEEE*1995;

*IEEE*2009;

*Pattern Recognition Letters*2007;28:662–9.

*Journal Information Science Engineering*2001;17:713–27.

*Pattern Recognition Letters*2009;30:275–84.

*Sensors (Basel)*2012;12(3):2373–99.

*BMC Bioinformatics*2010;11(6):S26.

*Comput Med Imaging Graph*2011;35(1):42–50.

*Journal of Shanghai Jiaotong University (Science)*2018;23(5):636–42.

*Med Phys*2011;38(7):4365–71.

*Med Phys*2011;38(6):2879–91.

*IEEE Transactions on Information Technology in Biomedicine*2008;12(1):55–65.

*Med Image Comput Comput Assist Interv*2011;14(3):562–9.