Cluster Analysis in Practice: Dealing with Outliers in Managerial Research

Main Article Content

Humberto Elias Garcia Lopes orcid
Marlusa de Sevilha Gosling orcid


Context: in recent years, cluster analysis has stimulated researchers to explore new ways to understand data behavior. The computational ease of this method and its ability to generate consistent outputs, even in small datasets, explain that to some extent. However, researchers are often mistaken in holding that clustering is a terrain in which anything goes. The literature shows the opposite: they must be careful, especially regarding the effect of outliers on cluster formation. Objective: in this tutorial paper, we contribute to this discussion by presenting four clustering techniques and their respective advantages and disadvantages in the treatment of outliers. Methods: for that, we worked from a managerial dataset and analyzed it using k-means, PAM, DBSCAN, and FCM techniques. Results: our analyzes indicate that researchers have distinct clustering techniques for dealing with outliers accordingly. Conclusion: we concluded that researchers need to have a more diversified repertoire of clustering techniques. After all, this would give them two relevant empirical alternatives: choose the most appropriate technique for their research objectives or adopt a multi-method approach.


Download data is not yet available.

Article Details

How to Cite
Lopes, H. E. G., & Gosling, M. de S. (2020). Cluster Analysis in Practice: Dealing with Outliers in Managerial Research. Journal of Contemporary Administration, 25(1), e200081.


Acock, A. C. (2014). A gentle introduction to Stata (4th ed). College Station: Stata Press.
Adams, J., Hayunga, D., Mansi, S., Reeb, D., & Verardi, V. (2019). Identifying and treating outliers in finance. Financial Management, 48(2), 345–384.
Aggarwal, C. (2014). An introduction to cluster analysis. In C. C. Aggarwal, C. K. Reddy (Eds.), Data clustering: Algorithms and applications (pp. 1-28). New York: CRC Press.
Besanko, D., Dranove, D., Shanley, M., & Schaefer, S. (2016). Economics of strategy (7th ed). Toronto: Wiley.
Beysolow, T. (2017). Introduction to deep learning using R: A step-by-step guide to learning and implementing deep learning models using R. New York: Apress.
Bezdek, J. (1981). Pattern recognition with fuzzy objective function algorithms. New York: Plenum Press.
Bhat, A. (2014). K-medoids clustering using partitioning aroud medoids for performing face recognition. International Journal of Soft computing, Mathematics and Control, 3(3), 1-12.
Boehmke, B., & Greenwell, B. (2019). K-means Clustering (p. 399–416). New York: CRC Press.
Caffo, B. (2016). Statistical inference for data science. British Columbia, UK: Leanpub.
Dunn, J. C. (1973). A Fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics, 3(3), 32–57.
Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996 August). A density-based algorithm for discovering clusters in large spatial databases with noise.  Proceedings of the International Conference on Knowledge Discovery and Data Mining, Munchen, Germany, 2. Retrieved from
Everitt, B. S., & Hothorn, T. (2006). Cluster analysis. In B. S. Everitt, T. Hothorn, A handbook of statistical analyses using R (pp. 243–258). New York: CRC Press.
Fávero, L. P., & Belfiore, P. (2017). Análise de agrupamentos. In Manual de análise de dados: Estatística e modelagem multivariada com Excel, SPSS e Stata (pp. 309–378). São Paulo: GEN.
Fischetti, T. (2015). Data analysis with R: Load, wrangle, and analyze your data using the world’s most powerful statistical programming language. Birmingham: Packt.
Hahsler, M., Piekenbrock, M., Arya, S., & Mount, D. (2019). Density-based clustering of applications with noise (DBSCAN) and related algorithms. CRAN. Retrieved from
Hahsler, M., Piekenbrock, M., & Doran, D. (2019). dbscan: Fast density-based clustering with R. Journal of Statistical Software, 91(1), 1–30.
Hair, J., Black, W. C., Babin, B. J., & Anderson, R. E. (2018). Multivariate data analysis (8th ed). Ireland: Cengage Learning EMEA.
Hartigan, J. A., & Wong, M. A. (1979). A K-means clustering algorithm. Journal of the Royal Statistical Society, 28(1), 100–108.
Husson, F., Lê, S., & Pagès, J. (2017). Clustering. In Exploratory multivariate analysis by example using R (pp. 173–208). New York: CRC Press.
Irizarry, R. A., & Love, M. (2015). Data analysis for the life sciences. British Columbia, UK: Leanpub.
Janssen, A., & Wan, P. (2020). K-means clustering of extremes. Electronic Journal of Statistics, 14(1), 1211–1233.
Kassambara, A. (2017). Practical guide to cluster analysis in R unsupervised machine learning. London: STHDA.
Kaufman, L., & Rousseeuw, P. (1990). Partitioning around medoids (Program PAM). In Finding groups in data: An introduction to cluster analysis (pp. 68–125). New York: Wiley-Interscience.
Ketchen, D. J., & Shook, C. L. (1996). The application of cluster analysis in strategic management research: An analysis and critique. Strategic Management Journal, 17(6), 441–458.<441::AID-SMJ819>3.0.CO;2-G
Loperfido, N. (2020). Kurtosis-based projection pursuit for outlier detection in financial time series. The European Journal of Finance, 26(2–3), 142–164.
Lopes, H. E. G., Pereira, C., & Vieira, A. F. (2009). Comparação entre os modelos norte-americano (ACSI) e europeu (ECSI) de satisfação: Um estudo no setor de serviços. RAM, Revista de Administração Mackenzie, 10(1), 161–187.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the Berkeley symposium on mathematical statistics and probability, 1, 281–297. Retrieved from
Maechler, M. (2019). Package “cluster”. CRAN.
Malhotra, N. (2018). Marketing research: An applied orientation (7th ed). New York: Pearson.
Moustaki, I., Jöreskog, K. G., & Mavridis, D. (2004). Factor models for ordinal variables with covariate effects on the manifest and latent variables: A comparison of LISREL and IRT approaches. Structural Equation Modeling: A Multidisciplinary Journal, 11(4), 487–513.
Norusis, M. J. (2006a). Cluster Analysis (pp. 361–391). Upper Saddle River, NJ: Prentice-Hall.
Norusis, M. J. (2006b). SPSS 15.0 statistical procedures companion. Upper Saddle River, NJ: Prentice Hall.
Nunnally, J., & Bernstein, I. (1994). Psychometric Theory. New York: McGraw Hill.
Pandey, P., & Singh, I. (2016). Comparision between K-mean clustering and improved K-mean clustering. International Journal of Computer Applications, 146(13), 39–42.
Peng, R. (2019). Report writing for data science in R. British Columbia, UK: Leanpub.
Raykov, Y., Boukouvalas, A., Baig, F., & Little, M. (2016). What to do when k-means clustering fails: A simple yet principled alternative algorithm. PLoS ONE, 11(9), 1–28. pone.0162259
Sander, J. (2010). Density-based clustering. In Encyclopedia of Machine Learning (pp. 270–273). Berlin: Springer-Verlag.
Scoltock, J. (1982). A survey of the literature of cluster analysis. The Computer Journal, 25(1), 130–134.
Starczewski, A., Goetzen, P., & Joo Er, M. (2020). A new method for automatic determining of the DBSCAN parameters. Journal of Artificial Intelligence and Soft Computing Research, 10(3), 209–211.
Sugar, C. A., & James, G. M. (2003). Finding the number of clusters in a dataset. Journal of the American Statistical Association, 98(463), 750–763.
Sun, L., Chen, G., Xiong, H., & Guo, C. (2017). Cluster analysis in data-driven management decisions. Journal of Management Science and Engineering, 2(4), 227–251.
Thrun, M. (2019). Cluster analysis of per capita gross domestic products. Entrepreneurial Business and Economics Review, 7(1), 217–231.
Velmurugan, T., & Santhanam, T. (2010). Computational complexity between k-means and k-medoids clustering algorithms for normal and uniform distributions of data points. Journal of Computer Science, 6(3), 363–368. Retrieved from
Yu, H., Wang, X., Wang, G., & Zeng, X. (2020). An active three-way clustering method via low-rank matrices for multi-view data. Information Sciences, 507, 823–839.