Abstract
This paper centers on a novel data mining technique we term supervised clustering. Unlike traditional clustering, supervised clustering is applied to classified examples and has the goal of identifying class-uniform clusters that have a high probability density. This paper focuses on how data mining techniques in general, and classification techniques in particular, can benefit from knowledge obtained through supervised clustering. We discuss how better nearest neighbor classifiers can be constructed with the knowledge generated by supervised clustering, and provide experimental evidence that they are more efficient and more accurate than a traditional 1-nearest-neighbor classifier. Finally, we demonstrate how supervised clustering can be used to enhance simple classifiers.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Basu, S., Bilenko, M., Mooney, R.: Semi-supervised Clustering by Seeding. In: Proceedings of the Nineteenth International Conference on Machine Learning (ICML 2002), Sydney, Australia, July 2002, pp. 19–26 (2002)
Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning Distance Functions Using Equivalence Relations. In: Proc. ICML 2003, Washington DC (August 2003)
Demiriz, A., Benett, K.-P., Embrechts, M.J.: Semi-supervised Clustering using Genetic Algorithms. In: Proc. ANNIE 1999 (1999)
Eick, C., Zeidat, N., Zhao, Z.: Supervised Clustering – Algorithms and Benefits. In: Proc. ICTAI 2004, Boca Raton, FL (November 2004)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, Chichester (1990)
University of California at Irving, Machine Learning Repository, http://www.ics.uci.edu/~mlearn/MLRepository.html
Sinkkonen, J., Kaski, S., Nikkila, J.: Discriminative Clustering: Optimal Contingency Tables by Learning Metrics. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430. Springer, Heidelberg (2002)
Slonim, N., Tishby, N.: Agglomerative Information Bottleneck. In: Neural Information Processing Systems (NIPS 1999) (1999)
Tishby, N., Periera, F.C., Bialek, W.: The Information Bottleneck Method. In: Proceedings of the 37th Allerton Conference on Communication and Computation (1999)
Vilalta, R., Achari, M., Eick, C.: Class Decomposition Via Clustering: A New Framework For Low-Variance Classifiers. In: Proceedings of the Third IEEE International Conference on Data Mining (ICDM 2003), Melbourne, FL (November 2003)
Wilson, D.L.: Asymptotic Properties of Nearest Neighbor Rules Using Edited Data. IEEE Transactions on Systems, Man, and Cybernetics 2, 408–420 (1972)
Xing, E.P., Ng, A., Jordan, M., Russell, S.: Distance Metric Learning with Applications to Clustering with Side Information. In: Advances in Neural Information Processing 15. MIT Press, Cambridge (2003)
Zeidat, N., Eick, C.: Using k-medoid Style Algorithms for Supervised Summary Generation. In: Proc. MLMTA 2004, Las Vegas (June 2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Eick, C.F., Zeidat, N. (2005). Using Supervised Clustering to Enhance Classifiers. In: Hacid, MS., Murray, N.V., RaÅ›, Z.W., Tsumoto, S. (eds) Foundations of Intelligent Systems. ISMIS 2005. Lecture Notes in Computer Science(), vol 3488. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11425274_26
Download citation
DOI: https://doi.org/10.1007/11425274_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25878-0
Online ISBN: 978-3-540-31949-8
eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
