The task of clustering is at the same time challenging and very important in Artificial Intelligence. One of the most popular family of clustering algorithms is the prototype-based approach. Prototype-based algorithms compute a representation of the clusters in the form of a set of prototypes, usually vectors approximating each cluster's barycenter. However, the objects in a data set are not necessarily vectors, especially in real-world applications. These non-vectorial data sets are often represented by the dissimilarities, distances, or relations between all pairs of objects. They are usually referred as relational data sets. For this kind of data, the algorithms must be adapted to different measures of distance. There are a few state-of-the-art algorithms adapted to relational data sets through the use of barycentric coordinates formalism, in which the objects of a relational data sets are embedded in a space defined by the distances between a subset of the objects, called support points. In this paper, we propose an approach that is able to automatically select the optimal set of support points. We also extend the method to relational data streams, in order to detect variations in the intrinsic dimensionality of the representation space over time. We have compared experimentally the quality of the proposed algorithms on real and artificial data sets. We show that the automatic selection of support points allows an optimal quality in a minimal computation time.

Automatic detection of the support points in relational clustering

Verde R.
2019

Abstract

The task of clustering is at the same time challenging and very important in Artificial Intelligence. One of the most popular family of clustering algorithms is the prototype-based approach. Prototype-based algorithms compute a representation of the clusters in the form of a set of prototypes, usually vectors approximating each cluster's barycenter. However, the objects in a data set are not necessarily vectors, especially in real-world applications. These non-vectorial data sets are often represented by the dissimilarities, distances, or relations between all pairs of objects. They are usually referred as relational data sets. For this kind of data, the algorithms must be adapted to different measures of distance. There are a few state-of-the-art algorithms adapted to relational data sets through the use of barycentric coordinates formalism, in which the objects of a relational data sets are embedded in a space defined by the distances between a subset of the objects, called support points. In this paper, we propose an approach that is able to automatically select the optimal set of support points. We also extend the method to relational data streams, in order to detect variations in the intrinsic dimensionality of the representation space over time. We have compared experimentally the quality of the proposed algorithms on real and artificial data sets. We show that the automatic selection of support points allows an optimal quality in a minimal computation time.
2019
978-1-7281-1985-4
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11591/429957
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact