This paper introduces a strategy for clustering grouped categorical ordinal data based on the partition of the set of distributions obtained by a quantification of ordinal categorical variables. The analyzed data are issued by the 2003 edition of the International Social Survey Programme studying the feelings of national identity and involving about 46 thousands respondents in 36 different countries. The ordinal categorical variables, corresponding to the judgment of each respondent to several questions, are measured on Likert-type scales. We propose to quantify them according to a procedure of Optimal Scaling, the Categorical Principal Component Analysis (CATPCA). From the results of the quantification step, we consider the distribution of individuals belonging to each country on the first two axes, for performing a partitioning of the countries. The main novelty of our proposal is that we use a Dynamic Clustering Algorithm which partitions the set of distributions describing the different countries, rather than the means of the country distributions. In the conclusions, we compare the proposed approach with a clustering algorithm performed on the means of the country distributions, in order to point out the advantages in considering distributions in the analysis.
Clustering quantified ordinal data distributions
VERDE, Rosanna;IRPINO, Antonio;BALZANELLA, Antonio
2013
Abstract
This paper introduces a strategy for clustering grouped categorical ordinal data based on the partition of the set of distributions obtained by a quantification of ordinal categorical variables. The analyzed data are issued by the 2003 edition of the International Social Survey Programme studying the feelings of national identity and involving about 46 thousands respondents in 36 different countries. The ordinal categorical variables, corresponding to the judgment of each respondent to several questions, are measured on Likert-type scales. We propose to quantify them according to a procedure of Optimal Scaling, the Categorical Principal Component Analysis (CATPCA). From the results of the quantification step, we consider the distribution of individuals belonging to each country on the first two axes, for performing a partitioning of the countries. The main novelty of our proposal is that we use a Dynamic Clustering Algorithm which partitions the set of distributions describing the different countries, rather than the means of the country distributions. In the conclusions, we compare the proposed approach with a clustering algorithm performed on the means of the country distributions, in order to point out the advantages in considering distributions in the analysis.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.