Technological advancement led to the development of tools to collect vast amounts of data usually recorded at temporal stamps or arriving over time, e.g. data from sensors. Common ways of analysing this kind of data also involve supervised classification techniques; however, despite constant improvements in the literature, learning from high-dimensional data is always a challenging task due to many issues such as, for example, dealing with the curse of dimensionality and looking for a trade-off between complexity and accuracy. Nowadays, research in functional data analysis (FDA) and statistical learning is very lively to address these drawbacks adequately. This study offers a supervised classification strategy that combines FDA and tree-based procedures. Specifically, we introduce functional classification trees, functional bagging, and functional random forest exploiting the functional principal components decomposition as a tool to extract new features and build functional classifiers. In addition, we introduce new tools to support the understanding of the classification rules, such as the functional empirical separation prototype, functional predicted separation prototype, and the leaves' functional deviance. Furthermore, we suggest some possible solutions for choosing the number of functional principal components and functional classification trees to be implemented in the supervised classification procedure. This research aims to provide an approach to improve the accuracy of the functional classifier, serve the interpretation of the functional classification rules, and overcome the classical drawbacks due to the high-dimensionality of the data. An application on a real dataset regarding daily electrical power demand shows the functioning of the supervised classification proposal. A simulation study with nine scenarios highlights the performance of this approach and compares it with other functional classification methods. The results demonstrate that this line of research is exciting and promising; indeed, in addition to the benefits of the suggested interpretative tools, we exceed the previously established accuracy records on a dataset available online.
Supervised classification of curves via a combined use of functional data analysis and tree-based methods
Maturo, Fabrizio
;Verde, Rosanna
2023
Abstract
Technological advancement led to the development of tools to collect vast amounts of data usually recorded at temporal stamps or arriving over time, e.g. data from sensors. Common ways of analysing this kind of data also involve supervised classification techniques; however, despite constant improvements in the literature, learning from high-dimensional data is always a challenging task due to many issues such as, for example, dealing with the curse of dimensionality and looking for a trade-off between complexity and accuracy. Nowadays, research in functional data analysis (FDA) and statistical learning is very lively to address these drawbacks adequately. This study offers a supervised classification strategy that combines FDA and tree-based procedures. Specifically, we introduce functional classification trees, functional bagging, and functional random forest exploiting the functional principal components decomposition as a tool to extract new features and build functional classifiers. In addition, we introduce new tools to support the understanding of the classification rules, such as the functional empirical separation prototype, functional predicted separation prototype, and the leaves' functional deviance. Furthermore, we suggest some possible solutions for choosing the number of functional principal components and functional classification trees to be implemented in the supervised classification procedure. This research aims to provide an approach to improve the accuracy of the functional classifier, serve the interpretation of the functional classification rules, and overcome the classical drawbacks due to the high-dimensionality of the data. An application on a real dataset regarding daily electrical power demand shows the functioning of the supervised classification proposal. A simulation study with nine scenarios highlights the performance of this approach and compares it with other functional classification methods. The results demonstrate that this line of research is exciting and promising; indeed, in addition to the benefits of the suggested interpretative tools, we exceed the previously established accuracy records on a dataset available online.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.