In the age of Big Data, scalable algorithm implementations as well as powerful computational resources are required. For data mining and data analytics the support of big data platforms is becoming increasingly important, since they provide algorithm implementations with all the resources needed for their execution. However, choosing the best platform might depend on several constraints, including but not limited to computational resources, storage resources, target tasks, service costs. Sometimes it may be necessary to switch from one platform to another depending on the constraints. As a consequence, it is desirable to reuse as much algorithm code as possible, so as to simplify the setup in new target platforms. Unfortunately each big data platform has its own peculiarity, especially to deal with parallelism. This impacts on algorithm implementation, which generally needs to be modified before being executed. This work introduces functional parallel primitives to define the parallelizable parts of algorithms in a uniform way, independent of the target platform. Primitives are then transformed by a compiler into skeletons, which are finally deployed on vendor-dependent frameworks. The procedure proposed aids not only in terms of code reuse but also in terms of parallelization, because programmer's expertise is not demanded. Indeed, it is the compiler that entirely manages and optimizes algorithm parallelization. The experiments performed show that the transformation process does not negatively affect algorithm performance.

Parallel primitives for vendor-agnostic implementation of big data mining algorithms

D'Angelo S.;Di Martino B.;Esposito A.
2018

Abstract

In the age of Big Data, scalable algorithm implementations as well as powerful computational resources are required. For data mining and data analytics the support of big data platforms is becoming increasingly important, since they provide algorithm implementations with all the resources needed for their execution. However, choosing the best platform might depend on several constraints, including but not limited to computational resources, storage resources, target tasks, service costs. Sometimes it may be necessary to switch from one platform to another depending on the constraints. As a consequence, it is desirable to reuse as much algorithm code as possible, so as to simplify the setup in new target platforms. Unfortunately each big data platform has its own peculiarity, especially to deal with parallelism. This impacts on algorithm implementation, which generally needs to be modified before being executed. This work introduces functional parallel primitives to define the parallelizable parts of algorithms in a uniform way, independent of the target platform. Primitives are then transformed by a compiler into skeletons, which are finally deployed on vendor-dependent frameworks. The procedure proposed aids not only in terms of code reuse but also in terms of parallelization, because programmer's expertise is not demanded. Indeed, it is the compiler that entirely manages and optimizes algorithm parallelization. The experiments performed show that the transformation process does not negatively affect algorithm performance.
2018
978-1-5386-5395-1
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11591/430555
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact