In this paper, a big data pipeline is presented, taking in consideration both structured and unstructured data made available by the Italian Ministry of Justice, regarding their telematic civil process. Indeed, the complexity and volume of the data provided by the ministry requires the application of big data analysis techniques, in concert with machine and deep learning frameworks, to be correctly analysed and to obtain meaningful information that could support the ministry itself in better managing civil processes. The pipeline has two main objectives: to provide a consistent workflow of activities to be applied to the incoming data, aiming at extracting useful information for the ministry's decision making tasks, and to homogenize the incoming data, so that they can be stored in a centralized and coherent data lake to be used as a reference for further analysis and considerations.
A Big Data Pipeline and Machine Learning for Uniform Semantic Representation of Data and Documents From {IT} Systems of the Italian Ministry of Justice
Beniamino Di MartinoSupervision
;Luigi Colucci CanteWriting – Original Draft Preparation
;Salvatore D( extquotesingle)Angelo;Mariangela GrazianoWriting – Original Draft Preparation
;Fiammetta MarulliWriting – Original Draft Preparation
;
2022
Abstract
In this paper, a big data pipeline is presented, taking in consideration both structured and unstructured data made available by the Italian Ministry of Justice, regarding their telematic civil process. Indeed, the complexity and volume of the data provided by the ministry requires the application of big data analysis techniques, in concert with machine and deep learning frameworks, to be correctly analysed and to obtain meaningful information that could support the ministry itself in better managing civil processes. The pipeline has two main objectives: to provide a consistent workflow of activities to be applied to the incoming data, aiming at extracting useful information for the ministry's decision making tasks, and to homogenize the incoming data, so that they can be stored in a centralized and coherent data lake to be used as a reference for further analysis and considerations.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.