It is estimated that in 2020 the amount of data produced was about 44 zettabytes for a per capita daily production of about 16 gigabytes. These numbers make us think about how much knowledge is possible to extract from the data produced every day by every single inhabitant of the earth. Supported by the growing diffusion of frameworks and tools for the automatic analysis of Big Data, Machine Learning and Deep Learning, we can try to extract knowledge from all this information and use it to offer a greater definition to human knowledge. In this paper we present two techniques that exploit the knowledge provided by data analysis to identify anomalies in the Italian judicial system, in particular in the civil process. The first anomaly concerns the presence of “serial witnesses”, people who lend themselves to provide testimony of the facts occurred in different trial proceedings where places dates and events overlap highlighting a false testimony. The second anomaly relates to “multiple entries” by lawyers with the aim of being able to happen upon a judge “favorable” to the case. The two anomalies presented, but the possibilities are endless, are identified through the definition of Big Data pipelines for data aggregation, information extraction and data analysis.
Anomalous Witnesses and Registrations Detection in the Italian Justice System Based on Big Data and Machine Learning Techniques
Di Martino B.
;D'Angelo S.
;Esposito A.
;
2022
Abstract
It is estimated that in 2020 the amount of data produced was about 44 zettabytes for a per capita daily production of about 16 gigabytes. These numbers make us think about how much knowledge is possible to extract from the data produced every day by every single inhabitant of the earth. Supported by the growing diffusion of frameworks and tools for the automatic analysis of Big Data, Machine Learning and Deep Learning, we can try to extract knowledge from all this information and use it to offer a greater definition to human knowledge. In this paper we present two techniques that exploit the knowledge provided by data analysis to identify anomalies in the Italian judicial system, in particular in the civil process. The first anomaly concerns the presence of “serial witnesses”, people who lend themselves to provide testimony of the facts occurred in different trial proceedings where places dates and events overlap highlighting a false testimony. The second anomaly relates to “multiple entries” by lawyers with the aim of being able to happen upon a judge “favorable” to the case. The two anomalies presented, but the possibilities are endless, are identified through the definition of Big Data pipelines for data aggregation, information extraction and data analysis.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.