Process Mining is getting a growing interest in many contexts where performance bottlenecks are critical for the business. Unfortunately, real cyber-physical systems are usually not implemented to easily address these techniques. One of the most frequent problems to face is transforming acquired data, often heterogeneous and unlabeled to allow the application of Process Mining technique. In this study, we propose an automatised and unsupervised methodology for extracting CaseIDs from an unlabeled event log. The proposed detection of CaseIDs is based on the definition of appropriate heuristic metrics, able to highlight the correlation between events that are part of the same process instance, according to temporal and topological features (e.g., kinds of functionally-related devices, topological distance, etc.). These features constitute the inputs for a clustering technique, which has been used to extract different cases. The performance of the proposed methodology was evaluated on a real diagnostic management system to support the decisions in maintenance operations in railway infrastructures. The system has been reproduced and tested in Gematica's laboratory for simulating the data used in this work.
CaseID Detection for Process Mining: A Heuristic-Based Methodology
De Fazio, Roberta
;Balzanella, Antonio;Marrone, Stefano;Marulli, Fiammetta;Verde, Laura
;
2024
Abstract
Process Mining is getting a growing interest in many contexts where performance bottlenecks are critical for the business. Unfortunately, real cyber-physical systems are usually not implemented to easily address these techniques. One of the most frequent problems to face is transforming acquired data, often heterogeneous and unlabeled to allow the application of Process Mining technique. In this study, we propose an automatised and unsupervised methodology for extracting CaseIDs from an unlabeled event log. The proposed detection of CaseIDs is based on the definition of appropriate heuristic metrics, able to highlight the correlation between events that are part of the same process instance, according to temporal and topological features (e.g., kinds of functionally-related devices, topological distance, etc.). These features constitute the inputs for a clustering technique, which has been used to extract different cases. The performance of the proposed methodology was evaluated on a real diagnostic management system to support the decisions in maintenance operations in railway infrastructures. The system has been reproduced and tested in Gematica's laboratory for simulating the data used in this work.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.