The increasing availability of digital data, heterogeneous in nature and format, makes it necessary to develop advanced tools capable of extracting knowledge, organising it in a structured manner, and making it usable by both humans and automated systems. This thesis is set in this context, proposing an integrated methodological approach that combines Semantic Annotation techniques, Natural Language Processing (NLP) methodologies, Artificial Intelligence techniques, and models based on Semantic Web technologies. The aim is to overcome the limitations of traditional approaches in the analysis of texts, images, and processes, favouring a richer, more transparent, and interoperable representation of data. The work develops along five main lines of research: The semantic annotation of texts and images, with applications in the cultural heritage sector; The automatic discovery of patterns in processes, useful for the verification of complex administrative procedures; The integration between Electronic Institutions (EI) and process modelled in Business Process Model Notation (BPMN), aimed at formally representing roles and interactions in multi-agent scenarios; The conformity verification of legal processes, with application to compensation procedures for road accidents; The improvement of the inferential capacities of Large Language Models (LLM) through Ontology Alignment and Ontology Augmented Generation (OAG) techniques, tested in the medical domain. The main contribution of the research is the definition of a unified methodological framework, which integrates symbolic (ontologies, rules, semantic models) and statistical methodologies (word embedding, NLP, LLM), illustrating how their convergence can improve accuracy, explainability, and applicability across various domains. The experiments carried out demonstrate the efficacy of the method, delineate its limitations, and pave the way for future advancements, particularly in the development of intelligent systems that are more reliable, transparent, and capable of operating in real and complex contexts.
Integrating Semantic Web Technologies, Natural Language Processing, and Large Language Model Techniques to Manage and Enhance Knowledge / Graziano, Mariangela. - (2026 Jan 27).
Integrating Semantic Web Technologies, Natural Language Processing, and Large Language Model Techniques to Manage and Enhance Knowledge.
GRAZIANO, MARIANGELA
2026
Abstract
The increasing availability of digital data, heterogeneous in nature and format, makes it necessary to develop advanced tools capable of extracting knowledge, organising it in a structured manner, and making it usable by both humans and automated systems. This thesis is set in this context, proposing an integrated methodological approach that combines Semantic Annotation techniques, Natural Language Processing (NLP) methodologies, Artificial Intelligence techniques, and models based on Semantic Web technologies. The aim is to overcome the limitations of traditional approaches in the analysis of texts, images, and processes, favouring a richer, more transparent, and interoperable representation of data. The work develops along five main lines of research: The semantic annotation of texts and images, with applications in the cultural heritage sector; The automatic discovery of patterns in processes, useful for the verification of complex administrative procedures; The integration between Electronic Institutions (EI) and process modelled in Business Process Model Notation (BPMN), aimed at formally representing roles and interactions in multi-agent scenarios; The conformity verification of legal processes, with application to compensation procedures for road accidents; The improvement of the inferential capacities of Large Language Models (LLM) through Ontology Alignment and Ontology Augmented Generation (OAG) techniques, tested in the medical domain. The main contribution of the research is the definition of a unified methodological framework, which integrates symbolic (ontologies, rules, semantic models) and statistical methodologies (word embedding, NLP, LLM), illustrating how their convergence can improve accuracy, explainability, and applicability across various domains. The experiments carried out demonstrate the efficacy of the method, delineate its limitations, and pave the way for future advancements, particularly in the development of intelligent systems that are more reliable, transparent, and capable of operating in real and complex contexts.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


