Part-of-speech (POS) tagging is a Natural Language Processing (NLP) technique extremely relevant in Question Answering systems and becomes more complex when these systems operate on spoken language. For the use case of Italian spoken language, here considered, enclitic forms are very difficult to be tagged, since they consist of one or more pronouns appended as suffixes to verbs. This work describes a case study aiming at investigating how to refine SyntaxNet, the NLP framework released by Google, to efficiently tag enclitic forms in Italian. In particular, first, a forward selection of different features is presented, aimed to assess their influence on POS tagging performance of SyntaxNet in Italian. Second, further features are added, as suggested by morphological rules characterizing Italian enclitics, in order to improve POS tagging performance. Finally, a qualitative and quantitative evaluation with respect to sentences coming from real spoken dialogs is performed, showing very promising results.

Tuning SyntaxNet for POS Tagging Italian Sentences

Fiammetta Marulli
Methodology
;
2018

Abstract

Part-of-speech (POS) tagging is a Natural Language Processing (NLP) technique extremely relevant in Question Answering systems and becomes more complex when these systems operate on spoken language. For the use case of Italian spoken language, here considered, enclitic forms are very difficult to be tagged, since they consist of one or more pronouns appended as suffixes to verbs. This work describes a case study aiming at investigating how to refine SyntaxNet, the NLP framework released by Google, to efficiently tag enclitic forms in Italian. In particular, first, a forward selection of different features is presented, aimed to assess their influence on POS tagging performance of SyntaxNet in Italian. Second, further features are added, as suggested by morphological rules characterizing Italian enclitics, in order to improve POS tagging performance. Finally, a qualitative and quantitative evaluation with respect to sentences coming from real spoken dialogs is performed, showing very promising results.
2018
Marulli, Fiammetta; Pota, Marco; Esposito, Massimo; Maisto, Alessandro; Guarasci, Raffaele
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11591/418521
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 0
social impact