This paper presents a technique to automatically derive ontologies which is based on hierarchical clustering of document corpora. The procedure applies to a set of texts forming a domain document corpus and creates a hierarchical structure (tree) where at every node is associated a set of terms derived from the document feature vectors. The labeling of the cluster is made by using a new algorithm presented in this work. The derived terms may represent concepts candidate to build a domain taxonomy from which the hierarchical relationships among the classes of the domain ontology can be extracted. To test the technique shown, has been built a propotype tool named (OntoClust).

Automatic Ontology Extraction with Text Clustering

DI MARTINO, Beniamino;CANTIELLO, Pasquale
2009

Abstract

This paper presents a technique to automatically derive ontologies which is based on hierarchical clustering of document corpora. The procedure applies to a set of texts forming a domain document corpus and creates a hierarchical structure (tree) where at every node is associated a set of terms derived from the document feature vectors. The labeling of the cluster is made by using a new algorithm presented in this work. The derived terms may represent concepts candidate to build a domain taxonomy from which the hierarchical relationships among the classes of the domain ontology can be extracted. To test the technique shown, has been built a propotype tool named (OntoClust).
2009
DI MARTINO, Beniamino; Cantiello, Pasquale
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11591/172596
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 3
social impact