In this paper, we introduce the problem of cross-modal visuo-tactile object recognition with robotic active exploration. With this term, we mean that the robot observes a set of objects with visual perception, and later on, it is able to recognize such objects only with tactile exploration, without having touched any object before. Using a machine learning terminology, in our application, we have a visual training set and a tactile test set, or vice versa. To tackle this problem, we propose an approach constituted by four steps: finding a visuo-tactile common representation, defining a suitable set of features, transferring the features across the domains, and classifying the objects. We show the results of our approach using a set of 15 objects, collecting 40 visual examples and five tactile examples for each object. The proposed approach achieves an accuracy of 94.7%, which is comparable with the accuracy of the monomodal case, i.e., when using visual data both as training set and test set. Moreover, it performs well compared to the human ability, which we have roughly estimated carrying out an experiment with ten participants.

A Transfer Learning Approach to Cross-Modal Object Recognition: From Visual Observation to Robotic Haptic Exploration

Natale C.;Pirozzi S.;
2019

Abstract

In this paper, we introduce the problem of cross-modal visuo-tactile object recognition with robotic active exploration. With this term, we mean that the robot observes a set of objects with visual perception, and later on, it is able to recognize such objects only with tactile exploration, without having touched any object before. Using a machine learning terminology, in our application, we have a visual training set and a tactile test set, or vice versa. To tackle this problem, we propose an approach constituted by four steps: finding a visuo-tactile common representation, defining a suitable set of features, transferring the features across the domains, and classifying the objects. We show the results of our approach using a set of 15 objects, collecting 40 visual examples and five tactile examples for each object. The proposed approach achieves an accuracy of 94.7%, which is comparable with the accuracy of the monomodal case, i.e., when using visual data both as training set and test set. Moreover, it performs well compared to the human ability, which we have roughly estimated carrying out an experiment with ten participants.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11591/415003
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 28
  • ???jsp.display-item.citation.isi??? 25
social impact