In the dynamic realm of modern technology, the rapid growth of Internet of Things (IoT) devices introduces different challenges in considering network security and reliability. However, the different nature of IoT environments complicates the task for network operators and security experts, who must face increasingly sophisticated threats. Additionally, relying only on network traffic to detect user actions presents some problems. The complexity of IoT environments and the variability of user actions make the distinctions between legitimate activities and threats difficult to track. Recently, Machine Learning techniques have arising as a way to identify threats in networking systems. Even if such techniques are very powerful, they relies on reliable datasets able to collect examples of both licit and malicious traffic. However, often datasets are limited in the number of examples collected and in the documentation of the way in which the traffic was monitored, moreover, labelling is not always reliable. Accordingly, this paper delineates the development of a procedure to generate datasets utilizing a dedicated test bed to capture user actions associated with smart-home IoT devices. Unlike most datasets in the literature, this paper aims at offering a way to easily collect and label continuously produced data, generating datasets enriched with detailed descriptions of each device involved in traffic generation. We believe that this paper offers a first step in the direction of systematic production of datasets, more suitable for the efficient use of machine learning techniques.

Navigating IoT Complexity: Developing Datasets for Smart-Home Device Interactions

Rak M.;Granata D.;Esposito A.;
2024

Abstract

In the dynamic realm of modern technology, the rapid growth of Internet of Things (IoT) devices introduces different challenges in considering network security and reliability. However, the different nature of IoT environments complicates the task for network operators and security experts, who must face increasingly sophisticated threats. Additionally, relying only on network traffic to detect user actions presents some problems. The complexity of IoT environments and the variability of user actions make the distinctions between legitimate activities and threats difficult to track. Recently, Machine Learning techniques have arising as a way to identify threats in networking systems. Even if such techniques are very powerful, they relies on reliable datasets able to collect examples of both licit and malicious traffic. However, often datasets are limited in the number of examples collected and in the documentation of the way in which the traffic was monitored, moreover, labelling is not always reliable. Accordingly, this paper delineates the development of a procedure to generate datasets utilizing a dedicated test bed to capture user actions associated with smart-home IoT devices. Unlike most datasets in the literature, this paper aims at offering a way to easily collect and label continuously produced data, generating datasets enriched with detailed descriptions of each device involved in traffic generation. We believe that this paper offers a first step in the direction of systematic production of datasets, more suitable for the efficient use of machine learning techniques.
2024
Rak, M.; Granata, D.; Esposito, A.; Ferretti, A.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11591/544980
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact