Currently, AI-based assistive technologies, particularly those involving sensitive data, such as systems for detecting mental illness and emotional disorders, are full of confidentiality, integrity, and security compromises. In the aforesaid context, this work proposes an algorithm for detecting depressive states based on only three never utilized speech markers. This reduced number of markers offers a valuable protection of personal (sensitive) data by not allowing for the retrieval of the speaker’s identity. The proposed speech markers are derived from the analysis of pitch variations measured in speech data obtained through a tale reading task performed by typical and depressed subjects. A sample of 22 subjects (11 depressed and 11 healthy, according to both psychiatric diagnosis and BDI classification) were involved. The reading wave files were listened to and split into a sequence of intervals, each lasting two seconds. For each subject’s reading and each reading interval, the average pitch, the pitch variation (T), the average pitch variation (A), and the inversion percentage (also called the oscillation percentage O) were automatically computed. The values of the triplet (Ti, Ai, Oi) for the i-th subject provide, all together, a 100% correct discrimination between the speech produced by typical and depressed individuals, while requiring a very low computational cost and offering a valuable protection of personal data.

A privacy-oriented approach for depression signs detection based on speech analysis

Vitale F.;Carbonaro B.;Cordasco G.;Esposito A.;Marrone S.;Verde L.
2021

Abstract

Currently, AI-based assistive technologies, particularly those involving sensitive data, such as systems for detecting mental illness and emotional disorders, are full of confidentiality, integrity, and security compromises. In the aforesaid context, this work proposes an algorithm for detecting depressive states based on only three never utilized speech markers. This reduced number of markers offers a valuable protection of personal (sensitive) data by not allowing for the retrieval of the speaker’s identity. The proposed speech markers are derived from the analysis of pitch variations measured in speech data obtained through a tale reading task performed by typical and depressed subjects. A sample of 22 subjects (11 depressed and 11 healthy, according to both psychiatric diagnosis and BDI classification) were involved. The reading wave files were listened to and split into a sequence of intervals, each lasting two seconds. For each subject’s reading and each reading interval, the average pitch, the pitch variation (T), the average pitch variation (A), and the inversion percentage (also called the oscillation percentage O) were automatically computed. The values of the triplet (Ti, Ai, Oi) for the i-th subject provide, all together, a 100% correct discrimination between the speech produced by typical and depressed individuals, while requiring a very low computational cost and offering a valuable protection of personal data.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11591/458012
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact