Abstract | Supportive smart home systems with integrated personalized cough analysis can support independent living and aging in place by helping monitor the state of acute and chronic health conditions. The stages of recognizing coughs, associating them to an individual, and analyzing the cough characteristics have traditionally been handled independently, using task-specific networks or algorithms. In contrast, recent transformer-based neural network speech foundation models trained on internet-scale datasets have demonstrated strong performance across a wide range of tasks. Learning such a general-purpose cough representation has been hampered by the lack of large-scale cough-specific datasets. In this work we demonstrate that the embeddings from a speech foundation model (w2v BERT 2.0) can be used as a powerful multi-purpose cough representation. We show that cough information is well encoded in the model, despite it being trained on speech data with no cough-specific fine-tuning or adapters. Zero-shot linear classification on the cough embeddings achieves strong performance on cough/breathing/speech discrimination (100%), cougher verification (96.9%), cougher identification (84.4%), and wet/dry cough classification (93.8%) tasks. We also show that distance metrics between cough embeddings is meaningful and use that to conduct explainable analysis of an unlabelled sample with similarity-based retrieval from a labelled dataset. We note these capabilities emerge in the early layers of the network, and that the cough embeddings occupy a small region of the embedding space, motivating future work into lower-complexity cough-specific representations suitable for embedded cough analysis. |
---|