Abstract | Supportive smart home systems with integrated sensors capable of measuring cough frequency and severity can support independent living and aging in place by helping monitor the state of acute and chronic health conditions. Previously, we showed that embeddings from speech foundation models are effective cough representations for a range of cough measurement applications. While powerful, the large compute and memory requirements of these models prevents them from being deployed in embedded smart sensors. In this work, we use knowledge distillation to train edge-compute focused student models, making them feasible for the measurement, identification, and classification of cough sounds directly in the smart home. This embedded processing avoids the privacy and security concerns associated with transmission and storage of sensitive audio recordings in the cloud. We show that the student networks preserve the universal cough representation capabilities of the teacher, even generalizing to unseen classes such as speech, allowing the same network to be used for multiple downstream applications without any task-specific fine-tuning. A student network based on a 14-layer variant of ResNet achieved the highest aggregate quality score across the downstream evaluation tasks, even outperforming the foundation model teacher on certain tasks despite having over 200× fewer parameters. Linear classification on the embeddings from the proposed student network achieves strong performance on a diverse set of cough measurement tasks, scoring 98.3% on cough/noncough discrimination, 90.3% on human sound classification, 94.8% on cougher verification, 84.4% on cougher identification, and 87.8% on wet/dry cough classification. |
---|