DOI | Resolve DOI: https://doi.org/10.1109/ICASSP49660.2025.10889032 |
---|
Author | Search for: Wang, Huimeng; Search for: Xie, Xurong; Search for: Geng, Mengzhe1; Search for: Hu, Shujie; Search for: Xu, Haoning; Search for: Chen, Youjun; Search for: Li, Zhaoqing; Search for: Deng, Jiajun; Search for: Liu, Xunying |
---|
Affiliation | - National Research Council of Canada. Digital Technologies
|
---|
Funder | Search for: Youth Innovation Promotion Association |
---|
Format | Text, Article |
---|
Conference | 2025 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, April 6 - 11, 2025, Hyderabad, India |
---|
Subject | speech disorders; speech recognition; discrete tokens; speech foundation models |
---|
Abstract | Discrete tokens provide compact and domain-adaptable representations of speech features. However, their application to disordered speech, characterized by articulation imprecision and significant mismatch with normal voice, remains unexplored. To this end, this paper proposes novel phone-purity guided (PPG) discrete tokens to address the weakened phonetic discrimination arising during unsupervised K-means clustering or vector quantization of continuous features. Phonetic label supervision is incorporated to regularize the maximum likelihood and reconstruction error costs in standard K-means and VAE-VQ-based token extraction. Experiments on the UASpeech corpus show that PPG-based discrete tokens extracted from HuBERT consistently outperform hybrid TDNN and End-to-End (E2E) Conformer systems using non-PPG tokens. Statistically significant word error rate (WER) reductions of up to 0.99% and 1.77% absolute (3.21% and 4.82% relative) are achieved across varying codebook sizes for the 16 UASpeech test dysarthric speakers. The lowest WER of 23.25% is obtained by combining systems using complementary token features. Consistent improvements are also observed in phone purity, and t-SNE visualizations demonstrate sharper decision boundaries between K-means/VAE-VQ clusters with the introduction of phone-purity guidance. |
---|
Publication date | 2025-04-06 |
---|
Publisher | IEEE |
---|
In | |
---|
Language | English |
---|
Peer reviewed | Yes |
---|
Export citation | Export as RIS |
---|
Report a correction | Report a correction (opens in a new tab) |
---|
Record identifier | 0457618b-9591-4752-97df-27ce17c1dca2 |
---|
Record created | 2025-04-03 |
---|
Record modified | 2025-04-08 |
---|