Abstract | In this paper, we explore the application of large language models (LLMs) to analyze Electroencephalogram (EEG) reports by clustering them into normal or abnormal classes. We employed fuzzy clustering techniques, leveraging Clinical-BERT, Meditron 7B, and Biomistral to extract robust features from clinical text. To address the high dimensionality of these embeddings, we used Principal Component Analysis (PCA) for dimensionality reduction while retaining critical information. The processed embeddings were subsequently clustered using Fuzzy C-Means (FCM) and Fuzzy J-Means (FJM) algorithms. Our methodology demonstrated flexibility in clustering, effectively handling the inherent noise and variability in medical data. The results indicated that FCM, although consistent, required more computational time as dimensionality increased. Conversely, FJM exhibited faster performance and was effective in avoiding local minima, albeit with some variability in clustering quality. These findings highlight the potential of specialized models and advanced clustering techniques in enhancing the analysis of medical reports. |
---|