SCUT: multi-class imbalanced data classification using SMOTE and cluster-based undersampling

Par Conseil national de recherches du Canada

Téléchargement	Voir la version finale : SCUT: multi-class imbalanced data classification using SMOTE and cluster-based undersampling (PDF, 664 Kio)
DOI	Trouver le DOI : https://doi.org/10.5220/0005595502260234
Auteur	Rechercher : Agrawal, Astha; Rechercher : Viktor, Herna L.; Rechercher : Paquet, Eric¹
Affiliation	Conseil national de recherches du Canada. Technologies de l'information et des communications
Format	Texte, Article
Conférence	The 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, November 12-14, 2015, Lisbon, Portugal
Sujet	clustering and classification methods; machine learning; pre-processing and post-processing for data mining; multi-class imbalance, undersampling, oversampling
Résumé	Class imbalance is a crucial problem in machine learning and occurs in many domains. Specifically, the two-class problem has received interest from researchers in recent years, leading to solutions for oil spill detection, tumour discovery and fraudulent credit card detection, amongst others. However, handling class imbalance in datasets that contains multiple classes, with varying degree of imbalance, has received limited attention. In such a multi-class imbalanced dataset, the classification model tends to favour the majority classes and incorrectly classify instances from the minority classes as belonging to the majority classes, leading to poor predictive accuracies. Further, there is a need to handle both the imbalances between classes as well as address the selection of examples within a class (i.e. the so-called within class imbalance). In this paper, we propose the SCUT hybrid sampling method, which is used to balance the number of training examples in such a multi-class setting. Our SCUT approach oversamples minority class examples through the generation of synthetic examples and employs cluster analysis in order to undersample majority classes. In addition, it handles both within-class and between-class imbalance. Our experimental results against a number of multi-class problems show that, when the SCUT method is used for pre-processing the data before classification, we obtain highly accurate models that compare favourably to the state-of-the-art.
Date de publication	2015-11-14
Maison d’édition	SCITEPRESS
Dans	Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management.
Langue	anglais
Publications évaluées par des pairs	Oui
Numéro NPARC	21277623
Exporter la notice	Exporter en format RIS
Signaler une correction	Signaler une correction (s'ouvre dans un nouvel onglet)
Identificateur de l’enregistrement	e8c7556d-9f94-466f-a1e5-72cdf9b9513f
Enregistrement créé	2016-05-05
Enregistrement modifié	2020-06-02

Date de modification :: 2024-11-09