Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines

Par Conseil national de recherches du Canada

DOI	Trouver le DOI : https://doi.org/10.1016/j.ins.2014.07.015
Auteur	Rechercher : Maldonado, Sebastián; Rechercher : Weber, Richard; Rechercher : Famili, Fazel¹
Affiliation	Conseil national de recherches du Canada. Technologies de l'information et des communications
Format	Texte, Article
Sujet	Feature selection; Imbalanced data set; Dimensionality reduction; Support vector machine; Data mining
Résumé	Feature selection and classification of imbalanced data sets are two of the most interesting machine learning challenges, attracting a growing attention from both, industry and academia. Feature selection addresses the dimensionality reduction problem by determining a subset of available features to build a good model for classification or prediction, while the class-imbalance problem arises when the class distribution is too skewed. Both issues have been independently studied in the literature, and a plethora of methods to address high dimensionality as well as class-imbalance has been proposed. The aim of this work is to simultaneously explore both issues, proposing a family of methods that select those attributes that are relevant for the identification of the target class in binary classification. We propose a backward elimination approach based on successive holdout steps, whose contribution measure is based on a balanced loss function obtained on an independent subset. Our experiments are based on six highly imbalanced microarray data sets, comparing our methods with well-known feature selection techniques, and obtaining a better prediction with consistently fewer relevant features.
Date de publication	2014-07-30
Dans	Information Sciences 286 (30 juillet 2014) : 228–246.
Langue	anglais
Publications évaluées par des pairs	Oui
Numéro NPARC	21272943
Exporter la notice	Exporter en format RIS
Signaler une correction	Signaler une correction (s'ouvre dans un nouvel onglet)
Identificateur de l’enregistrement	b2f7ea62-d0cc-4a85-b613-3f6a3d43e1eb
Enregistrement créé	2014-12-03
Enregistrement modifié	2020-04-22

Date de modification :: 2025-04-24