Résumé | Data streams, where an instance is only seen once and where a limited amount of data can be buffered for processing at a later time, are omnipresent in today’s real-world applications. In this context, adaptive online ensembles that are able to learn incrementally have been developed. However, the issue of handling data that arrives asynchronously has not received enough attention. Often, the true class label arrives after with a time-lag, which is problematic for existing adaptive learning techniques. It is not realistic to require that all class labels be made available at training time. This issue is further complicated by the presence of late-arriving, slowly changing dimensions (i.e., late-arriving descriptive attributes). The aim of active learning is to construct accurate models when few labels are available. Thus, active learning has been proposed as a way to obtain such missing labels in a data stream classification setting. To this end, this paper introduces an active online ensemble (AOE) algorithm that extends online ensembles with an active learning component. Our experimental results demonstrate that our AOE algorithm builds accurate models against much smaller ensemble sizes, when compared to traditional ensemble learning algorithms. Further, our models are constructed against small, incremental data sets, thus reducing the number of examples that are required to build accurate ensembles. |
---|