An ensemble of the distance-based and Naive Bayes classifiers for the online classification with data reduction

Joanna Jędrzejowicz , Piotr Jędrzejowicz


The paper proposes two variants of the ensemble distance-based and Naive-Bayes online classifiers with data reduction. In the first variant the reduced dataset is obtained through applying bias-correction fuzzy clustering. In the second we used the kernel-based fuzzy clustering as the data reduction tool. It is assumed that vectors of data with unknown class label arrive one by one, and that there is available an initial chunk of data with known class labels serving as the initial training set. Classification is carried-out in rounds. Each round involves a number of the classification decisions equal to the chunk size. For each round a set of base classifiers is constructed using different distance metrics. Set of base classifiers is extended with the Naive-Bayes classifier. The unknown label of each incoming vector is determined through weighted majority voting. After each round has been completed the training set is replaced by the fresh one and the classification process is continued. The approach is validated through computational experiment involving a number of datasets often used for testing data streams mining algorithms.
Author Joanna Jędrzejowicz II
Joanna Jędrzejowicz,,
- Institute of Informatics
, Piotr Jędrzejowicz
Piotr Jędrzejowicz,,
Other language title versions
Journal seriesJournal of Intelligent & Fuzzy Systems, ISSN 1064-1246 [1875-8967]
Issue year2017
Publication size in sheets0.5
Keywords in EnglishOnline classification, kernel-based fuzzy clustering, bias-correction fuzzy clustering
Languageen angielski
Score (nominal)25
ScoreMinisterial score = 20.0, 20-12-2017, ArticleFromJournal
Ministerial score (2013-2016) = 25.0, 20-12-2017, ArticleFromJournal
Publication indicators WoS Impact Factor: 2016 = 1.261 (2) - 2016=1.284 (5)
Citation count*0
Share Share

* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.