Towards de-duplication framework in big data analysis : a case study

Jacek Maślankowski


Big Data analysis gives access to wider perspectives of information. Especially it allows processing unstructured and structured data together. However lots of data sources do not mean that the quality of data is enough to provide reliable results. There are several different quality indicators related to Big Data analysis. In this paper we will focus on two of them that are the most critical in the first phase of data processing: ambiguousness and duplicates. The goal of this paper is to present the proposal of the framework used to eliminate duplicates in large datasets acquired with Big Data analysis.
Author Jacek Maślankowski (FM / DBI)
Jacek Maślankowski,,
- Department of Business Informatics
Publication size in sheets0.5
Book Wrycza Stanisław (eds.): Information systems: development, research, applications, education : 9th SIGSAND/PLAIS EuroSymposium 2016, Gdansk, Poland, September 29, 2016 : proceedings, Lecture Notes in Business Information Processing, no. 264, 2016, Springer International Publishing, ISBN 978-3-319-46641-5, [978-3-319-46642-2], 215 p., DOI:10.1007/978-3-319-46642-2
Keywords in Englishbusiness informatics, Big Data, unstructured data, data analysis, data quality
Languageen angielski
Score (nominal)15
ScoreMinisterial score = 15.0, 21-05-2018, BookChapterSeriesAndMatConfByIndicator
Ministerial score (2013-2016) = 15.0, 21-05-2018, BookChapterSeriesAndMatConfByIndicator
Citation count*
Share Share

Get link to the record

* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.