Towards de-duplication framework in big data analysis : a case study
AbstractBig Data analysis gives access to wider perspectives of information. Especially it allows processing unstructured and structured data together. However lots of data sources do not mean that the quality of data is enough to provide reliable results. There are several different quality indicators related to Big Data analysis. In this paper we will focus on two of them that are the most critical in the first phase of data processing: ambiguousness and duplicates. The goal of this paper is to present the proposal of the framework used to eliminate duplicates in large datasets acquired with Big Data analysis.
|Publication size in sheets||0.5|
|Book||Wrycza Stanisław (eds.): Information systems: development, research, applications, education : 9th SIGSAND/PLAIS EuroSymposium 2016, Gdansk, Poland, September 29, 2016 : proceedings, Lecture Notes in Business Information Processing, no. 264, 2016, Springer International Publishing, ISBN 978-3-319-46641-5, [978-3-319-46642-2], 215 p., DOI:10.1007/978-3-319-46642-2|
|Keywords in English||business informatics, Big Data, unstructured data, data analysis, data quality|
|Score|| = 15.0, 21-05-2018, BookChapterSeriesAndMatConfByIndicator|
= 15.0, 21-05-2018, BookChapterSeriesAndMatConfByIndicator
* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.