Dimensions of semantic similarity
Paweł Szmeja , Maria Ganzha , Marcin Paprzycki , Wiesław Pawłowski
AbstractSemantic similarity is a broad term used to describe many tools, models and methods applied in knowledge bases, semantic graphs, text disambiguation, ontology matching and more. Because of such broad scope it is, in a “general” case, difficult to properly capture and formalize. So far, many models and algorithms have been proposed that, albeit often very different in design and implementation, pro- duce a single score (a number) each. These scores come under the single term of semantic similarity. Whether one is comparing documents, ontologies, entities, or terms, existing methods often propose a universal score—a single number that “captures all aspects of similarity”. In opposition to this approach, we claim that there are many ways, in which semantic entities can be similar. We propose a division of knowledge (and, consequently, similarity) into categories (dimensions) of semantic relationships. Each dimension represents a different “type” of similarity and its implementation is guided by an interpretation of the meaning (semantics) of that similarity score in a particular dimension. Our proposal allows to add extra information to the similarity score, and to highlight differences and similarities between results of existing methods.
* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.