A generic open world named entity disambiguation approach for tweets

Autor:	Mena Badieh Habib, M. van Keulen
Přispěvatelé:	Databases (Former)
Rok vydání:	2013
Předmět:	Focus (computing) Information retrieval Computer science business.industry Home page Rank (computer programming) InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL Twitter EWI-23365 Context (language use) IR-86471 Named Entity Disambiguation Support vector machine METIS-297649 Entity linking Knowledge base Social media Named Entity RecognitionNamed Entity LinkingNamed Entity ExtractionNamed Entity DisambiguationSocial MediaTwitterTweetsMicroblogs business Social Media
Zdroj:	KDIR/KMIS Proceedings of the 5th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2013, Vilamoura, Portugal, 267-276 STARTPAGE=267;ENDPAGE=276;TITLE=Proceedings of the 5th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2013, Vilamoura, Portugal Proceedings of the 5th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2013 Scopus-Elsevier
DOI:	10.5220/0004536302670276
Popis:	Social media is a rich source of information. To make use of this information, it is sometimes required to extract and disambiguate named entities. In this paper we focus on named entity disambiguation (NED) in twitter messages. NED in tweets is challenging in two ways. First, the limited length of Tweet makes it hard to have enough context while many disambiguation techniques depend on it. The second is that many named entities in tweets do not exist in a knowledge base (KB). In this paper we share ideas from information retrieval (IR) and NED to propose solutions for both challenges. For the first problem we make use of the gregarious nature of tweets to get enough context needed for disambiguation. For the second problem we look for an alternative home page if there is no Wikipedia page represents the entity. Given a mention, we obtain a list of Wikipedia candidates from YAGO KB in addition to top ranked pages from Google search engine. We use Support Vector Machine (SVM) to rank the candidate pages to find the best representative entities. Experiments conducted on two data sets show better disambiguation results compared with the baselines and a competitor.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b1a7cac2cf3537823e56fe8f0810296c https://doi.org/10.5220/0004536302670276 Zobrazit plný text záznamu