LIPN: Introducing a new Geographical Context Similarity Measure and a Statistical Similarity Measure based on the Bhattacharyya coefficient

Autor: Nadi Tomeh, Davide Buscaldi, Belém Priego Sánchez, Joseph Le Roux, Jorge García Flores
Přispěvatelé: Buscaldi, Davide, Université Sorbonne Paris Cité - - USPC2011 - ANR-11-IDEX-0005 - IDEX - VALID, Laboratoire d'Informatique de Paris-Nord (LIPN), Université Sorbonne Paris Cité (USPC)-Institut Galilée-Université Paris 13 (UP13)-Centre National de la Recherche Scientifique (CNRS), Lexiques, Dictionnaires, Informatique (LDI), Université Sorbonne Paris Cité (USPC)-Université de Cergy Pontoise (UCP), Université Paris-Seine-Université Paris-Seine-Université Paris 13 (UP13)-Centre National de la Recherche Scientifique (CNRS), LabEx EFL, ANR-11-IDEX-0005,EFL,Empirical Foundations of Linguistics : data, methods, models(2011), Université Paris 13 (UP13)-Université de Cergy Pontoise (UCP), Université Paris-Seine-Université Paris-Seine-Université Sorbonne Paris Cité (USPC)-Centre National de la Recherche Scientifique (CNRS)
Jazyk: angličtina
Rok vydání: 2014
Předmět:
Zdroj: HAL
SemEval@COLING
SemEval 2014
SemEval 2014, Aug 2014, Dublin, Ireland. pp.400-405
Popis: International audience; This paper describes the system used by the LIPN team in the task 10, Multilingual Semantic Textual Similarity, at SemEval 2014, in both the English and Spanish sub-tasks. The system uses a support vector regression model, combining different text similarity measures as features. With respect to our 2013 participation, we included a new feature to take into account the geographical context and a new semantic distance based on the Bhattacharyya distance calculated on co-occurrence distributions derived from the Spanish Google Books n-grams dataset.
Databáze: OpenAIRE