LIPN: Introducing a new Geographical Context Similarity Measure and a Statistical Similarity Measure based on the Bhattacharyya coefficient
Autor: | Nadi Tomeh, Davide Buscaldi, Belém Priego Sánchez, Joseph Le Roux, Jorge García Flores |
---|---|
Přispěvatelé: | Buscaldi, Davide, Université Sorbonne Paris Cité - - USPC2011 - ANR-11-IDEX-0005 - IDEX - VALID, Laboratoire d'Informatique de Paris-Nord (LIPN), Université Sorbonne Paris Cité (USPC)-Institut Galilée-Université Paris 13 (UP13)-Centre National de la Recherche Scientifique (CNRS), Lexiques, Dictionnaires, Informatique (LDI), Université Sorbonne Paris Cité (USPC)-Université de Cergy Pontoise (UCP), Université Paris-Seine-Université Paris-Seine-Université Paris 13 (UP13)-Centre National de la Recherche Scientifique (CNRS), LabEx EFL, ANR-11-IDEX-0005,EFL,Empirical Foundations of Linguistics : data, methods, models(2011), Université Paris 13 (UP13)-Université de Cergy Pontoise (UCP), Université Paris-Seine-Université Paris-Seine-Université Sorbonne Paris Cité (USPC)-Centre National de la Recherche Scientifique (CNRS) |
Jazyk: | angličtina |
Rok vydání: | 2014 |
Předmět: |
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
Information retrieval Semantic similarity Similarity (network science) Computer science Textual Semantic Similarity Feature (machine learning) Bhattacharyya distance [INFO.INFO-TT] Computer Science [cs]/Document and Text Processing Context (language use) Semantic similarity measures Similarity measure SemEval |
Zdroj: | HAL SemEval@COLING SemEval 2014 SemEval 2014, Aug 2014, Dublin, Ireland. pp.400-405 |
Popis: | International audience; This paper describes the system used by the LIPN team in the task 10, Multilingual Semantic Textual Similarity, at SemEval 2014, in both the English and Spanish sub-tasks. The system uses a support vector regression model, combining different text similarity measures as features. With respect to our 2013 participation, we included a new feature to take into account the geographical context and a new semantic distance based on the Bhattacharyya distance calculated on co-occurrence distributions derived from the Spanish Google Books n-grams dataset. |
Databáze: | OpenAIRE |
Externí odkaz: |