Approaching terminological ambiguity in cross-disciplinary communication as a word sense induction task: a pilot study
Autor: | Julie Mennes, Els Lefever, Ted Pedersen |
---|---|
Rok vydání: | 2019 |
Předmět: |
050101 languages & linguistics
Linguistics and Language Computer science media_common.quotation_subject Context (language use) 02 engineering and technology Library and Information Sciences computer.software_genre Language and Linguistics Education 0202 electrical engineering electronic engineering information engineering Word-sense induction 0501 psychology and cognitive sciences Set (psychology) Cluster analysis media_common business.industry 05 social sciences Ambiguity Term (time) Language technology 020201 artificial intelligence & image processing Artificial intelligence Computational linguistics business computer Natural language processing Sentence |
Zdroj: | Language Resources and Evaluation. 53:889-917 |
ISSN: | 1574-0218 1574-020X |
Popis: | Cross-disciplinary communication is often impeded by terminological ambiguity. Hence, cross-disciplinary teams would greatly benefit from using a language technology-based tool that allows for the (at least semi-) automated resolution of ambiguous terms. Although no such tool is readily available, an interesting theoretical outline of one does exist. The main obstacle for the concrete realization of this tool is the current lack of an effective method for the automatic detection of the different meanings of ambiguous terms across different disciplinary jargons. In this paper, we set up a pilot study to experimentally assess whether the word sense induction technique of ‘context clustering’, as implemented in the software package ‘SenseClusters’, might be a solution. More specifically, given several sets of sentences coming from a cross-disciplinary corpus containing a specific ambiguous term, we verify whether this technique can classify each sentence in accordance to the meaning of the ambiguous term in that sentence. For the experiments, we first compile a corpus that represents the disciplinary jargons involved in a project on Bone Tissue Engineering. Next, we conduct two series of experiments. The first series focuses on determining appropriate SenseClusters parameter settings using manually selected test data for the ambiguous target terms ‘matrix’ and ‘model’. The second series evaluates the actual performance of SenseClusters using randomly selected test data for an extended set of target terms. We observe that SenseClusters can successfully classify sentences from a cross-disciplinary corpus according to the meaning of the ambiguous term they contain. Hence, we argue that this implementation of context clustering shows potential as a method for the automatic detection of the meanings of ambiguous terms in cross-disciplinary communication. |
Databáze: | OpenAIRE |
Externí odkaz: |