Approaching terminological ambiguity in cross-disciplinary communication as a word sense induction task: a pilot study

Autor: Julie Mennes, Els Lefever, Ted Pedersen
Rok vydání: 2019
Předmět:
Zdroj: Language Resources and Evaluation. 53:889-917
ISSN: 1574-0218
1574-020X
Popis: Cross-disciplinary communication is often impeded by terminological ambiguity. Hence, cross-disciplinary teams would greatly benefit from using a language technology-based tool that allows for the (at least semi-) automated resolution of ambiguous terms. Although no such tool is readily available, an interesting theoretical outline of one does exist. The main obstacle for the concrete realization of this tool is the current lack of an effective method for the automatic detection of the different meanings of ambiguous terms across different disciplinary jargons. In this paper, we set up a pilot study to experimentally assess whether the word sense induction technique of ‘context clustering’, as implemented in the software package ‘SenseClusters’, might be a solution. More specifically, given several sets of sentences coming from a cross-disciplinary corpus containing a specific ambiguous term, we verify whether this technique can classify each sentence in accordance to the meaning of the ambiguous term in that sentence. For the experiments, we first compile a corpus that represents the disciplinary jargons involved in a project on Bone Tissue Engineering. Next, we conduct two series of experiments. The first series focuses on determining appropriate SenseClusters parameter settings using manually selected test data for the ambiguous target terms ‘matrix’ and ‘model’. The second series evaluates the actual performance of SenseClusters using randomly selected test data for an extended set of target terms. We observe that SenseClusters can successfully classify sentences from a cross-disciplinary corpus according to the meaning of the ambiguous term they contain. Hence, we argue that this implementation of context clustering shows potential as a method for the automatic detection of the meanings of ambiguous terms in cross-disciplinary communication.
Databáze: OpenAIRE