A Comparison of Methods for Automatic Term Extraction for Domain Analysis
Autor: | Gregory Kulczycki, William B. Frakes, Jason Tilley |
---|---|
Rok vydání: | 2014 |
Předmět: |
Measure (data warehouse)
Vocabulary business.industry Computer science Speech recognition media_common.quotation_subject computer.software_genre Automatic term extraction Term (time) Domain (software engineering) Word lists by frequency Domain engineering Domain analysis Artificial intelligence business computer Natural language processing media_common |
Zdroj: | Lecture Notes in Computer Science ISBN: 9783319141299 ICSR |
DOI: | 10.1007/978-3-319-14130-5_19 |
Popis: | Fourteen word frequency metrics were tested to evaluate their effectiveness in identifying vocabulary in a domain. Fifteen domain-engineering projects were examined to measure how closely the vocabularies selected by the fourteen word frequency metrics were to the vocabularies produced by domain engineers. Stemming and stopword removal were also evaluated to measure their impact on selecting proper vocabulary terms. The results of the experiment show that stemming and stopword removal do improve performance and that term frequency is a valuable contributor to performance. Most word frequency metrics gave similar results. A few of the metrics did poorly compared to the others. |
Databáze: | OpenAIRE |
Externí odkaz: |