Popis: |
Many researches and inventions have been made in the field of linguistics and technology. Even so, the integration between linguistics and technology is not always reliable to all language. Every language is unique in its linguistic nature and rules. In this paper, a lemmatization technique in Bahasa (Indonesian language) is presented. It has achieved good precision by using The Indonesian Dictionary and a set of rules to remove affixes. The lemmatization technique is developed based on the previous algorithm, Indonesian stemmer. Both Indonesian stemming and lemmatization method have the same characteristics but a little bit different in its implementation. The way to reach its own goal/purpose is defined as a core difference and therefore possible to modify. The result shows that the algorithm achieved roughly 98% precision on a collection consisting 57,261 valid words with 7,839 unique valid words gathered from Kompas.com, an Indonesian online news article. |