Entropy of Tamil Language and Prioritized Coding Algorithm for Encoding of Tamil Letters
Autor: | S. Ewins Pon Pushpa, S Narasimhan, N M Dinesh, P Prashanth |
---|---|
Rok vydání: | 2018 |
Předmět: |
Coding algorithm
business.industry Computer science Binary number Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) Huffman coding computer.software_genre language.human_language symbols.namesake Tamil symbols language Entropy (information theory) Statistical analysis Artificial intelligence business computer Natural language processing |
Zdroj: | 2018 International Conference on Smart Systems and Inventive Technology (ICSSIT). |
DOI: | 10.1109/icssit.2018.8748705 |
Popis: | In this paper, the value of entropy of Tamil language estimated using statistical analysis is presented. The statistical method makes use of samples of text taken from different Tamil texts and entropy is calculated for the specific text. The value of entropy of the language as a whole is calculated by computing the average of the entropy obtained by considering various Tamil text samples. This paper also presents prioritized coding algorithm for efficient translation of Tamil letters to binary digits.. This paper discusses the basic model of prioritized coding algorithm with its implementation steps and compares its performance with other algorithms. The places where this algorithm performs better than Huffman coding algorithm is discussed and the relationship between the distribution of letters in the respective positions and the effect it has on the success of prioritized coding algorithm is highlighted. Also the text is encoded using an algorithm called as predictive guess coding based on the idea of converting the symbols to the number of guesses it takes to predict the symbol as proposed by C.E Shannon in his paper. |
Databáze: | OpenAIRE |
Externí odkaz: |