Automatic Encoding and Language Detection in the GSDL -- Part II.

Autor: Pinkas, Otakar
Předmět:
Zdroj: Journal of Systems Integration (1804-2724); 2015, Vol. 6 Issue 4, p45-51, 8p
Abstrakt: The processing of the older MS Word format in the GSDL depends on the correct encoding of the temporary HTML file. The "windows-scripting" fails, but the wvware.exe program is successful. The actual .docx format needs user to change the setting in the Word configuration. A temporary HTML file should be encoded in UTF-8 instead of the Windows-1250 preset in the Czech environment. The automatic conversion from ISO-8859-2 to Windows-1250 for HTML pages is wrong, but the conversion ISO-8859-1 to Windows-1252 is valid. The automatic language detection is sometimes incorrect due to the predomination of a similar language model. The automatic language detection needs further investigation. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index