A base62 transformation format of ISO 10646 for multilingual identifiers

Autor:	Pei-Chi Wu
Rok vydání:	2001
Předmět:	Identifier UTF-8 Alphanumeric Computer science Programming language Universal Character Set computer.software_genre Lexicographical order computer Software
Zdroj:	Software: Practice and Experience. 31:1125-1130
ISSN:	1097-024X 0038-0644
DOI:	10.1002/spe.408
Popis:	ISO 10646 Universal Character Set (UCS) is a 31-bit coding architecture that covers symbols in most of the world's written languages. Identifiers in programming languages are usually defined by using alphanumeric characters of ASCII, which represent mainly English words. An approach for working around this deficiency is to encode multilingual identifiers into the alphanumeric range of ASCII. For case-sensitive languages, an encoding that utilizes [0–9][A–Z][a–z] can be more space-efficient for multilingual identifiers. This paper proposes a base62 transformation format of ISO 10646 called UTF-62. The resulting string of UTF-62 is within a [0–9][A–Z][a–z] range, a total of 62 base characters. UTF-62 also preserves the lexicographic sorting order of UCS-4. Copyright © 2001 John Wiley & Sons, Ltd.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::6593b46d3b6e03b6844aeb81c45a08b7 https://doi.org/10.1002/spe.408 Zobrazit plný text záznamu Plný text