A base62 transformation format of ISO 10646 for multilingual identifiers

Autor: Pei-Chi Wu
Rok vydání: 2001
Předmět:
Zdroj: Software: Practice and Experience. 31:1125-1130
ISSN: 1097-024X
0038-0644
DOI: 10.1002/spe.408
Popis: ISO 10646 Universal Character Set (UCS) is a 31-bit coding architecture that covers symbols in most of the world's written languages. Identifiers in programming languages are usually defined by using alphanumeric characters of ASCII, which represent mainly English words. An approach for working around this deficiency is to encode multilingual identifiers into the alphanumeric range of ASCII. For case-sensitive languages, an encoding that utilizes [0–9][A–Z][a–z] can be more space-efficient for multilingual identifiers. This paper proposes a base62 transformation format of ISO 10646 called UTF-62. The resulting string of UTF-62 is within a [0–9][A–Z][a–z] range, a total of 62 base characters. UTF-62 also preserves the lexicographic sorting order of UCS-4. Copyright © 2001 John Wiley & Sons, Ltd.
Databáze: OpenAIRE