Internationalization Part 4 – Capitalization II
Posted in May 6th, 2009 by Cassius FigueiredoOne of the most appealing features in the ASCII character set is the easy conversion between capital letters and lower case letters. The latter can be created by adding or subtracting 0×0020 (hexadecimal system) to its corresponding point in the code. See the example and the figure provided below for a better understanding of what is being explained here.
Example: A [0x0041] + 0×0020 = a [0x0061]

We can explore the “hexadecimal system” topic in a future post. If you would like to understand it better and cannot wait for the post to be published, then you should read the fairly good Wikipedia article at http://en.wikipedia.org/wiki/Hexadecimal.
Unfortunately, not even with ASCII coding was it possible to use the above mentioned strategy to convert accented upper case into lower case characters, and vice versa. In addition, ASCII was not able to encompass the current internationalization needs. A much more comprehensive coding is used today. This is Unicode, which will be further discussed in future posts.
There are several other reasons why a simple, or even complex, algorithm does not cover all conversion needs. Here are a few examples:
• Some languages do not have a one-to-one mapping between their upper case and lower case characters
• While European French characters lose their accents when they are capitalized (é => E), in the same thing does not happen French Canadian (é => É)
• The corresponding capital letter for the German ‘ß’ is ‘SS’
• Most non-Latin languages do not have the concept of upper case and lower case characters (e.g. Chinese, Japanese and Thai).
Therefore, capitalization becomes a highly sophisticated procedure in several languages and it is far from reflecting the simplicity of good old times ASCII.
In the next post: Addresses… Do not get lost on your way there, OK?
Related posts:



















