Author Archive

Internationalization Part 8 – Fonts

06/08/2009

There are two aspects to observe as far as fonts are concerned. They will be covered in the topics below.

a) Font Names Specified in the Code

Remember Unicode? I promised to talk about this subject, but had not done so yet… Be prepared, because the subject is so extensive that it is worth of another series. The advent of Unicode has made it possible to display thousands of distinct characters, rather than sets of ten characters, which is the case of ASCII.

Most fonts do not encompass the complex set of Unicode characters. Therefore, if the name of a font type displays only Latin characters, and no Japanese ideogram is directly inserted into the software (or Web page) programming, all the Japanese localized text will be displayed as question marks, squares or strange characters, inserted for word space adjustment.

As an example, consider the Google homepage in Japanese. The first case is displayed incorrectly, whereas the second one is shown with its correct character set.


Click to enlarge the image


Click to enlarge the image

One reason for not using font names in programming is that the desired font may not be available in the system where the text is being displayed. That is, if your client’s computer does not have the AuntieLucyBold font, the text can be displayed with a substitute font, which may cause visualization problems.

b) Font Sizes Specified in the Code

Some writing forms are more complex than others. The most complex forms need more pixels (or “points”) for correct display.

For instance, most Latin characters may be displayed in a 5×7 grid; Japanese characters, however, need at least a 16×16 grid for clear visualization. Chinese characters need a 24×24 grid!

The chart below illustrates why some characters displayed in small fonts become illegible.

Font Size

As you can see, it is impossible to understand a Japanese ideogram with a font of size 7, while the Latin character “E” is perfectly legible, regardless of its size.

In the next post we will talk about number formats… 1, 2, 3 and… Get ready!

Share and Enjoy:
  • email
  • Twitter
  • del.icio.us
  • Facebook
  • LinkedIn
  • Technorati
  • Rec6
  • Identi.ca
  • Digg
  • Google Bookmarks
  • FriendFeed
  • Live
  • MySpace
  • StumbleUpon
  • Tumblr
  • Propeller
  • Print
  • PDF

Internationalization Part 7 – Date Format

06/04/2009

The format used to write dates is not consistent worldwide. Although dates basically include the day, month and year, their order and separators vary considerably. In fact, several differences are possible, including those within regions in the same given country.

There are two basic forms of writing dates:

a) Full Date

The chart provided below illustrates the different forms of writing full dates.

Obviously, month and weekday names vary from location to location; however, in Spanish-speaking countries, the day is placed before the month, letters are all in lower case, and “de” (in English, “of”) has been added.

In Japan, the weekday is omitted, and translations for day, month and year act more like separators.

Date formats in internationalized Java

b) Figure Date

Here you will find different abbreviated date formats:

Regarding figure date, it can be observed that in Spanish-speaking countries the order is day/month/year (also known as “dd/mm/yy”), unlike the USA pattern, namely month/day/year. In Japan the order is year/month/day. These differences may cause serious problems if they are not treated with due attention.

For instance, depending on the country, the date 07/04/01 can represent:

•      July 4, 2001 (EUA)
•      April 7, 2001 (Mexico)
•      April 1, 2007 (Japan)

Well, no more dates for today. In the next post we will talk about fonts.

Share and Enjoy:
  • email
  • Twitter
  • del.icio.us
  • Facebook
  • LinkedIn
  • Technorati
  • Rec6
  • Identi.ca
  • Digg
  • Google Bookmarks
  • FriendFeed
  • Live
  • MySpace
  • StumbleUpon
  • Tumblr
  • Propeller
  • Print
  • PDF

Internationalization Part 6 – Formatting of Financial Symbols

05/14/2009

As promised, we are talking about money (something rare in this financial crisis period =).

With regard to currency formatting, the following elements should be considered. Yet before that, let us give just a brief explanation. In the incoming examples we will be referring to European currencies prior to adoption of the Euro (€) so as to make it easy to introduce the wide variety of existing possibilities.

a) Currency Symbol
Currency Symbol can be a predefined element, such as the Euro (€), or a combination of symbols, like the Deutsche Mark (DM), which may be placed before or after the numerical value.

b) Negative Values
There are a number of ways to introduce negative values, namely:
   •  A negative sign before the currency symbol and the number:
      o  UK:  -£127.54
      o  France:  -127,54 F

   •  A negative sign before the number, but after the currency symbol:
      o  Denmark:  kr-127,54

   •  A negative sign after the number and the currency symbol:
      o  Netherlands:  127,54 F-

   •  The use of parentheses:
      o  USA:  ($127.54)

c) Decimal Separator
Most currencies use the same decimal and thousand separators in relation to local numbering, but this is not always true. In some Swiss regions, for example, a full stop is used as a decimal separator for Swiss Francs (Sfr. 127.54); however, a comma is used as a decimal separator in the rest of the country (Sfr. 127,54).

Money does indeed make the world go round

In the next post, we will talk about dates again—however, shifting the focus a little this time.

See you!

Share and Enjoy:
  • email
  • Twitter
  • del.icio.us
  • Facebook
  • LinkedIn
  • Technorati
  • Rec6
  • Identi.ca
  • Digg
  • Google Bookmarks
  • FriendFeed
  • Live
  • MySpace
  • StumbleUpon
  • Tumblr
  • Propeller
  • Print
  • PDF

Internationalization Part 5 – Addresses

05/11/2009

I guess the last post was rather tedious, with all that talk on hexadecimal system. This post will be lighter, I promise!

One of the least standardized items to be carefully monitored during the internationalization process is the address format. The input fields and routines that process information related to addresses must be able to grasp and manipulate the most varied address formats.

One of the most common errors is to force the user to enter information in a field called “State” (or “Province,” for Canada). While this information makes sense to people living in the U.S. or Canadian, it may create confusion for users in other regions of the globe that cannot complete the “State” field as it does not exist in their addresses.

You should also be flexible when validating input date. For instance, it is recommended to avoid validating zip code fields as they vary widely from country to country and may even contain letters and not only numbers.

Therefore, you should be very careful when including fields for entering address information in a Web form, for example. There are so many formats of addresses around the world that the using the most flexible data entry form you can think of is your best choice. This will prevent your users from spending their precious time trying to understand how to enter their contact information. They might even give up and do something more useful.

In the next post we will talk about money. Ding, ding, jackpot!

Share and Enjoy:
  • email
  • Twitter
  • del.icio.us
  • Facebook
  • LinkedIn
  • Technorati
  • Rec6
  • Identi.ca
  • Digg
  • Google Bookmarks
  • FriendFeed
  • Live
  • MySpace
  • StumbleUpon
  • Tumblr
  • Propeller
  • Print
  • PDF

Internationalization Part 4 – Capitalization II

05/06/2009

One of the most appealing features in the ASCII character set is the easy conversion between capital letters and lower case letters. The latter can be created by adding or subtracting 0×0020 (hexadecimal system) to its corresponding point in the code. See the example and the figure provided below for a better understanding of what is being explained here.

Example: A [0x0041] + 0×0020 = a [0x0061]

We can explore the “hexadecimal system” topic in a future post. If you would like to understand it better and cannot wait for the post to be published, then you should read the fairly good Wikipedia article at http://en.wikipedia.org/wiki/Hexadecimal.

Unfortunately, not even with ASCII coding was it possible to use the above mentioned strategy to convert accented upper case into lower case characters, and vice versa. In addition, ASCII was not able to encompass the current internationalization needs. A much more comprehensive coding is used today. This is Unicode, which will be further discussed in future posts.

There are several other reasons why a simple, or even complex, algorithm does not cover all conversion needs. Here are a few examples:

• Some languages do not have a one-to-one mapping between their upper case and lower case characters
• While European French characters lose their accents when they are capitalized (é => E), in the same thing does not happen French Canadian (é => É)
• The corresponding capital letter for the German ‘ß’ is ‘SS’
• Most non-Latin languages do not have the concept of upper case and lower case characters (e.g. Chinese, Japanese and Thai).

Therefore, capitalization becomes a highly sophisticated procedure in several languages and it is far from reflecting the simplicity of good old times ASCII.

In the next post: Addresses… Do not get lost on your way there, OK?

Share and Enjoy:
  • email
  • Twitter
  • del.icio.us
  • Facebook
  • LinkedIn
  • Technorati
  • Rec6
  • Identi.ca
  • Digg
  • Google Bookmarks
  • FriendFeed
  • Live
  • MySpace
  • StumbleUpon
  • Tumblr
  • Propeller
  • Print
  • PDF

Internationalization Part 3 – Capitalization I

04/13/2009

In this post, we will talk about the challenges related to capitalization. For this, we need to explain how letters and numbers were represented in the early eras of personal computing.

In the past most computers “spoke” English, with each character of a certain language related to a number in a character table. In this type of representation, only 128 codes were necessary to map all used characters, as you can see below.

Upper Case Letters (A – Z)                      26
Lower Case Letters (a – z)                        26
Digits (0 – 9)                                               10
Punctuation Marks (. , + { [ ) % $)           32
Space                                                            01
Control Characters (TAB, CR, LF etc.)    33
Total                                                           128


The ASCII Chart

As 7 bits are necessary to represent the 128 existing positions (27=128), a 7-bit coding was created and called “ASCII”, the acronym for American Standard Code for Information Interchange.

I suggest you read the article on ASCII on Wikipedia. It will serve as a useful reference for the most part of the next post.

See you then!

Share and Enjoy:
  • email
  • Twitter
  • del.icio.us
  • Facebook
  • LinkedIn
  • Technorati
  • Rec6
  • Identi.ca
  • Digg
  • Google Bookmarks
  • FriendFeed
  • Live
  • MySpace
  • StumbleUpon
  • Tumblr
  • Propeller
  • Print
  • PDF

Internationalization Part 2 – Differences in Calendars

03/12/2009

Hi there! In this second post on internationalization, we will be talking about a seemingly simple and irrelevant aspect, but which can create quite a few problems.

As you probably know, the Gregorian calendar is used in most English-speaking countries; however, when developing an internationalized application, one should take into consideration other calendars in use today, such as the Japanese, Buddhist, Islamic, Hebrew and Chinese calendars.

Here are some examples of differences among the different calendars:

• The year of each calendar can be different. The Gregorian year of 2000 is equivalent to the 12th year of the Japanese Heisei era and 1421 in the Islamic calendar.
• The first day of the year may not be January 1. The Chinese New Year, for example, was celebrated on February 5, 2000 of the Gregorian calendar.
• The length of months and years may vary, as well as leap years.
• The first day of the week may not be Sunday… In most European calendars, the week begins on Monday.


The Japanese calendar in Windows Vista

Also, there are cultures that may use more than one type of calendar. In Arabic regions of Nigeria, for example, all the following calendars are available:

Hijri or Islamic calendar
• English Gregorian calendar in the local language
• French Gregorian calendar in the local language
• Arabic Gregorian calendar in the local language
• Gregorian calendar in English
• Gregorian calendar in French

Therefore, when creating a calendar in your application, think and research for which countries your product will be internationalized as you will have to adapt it according to the locale.

In the next post we will be talking about capitalization. See you there!

Share and Enjoy:
  • email
  • Twitter
  • del.icio.us
  • Facebook
  • LinkedIn
  • Technorati
  • Rec6
  • Identi.ca
  • Digg
  • Google Bookmarks
  • FriendFeed
  • Live
  • MySpace
  • StumbleUpon
  • Tumblr
  • Propeller
  • Print
  • PDF

Internationalization Part 1 – Introduction

03/06/2009

Hello there! Shall we talk about internationalization?

As you may know, the term “internationalization” is usually reduced to its acronym “i18n”. The letters between the word’s initial “i” and the final “n” add up to 18 letters, right? And so does “18” add up to i18n!

Internationalization relates to the field devoted to product development and analysis, including all the relevant concepts for the product’s launch onto a new market. But why is it necessary to worry about this? Here are some reasons:

• Each year companies invest millions of dollars in computer software production and trading worldwide.
• In today’s world, there is a need to sell products in markets other than the one where they were manufactured.
• Due to market imposition, an even greater investment is necessary for product adaptation and translation to reach its target consumers.

The subject has already been explored in the Ccaps Newsletter, but the concept of i18n can also be applied to several means and products for export (e.g. food packaging, machinery manuals, etc). However, since we are a technology company, we will focus on the application of i18n in software development.

In this context, the ultimate goal of internationalization is to introduce the user to an application that is visually and functionally identical to the one that originated it in the several languages to which it has been localized. Users (and developers, especially) who are not familiar with the requirements for the development of internationalized products will certainly be surprised with the number of unknown aspects involved in such a process and with the details of the project that need solving.

This initial series of articles on internationalization will cover the basic aspects of the process and provide examples of how cultural diversity can affect software functioning.

To begin with, we will analyze a very simple aspect: the legal issues related to each target market. Since this is a key concept, you should seek advice from a consultant regarding the lawfulness of each “sensitive” feature of the application. As an example, we can refer to specific legislation of some countries regarding the use and exportation of encryption algorithms, used in simple procedures involving file compression and copy protection.

Another example relates to the documentation that accompanies the product. In some countries, it can remain in English while in others it has to be translated to the local language. In addition, the laws regulating corporate life in certain countries restrict the possibility of a given company declaring that its products are superior to those of the competition. This directly impacts the company’s marketing strategy, and possibly the documentation, help systems, and other materials packaged with the product.

This is only the tip of the iceberg. In future posts, we will be talking about other very important aspects. Be prepared for a few other surprises!

Share and Enjoy:
  • email
  • Twitter
  • del.icio.us
  • Facebook
  • LinkedIn
  • Technorati
  • Rec6
  • Identi.ca
  • Digg
  • Google Bookmarks
  • FriendFeed
  • Live
  • MySpace
  • StumbleUpon
  • Tumblr
  • Propeller
  • Print
  • PDF