This page explains in some more detail what is meant by an encoding system and what Unicode is. If you just want to use Japanese with your web browser or email and you are not curious about technical details, you can start on one of the earlier pages on this site, as shown in this site map:
To represent character data numerically (as in a computer file), one needs to decide on an encoding system that matches each numerical value to a specific characters. Each national language has one or more different encodings associated with it. So a single numerical value could represent any one of several different characters in several different languages, depending on the encoding system in use. Japanese alone has three common encoding schemes EUC, SJIS, and ISO-3022-JP. The last is sometimes referred to as JIS.
For a computer to display text that has been encoded as numerical data (an email message or a web page, for example), it needs to know what encoding system was used. If the encoding system is not specified explicitly in the email or on the web page, the email program or web browser will try to guess what encoding system was used, but this is not always easy. If the guess is wrong, the text will display as nonsense. You as the user can then tell the browser or email program to try interpreting the page with a different encoding system. You do this by selecting a different encoding choice from the encoding menu. Most email programs don't have this option (Apple's Mail application is an exception.) I show how to do this for your browser on an earlier page in this site. (See the list of pages above.)
Unicode is a standard for several closely related encoding systems that solve some of the problems above by using a much larger set of numerical values, and trying to represent all the characters of all major languages in a single encoding set. Each number encodes a unique character. Unicode is also referred to as UTF, for Unicode Transformation Format, and in the encoding menu of your browser you may see choices for Unicode encodings like UTF-8 and UTF-16. (If you are wondering why a single universal standard should produce several different encodings, see the note on encoding versus character set below).
Unicode encoding systems naturally include encodings for Japanese and other Asian Languages. If a Mac program bills itself as "Unicode savvy," it generally means you will be able to use Japanese with it once you have enabled Japanese on your system. (See the front page for how to do this.)
Unicode has a few different advantages. One is that it can be used as a more or less single universal standard: everyone can use Unicode to encode data in their own language, without the confusion of having many different encoding systems in use. For example, this makes it possible to open and save the same Japanese text file in several different software applications, even on different platforms like Mac, Unix, and Windows. For web development, another immediate advantage is that with Unicode you can represent multiple languages on one web page. Normally a single web page (or any pure text file) must use the same encoding throughout, so a page using a Japanese encoding scheme cannot contain characters that are not included in that set--Korean characters, for example. (Japanese encodings do contain roman characters, so you can mix English and Japanese on one page even without Unicode.)
Note that an encoding system like Unicode is quite distinct from a font. If software that supports Unicode encounters text in Japanese, it may know that the message is composed of certain Japanese characters, but the computer may not have the fonts required to display those characters on the screen or printer. For this reason, to work in a language you also need a font for the language.
Apple's TextEdit application can convert a text file to different encodings, but you need to save the file in text format instead of the Rich Text Format (RTF) that is the program's default. From the Format Menu select "Make Plain Text" and then select "File>Save As" choose the encoding from the pop-up encoding menu in the save dialog box. Other text editors that support unicode can convert files in a similar way.
If you need to convert a large file like a dictionary file from one encoding system to another, you can use Cyclone, which allows you access Mac OS X's encoding converters directly.
If you are wondering why a single unified standard like Unicode should produce several different encodings, the answer is that Unicode actually defines a character set that matches every character with a unique integer number. But there are different ways of representing those integer numbers in the binary code (the ones and zeroes) that the computer uses. So UTF-8, UTF-16, and UTF-32 are three different encodings that all use the Unicode character set: each assigns the same integer to a given character, but each represents that integer with a different number and combination of ones and zeroes. (Actually the Japanese encodings EUC, Shift JIS, and ISO-3022-JP are all based on the same character set. EUC is most common on UNIX, Shift JIS on the web, and ISO-3022-JP in email.) The most common unicode encoding on the web is UTF-8. That is the encoding these pages use.
For more detailed information on Japanese encodings and Unicode applications, I recommend the following excellent resources.
Back to Christopher Bolton's Home Page