Tips for Adding Japanese to Your Own Web Pages

I am not a professional web developer, but the following advice may be helpful for those who have experience developing English pages and would like to include some Japanese text.

Software

Use a text or html editor can support Japanese input and can save the file in the encoding you want to use. I use BBEdit.

HTML tips

On your web page, make sure to specify the encoding in a meta tag in the head section: <meta http-equiv="Content-type" content="text/html; charset=shift_jis"> . The most common flavor of unicode encoding is indicated with charset=utf-8.

It is less critical but good practice to specify the language of the page in your html code. You can do this by adding a lang="en" or lang="ja" attribute to the <html> tag at the start of the file. On a page that is mostly English, you can also specify that one portion is Japanese by adding the lang attribute to a particular element, like this: <p lang="ja">. You can also add it to the span and div elements to designate smaller or larger sections of Japanese text.

Ruby (Furigana) Markup

Particularly if you are teacher, you may be interested in how to display ruby or furigana over kanji characters on your web pages. Many browsers, including recent versions versions Safari and Google Chrome, can display Ruby. And Firefox can display it with a plugin. (Details are on the browsers page.) Simple Ruby markup looks like this:

<ruby>
  <rb>日本語</rb>
  <rt>にほんご</rt>
</ruby>

The <rb> tag is for "ruby body,", and the <rt> tags are for "ruby text." To make the pronunciation display more nicely in browsers that don't support ruby, you can add <rp> "ruby parentheses" tags like this.

<ruby>
  <rb>日本語</rb>
  <rp>(</rp> <rt>にほんご</rt> <rp>)</rp>
</ruby>

Ruby tags are not part of the HTML 4 specification, however. For detailed information, see the Ruby Annotation Specification at the World Wide Web Consortium.

Encoding Tips

The most common encoding for Japanese web pages is Shift JIS, but Unicode is more modern, and has a number of advantages. (If you don't know what an encoding is, see the "About Encodings" page of this site.)

If you use Shift JIS encoding and have accented English characters, as in vis-à-vis and garçon, make sure to code these using latin character entities like &egrave; and &ccedil;. Otherwise these characters will not display correctly.

Unlike Shift JIS and other specifically Japanse encodings, unicode can represent any character in any common language, which allows you to combine English with multiple non-roman languages on a single page. BBEdit has so many different flavors of unicode in its encoding menu that it can be confusing which to choose. UTF-8 seems to be the most common flavor of unicode for html. W3C recommends using an encoding without a byte order mark (BOM) included.

If you are using CSS and external style sheets, be careful about the encoding of the style sheets. Even if your html files are encoded using unicode, your style sheets do not have to be. In the past I've read (and found) that certain unicode encodings can cause some browsers to be unable to read the style sheet. My style sheets don't contain any Japanese, so I'm setting their encoding to Western (iso-8859-1). You can specify the encoding of an external style sheet just as you can specify it for an html file, but the syntax differs: for CSS, put the following declaration in the first line of the file: @charset "iso-8859-1"