May 21, 2003
Googling in Japanese

redpajero.jpgAn interesting conversation is going on over at the DigitalEve Japan discussion list about searching in Japanese vs English. One poster commented that searching for red Pajero at images.google.com and images.google.co.jp doesn’t bring up the same results. Later she revealed that she was searching for it in Japanese on the Japanese Google, and in English on the English Google.

Which is not the same search at all. Why not?

Well, as I explained on the list, if you search for aka pajero and for akai pajero (with aka/akai in kanji and pajero in katakana) you get different results: 11 for aka pajero and 3 for akai pajero. If you spell out aka or akai in hiragana you get 0 results.

Yet all four variations are definitely the same idea of “red Pajero” that any Japanese reader would understand.

This must give Japanese search engine developers nightmares. I didn’t even start on the variations of spaces between words or not. Generally, there are not spaces between words in Japanese. I usually search with spaces between words, though.

If you search for red Pajero in English on either images.google.co.jp or images.google.com you get 49 hits. Quite a few more than searching in Japanese.

So there are two issues involved:

1. There’s more than one way to write “red Pajero” in Japanese.
2. There are more results in English than in Japanese.

Regarding 1, you must try all variations to find all results. No way around it.

As for 2, I’m not sure whether there are more hits on this search in English than in Japanese becasue there are simply more pages on the web in English, or whether Japanese webmasters tend to name their images and pages in English or romaji even on otherwise Japanese pages.

Does anyone know the breakdown of English pages to Japanese pages? I assume a whole lot more English than Japanese, but I don’t know where to dredge up the actual numbers.

Posted by kuri at May 21, 2003 11:26 AM

Comments

It’s interesting!! I tried the serch in HIRAGANA on yahoo, actually only two hits. Ok next time, I’ll try in KATAKANA.—- anyway english is more difficult to me—- because I’m a Japanese.!

Posted by: Mieko on May 21, 2003 01:52 PM

You’ve been spotted and mentioned on the FG site for this post. You may get a few more hits today.

Posted by: Tracey on May 21, 2003 02:28 PM

From a technical point Japanese is especially difficult to index as the word that you are searching for may be written in:

Kanji
Kanji + Hiragana mix
Hiragana
Full Width Katakana (Zenkaku)
Half Width Katakana (Hankaku)
and possibly Romaji

Some Japanese magazines used to do a comparsion of the numbers of results for popular searches which were searched for in different ways on various search engines. By doing this you can sometimes understand the underlying way the indexer works which is different for each engine. Some engines may be better than others for one of the above group of wrting methods.

Posted by: Stuart Woodward on May 22, 2003 10:46 AM
Post a comment
Name:


Email Address (optional):


URL (optional):


Comments:


Remember info?



mediatinker.com