Search Engine Optimization (SEO) in China


China - an emerging giant. If you want to be a player on the Internet, you'd better take the Chinese search engines into account. Large corporate websites won't be very popular on Chinese search engines, because these won't use the Chinese language. However, any Chinese website should have a proper ranking.

I originally wrote this post in an email to a collegue, but hey, content is content, right?

The largest Chinese search engines are: To improve visibility in the Chinese search engines, there are a few organizational tricks.
  • The website should be submitted to all important Chinese search engines, every six months or so. Usually, a search engine has a procedure to do this. However, results will not be instantaneous. Baidu has a reputation for being slow.
  • The website should be listed in the appropriate directories on the largest portals in China (Sohu and Yahoo). Sohu has commercial directory listings and commercial "packages" as well. Yahoo has a free service.
There are also other, more technical solutions.
  • The character encoding of a Chinese website should be Big5 or GB2312, instead of the worldwidely used UTF-8. I'll explain later what this means. Mostly, this would be a rather expensive solution, because it is a change at a very technical level.
  • Another, cheaper adjustment would be to put in a proper language declaration in the HTML head to let the search engine know that this is a Chinese website.
Most importantly, there are also the basics of search engine optimization: make sure there is plenty of new content every day, use proper keywords, mention the most important words on a page more than once, try not to have more than two levels in your navigation, link to quality content from the homepage, use descriptive hyperlinks, and so on.

The character encoding

I had to choose between the short-but-incredibly-complex explanation, or the slightly more elaborate version. I chose the last option, so I will have to tell you something about "Unicode" first.

Unicode is an international standard. Its goal is to provide the means by which text of all forms and languages can be encoded for use by computers. Basically, computers don't understand text at all. A website is just a bunch of textual characters - so, if there was no Unicode standard, browsers wouldn't know how to show a website.

So far, Unicode has appeared simply as a means to assign a unique number to each character used by humans in written language. However, for technical reasons, the storage of these numbers on computers is a problem. Most software can only deal with specific storage formats that allow only a limited amount of Unicode characters to be stored. Such limits do not suffice for the needs of the Chinese language - simply because Chinese has a huge amount of characters.

Systems designers have therefore suggested several mechanisms for implementing Unicode, called "mappings"; which one implementers choose depends on available storage space, source code compatibility, and interoperability with other systems.

UTF-8 (Unicode Transformation Format) is such a mapping. It uses groups of bytes to represent the Unicode standard for the alphabets of many of the world's languages, including Chinese. It is used widely to make websites "readable" by your browser. However, Chinese, Japanese, and Korean characters use three bytes in UTF-8, where Western letters only use two. This makes UTF-8 inefficient for Chinese websites to use, and therefore a few different mappings were invented - GB2312 is the most well-known.

Since most Chinese websites use the GB2312 mapping, most Chinese search engines prefer this mapping, too. And there we are (-;

You are trying to view the newsticker in a browser that doesn't support it. I am sorry.