Language check: Passed!

I really wanted to be sure: Not breaking anything. So I’ve checked the wikipedia dumps for de (ok), it (ok), es (ok), en (ok), fr (ok, but I can’t figure out the prefix for templates) and zh (which is Chinese, for the template prefix see fr).

Chinese is cool and it is working because of UTF-8 encoding. It simply works, Safary is able to display the letters. Wikipedia style linking works fine. I’m impressed! My first search phrase was the number “7″. A good start. I really can’t imaging how Chinese people enter letters. Maybe I will see it sometimes.

Maybe you ask yourself: Do I really upload every language specific database to my telephone? No! The sources for Wiki2Touch are compiling fine and running well on my Mac.

3 Responses to “Language check: Passed!”

  1. Sanford Says:

    FYI: We (the Chinese) use input methods to input characters. Every character is a keyboard sequence of usually (dependent on the input method) one to five English characters. Each English character is associated to a root and its variations. By dissecting the character itself we get the roots needed to input the character.

    Some people, especially those in the mainland China (instead of Taiwan and Hong Kong people) uses the pronunciation instead. But for pronunciation based input system it’s quite difficult to input one character at one time since there will be many characters with the same pronunciation so usually these will be word based.

  2. Ole Says:

    Cool! Can anybody tell me, how much space is the recompressed English wiki taking up?

  3. Tom Says:

    Ole-

    Sure: English 2.5GB, 872MB, French 742 MB, Italian 460 MB, Spanish 446 MB and Chinese 209 MB

    Some of them, maybe also the english one, can be made smaller with the indexer out-of-the-box. You may add article prefixes which are never displayed. Thinks like “Image:” (contains the image metadta, not the image itself), Portal:, Category: or so.

    -Tom

Leave a Reply