Wiki2Touch 0.52

And once again a new version.

This deals with now finilized image support. I will describe how it works and how do you can create image packs over at Google Code (or in the forum). The tools to build the packages are out, too.

I was able to download app. 332,000 thumbnail images from Wikipedia (German Edition). There were smaller than the original once (120px instead of 180px). 2.85 GB!

But there ware way to get it smaller. I’m now using a version which “only” needs 1.5 GB. I will write another Blog entry about that in a couple of hours.

28 Responses to “Wiki2Touch 0.52”

  1. Sanford Says:

    My source had been updated to reflect the new version
    The url is still: http://168weedon.com/i
    Feel free to use it until you have submitted it to some community source, bandwidth is not a big concern for just an xml file.

    Furthermore I have tested the app and it still had the seg fault (template resolution) I reported earlier. With some more testing I have narrowed the problem to the template:Bd in zhwiki
    http://zh.wikipedia.org/w/index.php?title=Template:Bd&action=edit

  2. Tom Says:

    Sanford-

    thanks for the fast reaction and for poviding the installer package.

    Yes, I know, the bug is still in there simply because I haven’t looked at it. For a couple of reason I wanted to have this version out. I assume, preparing the image packages will take a while for anyone. But everyone wanted to have them so it’s here.

    I hope I find the time to write about “What’s next” in the next hours. But be sure that this bug is now on top of the list. Afterwards I will “try” to add support for simplified Chinese. You know, these characters means nothing to me. But I think these two steps are most important now.

    -Tom

  3. Chris Says:

    Hey there!

    you’re releasing updates real fast, thanks for that :-)

    But now if i try the new indexer, I’m slightly confused.

    which arguments do I have to use!? Cause you said Image and Bild is used for the German Wiki. Or do i have to run the indexer twice!?

    And, in addition to that, I would like to know how to compile the imagegetter under mac os, and which libaries are involved compiling?

    If all these questions will be answered in your upcoming blogentry, consider this comment as non-existend ;-)

    Greetings,

    Chris

  4. Tom Says:

    Chris-

    just a quick note. For the English edition simply use “Image”. On some languages (French, Itialian, German) you have to use the language depended name. For some other (English, Chinese) not. In the last case use “Image” because the parameter is mandatory.

    The ImageGetter is written in C#. It was faster for me to do it in C# than in C++. So maybe it will work using Mono, which is a free (GNU?) .NET-Framework running under Linux and MacOS.

    I’ve used Windows, running using “Parallels” on my Mac.

    -Tom

  5. dionysus623 Says:

    Tom-
    little more then a week ago you told me this: \
    “Dionysus- sure, that is possible. For any database create a “new” two letter language code. Digits should also work but be sure to use only two letter codes. I.E. use “wq” for wikiqoute, put the articles.bin into a subfolder “wb” and add the “language.config” from the directory of tjhe proper language. I.E. use the “language.config” from the “de” folder if you’re using the German “wikiquote”. Access the articles in the web frontend by prefixing “wq:” in front of the articles titles. Please drop me a not if it’s working for you. -Tom\”

    it works perfectly but i have to enter the new language code in the address bar every time after i search. i was wonder if there is a way to make a toggle switch or if there is some other way to search between wikipedia and wikiquote besides inserting wq in the address bar.

  6. Sanford Says:

    Tom:

    Thanks for your reply. With Wiki2Touch 0.52, the Chinese wiki is working very nicely on the iPhone, and the Simplified Chinese thing is just icing on the cake.. so I suppose it can wait if you have higher priorities or is just busy with other things.

    The seg fault, on the other hand, seems likely to affect not just the Chinese wiki but other languages with a similar template as well. I will see if I can make some simplified test cases to further narrow down the problem afterwards.

    On the other hand, I would like to report my results on compiling indexer on Linux. I had made the following changes to get it to compile on my 64bit CentOS box:

    - For every direct use of fpos_t type, (e.g. 100*currentBlockPos) change to fpos_t.__pos (i.e. 100*currentBlockPos.__pos)

    - FILEHEADER.titlePos and indexPos are now of type unsigned long long instead of fpos_t, to use them, a temporary fpos_t needs to be created and its __pos copied into them.

    - Very strangely, after making the changes above, the code will compile but will seg fault at the loop right after “lowering and indexing articles titles” and I had to change SIZEOF_POSITION_INFORMATION to 32 to fix that

    - Changing SIZEOF_POSITION_INFORMATION 32 seems to have affected the actual indexes and therefore the resulting articles.bin cannot be read properly by wikisrvd.

  7. Tom Says:

    Sanford-

    I did a quick check on that template. if I use it on a testpage it’s working fine:

    分類:出生不详 | 在世人物

    Whatever that means. But the template itself gets four parameters or so. So it looks like that the problems arises when parameters are added. I have no idea what the template is doing.

    So please can you give me an article name for which you get the error? Simply post the Chinese character here, that works.

    If you’re chaning the size of “SIZEOF_POSITION_INFORMATION” this will not work inside the wikisrv. The index itself points to the start of a title entry and after SIZEOF_POSITION_INFORMATION bytes the name of the title is expected. That will fail.

    A title record is build like that:

    8 bytes (fpos_t, unsigned long long should be fine): Position a 900k bzip2 block inside the file

    4 bytes (unsigned int): Position of the article itself inside the 900k bzip block

    4 bytes (unsigned int): Length of the article itself

    Hence, SIZEOF_POSITION_INFORMATION is 8 + 4 + 4 = 16.

    Do you think it help if you just #define fpos_t unsigned long long?

    Thanks,

    Tom

  8. Jim123 Says:

    First let me say im on windows xp.

    “1) indexer dewiki-latests-pages-articles.xml.bz2 Bild (use “Image” or anything else for the “en” Wikipedia)”

    Im trying to get the images for the english wikipedia so u say to “use image”. I have no idea what that means or how to do it. plz explain.(step by step)

    For indexing enwiki-latest-pages-articles.xml i just dragged the file over the indexer file and it created articles.bin. Is it something like that? If not that was really easy and that should be one way to do it.
    -thnx

  9. Tom Says:

    Sanford-

    sorry to annoy you. I found a alink which lead to that error in one of the other comment. When I check that link on may Mac its fine:

    /wiki/zh/%E6%A4%8E%E5%90%8D%E7%A2%A7%E6%B5%81

    I assume that is a problem with some of the wprintf() function. If the argument is “%S” and the parameter contains such chars it sometimes breaks. Had that error some times at different places.

    I will copy the Chinese database to my device and check again. This will be nasty to track down.

    -Tom

  10. Tom Says:

    Jim123-

    drag and drop will not work in the latest version. The second parameter is now mandatory. But I will change that soon. In the time between you can simply use the indexer inside the .51 package. It will produces lists for “Image” (which is fine for you) and “Bild” (which will never be found in the English edition.

    Beside the now mandatory second parameter there are no other changes to the newer indexer, so the older one is as good as the newer one.

    -Tom

  11. Achim Says:

    “…no other changes to the newer indexer”, good to know. So I can simply live with the old code in the GUI indexer for now. I’ll upload a new version of the GUI indexer/uploader tonight.

  12. in7ane Says:

    Tom, passing unicode Image arguments to indexer.exe does not seem to work (I am trying be.wikipedia.org and have tried Выява and %D0%92%D1%8B%D1%8F%D0%B2%D0%B0).

    Alternatively, and sorry for the stupid question, where should I place libbz2.a, and how to install it, for make indexer to work (running you r precompiled indexer gives me a Bus error).

  13. Sanford Says:

    #define fpos_t long long does not help, as fgetpos and fsetpos are defined as:
    int fgetpos(FILE *stream, fpos_t *pos);
    int fsetpos(FILE *stream, fpos_t *pos);
    And there will be errors like:
    indexer.cpp:333: error: cannot convert ‘long long int*’ to ‘fpos_t*’ for argument ‘2’ to ‘int fgetpos(FILE*, fpos_t*)’

    But now I know what’s wrong with my indexer.cpp after your explanation… since my Linux box is 64 bit, instead of having 4 bytes int I am having 8 bytes int. So it worked when I change SIZEOF_POSITION_INFORMATION to 24 and it indexes happily. It proves that if I compile my code under 32 bit linux it will work perfectly.

    As a further proof I change “int” (64 bit) into “short” (32 bit) inside while ((help-articlesTitles) < (int) read) it worked pass the index stage with #define SIZEOF_POSITION_INFORMATION 16 but fails at the sort stage. I think I will need to find a better way to make int 32 bit on my machine.

    For the Template:Bd problem, please see below a list of pages that all share the problem
    http://zh.wikipedia.org/wiki/Special:Whatlinkshere/Template:Bd

    Template:Bd is a template for displaying birthdates on zhwiki. Its usage is as follows:
    {{bd|b1|b2|d1|d2|index}}
    where,
    b1 is a year of birth. If b1 is not empty, the article is added to the category “[[Category:{{{b1}}}出生]]”(born on b1) else add category 出生不详 (unknown birth date)
    b2 is date of birth. If b2 is a valid date (in format X月 or X月Y日 [English: Month X, or Month X Day Y]) then link to b2, else display b2
    d1 and d2 are similar but they are year and date of death instead.
    If d1 and d2 are not inputted, the person is not dead and therefore added into category 在世人物 (still alive)
    index is just a sort index for the categories.

  14. Tom Says:

    Sanford-

    ok, that make perfectly sense. On 64 bit machine int is to long. I’m glad that this is at least solved and you’re able to index using your Linux machine.

    Thanks for further explanation. The Bd template is working on Mac OS so this bug gonna get stuff to find. Maybe it’s a memory issue, but may a compiler one. That happens from time to time. I hope it’s the memory issue.

    One of the complete articles you’ve listed is one of a Chinese singer (woman). Hey, looks like I’m going to learn it. Ahh, just kidding, name it “guess it”.

    -Tom

  15. Tom Says:

    in7an-

    I’m sure that is wokring on MacOS. The shell works perfectly using unicode characters. But adding support for %-style ecnoding is easy. Will do that in the next release.

    Using the precompile libbz2.a will not help you. I assume you get the bus error because you’re running a PPC machine. But it’s Intel code. And the precompile library, too.

    But compiling the bz2lib is easy. Just download the sources from the internet and execute make. Wroked for me from the stretch. The makefile generates everything, including the static library,

    -Tom

  16. Sanford Says:

    About the indexer: yes I have borrowed a 32bit Linux box and the indexer is now finally working. But the compiled binary does not work on my 64bit Linux system, so there’s still work to be done.

    About the seg fault: to my surprise it worked now after removing the cache directory in /var/root/Media/Wikipedia/zh!!

  17. Tom Says:

    Sanford-

    great news. Yes, I did changes to the template processing in 0.50. This solved a lot of issues. But the corrupt templates were still in the cache. May I should add an auto delete of the cache if such changes are made.

    Anyway, this is working now. Glad to hear that. So back to simplified Chinese, and a lot of other more minor stuff.

    Thanks for the feedback.

    -Tom

  18. Achim Says:

    Hi Tom,
    some links seem to be broken with 0.52. Example: Open the page on PNG in the German edition. Try to follow one of the links in the first paragraph, like “Rastergrafiken” or “GIF”. Although the pages are there (via main page), I always get the “article not found” error.

    (Thought I’d better post it here, might be an issue for others too)

  19. Tom Says:

    Achim-

    Works fine for me. But I’ve never opend the article “PNG”.

    I’ve changed the way linking internally works (from 0.50 to 0.51). So do me a favor and reload the page. I assume this will fix it.

    Or clear the cache of your Mobile Safary.

    -Tom

  20. Auge Says:

    hi,
    whre can i find the program pack.exe for windows. I need it for my images to pack them all together i think???

    thanks

  21. Achim Says:

    It’s inside http://wiki2touch.googlecode.com/files/Wiki2Touch_052.zip

  22. JoPhone Says:

    Hey,

    I just finished the packing process of the pictures for the German version of Wikipedia… The ImageGetter downloaded ~360.000 pictures, now the packer only packed 244.736, why is that? It was stopping stating: “Added so far: 244.736″. Can I continue anyhow to have it add the missing pictures to the images.bin?

    Thx for your help, I appreciate your project very much, great job, keep up the fantastic work!

    Regards

  23. Tom Says:

    JoPhone-

    I’ve posted a comment in the forums. Look here

    http://wiki2touch.ipodhelp.de/viewtopic.php?pid=99#p99

    -Tom

  24. Achim Says:

    JoPhone, I found a bug in the packer, see the thread that Tom mentioned:
    http://wiki2touch.ipodhelp.de/viewtopic.php?pid=108#p108

  25. in7ane Says:

    Sanford, or anyone else who has an idea about how Installer sources work, I am considering having a go at hosting the English Wikipedia dump through Installer, however it is not practical to have it in a zip that is downloaded and extracted taking up twice as much space. Instead I am trying to run a shell script that runs a curl command to download the uncompressed files.

    Here is the problem: the shell script runs fine from vt100, but fails on:

    Exec
    /bin/sh ~/Media/Wikipedia/be/a.sh

    Any ideas why, or how to fix this? Or is it that Installer just will not allow this?

    The shell script runs (the /be/ directory is there):

    curl -o ~/Media/Wikipedia/be/language.config http://www.in7ane.com/iphone/wiki/Belarusian/language.config
    curl -o ~/Media/Wikipedia/be/articles.bin http://www.in7ane.com/iphone/wiki/Belarusian/articles.bin

    A copy of the installer source (sorry for the long post) if someone wants to try it:

    info

    category
    in7ane.com Source

    name
    in7ane.com Secondary

    description
    Test source

    maintainer
    in7ane

    url
    http://www.in7ane.com

    packages

    bundleIdentifier
    com.in7anemirror.wiki.be

    name
    Wikipedia - Belarusian

    version
    2008.02.08v2

    location
    http://www.in7ane.com/iphone/wiki/Belarusian/sh.zip

    size
    1448

    description
    The Belarusian Wikipedia dump (semi-proper language.config)

    category
    Wiki Apps

    url
    http://www.in7ane.com

    scripts

    install

    Confirm
    This is actually 5MB, continue?
    Yes
    No

    CopyPath
    sh/
    ~/Media/Wikipedia/be

    SetStatus
    Downloading…

    Exec
    /bin/sh ~/Media/Wikipedia/be/a.sh

    RemovePath
    ~/Media/Wikipedia/be/a.sh

    RemovePath
    ~/Media/Wikipedia/be/i.sh

    uninstall

    RemovePath
    ~/Media/Wikipedia/be/language.config

    RemovePath
    ~/Media/Wikipedia/be/articles.bin

  26. Tom Says:

    in7ane-

    I think that is a great approach. You’re right, dealing with 2 GB is not easy. Ask me (images).

    This blog commets section is a good tool for having such a discussion. So let move over to the forums (http://wiki2touch.ipodhelp.de/) for further discussion.

    -Tom

  27. Achim Says:

    Tom-
    don’t you think it would be better to have one central source for information and one central spot for discussions?
    At the moment, there are user comments here at your blog, at the forum, and at the Wiki pages. Quite confusing.
    Maybe just close down user comments at the blog and at the Wiki, and direct everybody to the forum?

Leave a Reply