Wiki2Touch 0.52
And once again a new version.
This deals with now finilized image support. I will describe how it works and how do you can create image packs over at Google Code (or in the forum). The tools to build the packages are out, too.
I was able to download app. 332,000 thumbnail images from Wikipedia (German Edition). There were smaller than the original once (120px instead of 180px). 2.85 GB!
But there ware way to get it smaller. I’m now using a version which “only” needs 1.5 GB. I will write another Blog entry about that in a couple of hours.
März 3rd, 2008 at 15:47
My source had been updated to reflect the new version
The url is still: http://168weedon.com/i
Feel free to use it until you have submitted it to some community source, bandwidth is not a big concern for just an xml file.
Furthermore I have tested the app and it still had the seg fault (template resolution) I reported earlier. With some more testing I have narrowed the problem to the template:Bd in zhwiki
http://zh.wikipedia.org/w/index.php?title=Template:Bd&action=edit
März 3rd, 2008 at 15:56
Sanford-
thanks for the fast reaction and for poviding the installer package.
Yes, I know, the bug is still in there simply because I haven’t looked at it. For a couple of reason I wanted to have this version out. I assume, preparing the image packages will take a while for anyone. But everyone wanted to have them so it’s here.
I hope I find the time to write about “What’s next” in the next hours. But be sure that this bug is now on top of the list. Afterwards I will “try” to add support for simplified Chinese. You know, these characters means nothing to me. But I think these two steps are most important now.
-Tom
März 3rd, 2008 at 17:33
Hey there!
you’re releasing updates real fast, thanks for that
But now if i try the new indexer, I’m slightly confused.
which arguments do I have to use!? Cause you said Image and Bild is used for the German Wiki. Or do i have to run the indexer twice!?
And, in addition to that, I would like to know how to compile the imagegetter under mac os, and which libaries are involved compiling?
If all these questions will be answered in your upcoming blogentry, consider this comment as non-existend
Greetings,
Chris
März 3rd, 2008 at 18:20
Chris-
just a quick note. For the English edition simply use “Image”. On some languages (French, Itialian, German) you have to use the language depended name. For some other (English, Chinese) not. In the last case use “Image” because the parameter is mandatory.
The ImageGetter is written in C#. It was faster for me to do it in C# than in C++. So maybe it will work using Mono, which is a free (GNU?) .NET-Framework running under Linux and MacOS.
I’ve used Windows, running using “Parallels” on my Mac.
-Tom
März 3rd, 2008 at 18:55
Tom-
little more then a week ago you told me this: \
“Dionysus- sure, that is possible. For any database create a “new” two letter language code. Digits should also work but be sure to use only two letter codes. I.E. use “wq” for wikiqoute, put the articles.bin into a subfolder “wb” and add the “language.config” from the directory of tjhe proper language. I.E. use the “language.config” from the “de” folder if you’re using the German “wikiquote”. Access the articles in the web frontend by prefixing “wq:” in front of the articles titles. Please drop me a not if it’s working for you. -Tom\”
it works perfectly but i have to enter the new language code in the address bar every time after i search. i was wonder if there is a way to make a toggle switch or if there is some other way to search between wikipedia and wikiquote besides inserting wq in the address bar.
März 3rd, 2008 at 19:26
Tom:
Thanks for your reply. With Wiki2Touch 0.52, the Chinese wiki is working very nicely on the iPhone, and the Simplified Chinese thing is just icing on the cake.. so I suppose it can wait if you have higher priorities or is just busy with other things.
The seg fault, on the other hand, seems likely to affect not just the Chinese wiki but other languages with a similar template as well. I will see if I can make some simplified test cases to further narrow down the problem afterwards.
On the other hand, I would like to report my results on compiling indexer on Linux. I had made the following changes to get it to compile on my 64bit CentOS box:
- For every direct use of fpos_t type, (e.g. 100*currentBlockPos) change to fpos_t.__pos (i.e. 100*currentBlockPos.__pos)
- FILEHEADER.titlePos and indexPos are now of type unsigned long long instead of fpos_t, to use them, a temporary fpos_t needs to be created and its __pos copied into them.
- Very strangely, after making the changes above, the code will compile but will seg fault at the loop right after “lowering and indexing articles titles” and I had to change SIZEOF_POSITION_INFORMATION to 32 to fix that
- Changing SIZEOF_POSITION_INFORMATION 32 seems to have affected the actual indexes and therefore the resulting articles.bin cannot be read properly by wikisrvd.
März 3rd, 2008 at 20:26
Sanford-
I did a quick check on that template. if I use it on a testpage it’s working fine:
分類:出生不详 | 在世人物
Whatever that means. But the template itself gets four parameters or so. So it looks like that the problems arises when parameters are added. I have no idea what the template is doing.
So please can you give me an article name for which you get the error? Simply post the Chinese character here, that works.
—
If you’re chaning the size of “SIZEOF_POSITION_INFORMATION” this will not work inside the wikisrv. The index itself points to the start of a title entry and after SIZEOF_POSITION_INFORMATION bytes the name of the title is expected. That will fail.
A title record is build like that:
8 bytes (fpos_t, unsigned long long should be fine): Position a 900k bzip2 block inside the file
4 bytes (unsigned int): Position of the article itself inside the 900k bzip block
4 bytes (unsigned int): Length of the article itself
Hence, SIZEOF_POSITION_INFORMATION is 8 + 4 + 4 = 16.
Do you think it help if you just #define fpos_t unsigned long long?
Thanks,
Tom
März 3rd, 2008 at 21:42
First let me say im on windows xp.
“1) indexer dewiki-latests-pages-articles.xml.bz2 Bild (use “Image” or anything else for the “en” Wikipedia)”
Im trying to get the images for the english wikipedia so u say to “use image”. I have no idea what that means or how to do it. plz explain.(step by step)
For indexing enwiki-latest-pages-articles.xml i just dragged the file over the indexer file and it created articles.bin. Is it something like that? If not that was really easy and that should be one way to do it.
-thnx
März 3rd, 2008 at 21:46
Sanford-
sorry to annoy you. I found a alink which lead to that error in one of the other comment. When I check that link on may Mac its fine:
/wiki/zh/%E6%A4%8E%E5%90%8D%E7%A2%A7%E6%B5%81
I assume that is a problem with some of the wprintf() function. If the argument is “%S” and the parameter contains such chars it sometimes breaks. Had that error some times at different places.
I will copy the Chinese database to my device and check again. This will be nasty to track down.
-Tom
März 3rd, 2008 at 21:49
Jim123-
drag and drop will not work in the latest version. The second parameter is now mandatory. But I will change that soon. In the time between you can simply use the indexer inside the .51 package. It will produces lists for “Image” (which is fine for you) and “Bild” (which will never be found in the English edition.
Beside the now mandatory second parameter there are no other changes to the newer indexer, so the older one is as good as the newer one.
-Tom
März 3rd, 2008 at 22:04
“…no other changes to the newer indexer”, good to know. So I can simply live with the old code in the GUI indexer for now. I’ll upload a new version of the GUI indexer/uploader tonight.
März 4th, 2008 at 02:22
Tom, passing unicode Image arguments to indexer.exe does not seem to work (I am trying be.wikipedia.org and have tried Выява and %D0%92%D1%8B%D1%8F%D0%B2%D0%B0).
Alternatively, and sorry for the stupid question, where should I place libbz2.a, and how to install it, for make indexer to work (running you r precompiled indexer gives me a Bus error).
März 4th, 2008 at 02:26
#define fpos_t long long does not help, as fgetpos and fsetpos are defined as:
int fgetpos(FILE *stream, fpos_t *pos);
int fsetpos(FILE *stream, fpos_t *pos);
And there will be errors like:
indexer.cpp:333: error: cannot convert ‘long long int*’ to ‘fpos_t*’ for argument ‘2’ to ‘int fgetpos(FILE*, fpos_t*)’
But now I know what’s wrong with my indexer.cpp after your explanation… since my Linux box is 64 bit, instead of having 4 bytes int I am having 8 bytes int. So it worked when I change SIZEOF_POSITION_INFORMATION to 24 and it indexes happily. It proves that if I compile my code under 32 bit linux it will work perfectly.
As a further proof I change “int” (64 bit) into “short” (32 bit) inside while ((help-articlesTitles) < (int) read) it worked pass the index stage with #define SIZEOF_POSITION_INFORMATION 16 but fails at the sort stage. I think I will need to find a better way to make int 32 bit on my machine.
For the Template:Bd problem, please see below a list of pages that all share the problem
http://zh.wikipedia.org/wiki/Special:Whatlinkshere/Template:Bd
Template:Bd is a template for displaying birthdates on zhwiki. Its usage is as follows:
{{bd|b1|b2|d1|d2|index}}
where,
b1 is a year of birth. If b1 is not empty, the article is added to the category “[[Category:{{{b1}}}出生]]”(born on b1) else add category 出生不详 (unknown birth date)
b2 is date of birth. If b2 is a valid date (in format X月 or X月Y日 [English: Month X, or Month X Day Y]) then link to b2, else display b2
d1 and d2 are similar but they are year and date of death instead.
If d1 and d2 are not inputted, the person is not dead and therefore added into category 在世人物 (still alive)
index is just a sort index for the categories.
März 4th, 2008 at 07:52
Sanford-
ok, that make perfectly sense. On 64 bit machine int is to long. I’m glad that this is at least solved and you’re able to index using your Linux machine.
Thanks for further explanation. The Bd template is working on Mac OS so this bug gonna get stuff to find. Maybe it’s a memory issue, but may a compiler one. That happens from time to time. I hope it’s the memory issue.
One of the complete articles you’ve listed is one of a Chinese singer (woman). Hey, looks like I’m going to learn it. Ahh, just kidding, name it “guess it”.
-Tom
März 4th, 2008 at 07:55
in7an-
I’m sure that is wokring on MacOS. The shell works perfectly using unicode characters. But adding support for %-style ecnoding is easy. Will do that in the next release.
Using the precompile libbz2.a will not help you. I assume you get the bus error because you’re running a PPC machine. But it’s Intel code. And the precompile library, too.
But compiling the bz2lib is easy. Just download the sources from the internet and execute make. Wroked for me from the stretch. The makefile generates everything, including the static library,
-Tom
März 4th, 2008 at 08:33
About the indexer: yes I have borrowed a 32bit Linux box and the indexer is now finally working. But the compiled binary does not work on my 64bit Linux system, so there’s still work to be done.
About the seg fault: to my surprise it worked now after removing the cache directory in /var/root/Media/Wikipedia/zh!!
März 4th, 2008 at 09:07
Sanford-
great news. Yes, I did changes to the template processing in 0.50. This solved a lot of issues. But the corrupt templates were still in the cache. May I should add an auto delete of the cache if such changes are made.
Anyway, this is working now. Glad to hear that. So back to simplified Chinese, and a lot of other more minor stuff.
Thanks for the feedback.
-Tom
März 4th, 2008 at 14:40
Hi Tom,
some links seem to be broken with 0.52. Example: Open the page on PNG in the German edition. Try to follow one of the links in the first paragraph, like “Rastergrafiken” or “GIF”. Although the pages are there (via main page), I always get the “article not found” error.
(Thought I’d better post it here, might be an issue for others too)
März 4th, 2008 at 16:40
Achim-
Works fine for me. But I’ve never opend the article “PNG”.
I’ve changed the way linking internally works (from 0.50 to 0.51). So do me a favor and reload the page. I assume this will fix it.
Or clear the cache of your Mobile Safary.
-Tom
März 4th, 2008 at 17:21
hi,
whre can i find the program pack.exe for windows. I need it for my images to pack them all together i think???
thanks
März 4th, 2008 at 18:30
It’s inside http://wiki2touch.googlecode.com/files/Wiki2Touch_052.zip
März 4th, 2008 at 23:05
Hey,
I just finished the packing process of the pictures for the German version of Wikipedia… The ImageGetter downloaded ~360.000 pictures, now the packer only packed 244.736, why is that? It was stopping stating: “Added so far: 244.736″. Can I continue anyhow to have it add the missing pictures to the images.bin?
Thx for your help, I appreciate your project very much, great job, keep up the fantastic work!
Regards
März 5th, 2008 at 09:30
JoPhone-
I’ve posted a comment in the forums. Look here
http://wiki2touch.ipodhelp.de/viewtopic.php?pid=99#p99
-Tom
März 5th, 2008 at 22:25
JoPhone, I found a bug in the packer, see the thread that Tom mentioned:
http://wiki2touch.ipodhelp.de/viewtopic.php?pid=108#p108
März 6th, 2008 at 06:27
Sanford, or anyone else who has an idea about how Installer sources work, I am considering having a go at hosting the English Wikipedia dump through Installer, however it is not practical to have it in a zip that is downloaded and extracted taking up twice as much space. Instead I am trying to run a shell script that runs a curl command to download the uncompressed files.
Here is the problem: the shell script runs fine from vt100, but fails on:
Exec
/bin/sh ~/Media/Wikipedia/be/a.sh
Any ideas why, or how to fix this? Or is it that Installer just will not allow this?
The shell script runs (the /be/ directory is there):
curl -o ~/Media/Wikipedia/be/language.config http://www.in7ane.com/iphone/wiki/Belarusian/language.config
curl -o ~/Media/Wikipedia/be/articles.bin http://www.in7ane.com/iphone/wiki/Belarusian/articles.bin
A copy of the installer source (sorry for the long post) if someone wants to try it:
info
category
in7ane.com Source
name
in7ane.com Secondary
description
Test source
maintainer
in7ane
url
http://www.in7ane.com
packages
bundleIdentifier
com.in7anemirror.wiki.be
name
Wikipedia - Belarusian
version
2008.02.08v2
location
http://www.in7ane.com/iphone/wiki/Belarusian/sh.zip
size
1448
description
The Belarusian Wikipedia dump (semi-proper language.config)
category
Wiki Apps
url
http://www.in7ane.com
scripts
install
Confirm
This is actually 5MB, continue?
Yes
No
CopyPath
sh/
~/Media/Wikipedia/be
SetStatus
Downloading…
Exec
/bin/sh ~/Media/Wikipedia/be/a.sh
RemovePath
~/Media/Wikipedia/be/a.sh
RemovePath
~/Media/Wikipedia/be/i.sh
uninstall
RemovePath
~/Media/Wikipedia/be/language.config
RemovePath
~/Media/Wikipedia/be/articles.bin
März 6th, 2008 at 08:03
in7ane-
I think that is a great approach. You’re right, dealing with 2 GB is not easy. Ask me (images).
This blog commets section is a good tool for having such a discussion. So let move over to the forums (http://wiki2touch.ipodhelp.de/) for further discussion.
-Tom
März 6th, 2008 at 15:45
Tom-
don’t you think it would be better to have one central source for information and one central spot for discussions?
At the moment, there are user comments here at your blog, at the forum, and at the Wiki pages. Quite confusing.
Maybe just close down user comments at the blog and at the Wiki, and direct everybody to the forum?