Development of a new indexer

Guess what: Since a couple of hours I’m a proud owner of an iPhone. Look on eBay if you need an iPod Touch/16GB.

As my model has only 8GB (the new 16GB iPhone are released today) I will enhance the indexer. It will also repackage the data files so that these should be a lot smaller. And further I hope to have everything in one data file. Copying thousands of files is not very comfortable.

This time this will be one independent program: Splitting (if necessary at all), repackaging, indexing. Maybe downloading the source. And from the start for Mac OS X and Windows.

Stay tuned.

21 Responses to “Development of a new indexer”

  1. Dexter Says:

    great, I look forward to windows indexer application
    btw, could you make a guide how to get working wiki in different language than german or english ? I would like to have Czech wiki on my iPod because of it’s small size but I still dont know how to get it working
    and is there a way how to have more language wiki in one app ?

  2. Tom Says:

    Dexter-

    sure I will. That’s no problem because the server is out of the box able to handle more than one language. At the same time! For instance, I have the English and German edition on my device at the same time. Even more, and the end of an article there are usual links to the same article in other languages. And that also works: Every link to an installed language is displayed.

    Czech will be looking great for me I assume. Lots of slashed up and below the letters :)

    -Tom

  3. Henry S. Says:

    Tom,

    Enjoy the new iPhone!! and looking forward to the new Wiki program.

  4. Sanford Says:

    Okay, I converted the zh wiki and configured the thing to work. However it does not let me go (using the Go button) to any page with a unicode title!!
    Also some pages are behavioring very strangely see http://img136.imageshack.us/img136/7968/img0031ky9.jpg (refer to http://zh.wikipedia.org/wiki/%E6%B8%AF%E9%90%B5 for the proper rendering)

  5. Tom Says:

    <p>Sanford-</p>
    <p>Wow, that looks hmm, unusual. What is ZH for a language? Chinese?</p>
    <p>Anyway, the “height” stuff is a error of the wiki markup parser. There are a couple of mostly minor rendering issues.</p>
    <p>For the unicode problem I will address that. Not sure why it is so. Either searching, sorting or so might be the problem. But that will be hard to debug for me.</p>
    <p>-Tom</p>

  6. michi Says:

    great!

    what i’d like to see in a future version is the implementation of an ajax style search. you know like it’s on the wiki mainpage already : it completes your already typed in characters to the words you’re searching for..
    is that possible?

  7. Tom Says:

    Michi-

    Sure, AJAX is possible. I myself thought about that already. But actually, I’m working on the new indexer, which is nearly done. Now the integration into the wiki-server must be done.

    The indexer works on the PC as well as on the Mac. It produces three files, the compressed data, the titles and the index into the titles. So handling is much easier than before.

    -Tom

  8. Fil Says:

    Tom-

    A suggestion/request for an image container format: perhaps something along the lines of an SQLite database with (image_name, image_blob) indexed by name, so that thousands of images don’t have to be individually copied across to /Images/. Preferably with the ability to have several such containers usable at once, eg. Countries.dat, Cities.dat etc.

  9. Sanford Says:

    Yes, that’s Chinese. The beauty of using the Chinese wiki db is that it’s small enough (200MB) yet perfectly enough for everyday use for Chinese readers. Anyway looking forward to seeing your project in Sourceforge / Google code!

  10. Sanford Says:

    Well I tried it again today, go (using the Go button) to any page with a unicode title does work for me.. hmm very nice!

  11. gulibamba Says:

    A functional one data file installation would be awesome!
    But in my opinion….the “winner” of the race between the two offline wiki-apps (yours and the one patrick collison made) will be the one who at first brings out a “one klick solution” to the installer app.
    I know it’s arduous because of the hugh dump file but people are stupid and lazzzzyyyy :-)

    Anyways your one data file idea is the right way….keep on the good work….great tool!

    Thumbs up!

  12. onesilentlight Says:

    I think the Tom’s wikiapp is the winner hands down. Simply because it preserves the formating of the original articles, and does not omit data. Thanks Tom!!

    I’ve finally got it working on my ipod, and it’s great. I wonder why wikipedia doesn’t let you get the images from a dump somewhere. I mean, they are all copyright free images on the wiki. Maybe you could make a program that would spider though the wiki and download the files and give them the proper name to use in your offline wiki, and let you compress them all to a desired level (very high, to save alot of space and just have blurry images, or low to use alot of space and have very sharp images). I would rather have blurry, highly compressed images than no images at all. Thanks so much!!

  13. John Says:

    hey tom, i’ve got a question concerning your wiki server application. i’m truly loving it but there appears to be one problem when using both your wiki server and audioscrobbler.

    it seems that the wiki server makes the iphone think it is online. and since this is the case audioscrobbler (which, in offline mode, stores the scrobbled titles and doesn’t send them to the server until connected to the internet again) tries to do its job and does actually send scrobbled titles. but they end up, well, nowhere while audioscrobbler thinks it’s all great.

    are you aware of this issue and is there perhaps a possibility to solve it?

  14. John Says:

    oh, what i meant by audioscrobbler was mobilescrobbler which is audioscrobbler’s equivalent for the iphone.

  15. Tom Says:

    John-

    The “only” thing the WikiServer is doing is installing a http server on port 80 listening for request from anywhere. So don’t know why mobilescrobbler thinks the device is online. But you can try some tests:

    - Start the WikiServer on a different port. Look around in this blog I explained it somewhere how to do it.
    - Disable WikiServer and install Apache. Apache is doing more or less the same: Listening to an IP port. Don’t forget to uninstall Apache at the end.

    Hope that helps a little bit.

    -Tom

  16. bwd Says:

    I am using Apache on port 80, WikiServer on 8080 and MobileScrobbler, and they all seem to be working fine together.

  17. John Says:

    hey again tom,
    thank you for the reply. however, now that i tried again, i wasn’t able to make that issue appear again: the scrobbler just kept his scrobbles as it is intended to when using the wiki server.

    i must say, i’m quite bamboozled and maybe i was too fast to blame it on the wiki server but i just couldn’t and actually still can’t imagine what else to consider a possible cause of this.

    thanks anyway, i’ll keep an eye on it.
    two thumbs up for you, keep it up!

  18. moeshou Says:

    just wondering, are you going to make the application compatible with 1.1.3? so far its not working for me..

  19. Ganxta91 Says:

    i got it working on 1.1.3, but i did many things ti get it work.
    i dont know what was responsible to make it work, but i think you have to type “chown -hR mobile /var/mobile” in the terminal (of winscp)
    i hope, it was the right command.

  20. Tom Says:

    A short update:

    The indexer is ready and working as long as the function the get the article back out of the file. It’s one huge file containing everything. I can’t say if that is fast enough on the devices. I hope. If not I have to split it into two parts, the articles itself and the index.

    Beside indexing the archive is repackaged, the unnecessary xml metadata is stripped and the xml coding inside the articles is exchange (i.e. > will become > and so on).

    For the german issue (1.03 GB + app. 40 MB for the index) the result is 873 MB (total). App. 170 MB less. Not bad.

    I expect to have it in the reader during the weekend.

    -Tom

Leave a Reply