Evan Martin (evan) wrote in evan_tech,
Evan Martin

wikipedia / databases

I'm looking to play with the Wikipedia data, but first I need to load their database dumps. I tried the naive way (tell FreeBSD to install mysql, then run their database dump) and it just sat there, grinding. Thankfully, I know a MySQL expert, and he told me enough to point me in the right direction. (The main problem was that the wikipedia database dump wanted InnoDB, which needs more configuring before it'll work.)

The bottleneck CPU when un-bzipping the data (.39gb compressed, 1.4gb uncompressed), then disk speed. systat -vmstat indicates only 10mb/sec on my RAID, which feels low, but I also really don't want to fight with it right now. It took under a half hour to run, and now it's building the index (/*!40000 ALTER TABLE cur ENABLE KEYS */).

It's weird struggling with a gigabyte of data now; it feels like such a small amount of data compared to the stuff we deal with at work.

  • münchen

    On that note: I'm living in Munich for the next week plus a few days. Do I know anyone around here? (PS: The LJ → PubSubHubbub → Reader…

  • deb/rpm diffing tools

    Dear Linux hackers, Chrome tends to push minor updates (often security) pretty frequently. We'd like to operate as a good member of the Linux…

  • emacs

    I've been using vim for a very long time -- over ten years -- but over those years I've envied more and more the way emacs integrates other software.…

  • Post a new comment


    default userpic
    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.