Previous Entry Share Next Entry
12:54 am, 4 Oct 04

wikipedia / databases

I'm looking to play with the Wikipedia data, but first I need to load their database dumps. I tried the naive way (tell FreeBSD to install mysql, then run their database dump) and it just sat there, grinding. Thankfully, I know a MySQL expert, and he told me enough to point me in the right direction. (The main problem was that the wikipedia database dump wanted InnoDB, which needs more configuring before it'll work.)

The bottleneck CPU when un-bzipping the data (.39gb compressed, 1.4gb uncompressed), then disk speed. systat -vmstat indicates only 10mb/sec on my RAID, which feels low, but I also really don't want to fight with it right now. It took under a half hour to run, and now it's building the index (/*!40000 ALTER TABLE cur ENABLE KEYS */).

It's weird struggling with a gigabyte of data now; it feels like such a small amount of data compared to the stuff we deal with at work.