December 5th, 2005

  • evan

reverse engineering -- across lite's .puz format

I spent a few hours today reverse engineering the ".puz" crossword file format. It's used by a popular java applet used by, for example, the New York Times. Reverse engineering can be fun: it's a mixture of patience and creativity. You mix some knowledge on the bit level (byte order, file offsets) and on the higher level, looking for patterns within examples and then patterns across examples. It's also about putting yourself in the creator's position: did they encode strings as length+data or null-terminated? If they have shorts/longs, are they in network order or Windows order, how can you tell (usually more information in the lower bits), and why would they have chosen that? What information does the file need to contain, and how would you have gone about representing it?

I spent quite a bit of time on some mysterious patterns within the bytes in the headers of some files I had until I looked at a new one and saw that region containing random text. I now suspect it's either random memory garbage (they forgot to initialize that part of the struct to zero) or perhaps some sort of cut'n'paste buffer -- either way, a big waste of time. Oh well.

For future reference, then, here's what I have:
Collapse )