Evan Martin (evan) wrote in evan_tech,
Evan Martin

optimization, end of the c class

Think of Psyco as a kind of just-in-time (JIT) compiler, a little bit like Java's, that emit machine code on the fly instead of interpreting your Python program step by step. The difference is that Psyco writes several version of the same blocks (a block is a bit of a function), which are optimized by being specialized to some kinds of variables (a "kind" can mean a type, but it is more general). The result is that your unmodified Python programs run faster.
Last day of my C safety class featured student projects. There were only two people officially signed up for the class, so only two presentations.

One was a system for preventing %n format-string attacks. They used CIL to rewrite printf and friends into __push_whitelist(x) for all arguments x that were int pointers, and then the actual print into a function that verifies each %n would go into a whitelisted address and then calls the underlying print. (And pop afterwards.)
Takeaway: This is actually pretty good. It even works with va_lists (the top call to whatever is generating the varargs gets rewritten if the varargs eventually get passed to a format-string-using function), something other systems don't do. His performance numbers were pretty good, too. (Any printf that uses a static format string doesn't need any processing, nor does anything that doesn't use int* arguments: with an empty whitelist, any %n in the format string is rejected.)

And CIL sounds really awesome: you just set GCC in your makefile to "cilly --load-my-module" and it hands your module an abstract syntax tree. It even has a cute hack to transparently handle whole-program analysis (which is required to do the above varargs handling) by just putting source code in the .o files and then actually doing the compilation in the linking step.

Someone pointed out an amusing trick: you could subvert this by something like printf(fmt, (int*)(fmt+x)); where format included enough bytes followed by a %n such that the length would insert a "%n" later into the format string at runtime. But of course nobody is going to write code that does that.

Really, it seems more useful to me to just not allow %n. Does anybody actually use this? It would seem you could transform any code like
printf("foo %n bar", p);
*p = printf("foo "); printf(" bar");
and just permanently ban %n.

The other guy was playing with gprof stuff but didn't have any major results. But:
ld --wrap is neat! I didn't know it existed. It transforms all calls to a function f into calls to __wrap_f, and then renames f itself to __real_f, letting you easily wrap library functions.
He wanted to intercept the gprof hooks, which isn't hard with the above, except that the __real_whatever functions of gprof look back up the call stack to record where you were, so his __wrap_whatever functions would have screwed that up. He said: GCC has no method for tail calls, but asm ("leave\r\n jmp __real_whatever"); (er, I don't really know the right syntax for asm stuff, sorry if i got that wrong) at the end of his wrapper functions did the trick for x86.

  • blog moved

    As described elsewhere, I've quit LiveJournal. If you're interested in my continuing posts, you should look at one of these (each contains feed…

  • dremel

    They published a paper on Dremel, my favorite previously-unpublished tool from the Google toolchest. Greg Linden discusses it: "[...] it is capable…

  • treemaps

    I finally wrote up my recent adventures in treemapping, complete with nifty clickable visualizations.

  • Post a new comment


    default userpic
    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.