evan_tech

Previous Entry Share Next Entry
03:15 pm, 19 Feb 04

microsoft pl talk

Guest lecturer today was Daniel Weise, who talked about Microsoft's attempts at code quality.

The basic gist was this:
They have developed tools that let them plug in modules into their compilers. These tools let them add special attributes to their declarations that allow better checking. For example,
strncpy(char * __ecount(n) dest, const char *src, size_t n);
Then the compiler can verify that dest really has at least n bytes available. (The example he used was some horrible MS Office function, named, uh, MsoWzCchCopy. The whole class snickered at that.)

After Code Red and NIMDA, security became a high and visible concern, and so they actually deployed this. He talked a lot about how programmers are reluctant to use new tools, but that attributes like these are apparently used throughout Office, and are gradually being added to Windows itself.

The compiler module system lets them add specific flags depending on the project; another example he gave was tagging pointers as kernel or user space so that you don't mix the two. (And another was managing HANDLEs allocated by Windows so you can be sure to free them.) I was impressed to hear that they now have people working full time just to improve their languages, albeit in small ways.

Overall takeaway:
  • They're much more concerned with software engineering (how do you convince programmers to do this?) than theory (their system doesn't work for stuff like arrays of strings, or structs that carry a string and length field). Apparently they've had most of this technology since the mid-nineties but it takes so long to work with the 18-million-line Office.
  • The compiler extension idea is cool, too—desktops are powerful enough to do more intensive computations on every compile, and you could imagine a system that was more or less strict depending on what level of development you're at (like optimization levels, but for checking).
  • OCaml has its meta-language stuff but you still can't add stuff that violates ML's safety; using C as your base language and then adding checks lets you extend an unsafe language with more safety.


Regarding the emphasis on theory: Cyclone guarantees memory safety (and handles the above struct case, I think). The Microsoft project's observations:
  • Doesn't even pretend to guarantee safety, but finds and prevents a significant class of bugs.
  • Even rather stupid logic works for 90% of code (sometimes you need to tweak things to make them pass the checks).
  • Someone asked, "couldn't you just use STL strings"? And he said, basically: yes, but they're stuck in C-land. C++-style smart pointers would solve the HANDLE allocation problem, too. (Similarly, any higher-level language, like C#. But you still can't use that for kernels.)


At one point, he asked, "Word 6 was our second most buggy product. Guess what our first was?" Some people mumbled some guesses, then he and I said simultaneously: "ME." Hah. I never even used ME but I knew that it sucked. But at least they're making attempts at improving, while they're still finding embarrassing shit in Linux.