evan_tech

Previous Entry Share Next Entry
09:37 am, 17 Nov 05

subtypes

Some programming languages that have pointer-like types support a "notnull" qualifier. The semantics are pretty easy to imagine: C could allow declarations like
int strlen(const char * notnull str);
where the compiler would complain if you passed a possibly-null string to strlen. This would be especially nice considering the current behavior is just to segfault.
(If that declaration confuses you, see footnote*.)

This simple example is a nice place to start thinking about subtypes. Most OO programmers have a conception of what these are (despite generally having them confused with subclasses), but it's surprisingly difficult to keep things straight. So, two questions:
  1. Is const char * a subtype of char *?
  2. Is const char * notnull a subtype of const char *?
When we say A is a subtype of B, we mean that we can pass an A where we expect a B. This gets confusing because A is a subtype of B, but being a subtype generally means it has a superset of features: an HTTPConnection is a subtype of a TCPConnection, so you could pass either to connect() (which works with TCPConnections) but you can only pass the former to do_POST() (an operation specific to HTTP).

The way I usually keep this straight is with examples like I just gave. So to answer the questions:
  1. When a function wants a const char *, it's saying that it won't modify the chars its passed, but also that it'll still accept plain char *. Subset of functionality + accepting the other = no, const char * is a supertype of char *.
  2. You can no longer pass plain const char * to my modified strlen(), because they might be NULL. More specific requirement on the argument means yes, const char * notnull is a subtype of const char *.
#1 is really unintuitive to me. One way to untangle it is to imagine instead that all types were immutable to begin with and you had to declare them mutable explicitly: then both mutable and notnull are qualifiers that are further constraining the types, and both would indicate subtypes.

I think this is further confused by the fact that these qualifiers behave more like attributes than subtypes, but I don't really have a strong understanding of why I see a distinction there. (I can just imagine Graydon wincing at all of this now...)


* Footnote: C types are read backwards, which takes a bit to get used to. In a const char *, it's the char that's immutable; an immutable pointer is char * const, and read "constant pointer to char". ML fell into this same trap, so expressions like ref 3 have type int ref. Haskell (do I even need to write this sentence?) has it in the other order, which is both cleaner and works analogously with functions (where the "argument" to the type is on the right: list of int, not int list).