evan_tech

Previous Entry Share Next Entry
I searched online for the combination of keywords in the subject of this post and failed to find anything, so I'll write a post about it myself. I guess "existential type" is a relatively obscure theory-word, but you'll see the concept is simple enough.

Imagine you want a simple typesafe "Callback" class in C++ that works with objects. There are millions of these online with varying levels of complexity, so let's stay simple:
template<class T>
struct Callback {
  T* obj;
  void (T::*cb)();

  Callback(T* o, void (T::*c)()) : obj(o), cb(c) {}

  // Don't worry if this syntax is unfamiliar;
  // it just calls the member through a pointer.
  void run() { (obj->*cb)(); }
};
You can then construct and use this with code like:
SomeObj myobj;
Callback<SomeObj> my_cb(&myobj, &SomeObj::some_member_function);
...
my_cb.run();
This is all well and good until you want to make a collection of callbacks on objects of different types, for applications like "notify every listener that my value has changed". The problem is that the type of the object held within the callback "leaks" into the Callback's type (as exemplified by the SomeObj template parameter to my_cb). Really, the caller of the callback doesn't care what the type of the object hidden within the callback is; it just wants to invoke it.

(Brief OO aside: the way people often solve these sorts of callback-y problems in the OO world is through an interface. I'll make it concrete with an example: imagine a Socket class that uses some ISocketWatcher interface which has a virtual data_ready() method to notify its watchers that there is data available on the socket. Any class that wants to be notified of data has to implement the ISocketWatcher interface and provide a function exactly named data_ready(). Already this name may conflict with some other name your class has, or maybe it's not descriptive enough in the context of your new class, but worse if you want to use two sockets: suppose you're implementing a proxy and you have a server and client socket. What you'd prefer is to hook up a server_data_ready() method to one and a client_data_ready() to the other, but the interface again constrains it so that both can only call you back via the same name.)

Back to the Callback class. What you really want is to drop the template parameter; the user of the class doesn't care about the type carried inside. But then you can't make the internals type-safe. (I found implementations online that solve this with void pointers inside the class, but that gets especially ugly apparently because you can't guarantee that method pointers are the same size for every class -- yikes!) What you really want the type to say is "a callback has no parameters, but inside of it it uses some type T in such a way that the caller cannot work with T -- it's only used inside". If you squint at it a bit you can phrase this as an existential statement: given an instantiated Callback object, "there exists" some type T inside of it such that the interface it provides works, but you the caller don't get to know that type.

In Cyclone (briefly for new readers: a language that tries to retrofit fancier typing back on to C), they provide existential types with a cute syntax, something like:
struct Foo { <T> ... }.
I think the idea is the type is stored "inside" the curly braces, which is another way of thinking about it.

We don't have those in C++, though. But existential types are sorta (maybe "deeply") related to good ol' objects. When an object provides an interface and you use it through a pointer to the interface, the real type of the object (the thing on the other end of the interface pointer) is hidden from you. So you can hijack this:
struct Callback {
  virtual void run() = 0;
};

template<class T>
struct RealCallback : public Callback {
  // ...exactly the code in the first implementation of Callback
};
You can now create a bunch of RealCallbacks with different types T but stuff them all together into a collection of Callback* and invoke them all through run(). Note that this isn't in the OO spirit; you could imagine collapsing and resolving Callback and RealCallback into one blob, even at compile-time, if the typing only worked out. (It's also no panacea, as you have to manage the memory of Callback*s yourself...)

(ObHaskell: Autrijus(!) on behalf of Perl6 hacked existential types into GHC in 2005, using an object-like pattern as a demo of why you'd use this.)