Wednesday, August 29, 2012

A usable variant type for C++

A tagged union (also known as a variant) is an aggregation of two or more types. An object of such a type can be any one of a predefined set of types. They are insanely useful for modelling many real-world concepts. A network connection might be either ready, open or disconnected – and each of these states has different pieces of data attached to them. A message that comes over that network connection might be a heartbeat, broadcast packet, or an RCP call – and again, each type is associated with a certain set of data fields. Tagged unions pop up everywhere in the real world, and the core C++ language has no decent way to model this concept.

Enter Boost Variant, a library that provides tagged unions to C++. Conceptually, it sounds like just what we want – a Boost Variant is a template type that you can instantiate to hold a client-defined subset of types, and it provides protections to make sure you correctly unwrap the variant to the right type. The problem is, to take advantage of this type safety involves writing a lot of ugly code!

I will present in this post my library function that sprinkles in C++11 lambdas to make Boost Variants a little more palatable.

Take as a case study, the "motivating example" from the Boost Variant documentation:

#include <boost/variant.hpp>
#include <iostream>

class my_visitor : public boost::static_visitor<int>
{
public:
    int operator()(int i) const
    {
        return i;
    }

    int operator()(const std::string & str) const
    {
        return str.length();
    }
};

int main()
{
    boost::variant< int, std::string > u("hello world");
    std::cout << u; // output: hello world

    int result = boost::apply_visitor( my_visitor(), u );
    std::cout << result; // output: 11 (i.e., length of "hello world")
}

Besides the obvious weight of the class definition, notice that the logic you want to run is now removed from the original context in which you wanted to run it.

To make it better, we turn to C++11 lambdas. Here is the code rewritten to use the library presented here:

#include <boost/variant.hpp>
#include "inline_variant.hpp"
#include <iostream>

int main()
{
    boost::variant< int, std::string > u("hello world");
    std::cout << u; // output: hello world

    int result = match(u,
        [](int i) -> int { return i; },
        [](const std::string & str) -> int { return str.length(); });
    std::cout << result; // output: 11 (i.e., length of "hello world")
}

The magic part is the call to the match function, which accepts the variant followed by the handler functions. The functions can be specified in any order. A compiler error will be thrown if the functions do not match the types of the variant, if the return types are not all the same, or if a non-unary function is supplied.

This is a C++11 only library because it's quite pointless without lambda functions. I've tested it with gcc 4.7. It doesn't work on gcc 4.6, which can't handle the variadic templates.

If you think this sounds useful, check the inline_variant_visitor project on my Github.

The making of...

This was quite a fun library to implement – behind the scenes, it involves Boost MPL and Fusion metaprogramming, as well as C++11 variadic templates. I will try to explain what each of these mean in the rest of this post.

A metaprogram is a program that operates on a program. In this case, we are talking about compile-time metaprogramming on types. We dive into a world where our program runs during compilation, and the inputs and outputs are types.

Boost MPL is like the STL for metaprogramming. It gives you a library of type containers – such as vectors and maps of types. For example, boost::mpl::vector<int, string, char*> is a vector of the types int, string and char*. For my variant visitor, I use an MPL vector to store a list of the function types that are passed into match.

Boost Fusion is the bridge between the compile-time world and the runtime world. Like MPL, it gives you containers of types, but these containers can be instantiated at runtime to hold values. For example, boost::fusion::vector<int, string, char*> not only represents the list of types, but can be instantiated to actually hold an int, string and char* with specific values. It's actually equivalent to a tuple. But it doesn't stop there – more interesting is the Fusion map. To illustrate, take this Fusion data structure (I will omit the boost::fusion namespace for concision):

typedef map<
    pair<int, double>,
    pair<char, std::string> > example_map_type;

This means that the type int is associated with some value which is a double, and the type char is associated with some value which is a string. Let that sink in for a bit. The left side (the key) of the map is a type. The right side of the map is a runtime value. What value? You decide that by constructing it:

example_map_type m(
    make_pair<int>(3.14)
  , make_pair<char>("Pie"));

So we've just made a map that says int maps to 3.14, and char maps to the string "Pie". How does this help us? Inside my match function (instantiated in the example above), lives a type that looks something like this:

map<
    pair<int, std::function<int (int)> >
    pair<std::string, std::function<int (std::string)> >

This associates the type int with a function that accepts an int and returns an int, and it associates the type std::string with a function that accepts a std::string and returns an int. Given this compile-time/runtime data structure, all that's left in order to perform the function dispatch is to determine the type held in the variant and look up that type in the map to get the right function, which is then invoked.

I hope that made sense, because it's a heavily abridged account of what the code does. Perhaps a more thorough walkthrough of the implementation could be a topic for a future post.