Statically Typed

because Hindley-Milner rocks

Implementing Scala’s pattern matching in C++


Scala has many beautiful features which I’m absolutely in love with: closures, currying of functions, functions as first class objects, higher order functions, traits, and pattern matching (and there’s probably more I’m forgetting.)  When going from Scala to Java or C++ I’m often confronted by how verbose and flat other languages seem.  Worse, I miss my favorite avenues for addressing problems.

It seems I’m not the only one.  A few years back I questioned the usefulness and rational of some Boost libraries.  They seemed on the surface to promote hideously deformed syntax or added boilerplate for little to no gain.  What was the point of this Lambda library or all these Functional libraries?  Who could make sense of this MPL library?  Fast forward to today and I can’t get enough of them.  They solve many of the limitations of C++ but they’re lacking in some of the more profound abstractions.

Scala’s Pattern Matching

Speaking of great abstractions C++ could use, no feature of Scala is more profound than pattern matching.  For those who are not familiar with the concept of pattern matching let me give you an example:

  val aPattern = "[^d]+.*([10]/d+).*([pdk]/d+)/w*".r
  case class SomeType( n:Int, s:String )
  def saySomething( x:Any ) = x match{
    case "Nicole" => println("Hi, Nicole")
    case n:Int => println("I gots " + n + " fingers.")
    case a:Map[_,_] => println("We got a map")
    case (a:Int, b:String) => println("I have " + a + " of " + b + ".")
    case List(_, 0, _*) => println("This list starts with zero.")
    case SomeType( 4, s ) => println("Got four of " + s)
    case aPattern( a, b ) => println("Found " + a + " and " + b)
    case _ => println("You've reached the default case.")
  }

Confusing?  Looks almost like a switch statement or a visitor pattern.  It essentially is but only more powerful.  Let’s break down each case statement to find out what’s happening.

  1. case “Nicole” := We’re matching on a specific value passed into the function.  This is as close to a switch statement that accepts more than int types as we can get.
  2. case x:Int := We’re matching on type.  That is, we’ll accept any value as long as it’s an Int.
  3. case a:Map[_,_] := Again, we’re matching on type but this time it’s a container object.  The container is not bound to contain any specific type or either key or value objects.
  4. case (a:Int, b:String) := We’re matching on a container type again.  This time the container is a tuple of an Int and a String.  We’re also labeling the individual items of the tuple so that we may interact with them individually.
  5. case List(_, 0, _*) := We’ll accept a list of any length as long as the second item of the list is a zero.  Subtle yet profound in that we can match on a type of container, type contained by the container and the contents of the container in a specific location.
  6. case SomeType( 4, s ) := We’ll accept our own defined case class type as long as a 4 was called in the first argument of the constructor.  Under the hood we’re using some of the methods supplied to all case classes to peer into the construction of our object.
  7. case aPattern( a, b ) := We’re decomposing a regex pattern.  The variables “a” and “b” are similar to catch blocks we’d use in most regex captures.
  8. case _ := the default case.  We’ll accept anything, really.

Phew that’s a big list.  So how can we capture that same functionality in C++?  There’s a ton of syntactic sugar up there hiding all the boilerplate that is invariably being written to Java byte code.

The Strategy Pattern

As I eluded to before pattern matching looks like a switch statement on steroids.  The switch statement in C++ can be though of a mapped set of functions on integer values.  Generalizing it to objects is straight forward and we have several choices.  The Gang of Four book gives a fine example and our first inclination might be to use an abstract class representing an interface to the methods we’d employ:

struct IDoSomething : public std::unary_function<void,double>{
   virtual ~IDoSomething(){}
   virtual void operator()(const double& _value) const =0;
};
std::map<int,IDoSomething*> unsafely_mapped_functions;

There is nothing wrong with this route.  It’s perfectly reasonable OO-design.  But with most things OO it’s utterly verbose, requiring us to create a new class inheriting from this interface for each object value we’d like to treat.  It invites repetition, code bloat and has the feel of something completely over-engineered.

In C++ there’s an additional wrinkle.  I’ve used raw pointers in a container that hasn’t been specifically designed to deal with pointers.  Not using RAII invites improper disposal of system resources.  We really want to take advantage of RAII as much as possible and should convert that into a map wrapping smart pointers.  It’ll look like:

std::map<int,boost::shared_ptr(IDoSomething)> safely_mapped_functions;
safely_mapped_functions[4] = boost::shared_ptr(new DoSomethingWithFour() );

All this for an object which is quintessentially a function?  Surely there’s a better way and there is.

Boost gives us another way to abstract away from interfaces and avoids the types of problems we might encounter with the previous approach.  The Function library contains object definitions which accept any object which behaves like a function call, be they function pointers, member functions or functors.  They’ll even accept function like objects which do not explicitly match our signature as long as there exists an implicit conversion in the arguments to what we’d desire.  Hence, we no longer need a hierarchy of closely related objects:

std::map<int,boost::function<void(double)> > mapped_functions;
mapped_functions[4] = DoSomethingWithFour();
mapped_funcitons[2] = &divide_by_two;
mapped_functions[1] = boost::bind(&Foo::AddValue, boost::ref(*this), _1, 42.3);

But as I said above, we’re matching on the value of a very specific object type.

Abusing the Visitor Pattern

Like Scala, C++ allows for function overloading.  This is generally how the Visitor pattern is implemented in C++.  Visitors match on object type, not value.  This seemingly small detail presents us with a rather large problem.  In the example Scala code I’m passing in an object of type Any.  This is similar to Java’s Object or C++’s void pointer but not quite.

The Scala compiler is smart enough to be able to deduce the object type of the Any passed into the function.  In C++ there is no Any.  We need to resort of casting for the same effect and casting one object type to another replaces the type information of that object.  To say that in simple terms, if I cast an int to a char from that point forward the program does not know it should be an int.  Any Visitor will then treat it as a char.

Thankfully Boost provides us with a useful type known as a Variant.  Variants are statically typed objects which can represent a specifically defined set of other objects.  Variants can be used to model finite state machines with the State pattern, as a means to return several types of objects from the same function, or to move us closer to Scala’s pattern matching.

typedef boost::variant<int, double, std::string> numeric_type;
struct EqualsFour : public boost::static_visitor<bool>{
    bool operator()(const int& _value) const { return _value == 4; }
    bool operator()(const double& _value) const { return _value == 4.0; }
    bool operator()(const std::string& _value) const { return _value.compare("4"); }
}
bool itIsFour = boost::apply_visitor( EqualsFour(), numeric_type("4") );

This code creates a definition of a variadic type which can be an int, double or string.  It then creates an object which functions in a similar fashion to a functor with overloaded methods for each of the types in the variadic type.  Finally, the two meet and magic happens.

What isn’t clear from the above is that at compile time we’re required to have no less but possibly more than every type defined in the variant by our visitor patterned functor.  Failure to do so results in a compile time, not run time error.  The same is possible in Scala using sealed case classes (outside the scope of this discussion.)

This begs the question, how far can we take it?  How many of the case expressions in the Scala code can we satisfy using this method?

struct Match : public boost::static_visitor<std::size_t>{
    template<typename T>
    std::size_t operator()(const std::list<T>& _aList) const {
        return _aList.size(); }
    std::size_t operator()(const boost::tuple<int,double>& _aTuple) const {
        return 2; }
    template<typename T>
    std::size_t operator()(const T& _aDefault) const { return 1; }
};

Believe it or not a fairly broad category.  The compiler is smart enough to use the most restrictive overloaded operator it can.  Hence, a std::list<int> would compile down to the first branch in the code, returning the length of the list instead of the default value of 1.

But… and you knew this was coming, we can still not peer inside of an object to match on value, only type.  As said above, this is the limitation of a Visitor pattern.

A Solution?

Forget C++, just use Scala.  Kidding (not really.)  If we use a visitor pattern we can match on object type and if we use a strategy pattern we can match on object value.  To yield the same power of the pattern matching of Scala in C++ we can combine the two.  Here’s an abbreviated version:

class Match : public boost::static_visitor<> {
public:
    Match();//I'm leaving this part out for brevity!
    void operator()(const int& _value) const {
       std::map<int,boost::function<void(void)>::const_iterator operand 
           = m_IntMatch.find(_value);
       if(operand != m_IntMatch.end()){
           (*operand)();
        }
        else{
            defaultCase();
        }
    }
private:
    void defaultCause() const { std::cout << "Hey, what the..." << std::endl; }
    boost::unordered_map<int,boost::function<void(void)> > m_IntMatch;
};

And that’s just for matching multiple values on ints!

So, while it can be done the question is is it worth it?  Perhaps.  Depends on the situation and the amount of code reuse this would prevent.  YMMV.

Conclusion

I’m an advocate of Scala, it’s true, but no expert.  People often ask me “Why Scala?”  This is one of the reasons.  Pattern matching is powerful, concise, and beautiful.  While I’ve skipped writing the regex pattern, trust me when I say that to fit it into this article would have required several more pages of code and explanation.  I honestly didn’t feel like it and I hope you can see why.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Information

This entry was posted on January 18, 2011 by in C++, Scala and tagged .
%d bloggers like this: