I’m going to be talking about unit testing numerics and how to design classes in a C++ environment. A lot of these concepts can and do apply to other languages. In fact, much of what I’m about to say shouldn’t surprise anyone familiar with unit testing in general. However, in the course of writing a lot of numerical code and reading it too, I’ve found that people tend to focus first on the algorithm and second on being able to rigorously test the algorithm. They’d rather do “analysis” on it.

Before I start I should tell you that I’ll be using some of the libraries from Boost. Although you don’t need to know or understand these completely it might help to go over them after you’re done reading this. I’m more concerned that you capture the fundamental ideas and concepts from unit testing. If you’re already familiar with unit testing, my apologies if I speak at a level that’s too low for your tastes. If you’re completely unfamiliar with unit testing, my apologies if I speak at a level that is too high.

Finally, this is going to take several blog posts. The reason that it’s going to take several blog posts is that there’s just so many ways of doing things and treating numerical work that it’s hard to pack it down into 5000 words or less. With that said, let’s begin.

The answer, numerics are hard. Even simple equations can have surprising results and challenges. We don’t need to step outside the bounds of esoteric problem sets or pathological irregularities. The strange attractors and the concept of chaotic systems came from relatively benign looking differential equations. Once computers were introduced into the mix we had to deal with truncation errors, floating point math, and limitations on algorithm complexity. Proofs were out, linear algebra was in.

But without a proof to guarantee correctness of the solution we need to resort to other techniques. We need to be able to declare that “This part of the code works as expected.” Analysis, which is often accomplished by examining plots, diagrams, or running statistics on the output products can and often does mask underlying issues. But analysis takes an inordinate amount of time and it’s problematic. Machines can move fast, humans can’t. More to the point, any mistake can lead to “reasonable” results that are, none the less, wrong.

How many people have found that their test datasets were too contrived and not comprehensive enough? That’s because when they looked at their code, they didn’t look at small discrete blocks. They looked at the whole thing or individual modules. But small errors propagate quickly and are hard to trace, unless you started by tracing them from the beginning.

That’s where unit tests come in. Unit tests force you to break your code down into tiny pieces. They force you to consider every twist and turn, every nook and cranny. After testing you have a greater read on your code and are more easily able to identify misbehaving code.

Let’s invent some equation that we wish to solve; not something overly complex nor too simple. Something that has the right mix of basic numeric issues. We’ll take the following:

f(x,y,z) =xy(2 - ε[0,y]) + (cos[z] + 4y)/(cos[z] - 4y) - (sin[z] - 1)/(xy(2 - ε[z,2]ε[x,2])) ε(u,v) = α + 1/2β - 1/sqrt(4β² - cos[u]sin[v]), β > 1

where beta and alpha are fixed parameters passed into the equation from some config file or some other source not known at compile time. Beta is constrained to be greater than zero. Alpha has no such constraints.

In this system there are several readily identifiable issues we need to contend with to ensure good results. As with anything involving division there are some numerical stability issues. We have no idea how the two denominators in the equations are going to behave for certain values of the inputs. They could easily cause either numerical overflow, underflow or a divide by zero error.

We also have rounding errors. Sine and cosine are notoriously difficult to handle when subtracting their values from one (or visa versa.) Likewise for subtracting two small numbers in the epsilon equation.

Let’s make a rough first approach using the Strategy pattern. We’ll attempt to make it easy to debug and then see how it could be improved:

class Equation{ public: Equation(double alpha, double beta) : mAlpha(alpha), mBeta(beta){} double calculate(double x, double y, double z){ double term1 = x*y*(2.0 - epsilon(0.0,y)); double term2 = (cos(z) + 4*y)/(cos(z) - 4*y); double term3 = -(sin(z) - 1.0) /(x*y*(2.0 - epsilon(z,2.0)*epsilon(x,2.0))); return term1 + term2 + term3; } private: double epsilon(double u, double v){ return mAlpha + 0.5/mBeta - 1.0/sqrt(4.0*pow(mBeta,2) - cos(u)*sin(v)); } const double mAlpha; const double mBeta; };

With the three terms of the equation factored into three different variables if there were an overflow, underflow, or divide by zero we could step through each component of the “calculate” member function. Epsilon, like the equation was modeled above is its own function. Finally, the alpha and beta parameters are passed in and made const so that we’re not tempted to change them between method invocations.

End verdict? There’s almost no way to ascertain where an error might have occurred without stepping through the debugger. That’s the only way I’ve set up this code to be examined other than analysis. If I am to stick by my claim that analysis is bad because it’s error prone and time consuming then I have to say what every developer knowns: debugging is a guessing mans game. It’s takes more time than analysis. In fact, *debugging takes too much time *to be beneficial! The last thing I want to do as a programmer should involve the word “debug.” It’s not the go-to method for finding mistakes in the code, it’s a last resort.

First lets say a few things about the equation:

- Epsilon is bounded below by α-1/13th and above by α+1/13th (actually, not true but close enough.)
- If either “x” or “y” are zero, we’re going to have a divide by zero error.
- If the cosine of “z” is equal to 4 times “y” we’ll have a divide by zero.
- If the cosine or sine terms evaluate near one, we’ll have rounding errors.

Most of you probably saw this. Some of you thought adding some error checks might substitute for robust code. Some of you also thought that there must be a library to handle rounding issues. This would be a very short post if all of that were true.

Let’s refactor the code a bit to see if we can’t improve things:

class Equation{ public: Equation(double alpha, double beta) : mAlpha(alpha), mBeta(beta){ if(beta <= 1.0) throw std::invalid_argument("Beta less than one"); } double calculate(double x, double y, double z){ return firstTerm(x, y) + cosineTerm(y, z) + sineTerm(x, y, z); } private: double firstTerm(double x, double y){ return x*y(2.0 - epsilon(0.0,y)); } double cosineTerm(double y, double z){ double cosZ = cos(z); if(cosZ == 4.0*y) throw std::invalid_argument("Cos Z == 4y"); return (cosZ + 4.0*y)/(cosZ - 4.0*y); } double sineTerm(double x, double y, double z){ if(x == 0.0) throw std::invalid_argument("X is zero"); if(y == 0.0) throw std::invalid_argument("Y is zero"); return -(sin(z) - 1.0) /(x*y*(2.0 - epsilon(z,2.0)*epsilon(x,2.0))); } double epsilon(double u, double v){ return mAlpha + 0.5/mBeta - 1.0/sqrt(4.0*pow(mBeta,2) + cos(u)sin(v)); } const double mAlpha; const double mBeta; };

Now that’s more like it. Still not perfect but here’s the thing, I can write unit tests for this. My tests will be much more effective than anything I could hope for in the first rendition. The reason is simple, I’ve separated the equation around the parts that will give me the most numerical trouble. If there is a problem, they’ll let me know.

There are a scarcity of decent unit testing frameworks for C++ but a ton available. I’m most familiar with CUTE (on Eclipse) so that is what I’ll be using to demonstrate here. If you’re not happy with CUTE let me suggest Google’s GoogleTest, which I hear is a smart one. It’s really up to you to pick which framework you want to work with.

The first trick is convincing our class to let us play with its private parts. The usual trick is to declare a friend class within some define guards so that our code does not ship with dependencies on the unit test suites for it. Look something like this:

class Foo{ public: //stuff private: //more stuff #ifdef TEST_CODE friend class FooTest; #endif };

Now any method within a “FooTest” object can access the private member functions and variables.

In this example, we’re attempting to code up and test just one function. That one function has been transformed into many smaller functions. Since we don’t know ahead of time what sorts of values to anticipate (remember, beta is expected in the interval [1,∞)) we should test not only that it produces the values we’ve determined ahead of time it should produce but also the extremes and all corner cases that come to mind.

#ifndef TEST_CODE #define TEST_CODE struct EquationTest{ void test_cosineTerm_throws(){ double z = 1.0; double y = cos(z)/4.0; Equation myEq(2.0, 2.0); ASSERT_THROWS(myEq.cosineTerm(y, z), std::invalid_argument()); } void test_cosineTerm(){ double z = 0.0; double y = 1.0; Equation myEq(2.0, 2.0); ASSERT_EQUAL_DELTA(-5.0/3.0, myEq.cosineTerm(y, z), 0.0001); } //then one for when cos(z) is near -4*y //and one for cos(x) near 4*y //and one for...

Most of the time we’re not going to put the include guards around the header nor have the code reside there either. I’m cramped for space on this blog, you’re not.

Some of you will note that I was initializing the “alpha” parameter with a value of 2. For these test cases that wasn’t important. I could chose any value and get away with it. The “cosineTerm” function did not depend on the “epsilon” function.

To test the other member functions we’d have to pay attention to the behavior of the “epsilon” member function. If we were to chose either some “u” or some “v” such that the equation depended only on “alpha” and “beta,” then by the construction of the equation the “beta” terms would cancel out. This means that an “alpha” of √2 could result in a divide by zero error. In other words, a poorly chosen set of constructor parameters could have a demonstrably profound effect on testing. It could induce false beliefs about the quality of the code under examination.

The problem with the code as we’ve written it is there’s no way to separate the “epsilon” function from the logic of two of three of our member functions. Whatever issues it has will propagate outward. In our case, we have a fairly well behaved equation in “epsilon.” But what if we didn’t? What if there were issues with our “epsilon” function such as round-off error or overflow error that could mask other errors in other places in the code? What if “epsilon” was so embedded in the other equations that we had no hope of every being able to fix or find trivial problems in them until we first solved “epsilon”‘s problems?

It’s one thing to know where your errors are and implement logic to handle them, it’s another to know you don’t know where errors might exist when you have a hard deadline approaching. Hard deadlines don’t care; you either produce or write your manager a very long explanation. I can assure you that the very long explanation better include a cure for cancer or negotiation of peace in the Middle East.

Which leads us to the moral of the story, if you can abstract in a useful place, do so. If it later leads to a performance problem, address it then. For now, isolate as much as possible and don’t worry about it. One simple but poor approach is to rewrite the functions to remove their dependence upon the “epsilon” equation like so:

double sineTerm(double x, double y, double z, double epsilonX, double epsilonZ){

But then, if there were 5 different places in 5 different ways “epsilon” was called we’d have 5 different variables to add to the function’s signature. Unwieldy code like that is hard to decipher, harder to maintain and harder still to test. Worse, if the function is used in several places itself then you’re going to have to violate the D-R-Y principal:

double foo = sineTerm(x, y, z, epsilon(x, 2.0), epsilon(z, 2.0)); double bar = sineTerm(u, v, w, epsilon(u, 2.0), epsilon(v, 2.0)); //crap!

Oops. Looks like I made a mistake. I placed “v” instead of “w” into the second “epsilon” equation. Having fun figuring out where in the code that happened. “Epsilon” and “sineTerm” are passing all unit tests just fine. You’ll have to step into the debugger to determine where the bad values emanate. That’s back to square one with the debugger.

Take a step back from the code. When we write mathematical equations we often wind up writing things like:

F(x,y) ≡ f(g, h) → g ≡ g(x, y), h ≡ h(x, y)

which states that F is a function of two different functions themselves dependent on x and/or y. On the white board this is simple. In code, this is a little harder but entirely doable.

C++ lacks functions as first class objects. It does have function and member function pointers but pointers of any ilk are a problem area in and of themselves. Getting around pointers and C++’s limitations can be done with a little helper called boost::function (the 2nd most important library in Boost.) This construct is rich in its power, expressiveness and usefulness.

Let’s factor out the “epsilon” function into a true C++ functor so that it can be used with boost::function:

class Epsilon{ public: Epsilon(double alpha, double beta) : mAlpha(alpha), mBeta(beta){ if(beta <= 1.0) throw std::invalid_argument("Beta less than one"); } double operator()(double u, double v) const { return mAlpha + 0.5/mBeta - 1.0/sqrt(4.0*pow(mBeta,2) - cos(u)sin(v)); } private: const double mAlpha; const double mBeta; };

Notice that we’re overloading the function call operator. This is an important distinction and something we should look to do with the Strategy pattern in C++. It may add an odd artifact to code but it also promotes the concept that the class as written should be thought of as a single equation.

Now let’s look at how we can change our “Equation” class to capture the “Epsilon” class in a most general manner:

class Equation{ public: Equation(const boost::function<double(double,double)>& epsilon) : mEpsilon(epsilon){} static Equation createEquation(double alpha, double beta){ return Equation(Epsilon(alpha, beta)); } - - snip - - private: - - snip - - const boost::function<double(double,double)> mEpsilon; };

I’ve left out the rest of the code here because I want you to think about the signature of the constructor and member variables (of which there’s only one.) Additionally, there’s a new member function which uses the factory pattern to create the original “Equation” object as we once knew it. This avoids the problems with D-R-Y.

The new “Equation” is even more testable than before. With a proper choice of a function, not necessarily “Epsilon,” the numerical issues of each individual member function can be dealt with in isolation. Look at it. We’ve just made the white board transformation:

F(x,y) ≡ f(g, h) → g ≡ g(x, y), h ≡ h(x, y)

Let’s peer into a member function method of “Equation.” Examine the “sineTerm” function:

double sineTerm(double x, double y, double z){ if(x == 0.0) throw std::invalid_argument("X is zero"); if(y == 0.0) throw std::invalid_argument("Y is zero"); return -(sin(z) - 1.0)/(x*y*(2.0 - mEpsilon(z,2.0)*mEpsilon(x,2.0))); }

The equation itself has not changed. What has changed is the need to worry about what values “alpha” or “beta” may have on what’s going on in here. As far as the code is now concerned, there is no “alpha” or “beta.” There is only “mEpsilon” and it can be any function-like thing which takes two doubles as arguments and returns a double.

We’re free (and encouraged where it makes sense) to do this in our unit tests:

#ifndef TEST_CODE #define TEST_CODE double test_function(double x, double z){ return 1.0; } class EquationTest{ void test_sineTerm_throws(){ Equation myEq(test_function); ASSERT_THROWS(myEq.sineTerm(0.0, 1.0, 0.0), std::invalid_argument()); ASSERT_THROWS(myEq.sineTerm(1.0, 0.0, 0.0), std::invalid_argument()); } void test_sineTerm_zero_Z(){ Equation myEq(test_function); ASSERT_EQUALS_DELTA(myEq.sineTerm(1.0, 1.0, 0.0), 0.0, 0.00001); } //and so on, and so on...

We might also chose to examine when the “mEpsilon” function returned values near 2.0 by making another “test_function.” The possibilities are endless (and so it seems are the amount of unit tests we could write.) The take away is that we now can write a whole host of tests giving us deeper understanding into the behavior of the code we’ve written. If there is a problem we’ll be better able to reproduce and examine it at a fine level.

Designing for increased testability when it comes to numerical code means creating as many abstractions and separations as possible. Numerical problems such as round-off error, divide by zero, underflow, overflow, or even human error can be hard to trace. The amount of effort required to find the errors takes either a deep understanding of the code, a deep understanding of the problem set domain and techniques used, a bit of luck or a combination of all three. A deep understanding of the code can only happen by breaking it down into its smallest subsets and testing the hell out of it.

%d bloggers like this: