Statically Typed

because Hindley-Milner rocks

Interview Questions


There’s a post on StackOverflow dealing with a C++ interview question and the OP’s answer.  Unlike many candidates he’s actually trying to understand why his code was deemed sub-optimal.  Fantastic!  I wish there were more people like this.  This post is what I would have submitted.

First, the question.  Given a sorted array of integers:

  • Find the first of a randomly chosen value if it exists within the array
  • Return the count of this integer
  • Have the solution scale well for large arrays

The OP chose to use a binary search, not a bad heuristic.  I would do the same.  First I’d attempt to tackle find the value within the array assuming that I know the length of the array and that a long is big enough to cover the full length.

#define NOT_FOUND -1
long findValue(int *_array, long _startIndex, long _endIndex, int _value){
    if( _array[ _startIndex ] == _value ){
        return _startIndex;
    }
    else if( _array[ _endIndex ] == _value ){
        return _endIndex;
    }

    long midpoint = (_startIndex + _endIndex)/2;
    if( _array[ midpoint ] <= _value && _value < _array[ _endIndex ] ){
        return findValue( _array, midpoint, _endIndex, _value );
    }
    else if( _array[ _startIndex ] < _value && _value < _array[ midpoint ] ){
        return findValue( _array, _startIndex, midpoint, _value );
    }

    return NOT_FOUND;
}

This function doesn’t find the first value, only the index of a value.  Next, we need a routine to trace back to the location of the first value.  Here I’m going to play Devil’s advocate and say, “What if the array is almost everywhere the same integer?”  Now we have a problem because I’ve lost quite a bit of detail with the above binary search.  I spent time calculating the bounds of where the value in the array could be found in both directions but I’m not returning it!

Here’s a better approach:

Pair<long> notFound(){
    return make_pair(-1,0);
}

Pair<long> findValue(int *_array, long _startIndex, long _endIndex, int _value){
    if( _array[ _startIndex ] == _value ){
        return make_pair( _startIndex, _endIndex - _startIndex );
    }
    else if( _array[ _endIndex ] == _value ){
        return make_pair( _endIndex, _endIndex - _startIndex );
    }

    long midpoint = ( _startIndex + _endIndex )/2;
    if( _array[ midpoint ] <= _value && _value < _array[ _endIndex ] ){
        return findValue( _array, midpoint, _endIndex, _value );
    }
    else if( _array[ _startIndex ] < _value && _value < _array[ midpoint ] ){
        return findValue( _array, _startIndex, midpoint, _value );
    }

    return notFound();
}

Now I’m returning the information I need to speed up the performance of the counting algorithm.  I do this by reusing a binary-esque search to count the difference between the last place the value exists and the location I found it in the previous algorithm.

long countBehind(int *_array, long _location, long _range, int _value){
    if( _array[ _location ] == _value && _array[ _location + _range ] == _value ){
        return _range;
    }

    long halfRange = _range/2;
    if( halfRange == 0 ){
        return 0;
    }
    else if( _value < _array[ _location + halfRange ] ){
        return countBehind( _array, _location, halfRange, _value );
    }
    else {
        return halfRange + countBehind( _array, _location + halfRange, halfRange, _value );
    }
}

Now I’ve got the count of values which follow behind that which I retrieved.  The reverse is just as easy to do.  All together:

template<long Length> Pair<long> findFirstCount(int *_array, int _value ){
    Pair<long> interval = findValue( _array, 0, Length - 1, _value );
    switch( interval.first ){
    case -1:
        return interval;
    case 0:
        return make_pair(0, 1 + countBehind( _array, interval.first, interval.second, _value ) );
    case (Length - 1):
        long back = countBehind( _array, interval.first - interval.second, interval.second, _value );
        return make_pair(Length - back, 1 + back);
    default:
        long front = countBehind( _array, interval.first, interval.second, _value );
        long back = countBehind( _array, interval.first - interval.second, interval.second, _value );
        return make_pair(interval.first - front, front + back + 1 );
    }
}

That’s a little template meta-programming to make listing the logic in a simple switch statement.  The variable “Length” can’t actually be variable.  If it were then I’d have to break up the switch into a bunch of if-else statements, then again if I also had a choice I’d be using Boost::Array.

Advertisements

2 comments on “Interview Questions

  1. Roger Pate
    March 19, 2010

    You might’ve missed the comment on that SO question, as it was posted the day after this, but std::equal_range reduces all the code here to a single, simple function call plus some iterator arithmetic.

    • owreese
      March 19, 2010

      Yup, that beats my implementation. Totally, completely beats it. It’s so beautiful. Now I have to sit here and ask myself why didn’t I think of this?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Information

This entry was posted on March 14, 2010 by in C++, Intervew Questions.
%d bloggers like this: