C++17 Recursive Fibonacci calculation with memoization

Question

Compiler is g++ 4.2. I'm new to C++, but I've done a lot of data science, web scraping, and some socketing stuff in Python. This code generates the nth Fibonacci number, either with a naive implementation or with caching.

#include <iostream>
#include <string>
#include <unordered_map>
#define BIG unsigned long long int

std::unordered_map<int, BIG> umap;


int fib(int n) {
    if (n < 2) {
        return n;
    }
    return fib(n-1) + fib(n-2);
}

BIG fib_with_memo(int n) {
    if (umap.find(n) == umap.end()) { // if F_n not in cache, calculate, cache, and return it
        BIG val = fib_with_memo(n-1) + fib_with_memo(n-2);
        umap[n] = val;
        return val;
    }
    return umap[n]; // otherwise return the cached value
}

int main(int argc, char** argv) {
    umap[0] = 0;
    umap[1] = 1;

    int input = std::stoi( std::string(argv[1]) );
    
    std::cout << fib_with_memo(input) << std::endl;
    return 0;
}

I used a std::unordered_map (hash table) because its lookup/insertion times are \$O(1)\$ (compared to the std::map, which has \$O(log(n))\$ for both.)

indi · Accepted Answer · 2021-02-19 22:36:04Z

Compiler is g++ 4.2.

Yikes. That compiler is ancient. I’ve met competitive programmers younger than that compiler. If you’re planning to do any serious C++ programming, you really should look into upgrading your toolchain, if possible.

#define BIG unsigned long long int

This is absolutely the wrong way to create a new type.

First, there is a trend in C++ to avoid the preprocessor completely. So, no #defines, no #ifs, and so on. It is ALMOST completely unnecessary in the most recent version (C++20), but even in C++11 it is almost completely avoidable.

Rather than a #define, you should use a type alias:

using BIG = unsigned long long int;

But there’s one more problem. By convention, ALL_CAPS variable names are reserved for preprocessor macros… which basically means you should never use them. That means you should do:

using big = unsigned long long int;

Or, even better, choose a more descriptive name, like:

using big_uint = unsigned long long int;

But even better still would be to create your own namespace, and put everything—this type alias and all the Fibonacci functions—in there:

namespace your_ns {

using big_t = unsigned long long int;

int fib(int n) {
    // ... etc. ...

} // your_ns

Now I know your focus is on the memoized version of the Fibonacci algorithm, but the non-memoized one is really problematic:

int fib(int n) {
    if (n < 2) {
        return n;
    }
    return fib(n-1) + fib(n-2);
}

Imagine what happens if you call fib(5). Internally, that results in two calls to fib(4) and fib(3):

fib(5)
  ↓
fib(4) + fib(3)

Except… fib(4) also degrades to fib(3) and fib(2), while fib(3) degrades to fib(2) and fib(1):

fib(5)
  ↓
fib(4)           + fib(3)*
  ↓
fib(3)* + fib(2) + fib(2) + fib(1)

As you can see, fib(3) is called twice. And it gets even worse, because fib(3) degrades to fib(2) and fib(1):

fib(5)
  ↓
fib(4)                              + fib(3)
  ↓
fib(3)           + fib(2)*          + fib(2)*          + fib(1)
  ↓
fib(2)* + fib(1) + fib(1)  + fib(0) + fib(1)  + fib(0) + 1

Now you can see fib(2) is called three times. Each one of those times becomes fib(1) and fib(0).

fib(5)
  ↓
fib(4)                                     + fib(3)
  ↓
fib(3)                   + fib(2)          + fib(2)          + fib(1)
  ↓
fib(2)          + fib(1) + fib(1) + fib(0) + fib(1) + fib(0) + 1
  ↓
fib(1) + fib(0) + 1      + 1      + 1      + 1      + 1      + 1

So the net result of a single call to fib(5) is:

1 call to fib(4)
2 calls to fib(3)
3 calls to fib(2)
5 calls to fib(1)
3 calls to fib(0)

And that’s just fib(5). Imagine how many repeated calls you’ll get with fib(50).

(This is obviously not such a big problem with the memoized version… but it’s still a problem!)

What you really want to look into is the tail-recursive Fibonacci algorithm. I’m not going to work it out for you, but I’ll give you the interface:

auto fib_impl(int n, big_t a, big_t b)
{
    // ...
}

auto fib(int n)
{
    // are you sure you don't want to check for negative numbers?
    if (n < 2)
        return big_t(n); // note: this line won't compile if you used BIG; this is why the preprocessor sucks

    return fib_impl(n, 0, 1);
}

Now you don’t say what version of C++ you’re working with, but if it’s one of the more recent versions, then you can make all of this constexpr. That means that, when possible, the function may be computed at compile time, meaning it will have (effectively) zero run-time cost.

BIG fib_with_memo(int n) {
    if (umap.find(n) == umap.end()) { // if F_n not in cache, calculate, cache, and return it
        BIG val = fib_with_memo(n-1) + fib_with_memo(n-2);
        umap[n] = val;
        return val;
    }
    return umap[n]; // otherwise return the cached value
}

Now, since umap has no purpose outside of this function, it makes more sense to hide it within the function as a static variable. If you don’t know what function static variables are, they’re basically global variables, but only visible within the function.

While you’re at it, you might as well replace:

umap[0] = 0;
umap[1] = 1;

with:

umap = std::unordered_map<int, big_t>{{0, 0}, {1, 1}};

Why? Because this will be MUCH more efficient. The first way means that each line first searches the map for the key, then creates the new pair. The second way doesn’t bother with the search, but rather just creates the two pairs directly.

So now your fib_with_memo() could look like this:

auto fib_with_memo(int n) {
    // umap is constructed only once, the first time the function is called,
    // and initialized with {0, 0} and {1, 1}.
    static auto umap = std::unordered_map<int, big_t>{{0, 0}, {1, 1}};

    // note everything else is exactly the same (except I replaced the BIG
    // with auto)

    if (umap.find(n) == umap.end()) { // if F_n not in cache, calculate, cache, and return it
        auto val = fib_with_memo(n-1) + fib_with_memo(n-2);
        umap[n] = val;
        return val;
    }
    return umap[n]; // otherwise return the cached value
}

Another small fix is that once you’re in that if block, you know for a fact that n isn’t in the map. So rather than doing umap[n], you can emplace the new pair directly:

    if (umap.find(n) == umap.end()) {
        auto val = fib_with_memo(n-1) + fib_with_memo(n-2);
        umap.emplace(n, val);
        return val;
    }

Depending on your implementation, that might be more efficient.

Another possibly major optimization comes from the fact that you use find()… and then just discard the result. Why not use it?

    if (auto p = umap.find(n); p == umap.end())
    {
        // we didn't find n
        auto val = fib_with_memo(n-1) + fib_with_memo(n-2);
        umap.emplace(n, val);
        return val;
    }
    else
    {
        // we found n
        return p->second;
    }

So this is about where we’re at right now:

auto fib_with_memo(int n) {
    static auto umap = std::unordered_map<int, big_t>{{0, 0}, {1, 1}};

    if (auto p = umap.find(n); p == umap.end())
        auto val = fib_with_memo(n-1) + fib_with_memo(n-2);
        umap.emplace(n, val);
        return val;
    }
    else
    {
        // we found n
        return p->second;
    }
}

Now we get to the real problem: the same issue as with the non-memoized version, where you’re calling the same function over and over and over. The fix is the same as with the non-memoized version: refactor:

auto fib_with_memo_impl(int n, big_t a, big_t b) {
    static auto umap = std::unordered_map<int, big_t>{{0, 0}, {1, 1}};

    if (auto p = umap.find(n); p == umap.end())
        auto val = // actual algorithm to calculate fib here
        umap.emplace(n, val);
        return val;
    }
    else
    {
        // we found n
        return p->second;
    }
}

auto fib_with_memo(int n) {
    // are you sure you don't want to check for n < 0?

    // trivial solutions, don't even bother diving into the calculation
    if (n < 2)
        return big_t(n);

    return fib_with_memo_impl(n, 0, 1);
}

It’s up to you to implement the actual Fibonacci calculation there (you have a and b to work with). But otherwise, that should be a drop-in replacement for your current design, except orders of magnitude faster.

An alternative design

As you said:

I used a std::unordered_map (hash table) because its lookup/insertion times are \$O(1)\$ (compared to the std::map, which has \$O(log(n))\$ for both.)

Unfortunately, this thinking is misguided.

It turns out that due to the way modern processors work—particularly with caching—that big-O notation can be WILDLY misleading. I recall that the inventor of C++ himself was floored when someone showed him that a std::list, which has \$O(1)\$ insertion in the middle, is slower than a std::vector, which has \$O(n)\$ insertion in the middle, even for huge numbers of elements.

Forget all that theory crap, and instead take a step back and consider your actual problem. Imagine that you’ve called fib_with_memo(10). Okay, now what would you expect to see in the memo-map?

It would be this, right?:

Now look at the keys. It’s just 0 to 10, right? In order, with no holes. Just like an index.

So… why do you need a map? You’re just “mapping” an index to a value. That’s just what a basic array, or a vector, does.

Let’s imagine that we’re using a vector for the memoizing, with the “key” just being the index. That means after memoizing fib_with_memo(10), the vector would contain:

[index 0 ] → 0
[index 1 ] → 1
[index 2 ] → 1
[index 3 ] → 2
[index 4 ] → 3
[index 5 ] → 5
[index 6 ] → 8
[index 7 ] → 13
[index 8 ] → 21
[index 9 ] → 34
[index 10] → 55

So, after fib_with_memo(10), the vector contains 11 elements, and the element at index n—or, vec[n]—is the nth Fibonacci number.

Simple, right?

So how do you tell whether a number is already memoized? Again, simple, just check the vector’s size. If we call fib_with_memo(13), the vector must have at least 14 elements in it for vec[13] to be valid. So:

if (vec.size() > n)
    return vec[n];
else
    // calculate the nth fibonacci number

In fact, the implementation becomes trivial, too. That’s because if the function is called with an unmemoized n, all you need to do is resize the vector to n + 1, and fill in all the numbers between the old size and the new size. Something like:

auto fib_with_memo(int n) {
    // are you sure you don't want to check for n < 0?

    // trivial solutions, don't even bother diving into the calculation
    if (n < 2)
        return big_t(n);

    static auto vec = std::vector<big_t>{0, 1}; // start with fib(0) and fib(1)

    // note that if you really want, you could start with more than just
    // 0 and 1 - after testing your real-world use case, you could determine
    // that you will need, say, up to fib(8), so you could precompute that
    // much and put it in the vector initializer list above
    //
    // if you make the non-memoized fib function constexpr, then you could
    // even be *really* clever, and do something like:
    //      static auto vec = std::vector{fib(0), fib(1), fib(2), fib(3), ...
    // for as many precomputed values as you like.
    //
    // and then you could be *REALLY* clever (if you’re using a more modern
    // version of c++), and do something like:
    //      static auto vec = [](auto initial_size) {
    //          auto v = std::vector<big_t>(std::min(2, initial_size + 1));
    //          v[0] = 0;
    //          v[1] = 1;
    //
    //          for (auto p = std::next(v.begin(), 2); p != v.end(); ++p)
    //              *p = *(p - 1) + *(p - 2);
    //
    //          return v;
    //      }(4); // 4 to calculate fib(0) though fib(4), but you can use any number
    // but that's getting *really* fancy.

    // memoized, so use cached value
    if (vec.size() > n)
        return vec[n];

    // not memoized, so calculate all the values between what's been cached,
    // and what's desired

    auto old_size = vec.size();
    vec.resize(n + 1);

    // now fill in everything from vec[old_size] to vec[n] inclusive,
    // where vec[i] = vec[i - 1] + vec[i - 2]

    return vec[n];
}

Obviously I haven’t profiled this design, but I’d bet it is hundreds, if not thousands of times faster than a similar design using a hash map.

Questions

Difference between a macro and a type alias

When you do:

#define type long double

you’re talking to the preprocessor here… not the compiler.

The preprocessor is stupid (as in, unintelligent), it doesn’t understand C++ (often the same preprocessor is used for C, C++, and sometimes even other languages, like Objective-C, and sometimes Fortran, etc.). It just deals with meaningless tokens. In this case, it just does a blind textual substitution. Everywhere type appears in the code, it just gets literally replaced with long double—literally as if you manually did copy-paste—with no regard as to whether that makes sense.

To see where that causes problems, consider the following code:

struct thing
{
    std::string name;
    std::string type;
};

type value =
    type(0);

Everywhere type appears gets replaced with long double, so…:

struct thing
{
    std::string name;
    std::string long double; // compile error
};

long double value =
    long double(0); // compile error

Note that it didn’t just replace type when it is actually naming a type—which can still cause syntax errors, as it does for the second error above—it also replaced type when it was the name of a variable in the class. That’s why the preprocessor is so dangerous. It’s dumb. It’s indiscriminate. It blindly replaces everything, even when it makes no sense.

Now, when you do:

using type = long double;

you’re talking to the compiler here, not the preprocessor, and that makes all the difference.

This creates a type alias that the compiler understands. The compiler understands C++. The compiler doesn’t just blindly replace text, it knows when type is actually naming a type (and it’s not a variable name, for example), and even knows which type it’s naming (because namespace_1::type may not be the same as namesapce_2::type), and it does the replacement intelligently.

So when the compiler sees the original code:

struct thing
{
    std::string name;
    std::string type;
};

type value =
    type(0);

it will not replace the type in class thing, because that’s not a type, it’s a variable name. And it will replace the type in the last two lines… but not simply by literally replacing the text (which would break the cast on the last line). Instead, it will do the replacement intelligently. value will be a long double, and the last line is casting a 0 (an int) to a long double… everything just works.

In summary, the preprocessor is dumb. It just does blind text replacement, with no regard for what that text actually means… which often results in breaking code. By contrast, the compiler actually understands C++, so it knows when something is actually a type and when it’s not—and even which type it is—so it can do type substitutions intelligently, and not break code.

Refactored code

The standard on CodeReview is not to re-review code in the same question, but rather to post a new review request with the new code. So I can’t re-review your refactored code here. BUT! What I can do is give you suggestions to help you in your refactoring, and in general.

The number one most important thing you can do when coding, not just in this case, but always, is WRITE TESTS. This is the absolute most important skill as a programmer you could ever develop. If I were hiring, and I had two candidates, and one wrote tests and the other didn’t… I wouldn’t even look at the second, I wouldn’t even give them a shot, regardless of how “good” a coder they might be. A coder that doesn’t write tests would be worse than useless to me… yes worse than useless, because they’d actually be a net maintenance burden. I’d literally rather hire nobody than hire someone who doesn’t write tests.

Even if you’re just doing hobby programming, you should get in the habit of writing tests. I swear it will change your life. It makes everything so much easier. And you no longer have to wonder “does my code work?” You’ll just know. And if someone points out a bug you missed… no problem, just add a new test case that catches the bug, then fix it, and now you’re even more sure your code works.

Testing is so important, that I even recommend writing the tests before you write the first line of “real” code.

The first thing you should do is look into a proper testing system. I use Boost.Test but… it’s extremely heavyweight. (I mean, it’s awesome—by far the most powerful testing system—but, yeah, it’s not trivial to set up.) You might like Catch2. GoogleTest is another option but… meh.

The easiest way to set up Catch2 (other than using your system package manager, if you’re on Linux) is to download the two files mentioned at the start of the tutorial—catch_amalgamated.hpp and catch_amalgamated.cpp—and just put them in the same directory as your project. In that directory you should have (at least) fibonacci.hpp, fibonacci.cpp, fibonacci.test.cpp, catch_amalgamated.hpp, and catch_amalgamated.cpp:

fibonacci.hpp: your header
fibonacci.cpp: your implementation
fibonacci.test.cpp: your tests
catch_amalgamated.hpp: the Catch2 header
catch_amalgamated.cpp: the Catch2 code

and when you compile, you need to make sure to also compile the Catch2 code:

# only need to do this one time, which is good, because it will be slow
$ g++ --std=c++17 --pedantic -Wall -Wextra -c catch_amalgamated.cpp

# should do these every time
$ g++ --std=c++17 --pedantic -Wall -Wextra -c fibonacci.test.cpp
$ g++ --std=c++17 --pedantic -Wall -Wextra -c fibonacci.cpp
$ g++ --std=c++17 --pedantic -Wall -Wextra -o fibonacci.test fibonacci.test.o fibonacci.o catch_amalgamated.o
$ ./fibonacci.test

Or, better yet, put all those commands in a shell script or makefile. Also, you could use --std=c++11 or --std=c++14 or even --std=c++20, depending on what version of C++ you want to use.

Here’s a simple build script you could use:

#!/bin/sh

# Stop immediately if something fails.
set -e

# Only recompile Catch2 if we have to.
if [ ! -f catch_amalgamated.o ]
then
    printf '%s\n' 'g++ --std=c++17 --pedantic -Wall -Wextra -c catch_amalgamated.cpp'
    g++ --std=c++17 --pedantic -Wall -Wextra -c catch_amalgamated.cpp
fi

# Compile everything else.
printf '%s\n' 'g++ --std=c++17 --pedantic -Wall -Wextra -c fibonacci.test.cpp'
g++ --std=c++17 --pedantic -Wall -Wextra -c fibonacci.test.cpp

printf '%s\n' 'g++ --std=c++17 --pedantic -Wall -Wextra -c fibonacci.cpp'
g++ --std=c++17 --pedantic -Wall -Wextra -c fibonacci.cpp

# Link.
printf '%s\n' 'g++ --std=c++17 --pedantic -Wall -Wextra -o fibonacci.test fibonacci.test.o fibonacci.o catch_amalgamated.o'
g++ --std=c++17 --pedantic -Wall -Wextra -o fibonacci.test fibonacci.test.o fibonacci.o catch_amalgamated.o

Put that in build.sh, make it executable, and you can just do:

$ ./build.sh
$ ./fibonacci.test

That will speed up your workflow, and anything that speeds up your workflow is a good thing. (Even better than a build script would be a proper makefile, but that’s too complicated for here.)

The test file doesn’t need to be complex:

// fibonacci.test.cpp

#define CATCH_CONFIG_MAIN
#include "catch_amalgamated.hpp"

#include <random>

#include "fibonacci.hpp"

// You should always test important edge cases:

TEST_CASE("Fibonacci(0) memoized is 0")
{
    CHECK(your_ns::fib_with_memo(0) == 0);
}

TEST_CASE("Fibonacci(1) memoized is 1")
{
    CHECK(your_ns::fib_with_memo(1) == 1);
}

// Doesn't hurt to test a few random cases:

TEST_CASE("Fibonacci(10) memoized is 55")
{
    CHECK(your_ns::fib_with_memo(10) == 55);
}

TEST_CASE("Fibonacci(20) memoized is 6765")
{
    CHECK(your_ns::fib_with_memo(20) == 6765);
}

// Test the *property* of Fibonacci numbers:

TEST_CASE("Fibonacci(n) memoized is Fibonacci(n - 1) + Fibonacci(n - 2)")
{
    constexpr auto lower_limit = 5;
    constexpr auto upper_limit = 50; // or maybe as high as 90-ish, if you like

    constexpr auto number_of_trials = 3;

    auto rng = std::mt19937{std::random_device{}()};
    auto dist = std::uniform_int_distribution<int>{lower_limit, upper_limit};

    for (auto trial = 1; trial <= number_of_trials; ++trial)
    {
        // random number between lower and upper limit, inclusive
        auto const n = dist(rng);

        DYNAMIC_SECTION("n = " << n)
        {
            auto const fib_n = your_ns::fib_with_memo(n);
            auto const fib_n_minus_1 = your_ns::fib_with_memo(n - 1);
            auto const fib_n_minus_2 = your_ns::fib_with_memo(n - 2);

            INFO("fib(n) = " << fib_n);
            INFO("fib(n - 1) = " << fib_n_minus_1);
            INFO("fib(n - 2) = " << fib_n_minus_2);

            CHECK(fib_n == (fib_n_minus_1 + fib_n_minus_2));
        }
    }
}

Test cases should be as simple as possible. Indeed, the last case is pretty complex for a test case in my opinion, but most of it is just generating random values for n and recording the intermediate results so you can see exactly where things went wrong.

At first this shouldn’t compile, because fib_with_memo() shouldn’t exist yet (assuming you’re writing the tests first, which you normally should). So you just write some placeholders:

// fibonacci.hpp

#ifndef INCLUDE_GUARD_fibonacci
#define INCLUDE_GUARD_fibonacci

namespace your_ns {

using big_t = unsigned long long int;

big_t fib_with_memo(int);

} // namespace your_ns

#endif // include guard

// fibonacci.cpp

#include "fibonacci.hpp"

namespace your_ns {

big_t fib_with_memo(int)
{
    return {};
}

} // namespace your_ns

Now it should compile and run… and generate a bunch of test case failures. But that’s good! A failing test case means that at least something is being checked, and it probably means that a bug was found.

I got 2 cases passed, and 3 failures. The 2 passes are because fib_with_memo(0) returns 0… which by fluke happens to be correct, and because since fib_with_memo(anything) returns 0, that means fib_with_memo(anything) == fib_with_memo(anything - 1) + fib_with_memo(anything - 2) is always true. This happens sometimes as you’re developing; broken code happens to return the “right” answer… or at least an answer that “works”. Other tests should catch that problem, though.

So now you start to develop your Fibonacci algorithm. Let’s say as a first step, you do:

big_t fib_with_memo(int n)
{
    if (n < 2)
        return n;

    return {};
}

Compile and run… now 3 tests pass, and 2 fail. Progress!

This is how you do it. Take tiny steps, and check as you go. It may sound tedious, but once you get in the habit, it becomes automatic.

So the next thing I’ve done is just copy-pasted your refactored code into fib_with_memo() (I also needed to #include <vector>, and I needed to rename big to big_t because I didn’t notice I’d got the name wrong), and…

ALL TESTS PASSED!

Congratulations! 👍 Check it in!

So your refactored code works… but… can it be improved? Well, that’s a question for another code review. Catch2 also supports benchmarking, so you can even do performance checks in your testing code, and as you make changes, you can see if things speed up or not.

But the point is, if you adopt this style of development—testing, testing, testing, testing—you will write better code, quicker, and when it comes time to refactor, you can do so with confidence because if you “break” anything, the tests will catch it immediately. It’s such a nice way of coding that once you get in the habit, you can’t quit; you feel naked and unsteady when you’re trying to code without tests.

Some other tips:

I suggested putting the memo map (or vector) in the function as a static variable, and that’s definitely what you should do for the final code. However, while developing, you could pull it out and make it a global, so you can inspect what’s in the map/vector directly, and make sure that it’s working properly. It’s okay to break stuff while you’re working on it, once you remember to put things back in order when developing/testing is done.
If you do pull the memo map/vector out and make it a global (for development only), then you can observe its capacity to make sure it’s not wasting space. You can also use a tracing allocator to track allocations, and see if you’re losing efficiency there.

A simple tracing allocator is:

template <typename T>
class tracing_allocator
{
public:
    using value_type = T;

    // not needed in C++17
    using propagate_on_container_move_assignment = std::true_type;

    auto allocate(std::size_t n)
    {
        std::cout << "allocating " << n << " bytes\n";
        return std::allocator<T>{}.allocate(n);
    }

    auto deallocate(T* p, std::size_t n)
    {
        std::cout << "deallocating address " << static_cast<void*>(p) << ", which is " << n << " bytes\n";
        return std::allocator<T>{}.deallocate(p, n);
    }
};

template <typename T1, typename T2>
constexpr auto operator==(tracing_allocator<T1> const&, tracing_allocator<T2> const&) noexcept
{
    return true;
}

// not needed in C++20
template <typename T1, typename T2>
constexpr auto operator!=(tracing_allocator<T1> const&, tracing_allocator<T2> const&) noexcept
{
    return false;
}

And you’d use it by declaring your vector as:

static std::vector<big, tracing_allocator<big>> mem = {0, 1};

And now you will get trace statements printed to standard output, for every allocation (and deallocation). This will help you observe how your memo map/vector is growing, and whether there are unnecessary/overlarge allocations happening.

Wow, thank you. That's all very helpful. I briefly considered using a std::vector, but for whatever reason decided not to implement it that way. I'm obviously very new to C++— is there a significant difference between a preprocessor macro and a typedef, and why is the former better? I think I'll refactor with std::vector and constexpr as you said. — William Bradley, Commented Feb 19, 2021 at 18:50
Here's my refactored code: pastebin.com/WfvTV3va. I do like (and understand) your "extra-cached" solution. (Of course, if I were using this in the real world, I wouldn't use the recursive definition!) Instead of vec.size(), I keep a static int variable around to record the length— since for every recursive function call I'll only be appending one fibonacci value, I just increment it. — William Bradley, Commented Feb 19, 2021 at 19:17
I’ll extend the answer to include answers to the above questions, since it’s hard to answer them in a comment. — indi, Commented Feb 19, 2021 at 22:30
Awesome, thank you so much. I've mostly just written personal-use scripts and Jupyter notebooks in the past, but I'm creating an API (Python) for something right now, and tests are something I definitely feel would be helpful. I'll take all your advice into account for C++ and beyond. — William Bradley, Commented Feb 20, 2021 at 4:40

Davislor · Accepted Answer · 2023-03-22 14:19:48Z

Memoize this with a `std::vector`, not a Hash Table

There could be a use case where you want to store and look up very sparse data in constant time, and a std::unordered_map works for those, but here, you’re always calculating the values in consecutive order, only ever inserting at the end, and the key is just the consecutive index.

A std::vector is something you should look for a way to make your algorithm use if possible. It’s a very efficient data structure. Here, it’s absolutely perfect.

Check for Overflow

You’re adding unsigned numbers, as you should be, so if the sum is less than either operand, you overflowed the width of your type. You should throw an exception when this happens, perhaps std::range_error.

Avoid Non-Tail Recursion

The code

BIG fib_with_memo(int n) {
    if (umap.find(n) == umap.end()) { // if F_n not in cache, calculate, cache, and return it
        BIG val = fib_with_memo(n-1) + fib_with_memo(n-2);
        umap[n] = val;
        return val;
    }

is inefficient because it makes two recursive calls that are not tail-recursive, and therefore cannot be optimized away with tail-call elimination.

Whenever the value you want to compute is not in the map, you know that (if you use simple addition) you actually want to fill all values between the last one you calculated and the one you are looking for, in consecutive order.

So, using a std::vector, you would simply fill in all the missing intermediate values, in a for loop with overflow checking (see below).

I suggest you try benchmarking these two solutions and seeing for yourself how much faster this one is. Accessing consecutive memory locations in linear order is much faster than chasing down pointers and running hash functions.

Initialize Your Data Structures in Their Declaration, not in `main()`.

You probably want to declare the memo table static, either within the only function that uses it, or in file scope in its .cpp file, so no other module in the program references it. this declaration might look like:

static std::vector<BIG> memo_vector = {{1, 1}};

This also has the advantage that it is impossible to use the structure before it is initialized.

There is a similar initialization for a std::unordered_map using key-value pairs, but if you cannot initialize all the structures your module needs this way, there are two workarounds.

First, you could provide an initialization function and require the user to call it before calling any other function in the library.

Second, you could create a static bool variable and set it to true when the library is initialized. The other functions in the library would all check this variable and initialize if necessary.

However you do it, your main function should not be able to see or access the memo table as a global variable, much less need to know the precise commands to initialize it. What if you wanted to change the data structure, but all your existing code has the initialization in their main functions? You’d break all those other programs.

Don’t Use Preprocessor Macros in C++

If you want to make the compiler inline a function, write constexpr if you can, or inline otherwise.

To define a type, use using big = std::size_t; in C++, or typedef size_t big in C.

If you need to use __FILE__ and __LINE__, you now have std::source_location.

Use an Unsigned Key Type, not `int`

An int can have negative values, which you would have to detect at runtime, and is limited to slightly over 2 billion. Computing your unsigned result from the signed input might involve mixing signed and unsigned operands, which is a bug attractor. (For example, 1U < -3.)

In this case, you would overflow an unsigned 64-bit integer after only 93 terms, so you might as well use unsigned int for the input type.

Consider Using the Names in the Standard Library

For example, a good alternative for BIG is std::uint_fast64_t from <cstdint>. That’s a bit longer to type, but everyone will know what it is.

If you do declare your own type alias, a common convention in C++ is to name types that are not classes with _t. So, big_t would work. So might BigInt, but programmers are more likely to expect that to be an arbitrary-precision integer, which this isn’t.

Putting it All Together

The Fibonacci function:

#include <cstdint>
#include <cstdlib>
#include <stdexcept>
#include <vector>

using big_t = std::uint_least64_t;

/* Returns the nth Fibonacci number, using memoization.
 */
big_t fibo_memo(const unsigned n)
{
    // Will grow from here:
    static std::vector<big_t> memo_vector = {{1, 1}};
    const auto num_memoized = memo_vector.size();

    if (n==0) {
        return 0;
    }

    if (n > num_memoized) { // If n-1 is not in the table, fill.
        memo_vector.resize(n);

        for (size_t i = num_memoized; i < n; ++i) {
            const auto a = memo_vector[i-2];
            const auto b = memo_vector[i-1];

            if (a+b < a) {
                /* Without this, memo-vector will be padded with spurious
                 * results, which could poison future queries if the exception
                 * is caught:
                 */
                memo_vector.resize(i);
                throw std::range_error("Fibonacci number overflowed.");
            }

            memo_vector[i] = a+b;
        } // end for
    } // end if

    // memo_vector now contains (at least) all numbers below n.
    return memo_vector[n-1];
}

Some framework for testing:

// Test apparatus:
#include <concepts>
#include <iostream>
#include <source_location>

template<typename T, std::equality_comparable_with<T> U>
void expect_test( const T& got,
                  const U& expected,
                  const std::source_location location =
                        std::source_location::current() )
{
  if (expected == got) {
    std::cout << "Test at "
              << location.function_name() << " ("
              << location.file_name() << ':'
              << location.line() << ':'
              << location.column() << ')'
              << " passed.\n";
    std::cout.flush();
  } else {
    std::cout.flush();
    // Flush stderr here if necessary.
    std::cerr << "Test at "
              << location.function_name() << " ("
              << location.file_name() << ':'
              << location.line() << ':'
              << location.column() << ')'
              << " FAILED!\n"
              << "Expected: " << expected << '\n'
              << "Got: " << got << '\n';
    std::exit(EXIT_FAILURE);
  }
}

And a test driver:

int main()
{
    expect_test(fibo_memo(1), 1ULL);
    expect_test(fibo_memo(2), 1ULL);
    expect_test(fibo_memo(3), 2ULL);
    expect_test(fibo_memo(4), 3ULL);
    expect_test(fibo_memo(93), 12200160415121876738ULL);

    try {
        fibo_memo(94);
    } catch(std::range_error) {
        std::cout << "Out-of-range value failed successfully.\n";
    }

    try {
        fibo_memo(94);
    } catch(std::range_error) {
        std::cout << "Out-of-range value failed successfully, again.\n";
    }

    return EXIT_SUCCESS;
}

You can try it on Godbolt.

Clang 16.0.0 with -std=c++20 -march=x86-64-v3 -O3 compiles the loop that fills memo_vector to:

.LBB0_8:                                # =>This Inner Loop Header: Depth=1
        mov     rcx, qword ptr [rax + 8*r14 - 8]
        add     rcx, qword ptr [rax + 8*r14 - 16]
        jb      .LBB0_9
        mov     qword ptr [rax + 8*r14], rcx
        inc     r14
        cmp     r15, r14
        jne     .LBB0_8

Which is short and fast. I recommend comparing it to what the compiler has to do to fill the memo table when it is a std::unordered_map.

Note, however, that when computing constants such as these, memoization can be pessimizing, compared to a constexpr function that the compiler can fold into a constant at compile time.

Note that the _t convention can conflict with POSIX, if you define such names in the global namespace. — Toby Speight, Commented Mar 22, 2023 at 16:42
@TobySpeight It being a common convention means other libraries follow it too, yes. Wrapping your own libraries in a namespace is a good solution, as you know. — Davislor, Commented Mar 22, 2023 at 16:48

Stack Exchange Network

C++17 Recursive Fibonacci calculation with memoization

2 Answers 2

An alternative design

Questions

Difference between a macro and a type alias

Refactored code

Memoize this with a `std::vector`, not a Hash Table

Check for Overflow

Avoid Non-Tail Recursion

Initialize Your Data Structures in Their Declaration, not in `main()`.

Don’t Use Preprocessor Macros in C++

Use an Unsigned Key Type, not `int`

Consider Using the Names in the Standard Library

Putting it All Together

Hot Network Questions

C++17 Recursive Fibonacci calculation with memoization

2 Answers 2

An alternative design

Questions

Difference between a macro and a type alias

Refactored code

Memoize this with a std::vector, not a Hash Table

Check for Overflow

Avoid Non-Tail Recursion

Initialize Your Data Structures in Their Declaration, not in main().

Don’t Use Preprocessor Macros in C++

Use an Unsigned Key Type, not int

Consider Using the Names in the Standard Library

Putting it All Together

Related

Hot Network Questions

Memoize this with a `std::vector`, not a Hash Table

Initialize Your Data Structures in Their Declaration, not in `main()`.

Use an Unsigned Key Type, not `int`