7
\$\begingroup\$

Given

void foo(std::vector<std::string>& vec, size_t target, std::mt19937& rng)
{
    std::uniform_int_distribution dist;
    while(vec.size() < target)
    {
        auto x = std::to_string(dist(rng));
        std::cout << x << '\n';
        vec.insert(vec.begin(), x);
    }
}

I want to rewrite this such that it

  1. Doesn't change the semantics, including the values of pseudo-random x
  2. Doesn't insert into the front of vec many times, because that's horrible
  3. Isn't overly complicated language-wise, because it will be read by the original author who isn't an expert in C++
  4. Preferably doesn't do index arithmetic, because that is very error prone

The best I can think of is

void bar(std::vector<std::string>& vec, size_t target, std::mt19937& rng)
{
    if(vec.size() >= target)
        return;

    auto n = target - vec.size();
    std::uniform_int_distribution dist;

    std::vector<std::string> xs;
    xs.reserve(n);
    std::generate_n(std::back_inserter(xs), n, [&]{
        auto x = std::to_string(dist(rng));
        std::cout << x << '\n';
        return x;
    });

    vec.insert(vec.begin(),
               std::make_move_iterator(xs.rbegin()),
               std::make_move_iterator(xs.rend()));
}

But I have to admit, even though I know C++ well, bar looks way worse than foo. It's absolutely horrendous for someone who doesn't. Is there a better way?

\$\endgroup\$
0

5 Answers 5

8
\$\begingroup\$

If you need to insert at one end and read from the other, use a std::dequeue.

If you genuinely need a std::vector, store the array values in reverse order. Add new elements to the end, in constant time, and read them back with a reverse iterator. All the calls to the PRNG happen in the same sequence, so you get the side-effects you wanted.

If you genuinely need a std::vector stored in that exact order in memory, maybe because you’re going to iterate over it for SIMD: build the array in reverse order, as before, then reverse the array in O(N) time.

\$\endgroup\$
3
  • \$\begingroup\$ You might find dequeue to be slower than vector even for tasks like this. Definitely time it in your use case, don’t assume it’s the better choice. \$\endgroup\$ Commented Feb 21 at 13:35
  • 1
    \$\begingroup\$ @CrisLuengo If dequeue has a block implementation, that would depend on the number and size of blocks. Another data structure that would work well for a FIFO is: implement as a vector, add elements to the end, and consume them from the front by incrementing the begin() pointer. Both operations take constant time, so long as you reserve sufficient memory. When you run out of space and need to reallocate, you then move all elements back to the start of the block, reclaiming the storage of the consumed elements. In many cases, you will empty the FIFO queue, allowing you to start over. \$\endgroup\$ Commented Feb 21 at 14:29
  • \$\begingroup\$ While possible, using a different type is a much less local change (vec is passed around a lot) and I hesitate to do that. \$\endgroup\$ Commented Feb 24 at 3:06
5
\$\begingroup\$

So there are three properties you'd ideally have:

  1. A simple syntax to insert things in reverse order into a container.
  2. Have side effects executed in the expected order.
  3. Not have unnecessary overhead, such as repeated single insertions or using intermediate storage.

Currently, you cannot do 2 and 3 at the same time with a std::vector. It would require something like insert_range_reversed() to do this, and I don't see this ever being added to the standard library. So, you have to settle for one of the possible compromises:

  • Use a temporary container. This increases memory usage and adds some CPU overhead, but at least it's all linear complexity. This is your bar().
  • Insert default-constructor values at the start of the vector, then overwrite them in reverse order. This might still be possible without index operations.
  • Use something other than std::vector to store your strings in. std::deque has already been suggested.

Note that depending on the type of value stored in the container, each of these solutions might cause different side effects.

As for the first property: you can always make a nice generic functions that abstracts away what you want.

For a container with std::string, I think inserting default-constructed elements at the start of vec would be the most efficient solution, as that won't have any side effects of itself. A generic function that prepends elements to another container using a generator function could look like this:

template<typename R, typename F>
auto prepend_reversed(R& container, std::size_t count, F&& generator)
{
    container.insert(std::begin(container), count, {});
    auto first = std::rend(container);
    std::advance(first, -count);
    std::generate_n(first, count, std::forward<F>(generator));
}

Then call it like so:

void bar(std::vector<std::string>& vec, size_t target, std::mt19937& rng)
{
    …
    prepend_reversed(vec, n, [&]{
        auto x = std::to_string(dist(rng));
        std::cout << x << '\n';
        return x;
    });
}
\$\endgroup\$
5
  • \$\begingroup\$ This doesn't insert the elements in reverse order, which is a requirement \$\endgroup\$ Commented Feb 21 at 10:10
  • \$\begingroup\$ Oh! I totally missed that. let me think how to handle that... \$\endgroup\$ Commented Feb 21 at 10:12
  • \$\begingroup\$ You can do 2 and 3 at the same time: reverse the array in-place, append, reverse again. This has already been suggested by Davislor. \$\endgroup\$ Commented Feb 21 at 16:39
  • \$\begingroup\$ @CrisLuengo That's likely way more expensive. I would only do that if the value type would not have a default constructor or if it had undesirable side effects. \$\endgroup\$ Commented Feb 21 at 17:07
  • 1
    \$\begingroup\$ Oh, it seems that I hit upon the same method as you in my answer, albeit less neatly. One thing I did differently is to clean up the unoverwritten empty strings if the generator throws. \$\endgroup\$ Commented Feb 22 at 17:27
5
\$\begingroup\$

The initial function can be optimized simply:

  1. Immediately bail out if no work is to be done.
  2. .reserve() the target size.
  3. std::move() the strings into the container.
  4. Reverse container, insert at the end, reverse container.
void foo(std::vector<std::string>& vec, size_t target, std::mt19937& rng)
{
    if (vec.size() >= target)
        return;

    std::uniform_int_distribution dist;
    vec.reserve(target);
    std::reverse(vec.begin(), vec.end());

    while(vec.size() < target)
    {
        auto x = std::to_string(dist(rng));
        std::cout << x << '\n';
        vec.emplace_back(std::move(x));
    }

    std::reverse(vec.begin(), vec.end());
}

I leave fixing the exception behavior (the final reverse) to the interested reader.

\$\endgroup\$
5
  • \$\begingroup\$ You fell into the same trap I did: OP's code actually reverses the order of the elements inserted at the start of the original vector, whereas your solution with std::rotate() does not. \$\endgroup\$ Commented Feb 21 at 22:38
  • \$\begingroup\$ @G.Sliepen Better two reverses. \$\endgroup\$ Commented Feb 21 at 22:42
  • \$\begingroup\$ Yes. But that causes each element to be moved one more time than necessary. \$\endgroup\$ Commented Feb 22 at 9:28
  • 1
    \$\begingroup\$ Well, this solution swaps each two original elements twice, and each two new elements once, for old+new/2 swaps and new move-constructions. Add old move-constructions for reallocation if needed. Yours moves each original element and each new element once, for old+new moves and new copy constructions, regardless of reallocation. Without reallocation I don't think it is actually worse, with reallocation depends... A move is generally more expensive than a straight swap, or move-construction. \$\endgroup\$ Commented Feb 22 at 16:02
  • \$\begingroup\$ Your comment on the cost is very helpful, because it isn't obvious that two reverses aren't expensive, especially that moves are more (not less!) expensive then swaps. \$\endgroup\$ Commented Feb 24 at 3:25
5
\$\begingroup\$

Both functions have poor names. Both assume some prelude, which should have been included in the review:

#include <iostream>
#include <random>
#include <string>
#include <vector>

using std::size_t;

Personally, I'd remove that using, and just write the type in full, like the other standard library types.


Both functions are pretty inflexible about the exact types of container and random number generator. That's probably okay for now, until there's a proven need in the application to use other types.


foo() is somewhat inefficient because each insert() moves every existing element (even though a good compiler can deduce that can be implemented with the equivalent of std::memmove(), we can do it a single time as bar() shows). What makes this worse is that although we know the eventual size we'll reach, we don't reserve() before we start.

bar() is inefficient because we need to allocate memory for xs and have both vectors occupying memory simultaneously.


bar() is not equivalent to foo() because if any of the output operations throws, then the vector does not contain the results of the successful invocations whose side-effects have occurred. To fix this, we need a catch that inserts the partial results:

    vec.reserve(target);  // so that vec.insert() doesn't fail

    std::vector<std::string> xs;
    xs.reserve(n);
    try {
        std::generate_n(std::back_inserter(xs), n, [&dist, &rng]{
            auto const x = std::to_string(dist(rng));
            std::cout << x << '\n';
            return x;
        });
    } catch (...) {
        vec.insert(vec.begin(),
                   std::make_move_iterator(xs.rbegin()),
                   std::make_move_iterator(xs.rend()));
        throw;
    }

    vec.insert(vec.begin(),
               std::make_move_iterator(xs.rbegin()),
               std::make_move_iterator(xs.rend()));

If performance is important, then a better implementation would insert a bunch of empty strings at the start, shifting up the elements just once (this is likely more efficient than reversing the vector, as the string objects can be memmoved as one), and overwrite the empty strings. In the event of an exception, we'll need to move the content back appropriately.

That would look like this:

#include <iterator>

void prepend_randoms(std::vector<std::string>& vec, std::size_t target, std::mt19937& rng)
{
    if (vec.size() >= target) {
        // nothing to do
        return;
    }

    auto dist = std::uniform_int_distribution{};

    // Add empty elements to start
    vec.reserve(target);
    auto const n = target - vec.size();
    vec.insert(vec.begin(), n, {});

    for (auto it = vec.rend() - n;  it != vec.rend();  ++it) {
        try {
            auto const x = std::to_string(dist(rng));
            std::cout << x << '\n';
            *it = x;
        } catch (...) {
            vec.erase(vec.begin(), it.base());
            throw;
        }
    }
}

This is still not as simple and readable as the original, but it minimises computational complexity, requires no extra storage, and retains the same semantics even in the face of exceptions.

\$\endgroup\$
2
  • 1
    \$\begingroup\$ The exception issue is very true and skipped my mind, I definitely have to revisit it. I question whether reserve is correct though. You are overriding the std::vector reallocating algorithm even though the library has all the information it needs upon reallocation, i.e. it knows that insert needs space for n more elements. Typically reserve makes sense only when the library doesn't have the information, e.g. repeated push_back. \$\endgroup\$ Commented Feb 24 at 3:14
  • \$\begingroup\$ @友人A, good point about reserve() - it's helpful in foo() but redundant in my final code. Thanks for pointing that out. \$\endgroup\$ Commented Feb 24 at 7:20
4
\$\begingroup\$

I don't think you can get much better than bar. What you can do is split it apart, so that at the top level you have meaningful names.

std::vector<std::string> generate_random_strings(size_t count, std::mt19937& rng)
{
    std::vector<std::string> result;
    result.reserve(count);
    std::generate_n(std::back_inserter(result), count, [&]{
        return std::to_string(dist(rng));
    });
    return result;
}

void print_strings(const std::vector<std::string>& strings)
{
    for (auto& str : strings)
    {
         std::cout << str << '\n';
    }
}

void prepend_reversed(std::vector<std::string>& to, std::vector<std::string> from)
{
    to.insert(to.begin(),
              std::make_move_iterator(from.rbegin()),
              std::make_move_iterator(from.rend()));
}

void bar(std::vector<std::string>& vec, size_t target, std::mt19937& rng)
{
    if(vec.size() >= target)
        return;

    auto xs = generate_random_strings(target - vec.size(), rng);
    print_strings(xs);
    prepend_reversed(vec, std::move(xs));
}
\$\endgroup\$
2
  • \$\begingroup\$ I think you can do better than bar() - see my answer for avoiding creation of the vector of results, and for inserting (only) the successful ones when printing (or string creation) throws. \$\endgroup\$ Commented Feb 22 at 17:20
  • \$\begingroup\$ @TobySpeight better by the posed criteria, which includes being understood by the author of the initial version \$\endgroup\$ Commented Feb 23 at 15:15

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.