Since you’re primarily interested in efficiency, that’s what I’m going to focus on in my review.
However, first I do need to point out that int32_t main()
is not legal C++. I’m mildly surprised it even compiles. There are only two legal forms of main()
:
int main()
; and
int main(int, char*[])
.
I should also mention that a message queue is probably better done as a class, with the mutex and other implementation details as private data members, rather than as global variables, and details like notifying the condition variable handled automatically. But that isn’t a performance issue, just a design one.
std::list<std::vector<char>> msg_queue;
Using a std::list
for the actual queue is an odd choice. It’s not wrong; it’ll work perfectly fine. And std::list
is not the most performant container, but its inefficiencies will be absolutely dwarfed by all the thread context switching and locking/unlocking.
Normally for a message queue like this you’d use a std::vector
, or maybe a std::deque
. Or sometimes you’d actually use a std::array
(or a std::vector
with a fixed size), and either treat it like a ring buffer—so if there are too many unhandled messages, older messages get lost—or block any new messages until some older messages are handled.
As I said, std::list
isn’t wrong. It’ll work. I just don’t see any benefits to using it, because at least in the example code, you don’t really benefit from iterator stability (which is the primary reason to prefer a std::list
), and you’ll pay a lot more for allocations and cache misses.
char buff[2048];
Is there a reason you use a fixed-size buffer? I don’t see any performance gains from it, and you’ll just lose any messages longer than 2k.
I’ll offer alternative suggestions later.
while(true) {
You have a bug in your program in that the loops are infinite. Infinite loops are UB. (Correction: Infinite loops are UB if and only if they have no observable side effects. These loops have side effects (like printing to standard out, and locking mutexes). However, there is no portable way to end the program, and—more importantly (because it’s the subject of the review)—the message queue. If you tried to put this message queue as-is in an existing program, like the example one, it will never end; it will deadlock while waiting for the thread to join, while the thread loops and waits on the condition variable.)
What you really need is a way for the recv()
function to realize the program’s ending. For example, you could use a message "done" as a signal to wind things up:
// in recv()
while (true)
{
// ... [snip] ...
if (buf == "done")
break;
}
// in main()
for (auto n = 0; n < 10000; ++n) // instead of while (true)
{
// ... [snip] ...
}
// artificial scope
{
auto lock = std::scoped_lock(mu);
msg_queue.emplace_back({'d', 'o', 'n', 'e'});
}
cv.notify_one();
th.join();
It doesn’t need to be a special message to signal the end, but if you use something else, like a flag, then you might need to be sure you finish handling all the messages before ending the recv()
loop, and you need to consider what happens if the flag is set while the queue is empty (and the condition variable is waiting on it to not be empty).
while(msg_queue.empty())
cv.wait(ul);
Rather than a manual loop like this, it’s cleaner and less likely to cause problems if you use the other form of wait()
that takes a lambda:
cv.wait(ul, [] { return not msg_queue.empty(); });
This removes the temptation for some later code monkey to fool with the loop in some way. Raw loops are a code smell, because you should generally prefer algorithms. They’re fine in the use cases here, but you never know what some future coder might think to do. Also, this reads much clearer, even to someone who doesn’t understand condition variables (who might ask, “why a loop and a call to a wait function?”).
std::vector<char>& queue_msg = msg_queue.front();
if(queue_msg.size() < sizeof(buff)) {
std::memcpy(buff, queue_msg.data(), queue_msg.size());
buff[queue_msg.size()] = '\0';
}
msg_queue.pop_front();
Okay, here’s where the biggest inefficiencies lie. I don’t see a reason—at least in this code—why you copy the message from the message queue into a buffer, then delete the message from the queue. Why not just take the message right from the queue and use it?
In other words, why not do this:
void recv()
{
while (true)
{
std::unique_lock<std::mutex> ul { mu };
cv.wait(ul, [] { return not msg_queue.empty(); });
// *TAKE* the message from the queue. This will be *LIGHTNING* quick;
// possibly *thousands* of times faster than copying the message into a
// buffer.
auto queue_msg = std::move(msg_queue.front());
// Remove it from the list.
msg_queue.pop_front();
// No need to hold the lock anymore.
ul.unlock();
// Now you have the message as queue_msg, and you can do whatever you want
// with it. For example, print it:
std::cout.write(queue_msg.data(), queue_msg.size());
std::cout << '\n';
}
}
But that’s just the tip of the performance iceberg. It’s very wasteful to do all that locking and so on for ONE message at a time. What if there are several messages in the queue? Why not handle them all at once?
For example:
void recv()
{
while (true)
{
std::unique_lock<std::mutex> ul { mu };
cv.wait(ul, [] { return not msg_queue.empty(); });
// Take the ENTIRE QUEUE. Even if there are multiple messages, this could
// still be thousands of times faster than copying a SINGLE message into a
// buffer, and that's not even taking into account the gains you get from
// avoiding multiple locking and unlocking and waiting cycles.
auto messages = std::vector<std::vector<char>>{};
messages.resize(msg_queue.size()); // make space for the messages
std::move(msg_queue.begin(), msg_queue.end(), messages.begin()); // move them all out of the queue
msg_queue.clear(); // clear the queue of the (now empty, moved-from) messages
// No need for the lock anymore.
ul.unlock();
// Now we can handle all the messages at our leisure.
std::for_each(messages.begin(), messages.end(),
[](auto&& message)
{
std::cout.write(message.data(), message.size());
std::cout << '\n';
});
}
}
As an optimization, you could also move messages
out of the loop, and call messages.clear()
at the end of the loop. The advantage of this is that it will mean that messages.resize()
will almost never need to allocate because you can reuse the memory. That means that that entire block of code in the middle of the function above will amount to a few measly pointer copies… which will be FAST… far faster than copying entire strings, even with std::memcpy()
. And that in turn means much less time spent with the mutex locked, which means you should be able to get a MUCH higher throughput for your messages.