22

I wanted to index substrings of a std::string whose lifetime lasts for the entire program, and relate each substring to an int using a good old std::map. The substrings are delimited by blank spaces ' '. I generated the string and stored it to a file using this python code

import random
all_chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ 123456789!\"·$%&/()=?¿¡'|@#¬"
for i in range(0, 50000000):
    r = int(random.random() * len(all_chars))
    print(all_chars[r], end = '')

So, I wanted to compare the efficiency to split a std::string between using a std::string and std::string_view to store the substrings via these two (simple) functions.

[[nodiscard]]
std::vector<std::string_view> split_sv(const std::string& s, char c) noexcept {
    std::vector<std::string_view> res;
    std::size_t b = 0;
    std::size_t p = s.find(c);
    while (p != std::string_view::npos) {
        res.push_back( std::string_view{&s[b], p - b} );
        b = p + 1;
        p = s.find(c, b);
    }
    return res;
}

[[nodiscard]]
std::vector<std::string> split_s(const std::string& s, char c) noexcept {
    std::vector<std::string> res;
    std::size_t b = 0;
    std::size_t p = s.find(c);
    while (p != std::string::npos) {
        res.push_back( s.substr(b, i - b) );
        b = p + 1;
        p = s.find(c, b);
    }
    return res;
}

The split algorithm is clearly faster for std::string_view than it is for std::string (as expected):

std::string_view  28.6 ms
std::string       107.6 ms

(flags -std=c++20 -O3, gcc 11.4.0)

However, storing these substrings into a std::map is slower for std::string_view than for std::string.

{
    const std::vector<std::string_view> tokens = split(s, ' ');
    std::map<std::string_view, int> m;
    for (const std::string_view& t : tokens) {
        m.insert( {t, 0} );
    }
}
{
    const std::vector<std::string> tokens = split(s, ' ');
    std::map<std::string, int> m;
    for (const std::string& t : tokens) {
        m.insert( {t, 0} );
    }
}

The execution time of only the for loops above is:

std::string_view  445.9 ms
std::string       411.7 ms

(flags -std=c++20 -O3, gcc 11.4.0)

The difference is not that large, but I expected the results for std::string_view to be faster in both portions of the program.

Why is std::map<std::string, int> faster in gcc 11.4.0? Is std::map<std::string, int> generally faster than std::map<std::string_view, int>?

Full C++ code in compiler explorer and also in quick-bench.

I've run the code with 3 different string seeds each with a different number of blank spaces (NBS): one with just one single blank space, one with 10 blank spaces, and another with 20 blank spaces. The table below shows the execution time (in seconds) for every different seed string. I thought it would be interesting since the quick-bench results do not agree with this.

NBS Number tokens/
Average split
length (chars)
Function map

string_view
map

string
unordered_map

string_view
unordered_map

string
1 617864 / 83.97 split 0.010 0.153 0.023 0.025
store 0.441 0.408 0.106 0.167
10 4988259 / 9.35 split 0.594 0.883 0.469 0.206
store 4.430 3.806 1.361 1.470
20 8062256 / 5.20 split 0.696 0.981 0.559 0.298
store 6.568 5.590 1.820 1.892
21
  • 1
    It will not answer the question but both string and string_view has a find member function. Use that. Commented Sep 16, 2024 at 10:15
  • 1
    for (std::size_t i = 0; i < s.size(); ++i) { if (s[i] == c) { ... - what is that if not searching for c in s? Commented Sep 16, 2024 at 10:30
  • 2
    Fwiw, I was able to reproduce your surprising findings: quick-bench.com/q/Z23mIl7SgvMPLweRKuJB-nwuBiY Commented Sep 16, 2024 at 10:31
  • 1
    Thank you for the link to quick-bench. I really appreciate it :) Commented Sep 16, 2024 at 10:33
  • 3
    @TedLyngmo A quirk of the std::map implementation? With libc++, both are basically equal: https://quick-bench.com/q/jER1HpK70bIJ728Hh-7mJwgPwL0 Commented Sep 16, 2024 at 10:41

2 Answers 2

24

This is totally related to cache locality.

When you store data as std::strings, the only necessary data will be placed locally in heap for the very limited number of strings you actually use. (For smaller strings SSO (Small String Optimization) also comes to play and these strings would be placed even "more locally"). This is quite cache-friendly.

When you work with the tree in the 'std::map' most of these strings will be addressed frequently which will increase cache usage.

When you use std::string_view every reference to the actual data has to be done in a huge array (your initial huge string) and will be sparse. Since your array is large and doesn't fit into caches, intensive random jumping across large memory array will lead to inefficient cache usage.

Some details on the cause

Cache is usually loaded in lines (let’s say 64 bytes) and with values stored with high locality the chance to get many useful values with one load is higher. On the opposite, with lesser locality you will need to load more pages which will eventually lead to Thrashing and cache performance degradation.

On other causes and comparisons

Regarding other performance comparisons and concerns in the original question. I would suggest opening new questions on these specific topics, as SO recommends, since they are related to very subtle and specific performance effects and require additional measurements and research which would degrade the quality of current topic. Nevertheless, here is some input as the author requested for further research.

The unordered_map require on average less reads from memory; while std::map require on average O(log n) reads, the unordered_map requires O(1) reads (even if you have some elements in a bulk of items for the same hash code), so effect of sparse memory addressing decreases dramatically. On the other hand, it comes with the price for hash codes calculation (low) and extra memory for hash table and reads from it (not so high). So, you have at least three opposite influencers here, but overall in most cases now the effect of sparse memory addressing is quite low and it comes to get benefits from std::string_view way to avoid excessive memory allocations and object construction of 'std::string'. On the other hand, for small strings SSO saves 'std::string' here. So, this is a balance of at least five influencers which comes to play in this case.

Implementations from different libraries and compilers could differ in many terms, including approach, source code implementation, advanced CPU instructions usage, cache utilization, etc., so they definitely worth putting in separate questions with lots of research. It is not enough to measure only on one type of CPU/RAM, etc. to say that something of such matter is faster than something else; it could be much slower on another architecture. And it would be nice to separate measurements with different numbers of tokens and average length, since they would affect different influencers and in order to have full picture, they must be separated; performance would depend differently from both. Again, these effects are too subtle and too specific to cover in one question.

Sign up to request clarification or add additional context in comments.

7 Comments

Agreed - the small size of the difference suggests that only a small proportion of the strings are "small", but enough to be measurable. Someone who remembers their A Level Statistics could probably calculate the expected proportion - or OP could measure it from the file.
Cache locality is such an underappreciated aspect of micro benchmarking.
@TobySpeight and everybody else, I've added a table with the average length of the substrings.
I've accepted your answer since other people seem to agree with you and since this confirms the suspicions I had before posting. But for your answer to be perfect, it should explain why unordered_map is faster (better locality?). And maybe also explain why unordered_map is faster when using string_view than when using string, and also why libc++ seems better for map<string_view than libstdc++. Anyway, thank you for your efforts
@LluísAlemany-Puig, I've updated the answer. Hope now it covers everything you are interested in. If you are interested in further research on specific cases, you know where to start and you know the "underwater stones" which impact the performance here, so you can do the homework, preparing more data. If you like, you can open new specific questions and put here the comments with the links, so I could help with them.
|
-2

I think what's happening here is that the split algorithm is faster with string_view because substring calls are allocations. However SV is marginally slower when constructing your map due to the indirection aspect of the SV. Hope this is helpful.

1 Comment

split is called outside of measurement loop (I'm not down-voter). Question was about std::map::insert!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.