I wanted to index substrings of a std::string whose lifetime lasts for the entire program, and relate each substring to an int using a good old std::map. The substrings are delimited by blank spaces ' '. I generated the string and stored it to a file using this python code
import random
all_chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ 123456789!\"·$%&/()=?¿¡'|@#¬"
for i in range(0, 50000000):
r = int(random.random() * len(all_chars))
print(all_chars[r], end = '')
So, I wanted to compare the efficiency to split a std::string between using a std::string and std::string_view to store the substrings via these two (simple) functions.
[[nodiscard]]
std::vector<std::string_view> split_sv(const std::string& s, char c) noexcept {
std::vector<std::string_view> res;
std::size_t b = 0;
std::size_t p = s.find(c);
while (p != std::string_view::npos) {
res.push_back( std::string_view{&s[b], p - b} );
b = p + 1;
p = s.find(c, b);
}
return res;
}
[[nodiscard]]
std::vector<std::string> split_s(const std::string& s, char c) noexcept {
std::vector<std::string> res;
std::size_t b = 0;
std::size_t p = s.find(c);
while (p != std::string::npos) {
res.push_back( s.substr(b, i - b) );
b = p + 1;
p = s.find(c, b);
}
return res;
}
The split algorithm is clearly faster for std::string_view than it is for std::string (as expected):
std::string_view 28.6 ms
std::string 107.6 ms
(flags -std=c++20 -O3, gcc 11.4.0)
However, storing these substrings into a std::map is slower for std::string_view than for std::string.
{
const std::vector<std::string_view> tokens = split(s, ' ');
std::map<std::string_view, int> m;
for (const std::string_view& t : tokens) {
m.insert( {t, 0} );
}
}
{
const std::vector<std::string> tokens = split(s, ' ');
std::map<std::string, int> m;
for (const std::string& t : tokens) {
m.insert( {t, 0} );
}
}
The execution time of only the for loops above is:
std::string_view 445.9 ms
std::string 411.7 ms
(flags -std=c++20 -O3, gcc 11.4.0)
The difference is not that large, but I expected the results for std::string_view to be faster in both portions of the program.
Why is std::map<std::string, int> faster in gcc 11.4.0? Is std::map<std::string, int> generally faster than std::map<std::string_view, int>?
Full C++ code in compiler explorer and also in quick-bench.
I've run the code with 3 different string seeds each with a different number of blank spaces (NBS): one with just one single blank space, one with 10 blank spaces, and another with 20 blank spaces. The table below shows the execution time (in seconds) for every different seed string. I thought it would be interesting since the quick-bench results do not agree with this.
| NBS | Number tokens/ Average split length (chars) |
Function | mapstring_view |
mapstring |
unordered_mapstring_view |
unordered_mapstring |
|---|---|---|---|---|---|---|
| 1 | 617864 / 83.97 | split | 0.010 | 0.153 | 0.023 | 0.025 |
| store | 0.441 | 0.408 | 0.106 | 0.167 | ||
| 10 | 4988259 / 9.35 | split | 0.594 | 0.883 | 0.469 | 0.206 |
| store | 4.430 | 3.806 | 1.361 | 1.470 | ||
| 20 | 8062256 / 5.20 | split | 0.696 | 0.981 | 0.559 | 0.298 |
| store | 6.568 | 5.590 | 1.820 | 1.892 |
stringandstring_viewhas afindmember function. Use that.for (std::size_t i = 0; i < s.size(); ++i) { if (s[i] == c) { ...- what is that if not searching forcins?std::mapimplementation? Withlibc++, both are basically equal: https://quick-bench.com/q/jER1HpK70bIJ728Hh-7mJwgPwL0