Namespaces
Variants
Views
Actions

Talk:cpp/string/byte/strlen

From cppreference.com

Should we document the complexity of the function? The C++ Programming Language by Bjarne Stroustrup claims that the complexity is log(N), but isn't it O(N)?

what edition/chapter/page of The C++ Programming Language? --Cubbi (talk) 07:45, 27 July 2017 (PDT)
Hi, it's the 4th edition, chapter 36, page 1038.
I agree, it should state complexity. Even if it's not guaranteed, typical implementations must be O(n) right? It looks like glibc's is optimized to read long words not chars, but still must be O(n) unless the hardware has some magic, right? glibc strlen source. BenFrantzDale (talk) 04:33, 4 March 2021 (PST)
Anyone who knows what what big-O notation is would know that strlen is O(n), I don't think cppreference needs to spell out basic computer science concepts... --Ybab321 (talk) 08:34, 4 March 2021 (PST)
First, Anonymous is correct that that's what it says on page 1038: "Also, strlen() is a log(N) operation, whereas string::size() is a simple read."[1] but that *must* be a typo for O(n).
On the one hand, I agree. On the other hand, there's been a bit of a stir on Hacker News[2][3] due to the surprising fact that glibc's implementation of sscanf being O(strlen(buffer)). Anyone who knows what big-O notation would know that sscanf(data, "%f", &f); for some float f is O(1). But the fact is, it's not. And that mistake has wasted user-years (user-lifetimes?) of time for GTA-Online users, apparently. That leads me to conclude that documentation must highlight the complexity of all algorithms. Apparently the C standard doesn't provide many guarantees, but we can still highlight typical and best-possible performance. To flip the question, should we drop the Complexity section from std::find since clearly it's O(n)? BenFrantzDale (talk) 09:05, 4 March 2021 (PST)

I have no issue without pointing out the surprising time complexity of widely used implementations of functions (or any surprising deficiency for that matter), nor do I have any problem with pointing out the standard specified time complexity of a function. I don't believe strlen falls into either category. --Ybab321 (talk) 10:29, 4 March 2021 (PST)

My feeling is that the sscanf saga emphasizes the importance of explicitly stating the time (and memory and allocation?) complexity of pretty much everything. Some things (particularly from C) don't have any guarantees, but as a consumer of documentation, I feel I should expect std::strlen's docs to say, in addition to the "possible implementation", "The C standard does not provide complexity requirements. Typical implementations are O(std::strlen(buf))." and possibly even add "Typical implementations do not allocate and use O(1) stack space.". If that sort of documentation were universal, programmers would come to expect it and look for it rather than assume that a lack of documentation implies that the behavior is what they'd guess it would be. I'll think about a format for that sort of summary. (My favorite thing about cppreference.com is the consistency of formatting across the documentation!) It would include the required complexity (if any) or the typical complexity (or a cautionary note for things like sscanf) for time, stack, heap, and probably exceptions, locks, and system calls. BenFrantzDale (talk) 04:20, 5 March 2021 (PST)
I personally don't feel the same way; I should clarify that I'm not against the addition of these complexity statements though. I will bring up some potential issues though:
  • This is a lot of work having to look through implementations of (let's say) libstdc++ / glibc, libc++ / llvm-libc, MS STL / [whatever MSVC uses for C]
  • These implementations don't publish when they make changes to complexity (AFAIK), so it will be non viable to keep cppreference in synch with these libraries
  • If they do change complexity, do we talk about the versions that these changes happen (a la revboxes)? If so that it would make sense to go through all of these functions history to see what their complexity has been previously too (more work than any sane person would ever do)
And just to clarify again, I think required complexity should be mentioned and surprising behaviour ought to be noted, but run-of-the-mill statements about the implementations of functions seems excessive and unrealistic to me. --Ybab321 (talk) 07:21, 5 March 2021 (PST)
I agree. Surprising or problematic behavior in popular implementations can be mentioned in the notes when discovered, but it's not practical or maintainable for us to go look at every implementation out there and figure out what the complexity is (which can be quite nontrivial for some of those).

And really, what you want is not big-O complexity, it's "not unreasonably slow or inefficient". An implementation can do an arbitrarily expensive thing (say, spin for a second looking for a counterexample to the Collatz conjecture) without affecting the big-O. T. Canens (talk) 07:52, 5 March 2021 (PST)