I am testing the speed of different speed concatenation methods by concatenating the string representation from 1 to a large number(in my case, 20000000). The three methods that I am testing are:
import cProfile
count = 20000000
def profileWrapper(f):
def wrapper(*args, **argv):
pr = cProfile.Profile()
pr.enable()
string = f(*args, **argv)
pr.create_stats()
pr.print_stats()
return string
return wrapper
@profileWrapper
def naiveConcat():
global count
string = ''
for i in xrange(count):
string += `i`
return string
@profileWrapper
def improvedConcat():
global count
string = []
for i in xrange(count):
string.append(`i`)
return ''.join(string)
@profileWrapper
def fastestConcat():
global count
return ''.join([`i` for i in xrange(count)])
print 15 * "=", "naiveConcat", 15 * "="
naiveConcat()
print 15 * "=", "improvedConcat", 15 * "="
improvedConcat()
print 15 * "=", "fastestConcat", 15 * "="
fastestConcat()
I am expecting to see the improved method to be faster than the naive one, and it shouldn't' be much slower than the fastest method, but the result doesn't seem like that:
=============== naiveConcat ===============
3 function calls in 3.951 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 cProfile.py:90(create_stats)
1 3.951 3.951 3.951 3.951 str_concat.py:18(naiveConcat)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
=============== improvedConcat ===============
20000004 function calls in 6.990 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 cProfile.py:90(create_stats)
1 5.196 5.196 6.990 6.990 str_concat.py:26(improvedConcat)
20000000 1.322 0.000 1.322 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.473 0.473 0.473 0.473 {method 'join' of 'str' objects}
=============== fastestConcat ===============
4 function calls in 3.043 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 cProfile.py:90(create_stats)
1 2.539 2.539 3.043 3.043 str_concat.py:34(fastestConcat)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.504 0.504 0.504 0.504 {method 'join' of 'str' objects}
The improved method turned out to be much slower even than the naive method!
This doesn't make sense since the naive method are creating new binding and copying the previous string along with the concatenated string character by character ON EACH ITERATION, this method should take O(n^2), which should be much slower than the other methods which are O(n).
So what makes the improved method so slow? The only reason that I can think of is the append method, but according to this article, the append method takes O(1), so it's definitely not the reason. Then what takes so long in improvedConcat()? Thank you.
fastestCount, there is no need to use a list comprehension in yourjoin. You can just use the generator expression, i.e. remove the brackets. I believe this will further improve the performance of that function for you.