I am looking into optimizing an existing Python module that has a lot of byte-copying code like the following going around:
byte0 = data[0]
do_something_with_rest(data[1:])
And sometimes:
first_bytes = data[:4]
last_bytes = data[-4:]
do_something_with_rest(data[4:-4])
I wanted to try and avoid the many memory copies I imagine happening under the hood. To this end, I have written the BaseStrRange
class below to do lazy slicing. This will allow the consumer to do lazy slicing without actually slicing it until use.
class BaseStrRange:
def __init__(self, s, start=0, stop=None):
self.s = s
self.l = len(s)
self.start = start
self.stop = stop
def __len__(self):
l = self.l if self.stop is None else self.stop
l -= self.start
return l
def __bool__(self):
return len(self) > 0
def __getitem__(self, i):
if not isinstance(i, slice):
if self.stop is not None and self.start + i >= self.stop:
raise IndexError()
if i < 0:
l = self.l if self.stop is None else self.stop
return self.s[l + i]
return self.s[self.start + i]
if i.step is not None:
raise NotImplementedError()
if i.stop is None:
stop = self.stop
else:
stop = i.stop
if stop < 0:
l = self.l if self.stop is None else self.stop
stop = l + stop
else:
stop = min(self.start + stop, self.stop or self.l)
start = self.start + (i.start or 0)
return BaseStrRange(self.s, start, stop)
def apply(self, value):
return value[self.start:self.stop]
class BytesRange(BaseStrRange):
def __bytes__(self):
return self.s[self.start:self.stop]
To understand it, consider the following:
>>> x = BytesRange(b'0123456789')
>>> bytes(x[1:-1][1:-1]), x[1:-1][1:-1].s
(b'234567', b'0123456789')
That is, the internal bytes
object is never sliced until the BytesRange
is explicitly cast into bytes
. However, this code performs poorly when comparing to string copies. (The following uses VS Code cell mode notation)
# %%
def do_something_with_rest(s):
if len(s) == 0:
return 0
s0 = s[0]
return s0 + do_something_with_rest(s[1:])
# %%
%timeit do_something_with_rest(b'0123456789' * 100)
# %%
%timeit do_something_with_rest(BytesRange(b'0123456789' * 100))
The pure bytes
copying object gives:
1e+03 µs ± 13.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
while BytesRange
fares worse:
4.78 ms ± 72.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
I assume that even if I re-write the entire module, I will eventually have to do the same in-Python math that BytesRange
does thus hurting performance instead of improving it.
My question: is there a way to beat the memory-copying code?
b'0123456789' * 1000
orb'0123456789' * 10000
?memoryview
...