0

I am looking into optimizing an existing Python module that has a lot of byte-copying code like the following going around:

byte0 = data[0]
do_something_with_rest(data[1:])

And sometimes:

first_bytes = data[:4]
last_bytes = data[-4:]
do_something_with_rest(data[4:-4])

I wanted to try and avoid the many memory copies I imagine happening under the hood. To this end, I have written the BaseStrRange class below to do lazy slicing. This will allow the consumer to do lazy slicing without actually slicing it until use.

class BaseStrRange:
    def __init__(self, s, start=0, stop=None):
        self.s = s
        self.l = len(s)
        self.start = start
        self.stop = stop

    def __len__(self):
        l = self.l if self.stop is None else self.stop
        l -= self.start
        return l

    def __bool__(self):
        return len(self) > 0

    def __getitem__(self, i):
        if not isinstance(i, slice):
            if self.stop is not None and self.start + i >= self.stop:
                raise IndexError()
            if i < 0:
                l = self.l if self.stop is None else self.stop
                return self.s[l + i]

            return self.s[self.start + i]

        if i.step is not None:
            raise NotImplementedError()
        if i.stop is None:
            stop = self.stop
        else:
            stop = i.stop
            if stop < 0:
                l = self.l if self.stop is None else self.stop
                stop = l + stop
            else:
                stop = min(self.start + stop, self.stop or self.l)
        start = self.start + (i.start or 0)
        return BaseStrRange(self.s, start, stop)

    def apply(self, value):
        return value[self.start:self.stop]

class BytesRange(BaseStrRange):
    def __bytes__(self):
        return self.s[self.start:self.stop]

To understand it, consider the following:

>>> x = BytesRange(b'0123456789')
>>> bytes(x[1:-1][1:-1]), x[1:-1][1:-1].s
(b'234567', b'0123456789')

That is, the internal bytes object is never sliced until the BytesRange is explicitly cast into bytes. However, this code performs poorly when comparing to string copies. (The following uses VS Code cell mode notation)

# %%
def do_something_with_rest(s):
    if len(s) == 0:
        return 0
    s0 = s[0]
    return s0 + do_something_with_rest(s[1:])
# %%
%timeit do_something_with_rest(b'0123456789' * 100)
# %%
%timeit do_something_with_rest(BytesRange(b'0123456789' * 100))

The pure bytes copying object gives:

1e+03 µs ± 13.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

while BytesRange fares worse:

4.78 ms ± 72.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

I assume that even if I re-write the entire module, I will eventually have to do the same in-Python math that BytesRange does thus hurting performance instead of improving it.

My question: is there a way to beat the memory-copying code?

2
  • What times do you get for let's say b'0123456789' * 1000 or b'0123456789' * 10000? Commented Nov 27, 2021 at 1:48
  • you really should use a memoryview... Commented Nov 27, 2021 at 1:54

1 Answer 1

-1

The answer is here. Using Python's memoryview you get rid of copies, which in my use-case resulted in a 5% performance increase.

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.