The most efficient way to store a very large 2D array in Python/MicroPython

Question

I have a project in an embedded system (NodeMCU running MicroPython), where I need to store a very large array of variables, which have values of either 0 or 1. I need to be able to read/write them individually or via loops in a convenient way. For this example, I am filling the array with random integers between 0 and 1:

N = 50
table = [[randInt(0,1) for i in range(N)] for j in range(N)]

On my NodeMCU, even such a small array (2500 items) is enough to exceed the NodeMCU memory limits, crashing my script. I suppose this is because that in Python, int is an object with a lot of overhead. Since in my case I do not need the capacity of int variable - actually, 0 or 1 could be stored as a bit - how can I create and fill an array with the least-memory-consuming variables? Say, like in this example, randomizing between 0 and 1. I reviewed the uctypes, but as I'm new to Python, I couldn't get these to work. Or is there another way? How can create such an array with the least memory usage possible?

You can use each bit as separate value. which will help you to store 8 times more data. If it still not enough, you should use file buffer. — Olvin Roght, Commented Mar 31, 2020 at 21:19
How do I use T or F? @OlvinRoght, I read that in Python, int takes 3 bytes, right? So shouldn't I be able to store 3*8 = 24 times more data? — Justin8051, Commented Mar 31, 2020 at 22:34
@JustinasRubinovas, in python int has no fixed size, but if you'll use array.array with 'B', each element of this array will consume 1 byte. — Olvin Roght, Commented Mar 31, 2020 at 22:36
Thank you very much for that clarification, @OlvinRoght. I will try to find a way to store this data as bits. — Justin8051, Commented Mar 31, 2020 at 23:42

Russ Hughes · Accepted Answer · 2021-09-23 08:14:33Z

I would use a bytearray to store the data. You can store 8 bits per byte by calculating the index for the byte and the bit in that byte that corresponds to a particular row and column. Once you have the indexes you can use bit shifts and bitwise operators to set, clear or retrieve the value for that row and column.

class bitarray2d():
    """create 2d array of bits cols wide and rows high"""
    def __init__(self, cols, rows):
        self.bits = bytearray(cols * rows >> 3)
        self.cols = cols
        self.rows = rows

    def set(self, column, row):
        """set the bit that corresponds to the column and row given"""
        bit = (row * self.cols) + column
        self.bits[bit >> 3] |= 1 << 7 - (bit % 8)

    def clear(self, column, row):
        """clear the bit that corresponds to the column and row given"""
        bit = (row * self.cols) + column
        self.bits[bit >> 3] &= ~(1 << 7 - (bit % 8))

    def get(self, column, row):
        """get the value of the bit that corresponds to the column and row given"""
        bit = (row * self.cols) + column
        return 1 if self.bits[bit >> 3] & 1 << 7 - (bit % 8) else 0

    def print(self):
        """print the bitarray as a 2d array"""
        for row in range(self.rows):
            for column in range(self.cols):
                print(self.get(column, row), end="")
            print()

    def line(self, x0, y0, x1, y1):
        """draw a line starting at x0, y0 and ending at x1, y1"""
        steep = abs(y1 - y0) > abs(x1 - x0)
        if steep:
            x0, y0 = y0, x0
            x1, y1 = y1, x1
        if x0 > x1:
            x0, x1 = x1, x0
            y0, y1 = y1, y0
        dx = x1 - x0
        dy = abs(y1 - y0)
        err = dx >> 1
        ystep = 1 if y0 < y1 else -1
        while x0 <= x1:
            if steep:
                self.set(y0, x0)
            else:
                self.set(x0, y0)
            err -= dy
            if err < 0:
                y0 += ystep
                err += dx
            x0 += 1

# Test routine
columns = 64
rows = 64

bits = bitarray2d(columns, rows)
bits.line(0, 0, columns-1, rows-1)
bits.line(columns-1, 0, 0, rows-1)
bits.print()

You should fix a bug in the method clear(...). Otherwise it looks OK. — hynekcer, Commented Sep 23, 2021 at 7:53

juanpa.arrivillaga · Accepted Answer · 2020-03-31 23:27:23Z

0

You are correct, that int objects have a lot of overhead. However, in this case, these int objects may actually be cached (in CPython they would be), so the only overhead should be just the pointer...

However, a pointer will still require a machine word, to not rely on such things (implementation details) and to pack things more tightly, you could use actual arrays, you are currently using a list of list objects.

The arrays module provides object-oriented wrappers around primitive, C-like numeric arrays. Unfortunately, they do not provide multidimensional arrays.

So you could try something like:

import array
import random

N = 50

def build_table(N):
    rng = range(N)
    result = []
    for _ in rng:
        arr = array.array('B') #unsigned byte
        for _ in rng:
            arr.append(random.randint(0,1))
        result.append(arr)
    return result

table = build_table(N)

If this were CPython, I would suggest the bitarray module for maximum efficiency. I have no idea if that is available for micropython, but you could implement something like that yourself, likely on top of an array.array. There are many examples of this, it is sort of a classic data structure from the times when memory was measured in bytes. Here's just one example from the Python wiki.

edited Mar 31, 2020 at 23:27

answered Mar 31, 2020 at 19:59

juanpa.arrivillaga

96.7k14 gold badges138 silver badges186 bronze badges

Thank you for your suggestion and this code example. Are you sure it is correct? I get a "name 'B' isn't defined" error.
– Justin8051
Commented Mar 31, 2020 at 21:23
Okay, that works now, but I still don't get it... Shouldn't it produce a 2D array? When I print the "table", console prints: [array('B', [0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1])]
– Justin8051
Commented Mar 31, 2020 at 21:58
Oh, right! I am still getting the hang of Python indentations, didn't notice that one. Alright, this is now much better. But still, I am using a byte to store just 0 or 1. Is it possible to use a bit instead? I can't find bit as a datatype that array() could take as an argument.
– Justin8051
Commented Mar 31, 2020 at 22:54
@JustinasRubinovas no, as I stated, there is a bitarray library available available for CPython, but I don't know if it will work with micropython, I doubt it though. You could implement your own on top of an array.array, though. depending on the complexity o you task, this is relatively achievable, and there's probably lots of guides out there since this is sort of a classic data structure. EDIT: The python wiki even has a basic implementation
– juanpa.arrivillaga
Commented Mar 31, 2020 at 23:05
If you're going to post an answer, you should know that the answer will work on Micropython.
– Patrick
Commented Apr 7, 2020 at 17:30

| Show 4 more comments

Collectives™ on Stack Overflow

The most efficient way to store a very large 2D array in Python/MicroPython

2 Answers 2

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Related