1

I'm trying to insert Numpy array into PostgreSQL. Tried to do like this

def write_to_db(some_arr, some_txt):
""" insert a new array into the face_ar table """
    sql = """INSERT INTO test_db VALUES(%s,%s);"""
    conn = None
    try:
        params = config()
        conn = psycopg2.connect(**params)
        cur = conn.cursor()
        cur.execute(sql, (some_arr, some_txt))
        conn.commit()
        cur.close()

    except (Exception, psycopg2.DatabaseError) as e:
        print(e)
    finally:
        if conn is not None:
            conn.close()

Before it i created a table in my DB

create table test_db (encodings double precision[], link text);

Finally i got an error: "can't adapt type 'numpy.ndarray'"

I need to write Numpy array of 125 float64 items and small text like a link in each row. There will be a few millions of rows in my project. Just speed of reading and size of DB are important. As i got it is not possible to insert Numpy array directly, and need to convert it to another format. First idea i got was to convert it to Binary data and save it to DB, but i dont know how to do it and how to get it back from DB in Numpy array format.

2 Answers 2

2

Thanks to Vasyl Kushnir. This method started to work well and fast for reading data

import psycopg2
from config import config
import msgpack
import msgpack_numpy as m

def write_to_db(encoding, link):
""" insert a new array into the test1_db table """
    sql = """INSERT INTO test1_db VALUES(%s,%s);"""
    conn = None
    dumped_data = msgpack.packb(encoding, default=m.encode)
    try:
        params = config()
        conn = psycopg2.connect(**params)
        cur = conn.cursor()
        cur.execute(sql, (dumped_data, link))
        conn.commit()
        cur.close()

    except (Exception, psycopg2.DatabaseError) as e:
        print(e)
    finally:
        if conn is not None:
            conn.close()

def read_from_db():
""" query data from the test1_db table """
    conn = None
    row = None
    try:
        params = config()
        conn = psycopg2.connect(**params)
        cur = conn.cursor()
        cur.execute("SELECT encodings, link FROM test1_db")
        print("The number of rows: ", cur.rowcount)
        row = cur.fetchone()
        cur.close()
    except (Exception, psycopg2.DatabaseError) as error:
        print(error)
    finally:
        if conn is not None:
            conn.close()
        encoding1, somelink = row
        return msgpack.unpackb(encoding1, object_hook=m.decode), somelink
Sign up to request clarification or add additional context in comments.

Comments

1

Try to use pickle python for binary serialization/deserialization

Example:

import numpy as np
from pickle import dumps, loads
data=np.array([1,2,4,5,6])
dumped_data = dumps(data)
loaded_data = loads(dumped_data)
print(dumped_data)
print(loaded_data)

5 Comments

It is strange but pickle works much faster than np.save and np.loads.
I will try to make code faster with using msgpack-0.6.0 . Test showed that it decodes 2X faster than pickle, but it is not adapted for np arrays.
np.array has tolist method if you want to use msgpack-0.6.0 but it the same speed as pickle on with np.array on my computer.
% timeit dumps(encoding) 11.5 µs ± 489 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) % timeit loads(out) 5 µs ± 37.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) % timeit msgpack.packb(encoding, default=m.encode) 19.8 µs ± 581 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) % timeit msgpack.unpackb(x_enc, object_hook=m.decode) 3.62 µs ± 43.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
with pickle.loading - 5µs ; with msgpack_numpy.unpack - 3.62 µs

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.