Properly format SQL query when insert into variable number of columns

Question

I'm using psycopg2 to interact with a PostgreSQL database. I have a function whereby any number of columns (from a single column to all columns) in a table could be inserted into. My question is: how would one properly, dynamically, construct this query?

At the moment I am using string formatting and concatenation and I know this is the absolute worst way to do this. Consider the below code where, in this case, my unknown number of columns (i.e. keys from a dict is in fact 2):

dictOfUnknownLength = {'key1': 3, 'key2': 'myString'}

def createMyQuery(user_ids, dictOfUnknownLength):
    fields, values = list(), list()

    for key, val in dictOfUnknownLength.items():
        fields.append(key)
        values.append(val)

    fields = str(fields).replace('[', '(').replace(']', ')').replace("'", "")
    values = str(values).replace('[', '(').replace(']', ')')

    query = f"INSERT INTO myTable {fields} VALUES {values} RETURNING someValue;"

query = INSERT INTO myTable (key1, key2) VALUES (3, 'myString') RETURNING someValue;

This provides a correctly formatted query but is of course prone to SQL injections and the like and, as such, is not an acceptable method of achieving my goal.

In other queries I am using the recommended methods of query construction when handling a known number of variables (%s and separate argument to .execute() containing variables) but I'm unsure how to adapt this to accommodate an unknown number of variables without using string formatting.

How can I elegantly and safely construct a query with an unknown number of specified insert columns?

TrebledJ · Accepted Answer · 2021-02-15 17:28:25Z

To add to your worries, the current methodology using .replace() is prone to edge cases where fields or values contain [, ], or '. They will get replaced no matter what and may mess up your query.

You could always use .join() to join a variable number of values in your list. To top it up, format the query appropriately with %s after VALUES and pass your arguments into .execute().

Note: You may also want to consider the case where the number of fields is not equal to the number values.

import psycopg2


conn = psycopg2.connect("dbname=test user=postgres")
cur = conn.cursor()

dictOfUnknownLength = {'key1': 3, 'key2': 'myString'}


def createMyQuery(user_ids, dictOfUnknownLength):
    # Directly assign keys/values.
    fields, values = list(dictOfUnknownLength.keys()), list(dictOfUnknownLength.values())

    if len(fields) != len(values):
        # Raise an error? SQL won't work in this case anyways...
        pass

    # Stringify the fields and values.
    fieldsParam = ','.join(fields) # "key1, key2"
    valuesParam = ','.join(['%s']*len(values))) # "%s, %s"

    # "INSERT ... (key1, key2) VALUES (%s, %s) ..."
    query = 'INSERT INTO myTable ({}) VALUES ({}) RETURNING someValue;'.format(fieldsParam, valuesParam)

    # .execute('INSERT ... (key1, key2) VALUES (%s, %s) ...', [3, 'myString'])
    cur.execute(query, values) # Anti-SQL-injection: pass placeholder
                               # values as second argument.

I believe this approach will lead to column names being treated as strings and thus encapsulated within ' ' and therefore being invalid SQL syntax?
Ensuring that len(fields)==len(values) is a very good point, thanks.

Collectives™ on Stack Overflow

Properly format SQL query when insert into variable number of columns

1 Answer 1

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Linked

Related