91

I'm using SQL Alchemy's ORM and I find when I return a single column I get the results like so:

[(result,), (result_2,)] # etc...

With a set like this I find that I have to do this often:

results = [r[0] for r in results] # So that I just have a list of result values

This isn't that "bad" because my result sets are usually small, but if they weren't this could add significant overhead. The biggest thing is I feel it clutters the source, and missing this step is a pretty common error I run into.

Is there any way to avoid this extra step?

A related aside: This behaviour of the orm seems inconvenient in this case, but another case where my result set was, [(id, value)] it ends up like this:

[(result_1_id, result_1_val), (result_2_id, result_2_val)]

I then can just do:

results = dict(results) # so I have a map of id to value

This one has the advantage of making sense as a useful step after returning the results.

Is this really a problem or am I just being a nitpick and the post processing after getting the result set makes sense for both cases? I'm sure we can think of some other common post processing operations to make the result set more usable in the application code. Is there high performance and convenient solutions across the board or is post processing unavoidable, and merely required for varying application usages?

When my application can actually take advantage of the objects that are returned by SQL Alchemy's ORM it seems extremely helpful, but in cases where I can't or don't, not so much. Is this just a common problem of ORMs in general? Am I better off not using the ORM layer in cases like this?

I suppose I should show an example of the actual orm queries I'm talking about:

session.query(OrmObj.column_name).all()

or

session.query(OrmObj.id_column_name, OrmObj.value_column_name).all()

Of course, in a real query there'd normally be some filters, etc.

7 Answers 7

29

One way to decrease the clutter in the source is to iterate like this:

results = [r for (r, ) in results]

Although this solution is one character longer than using the [] operator, I think it's easier on the eyes.

For even less clutter, remove the parenthesis. This makes it harder when reading the code, to notice that you're actually handling tuples, though:

results = [r for r, in results]
1
  • To do this internal to sqlalchemy in 1.4+, check out @b3000's answer lower down. Commented May 3, 2023 at 4:36
23

Python's zip combined with the * inline expansion operator is a pretty handy solution to this:

>>> results = [('result',), ('result_2',), ('result_3',)]
>>> zip(*results)
[('result', 'result_2', 'result_3')]

Then you only have to [0] index in once. For such a short list your comprehension is faster:

>>> timeit('result = zip(*[("result",), ("result_2",), ("result_3",)])', number=10000)
0.010490894317626953
>>> timeit('result = [ result[0] for result in [("result",), ("result_2",), ("result_3",)] ]', number=10000)
0.0028390884399414062

However for longer lists zip should be faster:

>>> timeit('result = zip(*[(1,)]*100)', number=10000)
0.049577951431274414
>>> timeit('result = [ result[0] for result in [(1,)]*100 ]', number=10000)
0.11178708076477051

So it's up to you to determine which is better for your situation.

1
  • 1
    The assumption here is that there might be a million elements in the list and suddenly this doesn't look very clever. Commented May 10, 2020 at 12:23
23

Starting from version 1.4 SQLAlchemy provides a method to retrieve results for a single column as a list of values:

# ORM
>>> session.scalars(select(User.name)).all()
['ed', 'wendy', 'mary', 'fred']
# or
>>> query = session.query(User.name)
>>> session.scalars(query).all()
['ed', 'wendy', 'mary', 'fred']

# Core
>>> with engine.connect() as connection:
...     result = connection.execute(text("select name from users"))
...     result.scalars().all()
... 
['ed', 'wendy', 'mary', 'fred']

See the SQLAlchemy documentation.

5
  • 2
    This method is not present in a query object, only after running a literal SQL sentence.
    – josemfc
    Commented May 13, 2022 at 12:08
  • 5
    @josemfc you can do session.scalars(session.query(MyModel.id)) or session.scalars(select(MyModel.id)) if you are using the ORM. Commented Sep 4, 2022 at 9:00
  • @b3000 it would be great if you could update your answer to cover ORM models and Core tables. Commented Sep 4, 2022 at 9:03
  • 1
    Note that the snippet submitted by @snakecharmerb, raises a depracation warning : SADeprecationWarning: Object <flask_sqlalchemy.BaseQuery ...> should not be used directly in a SQL statement context, such as passing to methods such as session.execute(). This usage will be disallowed in a future release...
    – edg
    Commented Oct 6, 2022 at 9:08
  • Good answer. Just a site note 🐉🔥: If you select multiple columns .scalars().all() returns the first column only. So select(User.name, User.id) has the same ScalarResult as select(User.name). However, to select all columns using select(User), the ScalarResult is helpful again, as it returns a flattened list of User-objects instead of the Row-tuples.
    – Kim
    Commented Nov 15, 2022 at 22:09
19

I struggled with this too until I realized it's just like any other query:

for result in results:
     print result.column_name
1
  • Yeah, I agree, the NamedTuples are not immediately obvious. My most common post processing is still having to create a dictionary of some sorts. Most of my post processing has been eliminated by better database architecture and SQLAlchemy usage so I have the values I need attached to ORM objects, or available through methods.
    – Derek Litz
    Commented Apr 25, 2013 at 21:45
2

I found the following more readable, also includes the answer for the dict (in Python 2.7):

d = {id_: name for id_, name in session.query(Customer.id, Customer.name).all()}
l = [r.id for r in session.query(Customer).all()]

For the single value, borrowing from another answer:

l = [name for (name, ) in session.query(Customer.name).all()]

Compare with the built-in zip solution, adapted to the list:

l = list(zip(*session.query(Customer.id).all())[0])

which in my timeits provides only about 4% speed improvements.

0

My solution looks like this ;)

def column(self):
    for column, *_ in Model.query.with_entities(Model.column).all():
        yield column

NOTE: py3 only.

-3

Wow, guys, why strain? There are method steeper way, faster and more elegant)

>>> results = [('result',), ('result_2',), ('result_3',)]
>>> sum(results, tuple())
('result', 'result_2', 'result_3')

Speed:

>>> timeit('result = zip(*[("result",), ("result_2",), ("result_3",)])', number=10000)
0.004222994000883773
>>> timeit('result = sum([("result",), ("result_2",), ("result_3",)], ())', number=10000)
0.0038205889868550003

But if more elements in list - use only zip. Zip more speed.

2
  • 3
    This takes quadratic time, as you create a new tuple for every sum step. For 1000 elements, you create an empty tuple, then a tuple with 1 element, then a tuple with 2 elements, etc. Each new tuple created has to copy all references from the previous tuple first, so in the end you created N * N! copies of the references. Do not use sum() to concatenate sequences. Commented Aug 16, 2018 at 15:12
  • 1
    That the 3-element sum is faster is lucky, you should really time this with much, much larger inputs. Commented Aug 16, 2018 at 15:13

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.