Skip to content

Add a convenient way to visualize an sqlite3.Cursor in a terminal (and maybe also Jupyter) #128905

Open
@jpivarski

Description

@jpivarski

Feature or enhancement

Proposal:

I think that the sqlite3 module is a hidden gem in the Python standard library: the functionality it provides is like a mini-Pandas, similar to the array module, which is a mini-NumPy. The only problem is that it's more programmatic than interactive, in that it's hard to look at the results of a query without either writing a loop to iterate over sqlite3.Cursor results or passing the query to Pandas (which instantiates all of the results in memory).

This came up in a problem I was working on and I didn't find any standard solutions for visualizing an SQLite query, so I wrote one on StackOverflow. Just it's just complex enough that variants of code like this shouldn't be floating around the web; a single version should be built into the sqlite3.Cursor object.

I'd like a method that you can use like this:

db.execute("SELECT * FROM compleat_works").print()

that prints a table like this:

title           | type      | characters | year_low | year_high
----------------+-----------+------------+----------+----------
'The Sonnets'   | 'poetry'  |       None |     1609 |      1609
'All’s Well tha | 'comedy'  |         23 |     1604 |      1605
'The Tragedy of | 'tragedy' |         42 |     1606 |      1606
'As You Like It | 'comedy'  |         27 |     1599 |      1600
'The Comedy of  | 'comedy'  |         18 |     1594 |      1594
...             | ...       |        ... |      ... |       ...
'A Lover’s Comp | 'poetry'  |       None |     1609 |      1609
'The Passionate | 'poetry'  |       None |     1599 |      1599
'The Phoenix an | 'poetry'  |       None |     1601 |      1601
'The Rape of Lu | 'poetry'  |          2 |     1594 |      1594
'Venus and Adon | 'poetry'  |          2 |     1593 |      1593
--- 44 rows ----+-----------+------------+----------+----------

Here's my sample implementation:

def print(
    self,
    *,
    first: int = 5,
    last: int = 5,
    show_header: bool = True,
    show_count: bool = True,
    total_width: int = 80,
    max_width: int = 15,
    stream = sys.stdout,
):
    # name of each column
    header = [x[0] for x in self.description]
    # width of the repr of each column
    widths = [min(len(x), max_width) for x in header]

    # iterate over the whole Cursor, but only keep `first + 2*last` rows in memory
    first_rows = []
    last_rows = []
    row_count = 0
    for row in self:
        if len(first_rows) < first + last:
            first_rows.append(row)
        last_rows.append(row)
        if len(last_rows) > last:
            last_rows = last_rows[1:]
        row_count += 1

    if row_count <= first + last:
        # if the number of rows <= `first + last`, show them all
        rows = first_rows
    else:
        # otherwise, show the `first`, an ellipsis row, and then `last`
        rows = first_rows[:first] + [[...] * len(header)] + last_rows

    # represent rows with mutable lists so that we can replace them with reprs
    rows = [list(x) for x in rows]

    align = [">"] * len(header)
    for row in rows:
        assert len(row) == len(header), f"{len(row)} columns != {len(header)} columns"
        for i, cell in enumerate(row):
            # if all values are str or bytes (ignoring None), left-align
            if cell != ... and isinstance(cell, (str, bytes)) and cell is not None:
                align[i] = "<"
            # replace data with their repr strings (except ellipsis)
            row[i] = "..." if cell == ... else repr(cell)
            # identify the maximum (string) width of each column, up to max_width
            widths[i] = min(max(widths[i], len(row[i])), max_width)

    # if the table is too wide, replace the last column with ellipsis
    if sum(widths) + (len(widths) - 1) * 2 > total_width:
        header[-1] = "..."
        widths[-1] = 3
        for row in rows:
            row[-1] = "..."

    # if the table is still too wide, remove columns
    while sum(widths) + (len(widths) - 1) * 2 > total_width and len(header) > 1:
        del header[-2]
        del widths[-2]
        for row in rows:
            del row[-2]

    # prepare a format string for each line of text
    formatter = " | ".join(f"{{:{a}{w}s}}" for a, w in zip(align, widths))
    # prepare the horizontal line between header and data
    header_separator = "-+-".join("-" * w for w in widths)

    if show_header:
        # print the table column names and a horizontal line under it
        stream.write(formatter.format(*[x[:w] for x, w in zip(header, widths)]) + "\n")
        stream.write(header_separator + "\n")
    for row in rows:
        # print each table row
        stream.write(formatter.format(*[x[:w] for x, w in zip(row, widths)]) + "\n")
    if show_count:
        # print the number of rows in another horizontal line
        count = f"--- {row_count} rows ---"
        stream.write(count + header_separator[len(count) :] + "\n")

And/or maybe a head method that only shows the beginning, rather than both the beginning and end (to avoid iterating over millions of rows). Maybe also a _repr_html_ method for IPython/Jupyter, but that's probably more specific than the standard model should be (only useful in a third-party library, albeit a major one).

What do you think? Is this going against the direction you want sqlite3 to go? (E.g. are you trying to reduce its presence, rather than increase it?) Is the idea sound, but the method names (print, head) are bad? Is it a problem that it "eats" the Cursor, and should instead be a method on sqlite3.Connection that takes a query: str?

I'm willing to write the PR; I just want to start by checking to see if such a PR would be welcome.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions