SQL queries in a dataframe

Question

I want to get table names and column names from queries in a dataframe. The dataframe is like this:

Date         Query
29-03-2019   SELECT * FROM table WHERE ..
30-03-2019   SELECT * FROM ... JOIN ... ON ...WHERE ..
....         ....
20-05-2019   SELECT ...

and I run function to that dataframe to get tablename from the queries.

import sqlparse
from sqlparse.tokens import Keyword, DML

def getTableKey(parsed):
    findFrom = False
    wordKey = set(
        [
            "FROM",
            "JOIN",
            "LEFT JOIN",
            "INNER JOIN",
            "RIGHT JOIN",
            "OUTER JOIN",
            "FULL JOIN",
        ]
    )
    for word in parsed.tokens:
        if word.is_group:
            yield from getTableKey(word)
        if findFrom:
            if isSelect(word):
                yield from getTableKey(word)
            elif word.ttype is Keyword:
                findFrom = False
                StopIteration
            else:
                yield word
        if word.ttype is Keyword and word.value.upper() in wordKey:
            findFrom = True


def getTableName(sql):
    tableReg = re.compile(r"^.+?(?<=[.])")
    tableName = []
    query = sqlparse.parse(sql)
    for word in query:
        if word.get_type() != "UNKNOWN":
            stream = getTableKey(word)
            table = set(getWord(stream))
            for item in table:
                tabl = tableReg.sub("", item)
                tableName.append(tabl)
    return tableName

Also, I run function to get columnname from queries.

def getKeyword(parsed):
    kataKeyword = set(["WHERE", "ORDER BY", "ON", "GROUP BY", "HAVING", "AND", "OR"])
    from_seen = False
    for item in parsed.tokens:
        if item.is_group:
            yield from getKeyword(item)
        if from_seen:
            if isSelect(item):
                yield from getKeyword(item)
            elif item.ttype is Keyword:
                from_seen = False
                StopIteration
            else:
                yield item
        if item.ttype is Keyword and item.value.upper() in kataKeyword:
            from_seen = True


def getAttribute(sql):
    attReg = re.compile(r"asc|desc", re.IGNORECASE)
    namaAtt = []
    kueri = sqlparse.parse(sql)
    for kata in kueri:
        if kata.get_type() != "UNKNOWN":
            stream = getKeyword(kata)
            table = set(getWord(stream))
            for item in table:
                tabl = attReg.sub("", item)
                namaAtt.append(tabl)
    return namaAtt

But as this is my first try, I need an opinion about what I've tried, because my code runs slowly with a large file.

xander27 · Accepted Answer · 2019-07-16 07:49:12Z

1

That will not speedup your code, but there are some code improvements:

follow naming conventions getAttribute -> get_attribute https://visualgit.readthedocs.io/en/latest/pages/naming_convention.html
You can create set using set literal my_set = {1, 2, 3}
You can compile tableReg = re.compile(r"^.+?(?<=[.])") once

answered Jul 16, 2019 at 7:49

xander27

1561 bronze badge

\$\begingroup\$ The canonical reference related to code style in Python is the official Style Guide for Python Code widely known as PEP8. \$\endgroup\$

AlexV
– AlexV

2019-07-16 07:55:35 +00:00
Commented Jul 16, 2019 at 7:55

Add a comment |

Stack Exchange Network

SQL queries in a dataframe

1 Answer 1

You must log in to answer this question.

Hot Network Questions

SQL queries in a dataframe

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions