3
\$\begingroup\$

I want to get table names and column names from queries in a dataframe. The dataframe is like this:

Date         Query
29-03-2019   SELECT * FROM table WHERE ..
30-03-2019   SELECT * FROM ... JOIN ... ON ...WHERE ..
....         ....
20-05-2019   SELECT ...

and I run function to that dataframe to get tablename from the queries.

import sqlparse
from sqlparse.tokens import Keyword, DML

def getTableKey(parsed):
    findFrom = False
    wordKey = set(
        [
            "FROM",
            "JOIN",
            "LEFT JOIN",
            "INNER JOIN",
            "RIGHT JOIN",
            "OUTER JOIN",
            "FULL JOIN",
        ]
    )
    for word in parsed.tokens:
        if word.is_group:
            yield from getTableKey(word)
        if findFrom:
            if isSelect(word):
                yield from getTableKey(word)
            elif word.ttype is Keyword:
                findFrom = False
                StopIteration
            else:
                yield word
        if word.ttype is Keyword and word.value.upper() in wordKey:
            findFrom = True


def getTableName(sql):
    tableReg = re.compile(r"^.+?(?<=[.])")
    tableName = []
    query = sqlparse.parse(sql)
    for word in query:
        if word.get_type() != "UNKNOWN":
            stream = getTableKey(word)
            table = set(getWord(stream))
            for item in table:
                tabl = tableReg.sub("", item)
                tableName.append(tabl)
    return tableName

Also, I run function to get columnname from queries.

def getKeyword(parsed):
    kataKeyword = set(["WHERE", "ORDER BY", "ON", "GROUP BY", "HAVING", "AND", "OR"])
    from_seen = False
    for item in parsed.tokens:
        if item.is_group:
            yield from getKeyword(item)
        if from_seen:
            if isSelect(item):
                yield from getKeyword(item)
            elif item.ttype is Keyword:
                from_seen = False
                StopIteration
            else:
                yield item
        if item.ttype is Keyword and item.value.upper() in kataKeyword:
            from_seen = True


def getAttribute(sql):
    attReg = re.compile(r"asc|desc", re.IGNORECASE)
    namaAtt = []
    kueri = sqlparse.parse(sql)
    for kata in kueri:
        if kata.get_type() != "UNKNOWN":
            stream = getKeyword(kata)
            table = set(getWord(stream))
            for item in table:
                tabl = attReg.sub("", item)
                namaAtt.append(tabl)
    return namaAtt

But as this is my first try, I need an opinion about what I've tried, because my code runs slowly with a large file.

\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

That will not speedup your code, but there are some code improvements:

  1. follow naming conventions getAttribute -> get_attribute https://visualgit.readthedocs.io/en/latest/pages/naming_convention.html

  2. You can create set using set literal my_set = {1, 2, 3}

  3. You can compile tableReg = re.compile(r"^.+?(?<=[.])") once

\$\endgroup\$
1
  • \$\begingroup\$ The canonical reference related to code style in Python is the official Style Guide for Python Code widely known as PEP8. \$\endgroup\$ Commented Jul 16, 2019 at 7:55

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.