How can I connect to MSSQL and Oracle databases in Python with DSN and read tables into Pandas dataframes?

Question

I want to connect with one python program to different databases (MSSQL and Oracle for now but maybe also postgres/mysql later) and ideally read queries/tables into pandas dataframes. For compatability reasons on some other packages im using python 3.7. The connections to the databases are only available through DSN (read/configurable from a file but thats a non-issue).

The problem is that sqlalchemy (1.4) does not support a connection to an oracle database using a DSN (unless i just didnt find any answer to it on the internet), so I tried connecting with cx_oracle directly (which worked fine), but then i cant use pandas.read_sql_table(), so I would prefer it if there was another solution to somehow still get a sqlalchemy connection to the Oracle DB using a DSN. For MSSQL the sqlalchemy connection works fine using pyodbc as dialect. Some sample code:

import pandas as pd
import sqlalchemy as sqla

loginuser = 'username'
loginpwd = 'password'
logindsn = 'dsnname'
dbtype = 'oracle'/'MSSQL' #this is read from a file along with the other variables, just put it here directly to not make the code overly complicated

if dbtype == 'oracle':
    conn = ora.connect(user=loginuser, password=loginpwd, dsn=logindsn) #using a dsn doesnt work with sqlalchemy afaik
elif dbtype == 'MSSQL':
    engine = sqla.create_engine('mssql+pyodbc://'+loginuser+':'+loginpwd+'@'+logindsn)
    conn = engine.connect()

testdf = pd.read_sql_table('Employees', conn) # for MSSQL this works, for oracle it gives an error that pd.read_sql_table (which id like to use) can only be used with a sqlalchemy-connection

I'm willing to swap to a different library that allows me to connect to both Oracle and MSSQL easily if there's a better solution than pandas+sqlalchemy...

If the fancy automated tools don't fit your scenario, then you have to do it by hand. It isn't hard. Once you have a connection, the query format is the same, and you can use fetchall to fill in the dataframe. — Tim Roberts, Commented May 25, 2023 at 6:50
@TimRoberts well it would also make other cases easier, such as having always the same query instead of having different paramizations "(?)" and ":" to prevent sql injection... but yeah i guess if theres no easy solution ill have to go the long road — Erbs, Commented May 25, 2023 at 7:40
@Erbs PEP-249 specifies module.paramstyle attribute that denotes parametrization type used by engine. You may use this information when building the query. — astentx, Commented May 25, 2023 at 8:02
@astentx can you show me an example how i would use that? im quite new to working with (multiple) databases with python so i dont quite understand, if that makes me able to write queries as a single query or if it just saves me looking up the actual syntax and i still need to write 2 (or more) queries/replace chars in the string depending on db used — Erbs, Commented May 25, 2023 at 8:32
It's only a little bit awkward. rep = ora.paramstyle, then queries are like sql = f"SELECT name FROM users WHERE id={rep};" instead of just assuming ? or %s. — Tim Roberts, Commented May 25, 2023 at 18:16

Savan Soni · Accepted Answer · 2023-06-30 10:32:35Z

0

Create a direct connection with Oracle Database using python-oracledb.
Fetch the data rows using python-oracledb's fetch methods.
This data is returned in a list of rows. Convert the list into pandas DataFrame locally, Please refer to a sample code here.

Note: Python-oracledb is the new name for the latest release of Oracle's popular cx_Oracle driver runs in a 'Thin' mode by default which bypasses the Oracle Client libraries.

edited Jun 30, 2023 at 10:32

answered Jun 30, 2023 at 5:17

Savan Soni

214 bronze badges

i ditched the pandas dataframe idea by now and am just using sqlalchemy and oop classes and connect directly to the database without odbc, cause it just makes it easier to swap between different databases as i can use the sqlalchemy orm
– Erbs
Commented Jun 30, 2023 at 6:43

Add a comment |

Gord Thompson · Accepted Answer · 2023-06-30 14:05:50Z

0

I tried connecting with cx_oracle directly (which worked fine), but then i cant use pandas.read_sql_table()

For the record, with SQLAlchemy 1.4 and pandas 1.5.3, .read_sql_table() works fine for oracle+cx_oracle://:

import pandas as pd
import sqlalchemy as sa

engine = sa.create_engine(
    "oracle+cx_oracle://scott:[email protected]/?service_name=xepdb1"
)

df = pd.read_sql_table("table1", engine)
print(df)
"""
   id  txt
0   1  foo
"""

answered Jun 30, 2023 at 14:05

Gord Thompson

124k37 gold badges246 silver badges451 bronze badges

"For the record", read my question again, its not a service, its a dsn
– Erbs
Commented Jul 3, 2023 at 5:29

Add a comment |

Samuel Gottipalli · Accepted Answer · 2023-08-15 18:07:39Z

From SQLAlchemy 1.4 docs:

Connections with tnsnames.ora or Oracle Cloud Alternatively, if no port, database name, or service_name is provided, the dialect will use an Oracle DSN “connection string”. This takes the “hostname” portion of the URL as the data source name. For example, if the tnsnames.ora file contains a Net Service Name of myalias as below:

myalias =   
  (DESCRIPTION =
     (ADDRESS = (PROTOCOL = TCP)(HOST = mymachine.example.com)(PORT = 1521))
     (CONNECT_DATA =
       (SERVER = DEDICATED)
       (SERVICE_NAME = orclpdb1)
     )
  )

The cx_Oracle dialect connects to this database service when myalias is the hostname portion of the URL, without specifying a port, database name or service_name:

create_engine("oracle+cx_oracle://scott:tiger@myalias/?encoding=UTF-8&nencoding=UTF-8")

I also noticed in the question above that there is no create_engine section for the Oracle connection, which is required whether you use a DSN or otherwise when using SQLAlchemy.

Now I will admit that I'm not an Oracle expert in any form or fashion here, but some information about setting the Oracle TNS (if not already set) and using the DSN correctly can be found here from Oracle Docs.

Follow up question, if you're willing to explore other python packages, why not upgrade to SQLAlchemy 2.0? Same syntax still apply for the most part, and additional features and performance improvements have been added.

You can read more here

Collectives™ on Stack Overflow

How can I connect to MSSQL and Oracle databases in Python with DSN and read tables into Pandas dataframes?

3 Answers 3

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Linked

Related