0

I’m working on a project where I need to automatically extract data from an Excel file and load it into an Oracle database. I’m using Python, TOAD, Oracle Client, and VS Code. The goal is to trigger the upload as soon as a new Excel file is added to a specific folder. Please assist with a code for this.

import os
import pandas as pd
import cx_Oracle
from time import sleep

oracle_connection_string = 'c##chbz/excelpass@localhost:1521/XE'

#create a connection to the folder where the daily excel report will be populated.
folder_path = r'C:\Users\USERR\Desktop\FolderDailyExcels'

#connection between python script and oracle database
def connect_to_oracle():
    try:
        connection = cx_Oracle.connect(oracle_connection_string)
        print("Successfully connected:")
        return connection
    except cx_Oracle.DatabaseError as e:
        print("There was an error connecting to the database:", e)
        return None
connect_to_oracle()

#create a connection to the excel sheet
def read_excel(file_path):
    try:
        data = pd.read_excel(file_path)
        return data
    except Exception as e:
        print(f"Error reading Excel file: {e}")
        return None
# Now call the function like this:
file_path = r'C:\Users\USERR\Desktop\FolderDailyExcels\Excel1.xlsx'
read_excel(file_path)

#Upload Data to Oracle Database
def upload_to_oracle(data, Project_table):
    connection = connect_to_oracle()
    if connection is None:
        return
    
    cursor = connection.cursor()
    
    for index, row in data.iterrows():
        try:
            sql = f"INSERT INTO {Project_table} (FULLNAME, NOOFATTACK, DATEOFATTACK) VALUES (:1, :2, :3)"
            cursor.execute(sql, (row['column1'], row['column2'], row['column3']))
            connection.commit()
        except Exception as e:
            print(f"Error uploading row {index}: {e}")
    
    cursor.close()
    connection.close()

#Monitor folder and automate ETL
def monitor_folder():
    while True:
        files = os.listdir(folder_path)
        for file in files:
            if file.endswith(".xlsx"):
                file_path = os.path.join(folder_path, file)
                print(f"Found new file: {file_path}")
                data = read_excel(file_path)
                if data is not None:
                    upload_to_oracle(data, 'Project_table')  # Change 'Project_table' to your table name
        sleep(60)  # Check the folder every minute
5
  • check the link below if its useful for your usecase. stackoverflow.com/questions/46854609/…
    – Vijay
    Commented Apr 9 at 4:05
  • cx_Oracle was replaced 3 years ago by python-oracledb, so don't use cx_Oracle. Both drivers support the Python DB API so your code will be similar. And please don't iterate over cursor.execute() because this will be slow. Instead use cursor.executemany() - see the doc on batch loading: python-oracledb.readthedocs.io/en/latest/user_guide/… Commented Apr 9 at 6:16
  • next time use ``` to format code
    – furas
    Commented Apr 9 at 10:17
  • I don't see the part where processed files are deleted (or, preferably, archived).
    – jsf80238
    Commented Apr 9 at 14:37
  • I suggest getting rid of the try/except blocks. Unless you are going to do something with an Exception just let the main program handle it.
    – jsf80238
    Commented Apr 9 at 14:38

1 Answer 1

0

Here's the spreadsheet reading & insertion part from some old code I had around; you might want to check & tune it. The file monitoring code is left as an exercise for you.

# Code to insert spreadsheet data into an Oracle table where the table schema
# matches the spreadsheet columns (in the same order)
#
# sheet.xlsx contains:
#   ID   NAME             CITY  AGE  IQ
#   1   Anna        Singapore   20  101
#   2    Bob        Melbourne   30  101
#   3  Chris           London   32  101
#   4  Dolly  San Franscisco    28  101
#
# The table schema is like:
#   create table t (id number, name varchar2(40), city varchar2(20), age number, iq number);

import getpass
import oracledb
import pandas as pd

XLSX_FILE_NAME = "sheet.xlsx"

un = 'cj'
cs = 'localhost/orclpdb1'
pw = getpass.getpass(f'Enter password for {un}@{cs}: ')

with oracledb.connect(user=un, password=pw, dsn=cs) as connection:
    df = pd.read_excel(XLSX_FILE_NAME)
    sql = f"insert into t (ID,NAME,CITY,AGE,IQ) values (:ID,:NAME,:CITY,:AGE,:IQ)"
    with connection.cursor() as cursor:
        rows = [tuple(x) for x in df.values]
        cursor.executemany(sql, rows)

    # Verify inserted data
    with connection.cursor() as cursor:
        sql = "select * from t"
        for r in cursor.execute(sql):
            print(r)

With long running processes that connect relatively frequently, there is some possibility that using a connection pool of size 1, would give you more flexibility in future to use DRCP pooling in the backend.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.