0

Language: Python 3.8
Platform: MacOS 11 | Windows 10
Filetypes: .xlsx | .csv.
Task: File/Format Conversion

Synopsis: My excel file has cells with functions/formulas. I want to save this file as a .csv while preserving the value of the formulas (not the actual string of the function, itself)

What works: Pause script, prompt user to open Excel > 'Save As' .csv // Excel processes the functions within the cells and preserves the values before saving as .csv

What hasn't worked: Using pandas or openpyxl to convert the excel file to a .csv (such as 'wb.save' and 'df.to_csv' // The produced .csv file does not process the function cells and instead outputs nothing within those cells.

Question: Anyway of leveraging excel's 'process the function and save the values' function within the Python script?

Thank you!

Sample Code - Pandas

df = pd.read_excel('file.xlsx')
df.to_csv('file.csv')

Sample Code - Openpyxl

wb = openpyxl.load_workbook('file.xlsx', data_only=True)
sheet = wb.active
with open('file.csv', 'w', newline="") as f:
c = csv.writer(f)
for r in sheet.iter_rows():
c.writerow([cell.value for cell in r])
wb.save('file.csv')

Sample Problem

Excel Columns:

A: ['First Initial']
B: ['Last Name']
C: ['Email']

Formula in all rows within column C:

C1: [=CONCATENATE(A1,".",B1,"@domain.net")]
C2: [=CONCATENATE(A2,".",B2,"@domain.net")]
C3: [=CONCATENATE(A3,".",B3,"@domain.net")]
etc.

Output of 'file.xlsx' through excel & 'file.csv' (via excel > 'Save As' .csv):

A1: ['j']
B1: ['doe']
C1: ['[email protected]']

Output of 'file.csv' after following the Pandas Sample Code:

A1: ['j']
B1: ['doe']
C1: ['']

if a cell does not contain a formula, the conversion outputs correct values within the cells. For the cells with formulas, the cells are empty (since .csv is just plain-text). Is there a way to replicate excel's behavior of running the functions first > save output value into cell > save as .csv?

UPDATE:

So I found the issue, although not sure how to go about solving this. Pandas works as intended when I created a fresh .xlsx and tried the sample code. But it didn't work with the .xlsx in my script - and I narrowed it down to this step

The following is a snippet from my script that copies values from one excel file into another:

wb1 = xl.load_workbook('/file1.xlsx')
ws1 = wb1.worksheets[0]

wb2 = xl.load_workbook('/file2.xlsx')
ws2 = wb2.active

mr = ws1.max_row
mc = ws1.max_column

for i in range (1, mr + 1):
    for j in range (1, mc + 1):
        c = ws1.cell(row = i, column = j)
        ws2.cell(row = i, column = j).value = c.value


wb2.save('file2.xlsx')

The file ('file2.xlsx'), while seemingly opens and functions just like a regular excel file, DOES NOT preserve its values within cells that have formulas after converting it to a .csv via pandas.

The file ('file1.xlsx') however, does this just fine.

BUT, if I open 'file2.xlsx' and just simply save it (without changing anything), and then try converting it via pandas - it DOES end up preserving the values within formulas.

So there's definitely something wrong in my code (surprise, surprise) that does this. Not sure why, though.

7
  • If you only want to save an excel file as a csv then you don't need python, so can you explain why you want to do this with python? Commented Sep 29, 2021 at 17:56
  • @TaxpayersMoney It's part of a larger workflow - but I've excluded that information to simply focus on the file conversion. The product of the workflow should output a .csv (currently outputs a .xlsx) - that's the part I am trying to change. It can be done manually, but the whole point is to automate the process - and there are multiple .xlsx files that need to be turned into .csv files - the constant "Save As" process defeats the purpose of automating the task. Commented Sep 29, 2021 at 18:03
  • Couldn't you read the excel file as a pandas dataframe and then output to csv using read_excel and to_csv? Commented Sep 29, 2021 at 19:07
  • @TaxpayersMoney You are correct - that process does work. There's a flaw in another part of my code where the xlsx in need of conversion is a product of a value transfer from one workbook to another. This is where the problem happens (see updated part of my question). Any ideas on how to solve that? Commented Sep 29, 2021 at 20:04
  • Use read-only mode and stream to CSV row for row. Much faster than going via Pandas. Commented Sep 30, 2021 at 8:15

2 Answers 2

1

SOLVED

I was able to solve my own question - posting it here for anyone else who has a similar issue (searching this problem led me believe ya'll exist, so here you go.)

Note: This only works on a Windows system, with Excel installed

import win32com.client as win32
from win32com.client import Dispatch
from win32com.client import constants as c

    excel = Dispatch('Excel.Application') # Calls Excel
    excel.DisplayAlerts = False # Disables prompts, such as asking to overwrite files
    wb = excel.Workbooks.Open("/file.xlsx") # Input File
    wb.SaveAs("/file.csv"), c.xlCSV) # Output File
    excel.Application.Quit() # Close Excel
    excel.DisplayAlerts = True # Turn alerts back on
3
  • FYI it's much simpler to use Pandas library to load an excel workbook and then save to csv. Commented Oct 2, 2021 at 0:20
  • Is this faster than pandas conversion?any ideas?
    – Rahil
    Commented May 24, 2022 at 11:28
  • This code's not as pretty but it feels much faster than importing the CSV library and running a loop to import each line one by one.
    – PowerUser
    Commented Apr 19, 2023 at 23:34
0

This can be done using Pandas library. Here ,this might help : https://www.geeksforgeeks.org/convert-excel-to-csv-in-python/

5
  • While this works, it doesn't work if your excel file includes cells with working functions. Commented Sep 29, 2021 at 18:05
  • @TheCrazySwede - then your read_excel call is failing, because to_csv certainly works. Which is why we need an actual minimal reproducible example.
    – BigBen
    Commented Sep 29, 2021 at 18:07
  • @BigBen Updated the question with more info. The 'to_csv' works perfectly fine for everything, except every cell containing a formula comes out as blank. The rest of the cells are saved as intended. Commented Sep 29, 2021 at 18:26
  • @TheCrazySwede - unable to reproduce: i.sstatic.net/YansM.png What version of pandas?
    – BigBen
    Commented Sep 29, 2021 at 18:49
  • @BigBen You are right! Found the part of my code that is causing this issue. I updated the end of the question - would you be willing to help? Commented Sep 29, 2021 at 20:03

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.