7

I have multiple files which I process using Numpy and SciPy, but I am required to deliver an Excel file. How can I efficiently copy/paste a huge numpy array to Excel?

I have tried to convert to Pandas' DataFrame object, which has the very usefull function to_clipboard(excel=True), but I spend most of my time converting the array into a DataFrame.

I cannot simply write the array to a CSV file then open it in excel, because I have to add the array to an existing file; something very hard to achieve with xlrd/xlwt and other Excel tools.

2
  • What issues are you running into converting the array to a pandas.DataFrame? It should be as simple as df = pandas.DataFrame(yourarray). Commented Mar 18, 2014 at 19:38
  • There is no issue, it just takes a long time to perform both df = pandas.DataFrame(data=data), and df.to_clipboard(excel=True). Also, I don't really need the column names and row indexes. Commented Mar 18, 2014 at 20:45

6 Answers 6

14

My best solution here would be to turn the array into a string, then use win32clipboard to sent it to the clipboard. This is not a cross-platform solution, but then again, Excel is not avalable on every platform anyway.

Excel uses tabs (\t) to mark column change, and \r\n to indicate a line change.

The relevant code would be:

import win32clipboard as clipboard

def toClipboardForExcel(array):
    """
    Copies an array into a string format acceptable by Excel.
    Columns separated by \t, rows separated by \n
    """
    # Create string from array
    line_strings = []
    for line in array:
        line_strings.append("\t".join(line.astype(str)).replace("\n",""))
    array_string = "\r\n".join(line_strings)

    # Put string into clipboard (open, clear, set, close)
    clipboard.OpenClipboard()
    clipboard.EmptyClipboard()
    clipboard.SetClipboardText(array_string)
    clipboard.CloseClipboard()

I have tested this code with random arrays of shape (1000,10000) and the biggest bottleneck seems to be passing the data to the function. (When I add a print statement at the beginning of the function, I still have to wait a bit before it prints anything.)

EDIT: The previous paragraph related my experience in Python Tools for Visual Studio. In this environment, it seens like the print statement is delayed. In direct command line interface, the bottleneck is in the loop, like expected.

Sign up to request clarification or add additional context in comments.

3 Comments

Hm. Where did you put your print statement? It seems odd to me that it would take any noticeable time at all, since function parameters in Python are always just pointers. That is, you don't pass the contents of array, you pass its memory address.
You're right, when I try this in iPython (with the print right after the def ... line), there is no delay. There is only one when I try this in an interactive windows of Python Tools for Visual Studio. In iPython, the bottleneck is at the loop. I'll modify my answer accordingly.
For me this splits numbers in the array into individual digits, decimal point (.), and other symbols (e.g. e). For example, a value in the array 0.12345e-06 is pasted into Excel as 0, ., 1, ... ,e, -, 0, 6 into separate columns.
10
    import pandas as pd
    pd.DataFrame(arr).to_clipboard()

I think it's one of the easiest way with pandas package.

Comments

1

If I would need to process multiple files loaded into python and then parse into excel, I would probably make some tools using xlwt

That said, may I offer my recipe Pasting python data into a spread sheet open for any edits, complaints or feedback. It uses no third party libraries and should be cross platform.

Comments

1

As of today, you can also use xlwings. It's open source, and fully compatible with Numpy arrays and Pandas DataFrames.

Comments

0

I extended PhilMacKay answer to: - incude 1dimensional arrays and, - to allow for commas as decimal separator (decimal=","):

import win32clipboard as clipboard

def to_clipboard(array, decimal=","):
    """
    Copies an array into a string format acceptable by Excel.
    Columns separated by \t, rows separated by \n
    """
    # Create string from array
    try:
        n, m = np.shape(array)
    except ValueError:
        n, m = 1, 0
    line_strings = []
    if m > 0:
        for line in array:
            if decimal == ",":
                line_strings.append("\t".join(line.astype(str)).replace(
                    "\n","").replace(".", ","))
            else:
                line_strings.append("\t".join(line.astype(str)).replace(
                    "\n",""))
        array_string = "\r\n".join(line_strings)
    else:
        if decimal == ",":
            array_string = "\r\n".join(array.astype(str)).replace(".", ",")
        else:
            array_string = "\r\n".join(array.astype(str))
    # Put string into clipboard (open, clear, set, close)
    clipboard.OpenClipboard()
    clipboard.EmptyClipboard()
    clipboard.SetClipboardText(array_string)
    clipboard.CloseClipboard()

Comments

-1

You could also look into pyxll project.

1 Comment

Yeah, that looks great. It seems this product could really improve both Excel's usefullness and Python's userbase! But for my current purpose, I don't like the licensing...

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.