This question is about elapsed time.
It should include observations from
cProfile.
More than two hours for a smallish number of rows?
Less than three lines per second?
Wow, that's impressively slow.
There's not a lot going on here.
So all I can imagine is that this might have O(N^2) quadratic performance:
ws = wb.active
...
for row in table.rows:
ws.append([row.cells[-2].text])
To verify, simply comment out those two lines
(or turn the append line into pass)
and do a timing run.
If my guess is correct and that is the source
of delay, then consider accumulating rows
in a list rather than a worksheet.
And append the rows all at once.
Or, almost the same, use len( ... ) of that
list to pre-extend the worksheet so it
has the proper number of rows, and
then store each list element in the proper row.
Theory: Some APIs suffer from high cost
of repeatedly appending single element.
We saw this for example in early versions
of the cPython interpreter, which led to this idiom:
lines = []
for n in range(1_000_000):
lines.append(f"{n} bottles of beer on the wall")
return "\n".join(lines)
Nowadays the allocation behavior is better so it
is safe to write
lines = ""
for n in range(1_000_000):
lines += f"{n} bottles of beer on the wall\n"
return lines
What changed?
The allocator now "wastes" some memory when extending a string,
anticipating that this might not be the last such extension.
Crucially, it uses a multiplicative factor,
such as doubling the allocation or even multiplying
current length by, say, 1.3. Any factor greater than one
would suffice.
We still have to occasionally do O(N) linear work to
copy existing data into a newly allocated buffer.
But with amortization, each extension operation
has just O(log N) logarithmic cost.
The list allocator always had that behavior.
Numpy's allocator exhibits similar quadratic badness,
so there is strong incentive to pre-allocate an
appropriate number of rows from the get go.
I don't know about worksheet behavior.
But that's my guess.
Let us know
how close I came to the mark.
As a backup plan, consider using the very fast
csv
module to create a giant output.csv file.
And finally turn that file into the desired XLSX format
in a single operation.
Guaranteed to go faster than three rows per second.