2

I have a few files (the number of which is undetermined a priori) that I'd like to read simultaneously, line by line or in chunks, do some processing and move to the next line (or chunk) in all files. I guess my requirements are more or less similar to the ones in this question. However, in my case the files can have different numbers of lines and while trying to implement something like ExitStack, noticed that all files will be closed as soon as one of them is closed (likely to be the one with the least number of lines), whereas I'd like to continue the processing of the other files (eventually assigning empty strings to the "lines" of closed files). Is this something possible to accomplish? And how?

#cat f1.txt
RNvn 40
AvOp 13
yEVA 94
oNGn 10
VZQU 88

#cat f2.txt
gSNn 4
zxHP 84
ebRw 70
NaxL 2
lXUb 49
PQzn 79
aIyN 88

#cat f3.txt
XXce 5
RMIq 4
FFEi 47
wuLZ 60

With a simple implementation of ExitStack, the result only includes 4 lines, because file f3.txt only has 4 lines:

flist = ['f1.txt', 'f2.txt', 'f3.txt']
with ExitStack() as stack:
    files = [stack.enter_context(open(fname)) for fname in flist]
    for lines in zip(*files):
        print(lines)

# prints
('RNvn 40\n', 'gSNn 4\n', 'XXce 5\n')
('AvOp 13\n', 'zxHP 84\n', 'RMIq 4\n')
('yEVA 94\n', 'ebRw 70\n', 'FFEi 47\n')
('oNGn 10\n', 'NaxL 2\n', 'wuLZ 60\n')

2 Answers 2

5

You can use the best of both worlds.

The code is leaving the context with ExitStack() as stack: because zip() is exhausted on the shortest file. It has nothing to do with ExitStack(). Using zip_longest() won't exhaust until all files are finished. Then the ExitStack() will close the files.

from contextlib import ExitStack
from itertools import zip_longest

flist = ['f1.txt', 'f2.txt', 'f3.txt']
with ExitStack() as stack:
    files = [stack.enter_context(open(fname)) for fname in flist]
    for lines in zip_longest(*files):
        print(lines)
1
  • ohhh, the trick was actually on using zip_longest and it had nothing to do with ExitStack. Will update my answer as well.
    – PedroA
    Commented May 22, 2019 at 9:49
0

Answering my own question, but please feel free to add any notes/improvements/alternatives.

One way to get around this issue is to open all files, without using the with statement to keep reading until all files are read, and use zip_longest from itertools to gather all lines simultaneously. In the end close all files. Something in the lines of the code below should work:

from itertools import zip_longest
flist = ['f1.txt', 'f2.txt', 'f3.txt']

files = [open(i, 'rt') for i in flist]
for lines in zip_longest(*files):
    print(lines)

for f in files:
    f.close()

# this prints all lines as expected:
('RNvn 40\n', 'gSNn 4\n', 'XXce 5\n')
('AvOp 13\n', 'zxHP 84\n', 'RMIq 4\n')
('yEVA 94\n', 'ebRw 70\n', 'FFEi 47\n')
('oNGn 10\n', 'NaxL 2\n', 'wuLZ 60\n')
('VZQU 88\n', 'lXUb 49\n', None)
(None, 'PQzn 79\n', None)
(None, 'aIyN 88\n', None)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.