Skip to content Skip to sidebar Skip to footer

How To Iterate Over Arbitrary Number Of Files In Parallel In Python?

I have a list of file objects in a list called paths I'd like to be able to go through and read the first line of each file, do something with this n-tuple of data, then move on th

Solution 1:

import itertools
for line_tuple in itertools.izip(*files):
    whatever()

I'd use zip, but that would read the entire contents of the files into memory. Note that files should be a list of file objects; I'm not sure what you mean by "list of file handlers".


Solution 2:

This depends on how "arbitrary" it actually is. As long as the number is less than the limit of your OS, then itertools.izip should work just fine (or itertools.izip_longest as appropriate).

files = [open(f) for f in filenames]
for lines in itertools.izip(*files):
    # do something

for f in files:
    f.close()

If you can have more files than your OS will allow you to open, then you're out of luck (at least as far as an easy solution is concerned).


Solution 3:

the first idea pop into my mind the following code , it seems too Straightforward

fp_list = []
for file in path_array:
    fp = open(file)
    fp_list.append(fp)

line_list = []
for fp in fp_list:
    line = fp.readline()
    line_list.append(line)
    ## you code here process the line_list

for fp in fp_list:
    fp.close()

Post a Comment for "How To Iterate Over Arbitrary Number Of Files In Parallel In Python?"