Skip to content Skip to sidebar Skip to footer

64 Bit System, 8gb Of Ram, A Bit More Than 800mb Of Csv And Reading With Python Gives Memory Error

f = open('data.csv') f.seek(0) f_reader = csv.reader(f) raw_data = np.array(list(islice(f_reader,0,10000000)),dtype = int) The above is the code I am using to read a csv file. Th

Solution 1:

could someone tell me how to read from, say the 20th million row please? I know I need to use f.seek(some number)

No, you can't (and mustn't) use f.seek() in this situation. Rather, you must read each of the first 20 million rows somehow.

The Python documentation has this recipie:

defconsume(iterator, n):
    "Advance the iterator n-steps ahead. If n is none, consume entirely."# Use functions that consume iterators at C speed.if n isNone:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position nnext(islice(iterator, n, n), None)

Using that, you would start after 20,000,000 rows thusly:

#UNTESTED
f = open("data.csv")
f_reader = csv.reader(f)
consume(f_reader, 20000000)
raw_data = np.array(list(islice(f_reader,0,10000000)),dtype = int)

or perhaps this might go faster:

#UNTESTED
f = open("data.csv")
consume(f, 20000000)
f_reader = csv.reader(f)
raw_data = np.array(list(islice(f_reader,0,10000000)),dtype = int)

Post a Comment for "64 Bit System, 8gb Of Ram, A Bit More Than 800mb Of Csv And Reading With Python Gives Memory Error"