Skip to content Skip to sidebar Skip to footer

Understanding Numpy's Interpretation Of String Data Types

Lets say I have a bytes object that represents some data, and I want to convert it to a numpy array via np.genfromtxt. I am having trouble understanding how I should handle strings

Solution 1:

In my Python3 session:

In [568]: text = b'test, 5, 1.2'# I don't need BytesIO since genfromtxt works with a list of# byte strings, as from text.splitlines()

In [570]: np.genfromtxt([text], delimiter=',', dtype=None)
Out[570]: 
array((b'test', 5, 1.2), 
      dtype=[('f0', 'S4'), ('f1', '<i4'), ('f2', '<f8')])

If left to its own devices genfromtxt deduces that the 1st field should be S4 - 4 bytestring characters.

I could also be explicit with the types:

In [571]: types=['S4', 'i4', 'f4']
In [572]: np.genfromtxt([text],delimiter=',',dtype=types)
Out[572]: 
array((b'test', 5, 1.2000000476837158), 
      dtype=[('f0', 'S4'), ('f1', '<i4'), ('f2', '<f4')])
In [573]: types=['S10', 'i', 'f']
In [574]: np.genfromtxt([text],delimiter=',',dtype=types)
Out[574]: 
array((b'test', 5, 1.2000000476837158), 
      dtype=[('f0', 'S10'), ('f1', '<i4'), ('f2', '<f4')])

In [575]: types=['U10', 'int', 'float']
In [576]: np.genfromtxt([text],delimiter=',',dtype=types)
Out[576]: 
array(('test', 5, 1.2), 
      dtype=[('f0', '<U10'), ('f1', '<i4'), ('f2', '<f8')])

I can specify either S or U (unicode), but I also have to specify the length. I don't think there's a way with genfromtxt to let it deduce the length - except for the None type. I'd have to dig into the code to see how it deduces the string length.

I could also create this array with np.array (by making it a tuple of substrings, and giving a correct dtype:

In [599]: np.array(tuple(text.split(b',')), dtype=[('f0', 'S4'), ('f1', '<i4'), ('f2', '<f8')])
Out[599]: 
array((b'test', 5, 1.2), 
      dtype=[('f0', 'S4'), ('f1', '<i4'), ('f2', '<f8')])

Post a Comment for "Understanding Numpy's Interpretation Of String Data Types"