Understanding Numpy's Interpretation Of String Data Types
Lets say I have a bytes object that represents some data, and I want to convert it to a numpy array via np.genfromtxt. I am having trouble understanding how I should handle strings
Solution 1:
In my Python3 session:
In [568]: text = b'test, 5, 1.2'# I don't need BytesIO since genfromtxt works with a list of# byte strings, as from text.splitlines()
In [570]: np.genfromtxt([text], delimiter=',', dtype=None)
Out[570]:
array((b'test', 5, 1.2),
dtype=[('f0', 'S4'), ('f1', '<i4'), ('f2', '<f8')])
If left to its own devices genfromtxt
deduces that the 1st field should be S4
- 4 bytestring characters.
I could also be explicit with the types:
In [571]: types=['S4', 'i4', 'f4']
In [572]: np.genfromtxt([text],delimiter=',',dtype=types)
Out[572]:
array((b'test', 5, 1.2000000476837158),
dtype=[('f0', 'S4'), ('f1', '<i4'), ('f2', '<f4')])
In [573]: types=['S10', 'i', 'f']
In [574]: np.genfromtxt([text],delimiter=',',dtype=types)
Out[574]:
array((b'test', 5, 1.2000000476837158),
dtype=[('f0', 'S10'), ('f1', '<i4'), ('f2', '<f4')])
In [575]: types=['U10', 'int', 'float']
In [576]: np.genfromtxt([text],delimiter=',',dtype=types)
Out[576]:
array(('test', 5, 1.2),
dtype=[('f0', '<U10'), ('f1', '<i4'), ('f2', '<f8')])
I can specify either S
or U
(unicode), but I also have to specify the length. I don't think there's a way with genfromtxt
to let it deduce the length - except for the None
type. I'd have to dig into the code to see how it deduces the string length.
I could also create this array with np.array
(by making it a tuple of substrings, and giving a correct dtype:
In [599]: np.array(tuple(text.split(b',')), dtype=[('f0', 'S4'), ('f1', '<i4'), ('f2', '<f8')])
Out[599]:
array((b'test', 5, 1.2),
dtype=[('f0', 'S4'), ('f1', '<i4'), ('f2', '<f8')])
Post a Comment for "Understanding Numpy's Interpretation Of String Data Types"