Python Pandas From 0/1 Dataframe To An Itemset List
What is the most efficient way to go from a 0/1 pandas/numpy dataframe of this form:: >>> dd {'a': {0: 1, 1: 0, 2: 1, 3: 0, 4: 1, 5: 1}, 'b': {0: 1, 1: 1, 2: 0, 3: 0, 4:
Solution 1:
You can first multiple by columns names by mul
and convert DataFrame
to numpy array
by values
:
print (df.mul(df.columns.to_series()).values)
[['a''b''''''']['''b''c''d''']['a''''c''d''e']['''''''d''']['a''b''c''''']['a''b''c''d''']]
Remove empty string by nested list comprehension:
print ([[y for y in x if y != ''] for x in df.mul(df.columns.to_series()).values])
[['a', 'b'],
['b', 'c', 'd'],
['a', 'c', 'd', 'e'],
['d'],
['a', 'b', 'c'],
['a', 'b', 'c', 'd']]
Solution 2:
Here's a NumPy based vectorized approach to get a list of arrays as output -
In [47]: df
Out[47]:
a b c d e
011000101110210111300010411100511110
In [48]: cols = df.columns.values.astype(str)
In [49]: R,C = np.where(df.values==1)
In [50]: np.split(cols[C],np.unique(R,return_index=True)[1])[1:]
Out[50]:
[array(['a', 'b'],
dtype='|S1'), array(['b', 'c', 'd'],
dtype='|S1'), array(['a', 'c', 'd', 'e'],
dtype='|S1'), array(['d'],
dtype='|S1'), array(['a', 'b', 'c'],
dtype='|S1'), array(['a', 'b', 'c', 'd'],
dtype='|S1')]
Solution 3:
Simple list comprehesion:
itemset = [[df.columns.values[j] # the output based on the following logic:for j inrange(0, len(df.iloc[i]))
if df.iloc[i][j] == 1]
for i inrange(0, len(df.index))]
print (itemset)
Gives the result:
$ python test.py[['a', 'b'], ['b', 'c', 'd'], ['a', 'c', 'd', 'e'], ['d'], ['a', 'b', 'c'], ['a', 'b', 'c', 'd']]
Here's another format: Add this to the end of your list comprehension.
print ('[', end='')
for i in range(0, len(itemset)):
if i == len(itemset) - 1:
print (itemset[i], end='')
else:
print (itemset[i], end=',\n ')
print (']')
Output:
$ python test.py[['a', 'b'],
['b', 'c', 'd'],
['a', 'c', 'd', 'e'],
['d'],
['a', 'b', 'c'],
['a', 'b', 'c', 'd']]
Post a Comment for "Python Pandas From 0/1 Dataframe To An Itemset List"