Skip to content Skip to sidebar Skip to footer

Python Pandas From 0/1 Dataframe To An Itemset List

What is the most efficient way to go from a 0/1 pandas/numpy dataframe of this form:: >>> dd {'a': {0: 1, 1: 0, 2: 1, 3: 0, 4: 1, 5: 1}, 'b': {0: 1, 1: 1, 2: 0, 3: 0, 4:

Solution 1:

You can first multiple by columns names by mul and convert DataFrame to numpy array by values:

print (df.mul(df.columns.to_series()).values)
[['a''b''''''']['''b''c''d''']['a''''c''d''e']['''''''d''']['a''b''c''''']['a''b''c''d''']]

Remove empty string by nested list comprehension:

print ([[y for y in x if y != ''] for x in df.mul(df.columns.to_series()).values])
[['a', 'b'], 
 ['b', 'c', 'd'],
 ['a', 'c', 'd', 'e'], 
 ['d'], 
 ['a', 'b', 'c'], 
 ['a', 'b', 'c', 'd']]

Solution 2:

Here's a NumPy based vectorized approach to get a list of arrays as output -

In [47]: df
Out[47]: 
   a  b  c  d  e
011000101110210111300010411100511110

In [48]: cols = df.columns.values.astype(str)

In [49]: R,C = np.where(df.values==1)

In [50]: np.split(cols[C],np.unique(R,return_index=True)[1])[1:]
Out[50]: 
[array(['a', 'b'], 
       dtype='|S1'), array(['b', 'c', 'd'], 
       dtype='|S1'), array(['a', 'c', 'd', 'e'], 
       dtype='|S1'), array(['d'], 
       dtype='|S1'), array(['a', 'b', 'c'], 
       dtype='|S1'), array(['a', 'b', 'c', 'd'], 
       dtype='|S1')]

Solution 3:

Simple list comprehesion:

itemset = [[df.columns.values[j] # the output based on the following logic:for j inrange(0, len(df.iloc[i]))
        if df.iloc[i][j] == 1] 
    for i inrange(0, len(df.index))]

print (itemset)

Gives the result:

$ python test.py[['a', 'b'], ['b', 'c', 'd'], ['a', 'c', 'd', 'e'], ['d'], ['a', 'b', 'c'], ['a', 'b', 'c', 'd']]

Here's another format: Add this to the end of your list comprehension.

print ('[', end='')
for i in range(0, len(itemset)):
    if i == len(itemset) - 1:
        print (itemset[i], end='')
    else:
        print (itemset[i], end=',\n ')
print (']')

Output:

$ python test.py[['a', 'b'],
 ['b', 'c', 'd'],
 ['a', 'c', 'd', 'e'],
 ['d'],
 ['a', 'b', 'c'],
 ['a', 'b', 'c', 'd']]

Post a Comment for "Python Pandas From 0/1 Dataframe To An Itemset List"