Match Column Values To Dict
I have a dict and a dataframe like the examples v and df below. I want to search through the items in df and return the item that has the maximum number of field values in common
Solution 1:
Create one line DataFrame
and merge
with original:
a = pd.DataFrame(v, index=[0]).merge(df)['item']
print (a)
0 3
Name: item, dtype: int64
Another solution with query
, but if strings values of dict
is necessary add another "
:
v1 = {k: '"{}"'.format(v) if isinstance(v, str) else v for k, v in v.items()}
print (v1)
{'size': 1, 'color': '"red"'}
df = df.query(' & '.join(['{}=={}'.format(i,j) for i, j in v1.items()]))['item']
print (df)
1 3
Name: item, dtype: int64
In output are possible 3 ways - Series
with more values, one value or empty, so helper function was created:
def get_val(v):
x = pd.DataFrame(v, index=[0]).merge(df)['item']
if x.empty:
return 'Not found'
elif len(x) == 1:
return x.values[0]
else:
return x.values.tolist()
print (get_val({'size':1,'color':'red'}))
3
print (get_val({'size':10,'color':'red'}))
Not found
print (get_val({'color':'red'}))
[2, 3]
Solution 2:
An alternative solution is to work with dictionaries instead of dataframes:
v = {'size': 1, 'color': 'red'}
match_count = {}
fields = df.columns[1:]
for k, value in df.to_dict(orient='index').items():
match_count[value['item']] = sum(value[i] == v[i] for i in fields & v.keys())
Result
print(match_count)
# {2: 1, 3: 2}
res = max(match_count.items(), key=lambda x: x[1])
print(res)
# (3, 2)
Post a Comment for "Match Column Values To Dict"