Pandas: I Want To Sum A Count If A String Exists In Any One Of Several Columns And Add This Count To Another Dataframe With The Searched Term
I have a dataframe of videos with several columns of tags (strings) as follows: import pandas as pd videos = [(1, 'cool video','drama','horror'), (2, 'great video','sports','drama'
Solution 1:
Try this:
(df_search_terms['number_matching_videos'] =
df_search_terms['search_term'].map(df.set_index('video_id')
.stack()
.str.get_dummies()
.sum()))
Here is another way:
df_search_terms['number_matching_videos'] = (df_search_terms['search_term']
.map((df.loc[:,df.columns.str.contains('tag')]
.stack()
.str.extractall('({})'.format(df_search_terms['search_term'].str.cat(sep='|')))[0]
.str.get_dummies()
.sum())))
Solution 2:
Use regex to search and count all matches
search_re = '(' + df_search_terms.search_term.str.cat(sep=')|(') + ')'
Combine all tag columns into a single string and search
df_search_terms['number_matching_videos'] = (
df.filter(regex='tag_*')
.agg(' '.join, axis=1)
.str.extractall(search_re)
.notnull().sum()
)
Output
search_term number_matching_videos
0 drama 2
1 horror 2
2 sports 1
Post a Comment for "Pandas: I Want To Sum A Count If A String Exists In Any One Of Several Columns And Add This Count To Another Dataframe With The Searched Term"