Skip to content Skip to sidebar Skip to footer

Pandas: I Want To Sum A Count If A String Exists In Any One Of Several Columns And Add This Count To Another Dataframe With The Searched Term

I have a dataframe of videos with several columns of tags (strings) as follows: import pandas as pd videos = [(1, 'cool video','drama','horror'), (2, 'great video','sports','drama'

Solution 1:

Try this:

(df_search_terms['number_matching_videos'] = 
 df_search_terms['search_term'].map(df.set_index('video_id')
                                    .stack()
                                    .str.get_dummies()
                                    .sum()))

Here is another way:

df_search_terms['number_matching_videos'] = (df_search_terms['search_term']
                                             .map((df.loc[:,df.columns.str.contains('tag')]
                                                   .stack()
                                                   .str.extractall('({})'.format(df_search_terms['search_term'].str.cat(sep='|')))[0]
                                                   .str.get_dummies()
                                                   .sum())))

Solution 2:

Use regex to search and count all matches

search_re = '(' + df_search_terms.search_term.str.cat(sep=')|(') + ')'

Combine all tag columns into a single string and search

df_search_terms['number_matching_videos'] = (
    df.filter(regex='tag_*')
    .agg(' '.join, axis=1)
    .str.extractall(search_re)
    .notnull().sum()
)

Output

  search_term  number_matching_videos
0       drama                       2
1      horror                       2
2      sports                       1

Post a Comment for "Pandas: I Want To Sum A Count If A String Exists In Any One Of Several Columns And Add This Count To Another Dataframe With The Searched Term"