Skip to content Skip to sidebar Skip to footer

Pandas Populate New Dataframe Column Based On Matching Columns In Another Dataframe

I have a df which contains my main data which has one million rows. My main data also has 30 columns. Now I want to add another column to my df called category. The category is a c

Solution 1:

Consider the following dataframes df and df2

df = pd.DataFrame(dict(
        AUTHOR_NAME=list('AAABBCCCCDEEFGG'),
        title=      list('zyxwvutsrqponml')
    ))

df2 = pd.DataFrame(dict(
        AUTHOR_NAME=list('AABCCEGG'),
        title      =list('zwvtrpml'),
        CATEGORY   =list('11223344')
    ))

option 1merge

df.merge(df2, how='left')

option 2join

cols = ['AUTHOR_NAME', 'title']
df.join(df2.set_index(cols), on=cols)

both options yield

enter image description here

Solution 2:

APPROACH 1:

You could use concat instead and drop the duplicated values present in both Index and AUTHOR_NAME columns combined. After that, use isin for checking membership:

df_concat = pd.concat([df2, df]).reset_index().drop_duplicates(['Index', 'AUTHOR_NAME'])
df_concat.set_index('Index', inplace=True)
df_concat[df_concat.index.isin(df.index)]

Image

Note: The column Index is assumed to be set as the index column for both the DF's.


APPROACH 2:

Use join after setting the index column correctly as shown:

df2.set_index(['Index', 'AUTHOR_NAME'], inplace=True)
df.set_index(['Index', 'AUTHOR_NAME'], inplace=True)

df.join(df2).reset_index()

Image

Solution 3:

While the other answers here give very good and elegant solutions to the asked question, I have found a resource that both answers this question in an extremely elegant fashion, as well as giving a beautifully clear and straightforward set of examples on how to accomplish join/ merge of dataframes, effectively teaching LEFT, RIGHT, INNER and OUTER joins.

Join And Merge Pandas Dataframe

I honestly feel any further seekers after this topic will want to also examine his examples...

Solution 4:

You may try the following. It will merge both the datasets on specified column as key.

expected_result = pd.merge(df, df2, on = 'CATEGORY', how = 'left')

Post a Comment for "Pandas Populate New Dataframe Column Based On Matching Columns In Another Dataframe"