Skip to content Skip to sidebar Skip to footer

Replacing Few Values In A Column Based On A List In Python

here is one good explained topic on stackoverflow: Replacing few values in a pandas dataframe column with another value The example is: BrandName Specialty A H B

Solution 1:

Use regex=True for subtring replacement:

df['BrandName'] = df['BrandName'].replace(['ABC', 'AB'], 'A', regex=True)
print (df)
  BrandName Specialty
0         A         H
1         B         I
2       A B         J
3         D         K
4         A         L

Another solution is necessary, if need to avoid replacement values in anaother substrings, like ABCD is not replaced, then need regex words boundaries:

print (df)
  BrandName Specialty
0    A ABCD         H
1         B         I
2     ABC B         J
3         D         K
4        AB         L


L = [r"\b{}\b".format(x) for x in ['ABC', 'AB']]

df['BrandName1'] = df['BrandName'].replace(L, 'A', regex=True)
df['BrandName2'] = df['BrandName'].replace(['ABC', 'AB'], 'A', regex=True)
print (df)
  BrandName Specialty BrandName1 BrandName2
0    A ABCD         H     A ABCD       A AD
1         B         I          B          B
2     ABC B         J        A B        A B
3         D         K          D          D
4        AB         L          A          A

Edit(from the questioner):

To speed it up, you can have a look here: Speed up millions of regex replacements in Python 3

The best one is the trieapproach:

def trie_regex_from_words(words):
    trie = Trie()
    for word in words:
        trie.add(word)
    return re.compile(r"\b" + trie.pattern() + r"\b", re.IGNORECASE)

union = trie_regex_from_words(strings)
df['BrandName'] = df['BrandName'].replace(union, 'A', regex=True)

Post a Comment for "Replacing Few Values In A Column Based On A List In Python"