Skip to content Skip to sidebar Skip to footer

How To Determine If Two Rows Are Identical (similar) If Row 2 Contains Part Of The Info From Row 1?

Hope you are having a good day. I am currently working with an extremely dirty dataframe containing First Name, Last Name, and Middle Name. One the issues that I am trying to resol

Solution 1:

If we just want to check that both elements in row2 are contained in the respective elements of row1, we just need one if statement

row1 = ["James", "Bond"]
row2 = ["Jam", "Bo"]

if row2[0] in row1[0] and row2[1] in row1[1]:   
    print("Similar!")
else:
    print("Not Similar!")

If you want to check the opposite case (that ro1 is in row2), just create a second if statement with 'row1' and 'row2' terms swapped.


Solution 2:

This is a not so simple problem. To check if 2 strings are 'similar' you must enter in non-Euclidean distance algorithm. I mean, you must define a similarity function and 'understand' the distance between string.

jellyfish is a library born to solve these problems

Another approach is to collect all names and bind them to a thesaurus of names like this

With a some search, I've found this

hope can help


Post a Comment for "How To Determine If Two Rows Are Identical (similar) If Row 2 Contains Part Of The Info From Row 1?"