How To Determine If Two Rows Are Identical (similar) If Row 2 Contains Part Of The Info From Row 1?
Hope you are having a good day. I am currently working with an extremely dirty dataframe containing First Name, Last Name, and Middle Name. One the issues that I am trying to resol
Solution 1:
If we just want to check that both elements in row2 are contained in the respective elements of row1, we just need one if statement
row1 = ["James", "Bond"]
row2 = ["Jam", "Bo"]
if row2[0] in row1[0] and row2[1] in row1[1]:
print("Similar!")
else:
print("Not Similar!")
If you want to check the opposite case (that ro1 is in row2), just create a second if statement with 'row1' and 'row2' terms swapped.
Solution 2:
This is a not so simple problem. To check if 2 strings are 'similar' you must enter in non-Euclidean distance algorithm. I mean, you must define a similarity function and 'understand' the distance between string.
jellyfish is a library born to solve these problems
Another approach is to collect all names and bind them to a thesaurus of names like this
With a some search, I've found this
hope can help
Post a Comment for "How To Determine If Two Rows Are Identical (similar) If Row 2 Contains Part Of The Info From Row 1?"