Skip to content Skip to sidebar Skip to footer

Find Similar Sentences In Between Two Documents And Calculate Similarity Score For Each Section In Whole Documents

I took this example from web. My document one contains: Document 1 : Purpose of visit : For physical check up. History of patient : This is the first admission for this 56 year old

Solution 1:

You can use difflib module.

This module provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce difference information in various formats, including HTML and context and unified diffs. For comparing directories and files, see also, the filecmp module.

In your case, you need difflib.SequenceMatcher, class for comparing pairs of sequences of any type, so long as the sequence elements are hashable.

Sample example:

from difflib import SequenceMatcher
text_1 = "private Thread currentThread;"
text_2 = "private volatile Thread currentThread;"
s = SequenceMatcher(lambda x: x == " ",
                    text_1,
                    text_2)

Now for measuring the similarity of the sequences, use ratio() which returns a float in [0, 1]. As a rule of thumb, a ratio() value over 0.6 means the sequences are close matches.

>>>s.ratio()
0.8656716417910447

Post a Comment for "Find Similar Sentences In Between Two Documents And Calculate Similarity Score For Each Section In Whole Documents"