Skip to content Skip to sidebar Skip to footer

Tagging Words In Sentences Using Dictionares

I have a corpus of more than 100k sentences and i have dictionary. i want to match the words in the corpus and tagged them in the sentences corpus file 'sentences.txt' Hello how ar

Solution 1:

If you want your output in the order of the sentence input, then you need to build your output with respect to that order. Instead, you designed your program to report results in the order of the dictionary. You need to switch your inner and outer loops.

Read the dict file into an internal data structure, so you don't have to keep resetting and rereading the file.

Then read the sentence file, one line at a time. Look for words to tag (you already do that well). Make the replacements as you're doing, and write out the altered sentence.

Solution 2:

Without changing much in your code this should make it work:

...
phrases = []
for row in reader:
    needle = row[1]
    needle_length = len(needle.split())
    max_sim_val = 0.9
    max_sim_string = u""for ngram in ngrams(hay.split(), needle_length + int(.2 * needle_length)):
        hay_ngram = u" ".join(ngram)

        similarity = SM(None, hay_ngram, needle).ratio()
        if similarity > max_sim_val:
            max_sim_val = similarity
            max_sim_string = hay_ngram
            str = [row[1] , ' ', max_sim_val.__str__(),' ', max_sim_string , '\n']
            str1 = max_sim_string , row[2]
            phrases.append((max_sim_string, row[2]))

for line in hay.splitlines():
    ifany(max_sim_string in line for max_sim_string, _ in phrases):
        for phrase in phrases:
            max_sim_string, _ = phrase
            if max_sim_string in line:
                tag_sent = line.replace(max_sim_string, phrase.__str__())
                my3file.writelines(tag_sent + '\n')
                print(tag_sent)
                breakelse:
        my3file.writelines(line + '\n')

csvFile.close()

Post a Comment for "Tagging Words In Sentences Using Dictionares"