Skip to content Skip to sidebar Skip to footer

Match Acronym And Their Meaning With Python Regex

I am working on a Python function that will use regular expressions to find within a sentence the acronym within parentheses and its meaning within the sentence. For example, 'The

Solution 1:

r will match the characters

'r' is a python prefix that will result in the string to be considered as a raw string literal. It is not part of the re syntax.

the ? will match zero or one times,

This ? referred here is part of (?: which implies that this becomes a non capturing group that is part of the match but not returned as a matched group.

$ asserts the position at the end

It asserts the position at the end of the entire string, and not only the matched portion.

This pattern will obtain the name as well as abbreviation:

pattern = re.compile("^(.*?)\((.*?)?\)") 
for i in pattern.finditer(text):
    name, abbrev = i.groups() 
    print name.strip(), abbrev

Solution 2:

You can do something like this.

import re

text = "The Department of State (DOS) is the United States federal executive department responsible for international relations of the United States." 

acronym = re.search(r"(?<=\().*?(?=\))", text).group(0).lower()

regex = r"(?<= )"
for i in range(0, len(acronym)):
    if i > 0: regex += " "
    regex += acronym[i] + r".*?"

regex += r"(?= )"
meaning = re.search(regex, text).group(0).lower()

print("Acronym '"+acronym+"' stands for '"+meaning+"'.")

This does not work, I'm not good with Python at all, but I guess you can fix it pretty easily. The idea is to get the string inside the parenthesis, then make a regex from it which search words beginning with the letters of the acronym.


Post a Comment for "Match Acronym And Their Meaning With Python Regex"