Match Acronym And Their Meaning With Python Regex
Solution 1:
r will match the characters
'r' is a python prefix that will result in the string to be considered as a raw string literal. It is not part of the re
syntax.
the ? will match zero or one times,
This ?
referred here is part of (?:
which implies that this becomes a non capturing group that is part of the match but not returned as a matched group.
$ asserts the position at the end
It asserts the position at the end of the entire string, and not only the matched portion.
This pattern will obtain the name as well as abbreviation:
pattern = re.compile("^(.*?)\((.*?)?\)")
for i in pattern.finditer(text):
name, abbrev = i.groups()
print name.strip(), abbrev
Solution 2:
You can do something like this.
import re
text = "The Department of State (DOS) is the United States federal executive department responsible for international relations of the United States."
acronym = re.search(r"(?<=\().*?(?=\))", text).group(0).lower()
regex = r"(?<= )"
for i in range(0, len(acronym)):
if i > 0: regex += " "
regex += acronym[i] + r".*?"
regex += r"(?= )"
meaning = re.search(regex, text).group(0).lower()
print("Acronym '"+acronym+"' stands for '"+meaning+"'.")
This does not work, I'm not good with Python at all, but I guess you can fix it pretty easily. The idea is to get the string inside the parenthesis, then make a regex from it which search words beginning with the letters of the acronym.
Post a Comment for "Match Acronym And Their Meaning With Python Regex"