Split Text In Text File On The Basis Of Comma And Space (python)
Solution 1:
It would appear that you can only be certain that a line has a location if there is a comma. So it would make sense to parse the file in two passes. The first pass can build a set
holding all known locations. You can start this off with some known examples or problem cases.
Pass two could then also use the comma to match known locations but if there is no comma, the line is split into a set of words. The intersection of these with the location set should give you the location. If there is no intersection then it is flagged as "unknown".
locations = set(["London", "Faisalabad"])
withopen(path, 'r') as f_input:
unknown = 0# Pass 1, build a set of locationsfor line in f_input:
line = line.strip(' ,"\n')
if','in line:
loc = line.rsplit("," ,1)[1].strip()
locations.add(loc)
# Pass 2, try and find location in line
f_input.seek(0)
for line in f_input:
line = line.strip(' "\n')
if','in line:
uni, loc = line.rsplit("," ,1)
loc = loc.strip()
else:
uni = line
loc_matches = set(re.findall(r"\b(\w+)\b", line)).intersection(locations)
if loc_matches:
loc = list(loc_matches)[0]
else:
loc = "<unknown location>"
unknown += 1
uni = uni.strip()
print"uni:", uni
print"Loc:", loc
print"Unknown locations:", unknown
Output would be:
uni:ImperialCollegeofBusinessStudiesLoc:Lahoreuni:GovernmentCollegeUniversityFaisalabadLoc:Faisalabaduni:ImperialCollegeofBusinessStudiesLahoreLoc:Lahoreuni:UniversityofPeshawarLoc:Peshawaruni:UniversityofSindhLoc:Jamshorouni:LondonSchoolofEconomicsLoc:Londonuni:LahoreSchoolofEconomicsLoc:LahoreUnknown locations:0
Solution 2:
Your input file does not have commas on every line, causing the code to fail. You could do something like
if','in line:
loc = rep.split(',')[1].strip()
else:
loc = rep.split()[-1].strip()
to handle the lines without comma differently, or simply reformat the input.
Solution 3:
You can split using a comma, the result is always a list, you can check its size, if it is more than one, then you had already at least one comma, otherwise (if the size is one) you didn't have any comma
>>>word = "somethign without a comma">>>afterSplit = word.split(',')>>>afterSplit
['somethign without a comma']
>>>word2 = "something with, just one comma">>>afterSplit2 = word2.split(',')>>>afterSplit2
['something with', ' just one comma']
Solution 4:
I hope this will work, but I couldn't get 'London' though. May be the data should be constant.
f_data = open('places.txt').readlines()
stop_words = ['school', 'Economics', 'University', 'College']
places = []
for p in f_data:
p = p.replace('"', '')
if',' in p:
city = p.split(',')[-1].strip()
else:
city = p.split(' ')[-1].strip()
if city not in places and city not in stop_words:
places.append(city)
print places
o/p [' Lahore', ' Faisalabad', 'Lahore', 'Peshawar', ' Jamshoro']
Post a Comment for "Split Text In Text File On The Basis Of Comma And Space (python)"