How To Avoid Code Repetition And Redundancy
I am trying to simplify some code which does the following: create one empty list where to store the information scraped from one website apply a function to fill the list add the
Solution 1:
As far as I can tell from my_df
, the list1 declaration should be inside fun
, or you're emptying it elsewhere.
First, I would change fun
to only work on one entry (not whole Series):
def fun(x):
list1 = []
url = "my_website"+x
soup = BeautifulSoup(requests.get(url).content, "html.parser")
...
list1.append(data1)
return list1
Then, you can do the first transformation (populating second column List1) by doing:
my_df['List1'] = my_df.Col.apply(lambda x: fun(x))
After that, you could do something like:
while scraping_to_do:
newCol = pd.Series(list(set(my_df['List1']) - set(my_df['Col'])))
newList1 = newCol.apply(lambda x: fun(x))
my_df = my_df.append(pd.DataFrame(dict('Col'=newCol, 'List1'=newList1)), ignore_index=True)
my_df = my_df.explode('List1')
You need to figure out when to stop scraping (when the set difference is the empty set?), as well as deal with the NaNs that explode
produces from empty lists.
Post a Comment for "How To Avoid Code Repetition And Redundancy"