Skip to content Skip to sidebar Skip to footer

How To Avoid Code Repetition And Redundancy

I am trying to simplify some code which does the following: create one empty list where to store the information scraped from one website apply a function to fill the list add the

Solution 1:

As far as I can tell from my_df, the list1 declaration should be inside fun, or you're emptying it elsewhere.

First, I would change fun to only work on one entry (not whole Series):

def fun(x):
    list1 = []
    url = "my_website"+x
    soup = BeautifulSoup(requests.get(url).content, "html.parser")
    ...
    list1.append(data1)
    return list1

Then, you can do the first transformation (populating second column List1) by doing:

my_df['List1'] = my_df.Col.apply(lambda x: fun(x))

After that, you could do something like:

while scraping_to_do:
     newCol = pd.Series(list(set(my_df['List1']) - set(my_df['Col'])))
     newList1 = newCol.apply(lambda x: fun(x))
     my_df = my_df.append(pd.DataFrame(dict('Col'=newCol, 'List1'=newList1)), ignore_index=True)
     my_df = my_df.explode('List1')

You need to figure out when to stop scraping (when the set difference is the empty set?), as well as deal with the NaNs that explode produces from empty lists.

Post a Comment for "How To Avoid Code Repetition And Redundancy"