Skip to content Skip to sidebar Skip to footer

Pandas: Split List In Column Into Multiple Rows

I have a question regarding splitting a list in a dataframe column into multiple rows. Let's say I have this dataframe: Job position Job type id 0 [6] [1] 3

Solution 1:

Similar to Scott Boston's suggestion, I suggest you explode the columns separately, then merge them together.

For example, for 'Job position':

>>> df['Job position'].apply(pd.Series).reset_index().melt(id_vars='index').dropna()[['index', 'value']].set_index('index')
    value
index   
06.012.021.016.0

And, all together:

df = pd.DataFrame({'Job position': [[6], [2, 6], [1]], 'Job type': [[1], [3, 6, 5], [9]], 'id': [3, 4, 43]})
jobs = df['Job position'].apply(pd.Series).reset_index().melt(id_vars='index').dropna()[['index', 'value']].set_index('index')
types = df['Job type'].apply(pd.Series).reset_index().melt(id_vars='index').dropna()[['index', 'value']].set_index('index')
>>> pd.merge(
    pd.merge(
        jobs,
        types,
        left_index=True,
        right_index=True),
    df[['id']],
    left_index=True,
    right_index=True).rename(columns={'value_x': 'Job positions', 'value_y': 'Job type'})
Job positions   Job type    id
06.01.0312.03.0412.06.0412.05.0416.03.0416.06.0416.05.0421.09.043

Solution 2:

Use a comprehension

pd.DataFrame([
    [p, t, i] forP, T, i in df.values
    forpin P fortin T
], columns=df.columns)

   Job position  Job typeid061312342264325446345664665471943

Alternatives to iterating over values

pd.DataFrame([
    [p, t, i] forP, T, i in df.itertuples(index=False)
    forpin P fortin T
], columns=df.columns)

z = zip(df['Job position'], df['Jobtype'], df['id'])
pd.DataFrame([
    [p, t, i] forP, T, i in z
    forpin P fortin T
], columns=df.columns)

To generalize this solution to accommodate any number of columns

pd.DataFrame([
    [p, t] + a forP, T, *a in df.values
    forpin P fortin T
], columns=df.columns)

   Job position  Job typeid061312342264325446345664665471943

Solution 3:

From data frame constructor

s1=df.Jobposition.str.len()

s2=df.Jobtype.str.len()
pd.DataFrame({'id':df.id.repeat(s1*s2),
  'Jobposition':np.concatenate([np.repeat(x,y) for x,y inzip(df.Jobposition,s2)]),
  'Jobtype':np.concatenate(np.repeat(df.Jobtype,s1).values)})

   Jobposition  Jobtype  id061312341264125416341664165421943

Solution 4:

import itertools
dfres = pd.DataFrame([j+(i[2],) for i in df.values for j in itertools.product(*i[0:2])]
        ,columns=df.columns)

   Job position  Job typeid061312342264325446345664665471943

Post a Comment for "Pandas: Split List In Column Into Multiple Rows"