Skip to content Skip to sidebar Skip to footer

Append Series To Empty Dataframe Column Always Results The Same After A Loop

import pandas as pd df = pd.DataFrame(columns=['A', 'B']) df2 = pd.DataFrame({'C': [5, 6, 7, 8, 9], 'D': [1, 2, 3, 4, 5]}) for i in range(5): df['A'] = df['A'].append(df2['C

Solution 1:

TL;DR By assigning series to Dataframe column, the series will be conformed to the DataFrames index. The result of append() has more elements than the index of df, so column value won't change.

There is no problem with the append() function, the problem is in df["A"] assignment.

With df["A"] = xx, we are calling __setitem__():

def__setitem__(self, key, value):
        key = com.apply_if_callable(key, self)

        # see if we can slice the rows
        indexer = convert_to_index_sliceable(self, key)
        if indexer isnotNone:
            # either we have a slice or we have a string that can be converted#  to a slice for partial-string date indexingreturn self._setitem_slice(indexer, value)

        ifisinstance(key, DataFrame) orgetattr(key, "ndim", None) == 2:
            self._setitem_frame(key, value)
        elifisinstance(key, (Series, np.ndarray, list, Index)):
            self._setitem_array(key, value)
        else:
            # set column
            self._set_item(key, value)

In this case, we are not accessing the dataframe like df[:], so indexer is None. key value is A, which is just a string type. So we actually call:

self._set_item(key, value)

Let's see how _set_item() is defined:

def_set_item(self, key, value):
        """
        Add series to DataFrame in specified column.
        If series is a numpy-array (not a Series/TimeSeries), it must be the
        same length as the DataFrames index or an error will be thrown.
        Series/TimeSeries will be conformed to the DataFrames index to
        ensure homogeneity.
        """
        self._ensure_valid_index(value)
        value = self._sanitize_column(key, value)
        NDFrame._set_item(self, key, value)

        # check if we are modifying a copy# try to set first as we want an invalid# value exception to occur firstiflen(self):
            self._check_setitem_copy()

From the doc, we can see Series/TimeSeries will be conformed to the DataFrames index to ensure homogeneity.. This explains why the dataframe df doesn't change. Because after the first loop, the result of append() is larger than the index of df, the redundant is truncated.

If so, why appending to dataframe df is successful in the first loop? The answer lays in self._ensure_valid_index(value)

def_ensure_valid_index(self, value):
        """
        Ensure that if we don't have an index, that we can create one from the
        passed value.
        """

If the dataframe is empty, this method extends the dataframe to a len(value)*columns matrix with NaN values. Then with NDFrame._set_item(self, key, value), we replace the column key with value.

In the second example, we are trying to append to B column after A column:

for i inrange(5):
    df["A"] = df["A"].append(df2["C"], ignore_index=True)
    df["B"] = df["B"].append(df2["D"], ignore_index=True)

In the first loop, after appending to A column, the B column of dataframe df is filled with NaN. df["B"].append(df2["D"], ignore_index=True) appends values to original NaN. By assigning it to df["B"], the append() result will be conformed to the DataFrames index. That's why df["B"] remains NaN.

In the third example, we just replace the dataframe df with the result of append, it doesn't involve with dataframe __setitem__().

for i inrange(5):
    df = df.append(df2, ignore_index=True)

Post a Comment for "Append Series To Empty Dataframe Column Always Results The Same After A Loop"