Skip to content Skip to sidebar Skip to footer

Inconsistency Between Dataframe.plot.scatter And Dataframe.plot.density()?

The following example illustrates a strange difference between scatter- and density plots from pandas DataFrame .. or possibly my lack of understanding: import numpy as np import p

Solution 1:

The inconsistency is not between density and scatter, but between the plotting method of a dataframe and the plotting method of a series:

  • A series, Series.plot, is plotted to the active axes, if there is one, else a new figure is created.

  • A dataframe, DataFrame.plot, is plotted to a new figure, independent on whether there already exists one.

Example:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'x': np.random.randn(25), 'y': np.random.randn(25), 
                   'season': np.random.choice(['red', 'gold'], 25)})

# This plots the dataframe, and creates two figuresfor s in ['red', 'gold']:
    sdf = df[df['season'] == s]
    plot = sdf.plot(kind="line",color=s)
plt.show() 

# This plots a series, and creates a single figure  for s in ['red', 'gold']:
    sdf = df[df['season'] == s]
    plot = sdf["y"].plot(kind="line",color=s)
plt.show()

Here, sdf.plot creates two figures, while sdf["y"].plot plots to the same axes.


If the problem is to keep a previously plotted density in the plot, you may plot this density, add another one, save the figure and finally remove the second plot, such that you end up with the first density plot, ready to plot something else to it.
import numpy as np
import pandas as pd

df = pd.DataFrame({'x': np.random.randn(25), 'y': np.random.randn(25), 
                   'season': np.random.choice(['red', 'gold'], 25)})

ax = df['y'].plot.density()
for s in ['red', 'gold']:
    sdf = df[df['season'] == s]
    sdf["y"].plot.density(color=s)
    ax.get_figure().savefig("test_density_" + s + ".png")
    ax.lines[-1].remove()

Post a Comment for "Inconsistency Between Dataframe.plot.scatter And Dataframe.plot.density()?"