Skip to content Skip to sidebar Skip to footer

Why Is Matplotlib Figure Filesize Huge Despite Rasterized=true?

A simple example: from matplotlib.pyplot import plot, savefig from numpy.random import randn plot(randn(100),randn(100,500),'k',alpha=0.03,rasterized=True) savefig('test.pdf',dpi=

Solution 1:

Looks like the full answer is in the comment to here: https://stackoverflow.com/a/12102852/1078529

The trick is to use set_rasterization_zorder to rasterize everything below a certain zorder together into a single bitmap,

gca().set_rasterization_zorder(1)
plot(randn(100),randn(100,500),"k",alpha=0.03,zorder=0)
savefig("test.pdf",dpi=90)

Solution 2:

With rasterized=True, you get a PDF with an embedded bitmap (which can be big). With rasterized=False, you get a PDF with tons of embedded line-drawing instructions (which aren't big, but can take a while to render).

With rasterized=False, I get a 374 KiB document.

EDIT: Digging a little deeper, in the rasterized=True document (which clocks in at about 7 megabytes), it looks like every line gets its own bitmap, and they are overlaid:

$ pdfimages -list -all test.pdf
page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
   10 image     408177  rgb     38  image  no12090904192B 1.9%
   11 smask     408177  gray    18  image  no12090907511B  10%
   12 image     408170  rgb     38  image  no13090904472B 2.1%
   13 smask     408170  gray    18  image  no13090907942B  11%
   14 image     408180  rgb     38  image  no14090905454B 2.5%
   15 smask     408180  gray    18  image  no14090909559B  13%
   16 image     408180  rgb     38  image  no15090904554B 2.1%
   17 smask     408180  gray    18  image  no15090908077B  11%
[... 993 more images ...]

For the nonrasterized document, there are no images at all.

Post a Comment for "Why Is Matplotlib Figure Filesize Huge Despite Rasterized=true?"