Why Is Matplotlib Figure Filesize Huge Despite Rasterized=true?
A simple example: from matplotlib.pyplot import plot, savefig from numpy.random import randn plot(randn(100),randn(100,500),'k',alpha=0.03,rasterized=True) savefig('test.pdf',dpi=
Solution 1:
Looks like the full answer is in the comment to here: https://stackoverflow.com/a/12102852/1078529
The trick is to use set_rasterization_zorder
to rasterize everything below a certain zorder together into a single bitmap,
gca().set_rasterization_zorder(1)
plot(randn(100),randn(100,500),"k",alpha=0.03,zorder=0)
savefig("test.pdf",dpi=90)
Solution 2:
With rasterized=True
, you get a PDF with an embedded bitmap (which can be big).
With rasterized=False
, you get a PDF with tons of embedded line-drawing instructions (which aren't big, but can take a while to render).
With rasterized=False
, I get a 374 KiB document.
EDIT: Digging a little deeper, in the rasterized=True
document (which clocks in at about 7 megabytes), it looks like every line gets its own bitmap, and they are overlaid:
$ pdfimages -list -all test.pdf
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
10 image 408177 rgb 38 image no12090904192B 1.9%
11 smask 408177 gray 18 image no12090907511B 10%
12 image 408170 rgb 38 image no13090904472B 2.1%
13 smask 408170 gray 18 image no13090907942B 11%
14 image 408180 rgb 38 image no14090905454B 2.5%
15 smask 408180 gray 18 image no14090909559B 13%
16 image 408180 rgb 38 image no15090904554B 2.1%
17 smask 408180 gray 18 image no15090908077B 11%
[... 993 more images ...]
For the nonrasterized document, there are no images at all.
Post a Comment for "Why Is Matplotlib Figure Filesize Huge Despite Rasterized=true?"