Unset Pdf Font With Script
Solution 1:
Java
It can be done with such tools as the iText library; see example here. But that is in Java.
(Actually, I've tried and built a very simple JAR doing just the above (i.e., open a Stamper and calling unused object removal. TFM says that this will remove unused fonts, so if your troublesome fonts are really unused, it ought to do the trick). If you have a PDF on which to test it, I can give it a go - or I can send you the .java and .jar files. They are built against iText 5.4.2, you can upgrade them at 5.5.3):
java -jar pdftrim.jar input.pdf output.pdf
Other languages (in theory even bash
script)
In Python, C or shell there are no tools that I know of that are capable of doing this, yet. But it is not impossible to write one yourself.
As a first step you would need to uncompress the PDF file using pdftk
(not uncoincidentally, it's made out of iText
). The resulting PDF is a text file (well, apart from the first line and multibyte considerations...) and can be examined at leisure. grep
will work, for example.
To detect font usage, you need to check all lines in the format
/Font NNNNNN 0 R
which would tell you that font reference object NNNNNN is in use by some text. The list of font references (not fonts) is then given by
grep "^\/Font "$PDFFILE | sort -n -k2.1 | uniq
We now look in the file for an item like this
NNNNNN 0 obj
<<
/F0 XXXXXX 0 R
/F1 YYYYYY 0 R
>>
This will give us more object numbers for different typefaces of the same font. XXXXXX might be the header for the bold font and YYYYYY the one for the bold-italicized font, say. XXXXXX and YYYYYY (and maybe ZZZZZZ...) are our "true" font numbers. And at those object offsets one would find something like
XXXXXX 0 obj
<<
/Encoding /WinAnsiEncoding
/ToUnicode AAAAAA 0 R
/FontDescriptor BBBBBB 0 R
/Widths [...]
/Subtype /TrueType
/Type /Font
/FirstChar 32
/LastChar 121
/BaseFont /Whatever+Font+Name
>>
which would tell us that this header references a descriptor at offset BBBBBB and a font data block at address AAAAAA. The font data block may in turn be made up of child streams.
So with a bit of dictionary lookup storage to handle the fact that we have these levels of indirection, and one directive such as /Font refers to a number while the corresponding /BaseFont refers to another, we can now:
- find what fonts are installed (through the /BaseFont directive, following it if needed)
- find what fonts are used (through the /Font directive)
Removal is possible (even though not for the faint of heart) by removing the unused font object subtree, starting at the addresses supplied by BaseFont and FontDescriptor, renumbering the object IDs with higher ID number and then recalculating all file offsets (they are at the bottom of the PDF file); in practice this last is achieved by copying the objects from the old PDF to the new and reading the file offset in the new file via ftell()
. Then the PDF XREF at bottom can be rewritten
xref -- start of XREF (NOT NECESSARILY AT A NEWLINE)03315-- there are 3315 objects000000000065535 f -- not an object; flags000000001500000 n -- first object is 15 bytes past the beginning of the file000003300300000 n
...
001016910100000 n
trailer
<</Info 33140 R -- the info table, usually just before the XREF (needs renumbering)/Root 32590 R -- the root object ID (needs renumbering)/Size 3315-- number of objects, again>>
startxref
10169367-- file offset of XREF table above.%%EOF
pdftk
can then be used to recompress the resulting PDF file.
I've also tried using tools such as PDFEdit but with scarce success.
Solution 2:
Typically, font is included in the file if some of its characters have been used. A safer approach would be to embed all fonts in your pdf file. Assuming a requirement of prepress quality for output.pdf, you can use
gswin64c -dCompatibilityLevel=1.4 -dPDFSETTINGS=/prepress -dCompressFonts=true -dSubsetFonts=true -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=output.pdf -f input.pdf
You need to install ghostscript (http://www.ghostscript.com/), description of the options is given here http://www.ghostscript.com/doc/9.14/Ps2pdf.htm#Options
Post a Comment for "Unset Pdf Font With Script"