PDF file

PDF file format, structure and editing software

PDF structure

Structure

Header:

%PDF-1.3

Body:

3 0 obj
<< / Filter /F1ateDecode / Length 198 >>
stream
...
endstream
Endobj
9 0 obj
<< / Type /Cata10g / Pages 2 0 R >>
Endobj
...

xref:

0 14
0000000000 65535 f
0000000292 00000 n
0000003240 00000 n
0000000022 00000 n

Trailer:

<< / Size 14
/Root 9 0 R
/lnfo 13 0 R
startxref
12937
%%EOF

Operators reference

CATEGORY OPERATORS TABLE PAGE
General graphics state w, J, j, M, d, ri, i, gs 4.7
Special graphics state q, Q, cm 4.7
Path construction m, l, c, v, y, h, re 4.9
Path painting S, s, f, F, f*, B, B*, b, b*, n 4.10
Clipping paths W, W* 4.11
Text objects BT, ET 5.4
Text state Tc, Tw, Tz, TL, Tf, Tr, Ts 5.2
Text positioning Td, TD, Tm, T* 5.5
Text showing Tj, TJ , ', " 5.6
Type 3 fonts d0,d1 5.10
Color CS, cs, SC, SCN, sc, scn, G, g, RG, rg, K, k 4.24
Shading patterns sh 4.27
Inline images BI,ID,EI 4.42
XObjects Do 4.37
Marked content MP, DP, BMC, BDC, EMC 10.7
Compatibility BX, EX 3.29

Text op

BT  
    /F0 36 Tf  
    50 706 Td  
    (Hello, World!) Tj  
ET

CID fonts mapping

https://stackoverflow.com/questions/15721846/cidfonts-and-mapping
https://www.toughdev.com/content/2015/02/restoring-text-from-pdf-files-encoded-using-custom-cid-fonts/

QPDF

Decoding

The following command de-compresses all streams and all object streams:

qpdf --qdf --object-streams=disable orig.pdf expanded.pdf
qpdf --stream-data=uncompress --decode-level=all orig.pdf expanded.pdf

Re-compress

qpdf expanded.pdf orig2.pdf

Decrypt

qpdf --password="mypass" --decrypt input.pdf output.pdf