class: center, middle # PDF = PestesDateiFormat
Imagine being able to send full text and graphics documents (newspapers, magazine articles, technical manuals, etc.) over electronic mail distribution networks. These documents could be viewed on any machine and any selected document could be printed locally. This capability would truly change the way information is managed.
--- # PostScript Somewhat the predecessor of PDF, created by the same company, Adobe. Presented in the 80s. ```html %! /Courier findfont % Select font 20 scalefont % Scale to font size 20 setfont % Set it as active font 50 50 moveto % Set cursor to (50, 50) (Hallo Welt!) show % Print text at cursor position showpage % Show page ``` Primarily used for vector graphics, but also turing-complete stack-oriented programming language. What the printer sees! --- # PDF Improvement over PostScript. No programming language anymore, more rigid structure, more features. Enables seek (load single page without compiling everything before), Comments, Forms, Video- and Audioplayback, 3D, ... Version 1.0 released in 1993. Acrobar Reader: 50$ Acrobat Distiller (personal version): 695$ Acrobat Distiller (network version): 2495$ --- | | Year | Industry | Notable features | | ---- | :--- | :------- | :--------------- | | v1.0 | 1993 | Adobe | **text**, **images**, **pages**, **hyperlinks**, bookmarks | | v1.1 | 1994 | Tax | passwords, encryption, device-independent color | | v1.2 | 1996 | Printing | radio buttons, checkboxes, forms incl. import/export, mouse events, sound, **unicode**, color features | | v1.3 | 2000 | Printing | digital signatures, color spaces, JavaScript, embedded file streams, image utilities, **CIDFonts**, prepress support | | v1.4 | 2001 | | RC4 > 40bits, transparency, better forms, metadata, accessibility, page boundaries, printer marks, predefined CMaps | | v1.5 | 2003 | | jpeg, multimedia playback, better forms, public key encryption, permissions, view/hide layers, slideshow | | v1.6 | 2004 | | 3D, **OpenType**, SOAP over http, public key encryption improvements, color spaces | | v1.7 | 2006 | | 3D improvements, public key encryption improvements | *Version 1.7 is ISO 32000-1:2008.* --- # Recent developments PDF 1.7 Extensions: - PDF 1.7 Extension Level 1 (2008) - PDF 1.7 Extension Level 3 (2008) - PDF 1.7 Extension Level 5 (2009) - PDF 1.7 Extension Level 6 (2009) - PDF 1.7 Extension Level 8 (2011) Newest ISO version is ISO 32000-2:2017: PDF 2.0. Clarified 1.7 specification, some new features. --- # Even more standards Specialized use cases = more standards: - PDF/X for graphic - PDF/A for archival - PDF/E for technical documentation - PDF/VT for dynamic data - PDF/UA for accessibility PDF/A Levels: - Level B guarantees visual reproduction. - Level A additionally guarantees content reproduction. PDF/A Subversions: - PDF/A-1 for PDF 1.4 standard - PDF/A-2 for PDF 1.7 standard - PDF/A-3 for PDF 2.0 standard --- # Why is it so popular? Single purpose, which is archived: **It displays content equally on all devices.**
Comarison: How webpages deal with different devices: - Adapt font size, colors, spacing, ... to screen size - Adapt layout to aspect ratio / screen size - Remove or add elements depending on end device - Test on end devices and/or use resources such as [Can I Use](https://caniuse.com/) In short: Its painful and slow. --- class: center, middle # The file format --- # Tokens PDF is a text format. You can open any PDF in your text editor! Tokens: ```html 0 % Numbers Hello % Strings 5 0 R % References (5 for the object number, 0 for its generation number, R for reference) [2 0 R] % Arrays <> % Dictionaries Image % Names (any two with the same content are "equal") ``` Out of these tokens, the higher-level objects are composed - Dictionaries - Streams ```html <> stream BT 1 0 0 1 22 20 cm 1 w /F 12 Tf 14.4 TL (Hello world)Tj ET endstream ``` --- # Structure (1/2) Header (asserts PDF version and wether binary data is contained) ```html %PDF-1.7 %���� ``` Body (contains the actual content) ```html 1 0 obj <> endobj 2 0 obj <> endobj % ... ``` --- # Structure (2/2) Cross-Reference Table (CRT) (contains the binary offset of objects) ```html xref 0 8 0000000000 65535 f 0000000015 00000 n 0000000062 00000 n % ... ``` Trailer (contains size of CRT and reader start points) ```html trailer <> startxref 574 %%EOF ``` --- class: full-width-pre # Body ```html 1 0 obj <> endobj 2 0 obj <> endobj 3 0 obj <> endobj 4 0 obj <> /ProcSet [/PDF /Text]>> endobj 5 0 obj <> endobj 6 0 obj <> stream BT 1 0 0 1 22 20 cm 1 w /F 12 Tf 14.4 TL (Hello world)Tj ET endstream endobj ``` --- # Content Text (`w` = line width, `Tf` = font, `TL` = leading, `Tj` = text) ```html stream BT 1 0 0 1 22 20 cm 1 w /F 12 Tf 14.4 TL (Hello world)Tj ET endstream ``` Drawing (`rg` = background color, `re` = rectangle, `b` = painting mode) ```html stream 1 0 0 1 40 20 cm 0.5 w 0.67 0.8 0.73 rg 0 0 20 30 re b endstream ```
--- # Include Image Stream with binary image data ```html 5 0 obj <> stream ���� JFIF �� # # # # % # ' + + ' 6 ; 4 ; 6 P J C C J P z W ] W ] W z � s � s s � s � � � � � � � �% � � � �%S S�oo����� # # # # % # ' + + ' 6 ; 4 ; 6 P J C C J P z W ] W ] W z � s � s s � s � � � � � � � �% � � � �%S S�oo������� " �� �� ̊ �N��� �� k�� �� �� " "CRSb��� ? �`W�F�X�6�G�Qy&IXFĪ��V�to��� A�� ? zе]��� Q�� ? S����� endstream endobj ```
--- # Print image Declare image alias (for `4 0 R` referenced as `Resources` in ` Page`): ```html 4 0 obj <> /ProcSet [/PDF /ImageC]>> endobj 5 0 obj (image stream) ``` Print image (`Do` = print referenced image): ```html 6 0 obj <> stream 20 0 0 20 20 20 cm 1 w /I Do endstream endobj ``` --- class: center, middle # Enough foreplay: Render text!
Or: What is the most complicated approach to "support" unicode?
--- # Supporting unicode `WinAnsiEncoding` is a single byte encoding; so are the other default encodings. What if I need a character not contained in these standard encodings (like chinese characters)? Steps: - Embed font which supports character as stream - Declare encoding of text (PDF) onto glyphs (font) - Declare meta data (widths of characters, ...) => Requires deep knowledge about font --- class: center, middle # The TTF format ---