From 2ff480a3650079a50bac4c9bf538f788fbd4a424 Mon Sep 17 00:00:00 2001 From: Florian Moser <git@famoser.ch> Date: Sun, 28 Mar 2021 21:38:39 +0200 Subject: [PATCH] Working on PDF course --- pdf/css/remark.css | 4 + pdf/image.pdf | 3 + pdf/pdf.html | 193 +++++++++++++++++++++++++++++++++---- pdf/rectangle.pdf | 3 + pdf/special_characters.pdf | 3 + pdf/text.pdf | 3 + 6 files changed, 190 insertions(+), 19 deletions(-) create mode 100644 pdf/image.pdf create mode 100644 pdf/rectangle.pdf create mode 100644 pdf/special_characters.pdf create mode 100644 pdf/text.pdf diff --git a/pdf/css/remark.css b/pdf/css/remark.css index cdb2ed8..26103fd 100644 --- a/pdf/css/remark.css +++ b/pdf/css/remark.css @@ -36,3 +36,7 @@ table thead th { border-bottom: 2px solid #dee2e6; vertical-align: bottom; } + +.space-3 { + margin-top: 5em +} \ No newline at end of file diff --git a/pdf/image.pdf b/pdf/image.pdf new file mode 100644 index 0000000..df91e0e --- /dev/null +++ b/pdf/image.pdf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7cb7c68d9d8490dbb8c319f7be1fe3a2992aecb15707a6662ae7210fb9ca5b2f +size 1411 diff --git a/pdf/pdf.html b/pdf/pdf.html index 20288e2..590d013 100644 --- a/pdf/pdf.html +++ b/pdf/pdf.html @@ -56,18 +56,16 @@ Acrobat Distiller (network version): 2495$ --- -# Releases - -| | Year | Industry | Notable features | -| --- | :--- | :------- | :--------------- | -| 1.0 | 1993 | Adobe | **text**, **images**, **pages**, **hyperlinks**, bookmarks | -| 1.1 | 1994 | Tax | passwords, encryption, device-independent color | -| 1.2 | 1996 | Printing | radio buttons, checkboxes, forms incl. import/export, mouse events, sound, **unicode**, color features | -| 1.3 | 2000 | Printing | digital signatures, color spaces, JavaScript, embedded file streams, image utilities, **CIDFonts**, prepress support | -| 1.4 | 2001 | | RC4 > 40bits, transparency, better forms, metadata, accessibility, page boundaries, printer marks, predefined CMaps | -| 1.5 | 2003 | | jpeg, multimedia playback, better forms, public key encryption, permissions, view/hide layers, slideshow | -| 1.6 | 2004 | | 3D, **OpenType**, SOAP over http, public key encryption improvements, color spaces | -| 1.7 | 2006 | | 3D improvements, public key encryption improvements | +| | Year | Industry | Notable features | +| ---- | :--- | :------- | :--------------- | +| v1.0 | 1993 | Adobe | **text**, **images**, **pages**, **hyperlinks**, bookmarks | +| v1.1 | 1994 | Tax | passwords, encryption, device-independent color | +| v1.2 | 1996 | Printing | radio buttons, checkboxes, forms incl. import/export, mouse events, sound, **unicode**, color features | +| v1.3 | 2000 | Printing | digital signatures, color spaces, JavaScript, embedded file streams, image utilities, **CIDFonts**, prepress support | +| v1.4 | 2001 | | RC4 > 40bits, transparency, better forms, metadata, accessibility, page boundaries, printer marks, predefined CMaps | +| v1.5 | 2003 | | jpeg, multimedia playback, better forms, public key encryption, permissions, view/hide layers, slideshow | +| v1.6 | 2004 | | 3D, **OpenType**, SOAP over http, public key encryption improvements, color spaces | +| v1.7 | 2006 | | 3D improvements, public key encryption improvements | *Version 1.7 is ISO 32000-1:2008.* @@ -110,18 +108,175 @@ PDF/A Subversions: # Why is it so popular? Single purpose, which is archived: -**It display content equally on all devices.** +**It displays content equally on all devices.** -Comparison to web: -- Need to increase readability on smaller screens (font size, colors, spacing, ...) -- Need to handle different aspect ratios -- Optimize for less powerful devices -- [Can I Use](https://caniuse.com/) tries to track browser inconsistencies +<div class="space-3"></div> + +Comarison: How webpages deal with different devices: +- Adapt font size, colors, spacing, ... to screen size +- Adapt layout to aspect ratio / screen size +- Remove or add elements depending on end device +- Test on end devices and/or use resources such as [Can I Use](https://caniuse.com/) + +In short: Its painful and slow. + +--- + +class: center, middle + +# The file format + +--- + +# Tokens + +PDF is a text format. You can open any PDF in your text editor! + +Tokens: +```html +0 % Numbers +Hello % Strings +5 0 R % References (5 for the object number, 0 for its generation number, R for reference) +[2 0 R] % Arrays +<</Key /Value>> % Dictionaries +Image % Names (any two with the same content are "equal") +``` + +Out of these tokens, the higher-level objects are composed +- Dictionaries +- Streams --- -# The PDF format +# Structure (1/2) + +Header (asserts PDF version and wether binary data is contained) +```html +%PDF-1.7 +%���� +``` + +Body (contains the actual content) +```html +1 0 obj +<</Type /Catalog /Pages 2 0 R>> +endobj +2 0 obj +<</Type /Pages /Kids [3 0 R] /Count 1>> +endobj +% ... +``` + +--- + +# Structure (2/2) + +Cross-Reference Table (CRT) (contains the binary offset of objects) +```html +xref +0 8 +0000000000 65535 f +0000000015 00000 n +0000000062 00000 n +% ... +``` + +Trailer (contains size of CRT and reader start points) +```html +trailer +<</Size 8 /Root 1 0 R /Info 7 0 R>> +startxref +574 +%%EOF +``` + +--- + +# Body + +```html +1 0 obj +<</Type /Catalog /Pages 2 0 R>> +endobj +2 0 obj +<</Type /Pages /Kids [3 0 R] /Count 1>> +endobj +3 0 obj +<</Type /Page /Parent 2 0 R /Resources 4 0 R /MediaBox [0 0 210 297] /Contents [6 0 R]>> +endobj +4 0 obj +<</Font <</F 5 0 R>> /ProcSet [/PDF /Text]>> +endobj +5 0 obj +<</Type /Font /Subtype /Type1 /BaseFont /Helvetica /Encoding /WinAnsiEncoding>> +endobj +6 0 obj +<</Length 59>> +stream +BT 1 0 0 1 22 20 cm 1 w /F 12 Tf 14.4 TL (Hello world)Tj ET +endstream +endobj +``` + +--- + +# Content + +Text (`w` = line width, `Tf` = font, `TL` = leading, `Tj` = text) +```html +stream +BT 1 0 0 1 22 20 cm 1 w /F 12 Tf 14.4 TL (Hello world)Tj ET +endstream +``` + +Drawing (`RG` = line color, `rg` = background color, `re` = rectangle, `b` = painting mode) +```html +stream +1 0 0 1 40 20 cm 0.5 w 0.68 0.98 0.94 RG 0.67 0.8 0.73 rg 0 0 20 30 re b +endstream +``` + +--- + +# Image + +Stream with binary image data + +```html +5 0 obj +<</Length 570 /Type /XObject /Subtype /Image /Width 25 /Height 16 /Filter /DCTDecode /BitsPerComponent 8 /ColorSpace /DeviceRGB>> +stream +�����JFIF���������#�#�#�#�%�#�'�+�+�'�6�;�4�;�6�P�J�C�C�J�P�z�W�]�W�]�W�z���s���s�s���s����������������%��������%S +S�oo������#�#�#�#�%�#�'�+�+�'�6�;�4�;�6�P�J�C�C�J�P�z�W�]�W�]�W�z���s���s�s���s����������������%��������%S +S�oo����������"��������������������������̊ �N�������������������������k���������������������������"������������"CRSb�����?��`W�F�X�6�G�Qy&IXFĪ��V�to������������������A���?�zе]������������������Q���?�S����� +endstream +endobj +``` + +Print image (`Do` = print referenced image) +```html +6 0 obj +<</Length 28>> +stream +20 0 0 20 20 20 cm 1 w /I Do +endstream +endobj +``` + +--- + +# Supporting unicode + +`WinAnsiEncoding` is a single byte encoding; so are the other default encodings. + +What if I need a character not contained in these standard encodings (like chinese characters)? + +Steps: +- Embedd font which supports character as stream +- Declare encoding of text (PDF) onto glyphs (font) +- Declare meta data (widths of characters, ...) +=> Requires deep knowledge about font --- diff --git a/pdf/rectangle.pdf b/pdf/rectangle.pdf new file mode 100644 index 0000000..ab73c13 --- /dev/null +++ b/pdf/rectangle.pdf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a36378ab20ce0579de369f809130f2b7af192870bfeef01486c6b91e192ed3df +size 786 diff --git a/pdf/special_characters.pdf b/pdf/special_characters.pdf new file mode 100644 index 0000000..38fefd8 --- /dev/null +++ b/pdf/special_characters.pdf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fb4219346e7aff1c1f04369def800cfd88dcce46a4ac6d4f5e5ed308051e9ab8 +size 7531 diff --git a/pdf/text.pdf b/pdf/text.pdf new file mode 100644 index 0000000..76d3afd --- /dev/null +++ b/pdf/text.pdf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ee782f47771c1a0d6cafcf4a08c8c628b41fe3ff29704db5124a5e21ca802eaa +size 806 -- GitLab