Fun with pdftk

Basic PDF manipulation

Fun with pdftk
A terminal emulator on Debian (Photo Credit: LukasUnsplash License)

Summary

pdftk is a free command-line tool for manipulating PDF files. It can edit metadata, extract, concatenate, rotate, burst, and split, among other tasks. This article will describe a few real-world use cases I have encountered, and how to use pdftk to handle them.

Strip blank pages

One of my employers provided pay stubs in the form of a two-page PDF, where the second page was blank. Because replicated cloud storage is cheap, but not free, I would do:

for f in paystub*; do pdftk $f cat 1 output $f'a'; rm $f; mv $f'a' $f; done

This command line can be saved in a shell script, such as “trim_paystub.sh” and can be run against one or more files at once to remove the blank second page.

Another similar situation is where a large document has multiple pages marked “This page is intentionally left blank.” To fix this, you can do:

pdftk my_long_file.pdf cat 1 3-6 8-10 12-end output shorter_file.pdf

In that example, pages 2, 7, and 11 are “blank,” so the resulting PDF contains only the pages with good data.

Mass-scan and split

If you have a large stack of paper documents to be scanned, you can arrange them in a sensible order, then feed them through an automatic document feeder and scan them into one large file. You then might want to split the large file into documents:

pdftk big_scanned_file.pdf cat 1-2 output bank_statement_20220503.pdf

Repeat the above command as needed until you have created each document.

Sometimes a page might need to be rotated. For that, use north, south, east, west. For example, if page 1 is normal and page 2 is upside down:

pdftk source_file.pdf cat 1 2south output destination_file.pdf

Extract, rotate, combine, protect

Sometimes you may wish to combine a number of PDFs into one. This could include some files in portrait, others in landscape, some with blank pages, etc. You might also need to send the file by email or other insecure means.

Example command lines

Extract page 2 and all following pages, rotating the pages east 90° (in this case, from portrait to landscape to match other files which are landscape):

pdftk my_file.pdf cat 2-endeast output my_new_file.pdf

Combine many files into one, protecting it with a password and allowing printing:

pdftk TitlePage.pdf Chapter1.pdf Chapter1_addendum.pdf Chapter2.pdf Appendix.pdf Index.pdf cat output CombinedBookDraft.pdf user_pw My_Password123 allow printing

Further Reading

man pdftk


Found this post useful? Consider supporting us.

Support Us