Fun with pdftk
Basic PDF manipulation
Summary
pdftk is a free command-line tool for manipulating PDF files. It can edit metadata, extract, concatenate, rotate, burst, and split, among other tasks. This article will describe a few real-world use cases I have encountered, and how to use pdftk to handle them.
Strip blank pages
One of my employers provided pay stubs in the form of a two-page PDF, where the second page was blank. Because replicated cloud storage is cheap, but not free, I would do:
for f in paystub*; do pdftk $f cat 1 output $f'a'; rm $f; mv $f'a' $f; done
This command line can be saved in a shell script, such as “trim_paystub.sh” and can be run against one or more files at once to remove the blank second page.
Another similar situation is where a large document has multiple pages marked “This page is intentionally left blank.” To fix this, you can do:
pdftk my_long_file.pdf cat 1 3-6 8-10 12-end output shorter_file.pdf
In that example, pages 2, 7, and 11 are “blank,” so the resulting PDF contains only the pages with good data.
Mass-scan and split
If you have a large stack of paper documents to be scanned, you can arrange them in a sensible order, then feed them through an automatic document feeder and scan them into one large file. You then might want to split the large file into documents:
pdftk big_scanned_file.pdf cat 1-2 output bank_statement_20220503.pdf
Repeat the above command as needed until you have created each document.
Sometimes a page might need to be rotated. For that, use north, south, east, west. For example, if page 1 is normal and page 2 is upside down:
pdftk source_file.pdf cat 1 2south output destination_file.pdf
Extract, rotate, combine, protect
Sometimes you may wish to combine a number of PDFs into one. This could include some files in portrait, others in landscape, some with blank pages, etc. You might also need to send the file by email or other insecure means.
Example command lines
Extract page 2 and all following pages, rotating the pages east 90° (in this case, from portrait to landscape to match other files which are landscape):
pdftk my_file.pdf cat 2-endeast output my_new_file.pdf
Combine many files into one, protecting it with a password and allowing printing:
pdftk TitlePage.pdf Chapter1.pdf Chapter1_addendum.pdf Chapter2.pdf Appendix.pdf Index.pdf cat output CombinedBookDraft.pdf user_pw My_Password123 allow printing
Further Reading
man pdftk
Comments ()