We recently had a small job to chop and reassemble a dozen or so pdfs and then include them into a website archive. Our client needed all the first pages exchanged with new first pages that had new information on them. A laborious task manually, but PDF Toolkit came to our rescue. It’s a cross platform GPL licensed powerhouse for pdf manipulation. We used the command line to get the job done doc by doc; if the task was any larger, it would be a very trivial thing to incorporate this into a script.
First remove the old first page:
pdftk OurDocument.pdf cat 2-end output OurDocumentMinusFirst.pdf
then combine the new first page with the remaining document:
pdftk NewFirst.pdf OurDocumentMinusFirst.pdf cat output NewOurDocument.pdf
Very simple indeed, but pdftk can do a whole lot more:
- Merge PDF Documents
- Split PDF Pages into a New Document
- Rotate PDF Pages or Documents
- Decrypt Input as Necessary (Password Required)
- Encrypt Output as Desired
- Fill PDF Forms with FDF Data or XFDF Data and/or Flatten Forms
- Apply a Background Watermark or a Foreground Stamp
- Report on PDF Metrics such as Metadata, Bookmarks, and Page Labels
- Update PDF Metadata
- Attach Files to PDF Pages or the PDF Document
- Unpack PDF Attachments
- Burst a PDF Document into Single Pages
- Uncompress and Re-Compress Page Streams
- Repair Corrupted PDF
The help manual that comes with the package gives a fairly comprehensive guide to options together with examples. Simply type
pdftk --help
Otherwise, there are plenty of guides available, like this one from segfault.