This tool allows you to check PDF properties/contents in a batch mode. Originally I developed this as a tool for a library to do a check on huge collection of PDF files from an output of OCR software or outsourced company.
This tool is able to check PDF properties/contents in a batch mode. It currently checks the following properties of PDF files:
The tool currently checks the following properties of PDF files:
pdf-checker is a free software licensed under GNU AGPL version 3.
Source code is available at http://github.com/masao/pdf-checker/.
This tool is written in Java. You can use the tool in various environments such as Windows/Mac OS/Unix.
At first, download the binary package and unpack it. And then run the jar file with specifying the targeted PDF files on command line:
% unzip pdf-checker-YYYYMMDD.zip % java -jar PdfChecker.jar pdf/2010J00*.pdf pdf/2010J0001.pdf version 3 pdf/2010J0001.pdf encryption false pdf/2010J0001.pdf creationdate D:20060627211618 pdf/2010J0001.pdf producer PDFlib 4.0.3 + PDI (SunOS 5.8) pdf/2010J0001.pdf pages 8 pdf/2010J0001.pdf page1 pagesize Rectangle: 595.0x842.0 (rot: 0 degrees) pdf/2010J0001.pdf page1 imagetype png pdf/2010J0001.pdf page1 dpi-x 346.8101 pdf/2010J0001.pdf page1 dpi-y 346.06174 pdf/2010J0001.pdf page1 text length 83 pdf/2010J0001.pdf page2 pagesize Rectangle: 595.0x842.0 (rot: 0 degrees) pdf/2010J0001.pdf page2 imagetype png pdf/2010J0001.pdf page2 dpi-x 346.8101 pdf/2010J0001.pdf page2 dpi-y 346.06174 pdf/2010J0001.pdf page2 text length 87 pdf/2010J0001.pdf page3 pagesize Rectangle: 595.0x842.0 (rot: 0 degrees) pdf/2010J0001.pdf page3 imagetype png pdf/2010J0001.pdf page3 dpi-x 346.8101 pdf/2010J0001.pdf page3 dpi-y 346.06174 pdf/2010J0001.pdf page3 text length 0 pdf/2010J0001.pdf page4 pagesize Rectangle: 595.0x842.0 (rot: 0 degrees) pdf/2010J0001.pdf page4 imagetype png pdf/2010J0001.pdf page4 dpi-x 346.8101 pdf/2010J0001.pdf page4 dpi-y 346.06174 pdf/2010J0001.pdf page4 text length 1794 .....
An example above means that the file parsed is PDF version 3, is not encrypted, is created at June 27th 2006, is produced by a tool called PDFLib, and has 8 pages. Each page of this file has a size of "595x842" without rotation, an embedded (roughly) 300 DPI resolution image with PNG-style compression, and has corresponding textual contents of 80-1800 characters.
The tool can check multiple files by specifying them as arguments. When specifying multiple files, the first column shows each filename.
As output format is a simple text file with tab-separated, you can check it further via other applications like Excel.
There are some known limitations/bugs in this tool as of September 2011.
This tool uses and bundles iText PDF Library and Legion of the Bouncy Castle Java cryptography APIs. Source codes and detailed information is available under:
Development of this tool is partially based on work in DRF Technical Workshop at Karuizawa (Sep. 2010), sponsored by CSI 2010 program of NII.