Created on 2011-01-16
Last modified on 2011-09-14

PDF check utility

This tool allows you to check PDF properties/contents in a batch mode. Originally I developed this as a tool for a library to do a check on huge collection of PDF files from an output of OCR software or outsourced company.

This tool is able to check PDF properties/contents in a batch mode. It currently checks the following properties of PDF files:

The tool currently checks the following properties of PDF files:


Version 20110914 released.
New TSV format introduced for filter-utility friendly.
Add a check for protection permissions.
Add a support for some TIFF formats (JBIG2DECODE filter).
Add supports for some more errors.
Add a support for encrypted PDF files.
Released the first version (20110116).


pdf-checker is a free software licensed under GNU AGPL version 3.

Source code is available at

How to use

This tool is written in Java. You can use the tool in various environments such as Windows/Mac OS/Unix.

At first, download the binary package and unpack it. And then run the jar file with specifying the targeted PDF files on command line:

 % unzip
 % java -jar PdfChecker.jar pdf/2010J00*.pdf
 pdf/2010J0001.pdf	version	3
 pdf/2010J0001.pdf	encryption	false
 pdf/2010J0001.pdf	creationdate	D:20060627211618
 pdf/2010J0001.pdf	producer	PDFlib 4.0.3 + PDI (SunOS 5.8)
 pdf/2010J0001.pdf	pages	8
 pdf/2010J0001.pdf	page1	pagesize	Rectangle: 595.0x842.0 (rot: 0 degrees)
 pdf/2010J0001.pdf	page1	imagetype	png
 pdf/2010J0001.pdf	page1	dpi-x	346.8101
 pdf/2010J0001.pdf	page1	dpi-y	346.06174
 pdf/2010J0001.pdf	page1	text length	83
 pdf/2010J0001.pdf	page2	pagesize	Rectangle: 595.0x842.0 (rot: 0 degrees)
 pdf/2010J0001.pdf	page2	imagetype	png
 pdf/2010J0001.pdf	page2	dpi-x	346.8101
 pdf/2010J0001.pdf	page2	dpi-y	346.06174
 pdf/2010J0001.pdf	page2	text length	87
 pdf/2010J0001.pdf	page3	pagesize	Rectangle: 595.0x842.0 (rot: 0 degrees)
 pdf/2010J0001.pdf	page3	imagetype	png
 pdf/2010J0001.pdf	page3	dpi-x	346.8101
 pdf/2010J0001.pdf	page3	dpi-y	346.06174
 pdf/2010J0001.pdf	page3	text length	0
 pdf/2010J0001.pdf	page4	pagesize	Rectangle: 595.0x842.0 (rot: 0 degrees)
 pdf/2010J0001.pdf	page4	imagetype	png
 pdf/2010J0001.pdf	page4	dpi-x	346.8101
 pdf/2010J0001.pdf	page4	dpi-y	346.06174
 pdf/2010J0001.pdf	page4	text length	1794

An example above means that the file parsed is PDF version 3, is not encrypted, is created at June 27th 2006, is produced by a tool called PDFLib, and has 8 pages. Each page of this file has a size of "595x842" without rotation, an embedded (roughly) 300 DPI resolution image with PNG-style compression, and has corresponding textual contents of 80-1800 characters.

The tool can check multiple files by specifying them as arguments. When specifying multiple files, the first column shows each filename.

As output format is a simple text file with tab-separated, you can check it further via other applications like Excel.


There are some known limitations/bugs in this tool as of September 2011.

Error handling
Reading from an invalid PDF file will causes an exception and hung on it.
Image handling assumes a scanned document and it cause weired results on normal document created by document tools such as Word/LaTeX etc.


This tool uses and bundles iText PDF Library and Legion of the Bouncy Castle Java cryptography APIs. Source codes and detailed information is available under:

Development of this tool is partially based on work in DRF Technical Workshop at Karuizawa (Sep. 2010), sponsored by CSI 2010 program of NII.

高久雅生 (Masao Takaku),