PDF validation

The validation of PDF files was a major theme of the recent PDF Association technical conferences. While there are organisational and political issues to be solved (involving who would provide the tools and what guarantees could be given as to their accuracy), it would clearly be valuable if those processing PDF files could have access to an independent arbiter of the correctness of PDF files. This could be done at various levels: testing files against the PDF specification, for PDF/X, PDF/UA or PDF/A conformance or simply testing whether a file is ‘fit for purpose’.  This last category might include, for example, testing whether a file is suitable for text extraction.

Validation is likely to be of interest to users of Mimotek software, who process PDFs from various sources and of variable quality. Mimotek software is able to report problems with particular PDF files, but an independent report on the quality of questionable files would be valuable when providing feedback to the file’s originator.

One particular problem for users of Mimotek Structuriser is that the PDFs that they process may have come from a long supply chain. The problems that we see often occur in advertisements or special supplements, which may not have been created by the newspaper’s publisher. If originators could be persuaded to test their PDF against a particular validator, the overall quality of the PDF files should improve.