We need a way to perform a detailed audit on pdf files that are about to be formally submitted, preferably in a batch tool. The audit would report on various aspects such as;
- Check metadata and display Title, Author, Subject, Keywords. Report any other custom keys data.
- Summarize the page count, word count, sheet size(s) and page orientation(s) used.
- Are there digital signatures? are they validated?
- Broken bookmarks
- Broken links
- Links to external sites
- Links to internal sites / mapped drives
- Are pages labeled? How many pages are labeled and how many are not labeled?
- Do pages have scales assigned? How many pages have / do not have scales assigned?
- Are there unflattened markups? If there are markups summarize the author(s), date(s) and page(s) that have markups on them.
- Is all text searchable? How many pages have no text on them?
- Are fonts embedded? Which ones? Are there embedded fonts that are not used?
- Are there viewports in the file?
- Are there custom markup column settings?
- How large is the file? What proportion of the file size is for each component (as shown on the reduce file size tool).
- Are there embedded file attachments?
- Are there any document security settings?
- (anything else?)
It is tedious to go through these steps and easy to miss things when cleaning up files that come from various people and companies. This audit is useful both for the submitting company as well as the receiver, who can quickly process files received and determine if they contain appropriate content and settings.
Also include the ability to audit two documents and flag differences in the above data.