The Importance of Document Image Quality

The Importance of Document Image Quality

The Importance of Document Image Quality

by Matt Tarpey

 

Digital transformation has gone from a potential “nice-to-have” to an absolute competitive necessity in many industries. In order to transition critical processes to digital formats, many businesses are starting by digitizing their documents.

Developing or investing in an effective document digitization process is a great entry point for a business’s broader digital transformation, as it provides an easy way to ingest new data and information into downstream digitized functions you can build up as you go. 

One of the key things you’ll want to look for in a document digitization platform will be image quality. When you hear the phrase “image quality,” you almost certainly think of the image resolution. Generally this is presented in a metric like pixels per inch (ppi) or pixels per mm. This generally tells you how sharp and clear the resulting image will be, the size of the original document, as well as the amount of digital space the document will take up. Because higher resolution images tend to be larger files, when dealing with a large quantity of scanned documents, you may reach a point of diminishing returns, where the benefits of increasing the image definition are outweighed by the extra storage space requirements.

However, image resolution isn’t the only important factor in determining the quality of a digitized image. Others include:

Skew - Improper alignment of a document as it’s being digitized can make the output document difficult to read and may even cut off critical information located near the margins of the page. It may not seem critically important to make sure documents are lined up straight, but fixing skew digitally after the fact may require pixel interpolation, which can be inaccurate and increase the size of the digital file. 

Fold - Depending on the type of scanner in use, some documents may get folded or wrinkled during scanning, making it difficult or even impossible to read large sections of the digitized document. 

White balance - For most businesses, the vast majority of the documents in need of digitization will be in black and white. But even documents that only exist in greyscale, white balance needs to be taken into account to avoid unexpected tinting or overexposure. Not to mention the importance of being able to properly distinguish between different colors on charts and graphs within full color documents.

Illumination uniformity - If you’ve ever run a standard office scanner with the top open, you’re aware of the bright light that moves across the surface as the machine runs. Proper illumination on scanned documents is vital to legibility and data capture, as well as image and color fidelity.

Field artifacts and noise - In a perfect world, the scanning process would only ever capture exactly what’s on the original page. However, in the real world, dust, dirt, scratches, and other errant particles, often referred to as “noise,”  will inevitably be picked up and reproduced in digital files. While this typically isn’t a major concern, in some instances an artifact can block or obscure critical information. It’s important monitor 

Geometric distortion - Warping is a common issue, especially when a scanned image needs to be scaled up. Typical camera lenses are poorly corrected for this, though scanners are usually much more capable of avoiding distortions. Additionally, digital tools designed to correct distortions created by lenses have become widely available and easy to use - some can even automate the process.

The importance of detecting and correcting these various error types is determined in large part by the type of document and reason it’s being digitized. For example, when digitally recreating historical documents, which may include or even be historically significant images in and of themselves, maintaining extreme fidelity might be of high importance. For most business documents, however, legibility - both to humans and to AI-run data extraction tools - is usually the most important determining factor. 

 

Real-Time Image Quality Assurance (IQA)

Advanced scanning platforms have gotten really good at limiting the frequency of all types of scanning errors and distortions, but the best platforms don’t rely on prevention alone. There’s nothing worse than finishing a large job only to realize you need to start again from the beginning because of a minor error that threw off the entire batch. 

Exela’s suite of IntelliScan scanners perform in-line monitoring by testing every image for all sorts of potential issues while the job is underway. Problems and defects can be detected earlier in the process, allowing for timely intervention and resolution rather than costly and time-consuming overruns and re-scans.