computer screen with a gear next to icons of a PDF being converted with OCR

Digital Accessibility Tip: When to Use Optical Character Recognition (OCR) on PDFs

By

|

Looking to expand your knowledge of digital accessibility and learn quick strategies that you can easily integrate into your workflow? CATL’s blog post series on digital accessibility catalogues some helpful tips and tricks we’ve shared in our Teach Tuesday e-newsletter!

In this tip, we’ll discuss what optical character recognition (OCR) is, when to use it, and what your other options are for making accessible PDFs.

What is Optical Character Recognition (OCR)?

Optical character recognition, or OCR, is an automated process that turns an image of text into machine-readable characters that can be parsed by technologies like screen readers. OCR is a very important step for making some PDFs accessible, but it isn’t a one-size-fits-all approach. Knowing where and when to use OCR is important for making your course materials as accessible as possible.

When Do I Need to Use OCR on My PDFs?

Some PDFs are digitally created PDFs, or documents that were designed on a computer with a word processing program. These documents do not need to undergo optical character recognition because the text is already machine-readable. Digitally created PDFs may still need other forms of remediation to be made fully accessible, such as adding tags (e.g., headings, paragraphs), image alt text, and a document title, so be sure to use the accessibility checker in Acrobat or in the application where you created the document.

Other PDFs, though, are created from scans of a physical book or printed document. The text in these scanned PDFs cannot be read by screen readers and other assistive technologies, so OCR is necessary to make them accessible.

Not sure if your PDF is a scan? Try clicking and dragging to highlight the text – a machine-readable document will let you select individual lines of text and copy them, while a scan will not.

A scanned book page with a paragraph highlighted in blue. Caption reads: PDF with searchable text done through the Scan and OCR function in Adobe Acrobat.

While OCR can help make these PDFs more accessible, an OCR-scanned document still requires additional review and remediation. This process can be difficult and time-consuming, so before you use OCR on a scanned document, consider the following alternatives:

  • Check the library catalog to see if there is a machine-readable e-text version of your article, book, or journal. If you cannot find your document in the library’s catalog, try contacting your department’s librarian liaison, and they may be able to help you track down or order a digital version.
  • Check online to see if there is an alternative version of your document. It is possible that the original author or publisher has created an HTML webpage version or a machine-readable PDF version that is available online.
  • Create an alternative version in Word or Canvas. For shorter excerpts of text, retyping the content as a Word document or Canvas page is often a faster and more accessible solution than trying to use OCR.
  • Consider using an alternative document or resource. Remember that you have the academic freedom to choose your course materials, but you are also legally responsible for ensuring those materials are accessible. Sometimes you may need to make compromises.

If none of the alternatives above are feasible options, then OCR may be the best option for your document.

How Do I Use OCR?

OCR is an automated process, but it is just one of several steps for making a scanned PDF accessible. Follow these steps to make your document as accessible as possible:

  1. Start with a high-quality scan whenever possible. The optical character recognition process works best with high-resolution, clear scans. Keep in mind that a low-quality scan may be impossible to fully remediate.
  2. Use the Adobe Acrobat desktop app to recognize text in your PDF (faculty and staff have access to Acrobat through Adobe Creative Cloud). This process may take from a few seconds to a few minutes to complete. The video guide below walks through the process in less than three minutes.
  3. After the OCR scan has finished, review any suspect text flagged by Adobe and make necessary corrections.
  4. Remember to auto-tag your document (and review those tags), add alt text to any images, and set a document title.

Looking for More Tips?

Explore even more quick tips in our Digital Accessibility Tips post, where you’ll find a growing list of strategies to help make your course materials more accessible.

Further Accessibility Training

Ready to dive deeper into digital accessibility? Essentials of Accessibility for Faculty and Staff is a free, self-paced, online course that will teach you the basics of digital accessibility and accessibility best practices for several key applications that UW-Green Bay employees may use in their daily work. We encourage you to self-enroll in the course to learn practical approaches for remediating digital accessibility issues in a variety of use cases and applications.

Connect with CATL

You are not alone in your accessibility journey! While CATL cannot advise on the legal specifications of Title II, instructors are always welcome to schedule a consultation with us or stop by our office (CL 405) to discuss the accessibility of your teaching materials.