Scanned PDF OCR Pipeline
Converts scanned PDF pages to images, then runs OCR (Windows or Tesseract engine) on each image, and stitches the extracted text together.
Sign in or create a free account to copy this script.
Provided as-is, without warranty of any kind. Review and test each pattern in a non-production environment before deploying it to live automations. See our Terms.
Problem this solves
The built-in PDF text extractor returns blank for scanned documents. Users must use the image-then-OCR workaround.
Dependencies
- PDF Text Extractor
- Invoice Data Extractor
- PDF Merge & Split