How to extract text from a PDF of scanned pages

PDFs are often a compilation of scanned pages or still images (.png, .jpg, tiff, etc). And so even opening them with Adobe Acrobat might not allow you to copy the actual text. This is a recipe you can follow to extract the text.

  1. Take a screenshot of every single page and store it using a sequential name (e.g. 1.jpg, 2.jpg...). OR, export the PDF into still images. Some PDF clients will let you do that.
  2. Upload each image into Google Drive.
  3. Right click on each of the images, and select "Open with Google Docs".
  4. Open the new file to find the image and right under it, the actual text (OCR).
  5. Copy the text out into your favorite text editor.