How to extract text from a PDF of scanned pages
PDFs are often a compilation of scanned pages or still images (.png, .jpg, tiff, etc). And so even opening them with Adobe Acrobat might not allow you to copy the actual text. This is a recipe you can follow to extract the text.
- Take a screenshot of every single page and store it using a sequential name (e.g. 1.jpg, 2.jpg...). OR, export the PDF into still images. Some PDF clients will let you do that.
- Upload each image into Google Drive.
- Right click on each of the images, and select "Open with Google Docs".
- Open the new file to find the image and right under it, the actual text (OCR).
- Copy the text out into your favorite text editor.