Extract text from image or pdf
10 March 2021
How can you effectively extract text from a pdf or an image ? commmonly called OCR (optical character recognition). I found 2 extremly powerfull tools based on the open source engine Tesseract (Official website).
I am using windows and can be both used on this OS. One permit to convert scanned pdf to searchable pdf (as well as copiable). The other permit to get a screenshot from an area of your screen, convert it to text and store it in your clipboard.
- Ocrmypdf
- you need to use Ubuntu on windows more info here
- update your apt:
sudo apt-get update
- install it:
sudo apt install ocrmypdf
- normcaphttps://github.com/dynobo/normcap
- easy to install, just use the exe
Have a try :)
Comments
Join the discussion for this article on this ticket. Comments appear on this page instantly.
Thanks to aristaht for making this static comment system possible.