Best Software for 2025 is now live!
Matthew M.
MM
Freelance Translator Spanish to English at Self-Employed

What is the best way to get good quality conversions from PDFs to Word documents that have complicated pictures and text?

I have a translation or two to do that have a good amount of random marks on them and are PDFs. I was wondering if there's a good way to still convert these to Word documents while still preserving the formatting. I am not interested in the random marks, of course, but I am interested in the text.
2 comments
Looks like you’re not logged in.
Users need to be logged in to answer questions
Log In
SF
CEO at SCS Computer Consultants, Inc.
0
The OCR does a phenomenal job in handling stray marks, creases in the paper, etc. They will be turned in graphics, while the text would be handled as separate blocks. Normally, the document can be edited as-is -- deleting the extraneous graphic blocks; But I've found that if the OCRed document is too cluttered, you will have to resort to copy and pasting the text. There were several scans I had made that had graphics UNDER the text. The last time this had happened was with a picture of a crease on the page. I had to resort to copy and pasting because the graphic refused to be removed without removing most of the text along with it.
Looks like you’re not logged in.
Users need to be logged in to write comments
Log In
Reply
Gary F.
GF
Independent Publishing Professional
0
It's unlikely you'll find anything which can perfectly differentiate between legitimate text and random blemishes / marks. I would suggest using PDFelement's OCR module (under the "Convert" menu in PDFelement Pro) to extract as much of the text as possible, then making corrections as needed to the text (while still in PDFelement) and finally saving as a Word document (also under "Convert.") I tried this on my own system before writing the above; it works fine.
Looks like you’re not logged in.
Users need to be logged in to write comments
Log In
Reply