Freelance Translator Spanish to English at Self-Employed
What is the best way to get good quality conversions from PDFs to Word documents that have complicated pictures and text?
I have a translation or two to do that have a good amount of random marks on them and are PDFs. I was wondering if there's a good way to still convert these to Word documents while still preserving the formatting. I am not interested in the random marks, of course, but I am interested in the text.
The OCR does a phenomenal job in handling stray marks, creases in the paper, etc. They will be turned in graphics, while the text would be handled as separate blocks.
Normally, the document can be edited as-is -- deleting the extraneous graphic blocks; But I've found that if the OCRed document is too cluttered, you will have to resort to copy and pasting the text.
There were several scans I had made that had graphics UNDER the text. The last time this had happened was with a picture of a crease on the page. I had to resort to copy and pasting because the graphic refused to be removed without removing most of the text along with it.
It's unlikely you'll find anything which can perfectly differentiate between legitimate text and random blemishes / marks. I would suggest using PDFelement's OCR module (under the "Convert" menu in PDFelement Pro) to extract as much of the text as possible, then making corrections as needed to the text (while still in PDFelement) and finally saving as a Word document (also under "Convert.")
I tried this on my own system before writing the above; it works fine.
PDFelement is a leading alternative to Adobe® Acrobat®, offering enterprise-grade PDF functionalities and perpetual licensing at a fraction of the price. It is available across desktop, mobile, and we
With over 2.5 million reviews, we can provide the specific details that help you make an informed software buying decision for your business. Finding the right product is important, let us help.
or continue with
LinkedIn
Google
Google (Business)
Gmail.com addresses not permitted. A business domain using Google is allowed.