0

I have a PDF of an academic paper which looks as though it's been scanned, and is not searchable. I wanted to convert it to a Word file wherein the pages are text rather than images. I did so by using PDF24's OCR and then its Convert function. However, although OCR'ing generates a searchable PDF, when I convert that to DOCX, I get a file whose pages are images. How can I get round this?

I'll finish by inserting screenshots of what I did. This is using a copy of PDF24 installed yesterday onto Windows 10.

    1. The original PDF. As I said, this looks like someone's scan. The first word on the page is "purple", but Find can't recognise it, showing that the file is not searchable.2
    2. The OCR'd PDF. This shows an earlier page. The paper is about Roman textiles, and a search for "Roman" found 30 occurrences, of which the second is shown. This PDF clearly is searchable.
    3. The DOCX file to which the OCR'd PDF was converted. This shows it open in LibreOffice Writer (I don't have Word). You can see a box around the content. Dragging its top left vertex pulls the content along as if it were an image. Searching for the word "Roman" (and for other words I know are there, such as "wool") finds nothing.
Stefan Ziegler Answered question 2021-01-11

This is Phil van Kleur, who wrote the question above. I spent over half an hour preparing screenshots to include in the question, only to find when I'd uploaded them all that the system refused to accept the question. It kept saying "There is an error in your image", but would not explain what. By trial and error, I found I had to remove all the images before I could submit. I was expecting I could then go back and remove the text referring to them, but I can't find a control for editing submitted questions. Unlike StackExchange, don't you allow that, or am I missing something?

Anyway, I would really appreciate it if the site would explain why it won't accept my images, instead of just saying there's an error. I prepared them carefully to show exactly what each file looked like, and what were the settings I used.