0

The screenshot below shows stages in converting a PDF to Word and attempting to edit the result. Although the PDF is text, the Word version has come across as images. I'm very confused by this. It may be that I'm using the wrong settings, so I've put up a temporary copy of the PDF at http://www.filedropper.com/ddwild2002thetextileindustriesofromanbritain . It's an academic paper with book-style pagination, containing text and a few diagrams. I'd be very grateful for advice.

Below, I explain how I converted it in PDF24 and why the result isn't what I hoped for. In the top-left corner of the screenshot, I opened my local copy in Microsoft Edge, using this just as a display tool. You can see that I've selected some text. To prove that it is text, in the top-right corner, I paste this into Notepad. The result may be a bit small to read, but text definitely has been copied from the PDF.

In the bottom-left corner is what PDF24 shows after I converted the original PDF to Word.

Below the Notepad window is the result of opening this in Word. Note the green square and red circle at the top of the page. This shows that Word thinks the page is an image.

Under that window is another version of the PDF opened in Word. This one was converted with a converter called SmallPDF. You can see that it is text, because I was able to edit the title. I have also converted the original PDF with Acrobat DC, Word's PDF importer, and Kofax Power PDF. They all generate Word files containing text I can edit. Only PDF24 generates images. It does this regardless of whether I also ask it to OCR the PDF: but none of the other converters needed me to do that.

I can't find a version number on PDF24, but it's running on Windows 10 and was installed less than a month ago.

Thanks!

Stefan Ziegler Answered question 2021-01-18

This is similar to my question at https://help.pdf24.org/en/questions/question/why-does-ocring-and-converting-to-docx-leave-the-docx-file-as-images/ , except that there, the PDF being converted was images of text. Here, it's text, as shown by my being able to copy from it. As PDF24 gave me images in both cases, I wonder whether it is designed to do that if it can't work out how to convert the PDF to Word text?