OCR Serbian

Question

558 views2025-06-05PDF24 Creator

0

mikolajddd 14 2025-06-05 0 Comments

Hi,

I want to OCR a document written in Serbian (Cyrillic script), Latin and German. It's a 19th century book. After I start the OCR process the pages are optimized, and then an error appears. It says:

Could not open data file srp_latin.traineddata. Try resetting!

I tried resetting, but the error still appears. To be clear, I don't need Serbian-Latin OCR, but normal, Cyrillic Serbian.

Stefan Ziegler Answered question 2025-06-05

3 Answers

score 0 · Answer 1 · 2025-06-05T13:43:11+00:00

0

Stefan Ziegler 1.85K Posted 2025-06-05 8 Comments

Please send me your trainDataList.txt file to forum@pdf24.org, which is located in the %LOCALAPPDATA%\PDF24\tesseract\5.4.1 directory.

Stefan Ziegler Posted new comment 2025-06-06

mikolajddd commented 2025-06-05

I dont have the file. PDF24tesseract only contains tessdata folder, LICENSE and tesseract.exe

Stefan Ziegler commented 2025-06-05

Do you use the latest version of pdf24 creator? Which version do you use?

mikolajddd commented 2025-06-05

11.26.1. Should I reinstall?

Stefan Ziegler commented 2025-06-05

That's fine. Try to delete the %LOCALAPPDATA%PDF24tesseract folder and then try OCR again.

mikolajddd commented 2025-06-05

After I deleted the folder, there is an error saying something is wrong with internet connection and it can't download the language files. But my connection is fine.

Show 3 more comments