Hi,
I want to OCR a document written in Serbian (Cyrillic script), Latin and German. It's a 19th century book. After I start the OCR process the pages are optimized, and then an error appears. It says:
Could not open data file srp_latin.traineddata. Try resetting!
I tried resetting, but the error still appears. To be clear, I don't need Serbian-Latin OCR, but normal, Cyrillic Serbian.
Please send me your trainDataList.txt file to forum@pdf24.org, which is located in the %LOCALAPPDATA%\PDF24\tesseract\5.4.1 directory.
Do you use the latest version of pdf24 creator? Which version do you use?
11.26.1. Should I reinstall?
That's fine. Try to delete the %LOCALAPPDATA%PDF24tesseract folder and then try OCR again.
After I deleted the folder, there is an error saying something is wrong with internet connection and it can't download the language files. But my connection is fine.
The file name of srp_latin.traineddata should be srp_latn.traineddata, wothout the i.
Sorry, where is the file supposed to be located?
Which language files did you select?
Language files are downloaded from the internet if you do not have them installed locally, so for the download, an internet connection is required.
I have a stable internet connection. I already OCRed documents in Russian, Ukrainian, Belarusian, Polish, and the only error I get is with Serbian.
I dont have the file. PDF24tesseract only contains tessdata folder, LICENSE and tesseract.exe