Home › Forums › PDF24 Creator › General › PDF-OCR command line
- This topic has 1 reply, 1 voice, and was last updated 11 months, 1 week ago by dolezelt.
-
AuthorPosts
-
2023-05-26 at 12:10 #63211dolezeltParticipant
Hi
Because it is not possible to drop all directory content (with subdirectories etc) to PDF OCR app (i am able to drag only files) i tried to use command line batch.
I do have this batch that looks OK, but got no output file:
@For /R "C:\Users\dolez\Desktop\SO 03" %%G In (*.pdf) Do @For %%H In ("%%~dpG.") Do @"%ProgramFiles%\PDF24\pdf24-Ocr.exe" -outputFile "%%~nxH-%%~nG_ocred%%~xG" -language ces -dpi 300 -deskew -autoRotatePages "%%G"
batch got this echo:
================
"C:\Program Files\PDF24\tesseract\tesseract.exe" -v
----------------
================
Optimizing PDF
================
"C:\Program Files\PDF24\jre\bin\java.exe" -cp "C:\Program Files\PDF24\lib\jar\*" -Dwindows.acp=1252 "org.pdf24.OcrPdfOptimizer" "-deskew" "C:\Users\dolez\Desktop\SO 03\Mar.pdf" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_0_1211166406_1581469258_opt.pdf"
----------------
OPTIMIZE> PDF24 PDF OCR Optimizer
OPTIMIZE>
OPTIMIZE> 2023.05.26 14:09:35 main: Start
OPTIMIZE> 2023.05.26 14:09:35 main: Skipping page 1; images collected: 9
OPTIMIZE> 2023.05.26 14:09:35 main: Skipping page 2; images collected: 3
OPTIMIZE> 2023.05.26 14:09:35 main: Skipping page 3; images collected: 4
OPTIMIZE> 2023.05.26 14:09:35 main: Skipping page 4; images collected: 4
OPTIMIZE> 2023.05.26 14:09:35 main: Images collected: 0
OPTIMIZE> 2023.05.26 14:09:35 main: Done
================
================
"C:\Program Files\PDF24\gs\bin\gswinc.exe" -dBATCH -dNOPAUSE -dSAFER -dALLOWPSTRANSPARENCY "-sFONTPATH=C:\WINDOWS\Fonts" -dNEWPDF=true -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -dFirstPage=1 -dLastPage=1 -sDEVICE=png16m -dDownScaleFactor=1 "-sOutputFile=C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_1_1211167078_1229304252.png" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_0_1211166406_1581469258_opt.pdf"
----------------
GPL Ghostscript 10.01.1 (2023-03-27)
Copyright (C) 2023 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 1 through 1.
Page 1
================
================
"C:\Program Files\PDF24\tesseract\tesseract.exe" "--tessdata-dir" "C:\Users\dolez\AppData\Local\PDF24\tesseract\5.3.0\tessdata" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_1_1211167078_1229304252.png" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_2_1211167859_1337565084" "-l" "ces" "-c" "textonly_pdf=1" "--dpi" "300" "--oem" "3" "--psm" "1" "pdf" "txt"
----------------
================
================
"C:\Program Files\PDF24\gs\bin\gswinc.exe" -dBATCH -dNOPAUSE -dSAFER -dALLOWPSTRANSPARENCY "-sFONTPATH=C:\WINDOWS\Fonts" -dNEWPDF=true -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -dFirstPage=2 -dLastPage=2 -sDEVICE=png16m -dDownScaleFactor=1 "-sOutputFile=C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_3_1211169906_2639222066.png" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_0_1211166406_1581469258_opt.pdf"
----------------
GPL Ghostscript 10.01.1 (2023-03-27)
Copyright (C) 2023 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 2 through 2.
Page 2
================
================
"C:\Program Files\PDF24\tesseract\tesseract.exe" "--tessdata-dir" "C:\Users\dolez\AppData\Local\PDF24\tesseract\5.3.0\tessdata" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_3_1211169906_2639222066.png" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_4_1211170640_2528216152" "-l" "ces" "-c" "textonly_pdf=1" "--dpi" "300" "--oem" "3" "--psm" "1" "pdf" "txt"
----------------
================
================
"C:\Program Files\PDF24\gs\bin\gswinc.exe" -dBATCH -dNOPAUSE -dSAFER -dALLOWPSTRANSPARENCY "-sFONTPATH=C:\WINDOWS\Fonts" -dNEWPDF=true -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -dFirstPage=3 -dLastPage=3 -sDEVICE=png16m -dDownScaleFactor=1 "-sOutputFile=C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_5_1211171921_18012576.png" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_0_1211166406_1581469258_opt.pdf"
----------------
GPL Ghostscript 10.01.1 (2023-03-27)
Copyright (C) 2023 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 3 through 3.
Page 3
================
================
"C:\Program Files\PDF24\tesseract\tesseract.exe" "--tessdata-dir" "C:\Users\dolez\AppData\Local\PDF24\tesseract\5.3.0\tessdata" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_5_1211171921_18012576.png" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_6_1211172625_3814948804" "-l" "ces" "-c" "textonly_pdf=1" "--dpi" "300" "--oem" "3" "--psm" "1" "pdf" "txt"
----------------
================
================
"C:\Program Files\PDF24\gs\bin\gswinc.exe" -dBATCH -dNOPAUSE -dSAFER -dALLOWPSTRANSPARENCY "-sFONTPATH=C:\WINDOWS\Fonts" -dNEWPDF=true -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -dFirstPage=4 -dLastPage=4 -sDEVICE=png16m -dDownScaleFactor=1 "-sOutputFile=C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_7_1211173906_2698855182.png" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_0_1211166406_1581469258_opt.pdf"
----------------
GPL Ghostscript 10.01.1 (2023-03-27)
Copyright (C) 2023 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 4 through 4.
Page 4
================
================
"C:\Program Files\PDF24\tesseract\tesseract.exe" "--tessdata-dir" "C:\Users\dolez\AppData\Local\PDF24\tesseract\5.3.0\tessdata" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_7_1211173906_2698855182.png" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_8_1211174609_2574622620" "-l" "ces" "-c" "textonly_pdf=1" "--dpi" "300" "--oem" "3" "--psm" "1" "pdf" "txt"
----------------
ERR> Too few characters. Skipping this page
TESS> Too few characters. Skipping this page
ERR> OSD: Weak margin (0.00) for 13 blob text block, but using orientation anyway: 0
TESS> OSD: Weak margin (0.00) for 13 blob text block, but using orientation anyway: 0
================
Auto Rotating PDF Pages
================
"C:\Program Files\PDF24\jre\bin\java.exe" -cp "C:\Program Files\PDF24\lib\jar\*" -Dwindows.acp=1252 "org.pdf24.PdfPagesAutoRotator" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_9_1211175609_2384946173_ocred.pdf" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_10_1211175625_419331387_rot.pdf"
----------------
AUTOROT> PDF24 PDF Pages Auto Rotator
AUTOROT>
================
C:\Users\dolez\Desktop>pause
Press any key to continue . . .
2023-05-29 at 08:16 #63213dolezeltParticipantHi
My previous question is maybe complicated due to complicated batch. The batch should only do ocr through all directory content (include SUB DIR etc). Also tried simplified batch with the same result.
The command is (simplified):pdf24-Ocr.exe -outputFile -language ces -dpi 300 -deskew -autoRotatePages -inputfile
I tested command with specific file with the result that job starts but give no output file pdf file (also tried to start batch as administrator on Win 11). There was no output temp files in directory C:\Users\dolez\AppData\Local\Temp\PDF24\
Any help?
-
AuthorPosts
- You must be logged in to reply to this topic.