PDF-OCR command line

Home Forums PDF24 Creator General PDF-OCR command line

Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
    Posts
  • #63211
    dolezelt
    Participant

    Hi

    Because it is not possible to drop all directory content (with subdirectories etc) to PDF OCR app (i am able to drag only files) i tried to use command line batch.

    I do have this batch that looks OK, but got no output file:

    @For /R "C:\Users\dolez\Desktop\SO 03" %%G In (*.pdf) Do @For %%H In ("%%~dpG.") Do @"%ProgramFiles%\PDF24\pdf24-Ocr.exe" -outputFile "%%~nxH-%%~nG_ocred%%~xG" -language ces -dpi 300 -deskew -autoRotatePages "%%G"

    batch got this echo:

     

    ================

    "C:\Program Files\PDF24\tesseract\tesseract.exe" -v

    ----------------

    ================

    Optimizing PDF

    ================

    "C:\Program Files\PDF24\jre\bin\java.exe" -cp "C:\Program Files\PDF24\lib\jar\*" -Dwindows.acp=1252 "org.pdf24.OcrPdfOptimizer" "-deskew" "C:\Users\dolez\Desktop\SO 03\Mar.pdf" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_0_1211166406_1581469258_opt.pdf"

    ----------------

    OPTIMIZE> PDF24 PDF OCR Optimizer

    OPTIMIZE>

    OPTIMIZE> 2023.05.26 14:09:35 main: Start

    OPTIMIZE> 2023.05.26 14:09:35 main: Skipping page 1; images collected: 9

    OPTIMIZE> 2023.05.26 14:09:35 main: Skipping page 2; images collected: 3

    OPTIMIZE> 2023.05.26 14:09:35 main: Skipping page 3; images collected: 4

    OPTIMIZE> 2023.05.26 14:09:35 main: Skipping page 4; images collected: 4

    OPTIMIZE> 2023.05.26 14:09:35 main: Images collected: 0

    OPTIMIZE> 2023.05.26 14:09:35 main: Done

    ================

    ================

    "C:\Program Files\PDF24\gs\bin\gswinc.exe" -dBATCH -dNOPAUSE -dSAFER -dALLOWPSTRANSPARENCY "-sFONTPATH=C:\WINDOWS\Fonts" -dNEWPDF=true  -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -dFirstPage=1 -dLastPage=1 -sDEVICE=png16m -dDownScaleFactor=1 "-sOutputFile=C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_1_1211167078_1229304252.png" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_0_1211166406_1581469258_opt.pdf"

    ----------------

    GPL Ghostscript 10.01.1 (2023-03-27)

    Copyright (C) 2023 Artifex Software, Inc.  All rights reserved.

    This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:

    see the file COPYING for details.

    Processing pages 1 through 1.

    Page 1

    ================

    ================

    "C:\Program Files\PDF24\tesseract\tesseract.exe" "--tessdata-dir" "C:\Users\dolez\AppData\Local\PDF24\tesseract\5.3.0\tessdata" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_1_1211167078_1229304252.png" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_2_1211167859_1337565084" "-l" "ces" "-c" "textonly_pdf=1" "--dpi" "300" "--oem" "3" "--psm" "1" "pdf" "txt"

    ----------------

    ================

    ================

    "C:\Program Files\PDF24\gs\bin\gswinc.exe" -dBATCH -dNOPAUSE -dSAFER -dALLOWPSTRANSPARENCY "-sFONTPATH=C:\WINDOWS\Fonts" -dNEWPDF=true  -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -dFirstPage=2 -dLastPage=2 -sDEVICE=png16m -dDownScaleFactor=1 "-sOutputFile=C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_3_1211169906_2639222066.png" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_0_1211166406_1581469258_opt.pdf"

    ----------------

    GPL Ghostscript 10.01.1 (2023-03-27)

    Copyright (C) 2023 Artifex Software, Inc.  All rights reserved.

    This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:

    see the file COPYING for details.

    Processing pages 2 through 2.

    Page 2

    ================

    ================

    "C:\Program Files\PDF24\tesseract\tesseract.exe" "--tessdata-dir" "C:\Users\dolez\AppData\Local\PDF24\tesseract\5.3.0\tessdata" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_3_1211169906_2639222066.png" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_4_1211170640_2528216152" "-l" "ces" "-c" "textonly_pdf=1" "--dpi" "300" "--oem" "3" "--psm" "1" "pdf" "txt"

    ----------------

    ================

    ================

    "C:\Program Files\PDF24\gs\bin\gswinc.exe" -dBATCH -dNOPAUSE -dSAFER -dALLOWPSTRANSPARENCY "-sFONTPATH=C:\WINDOWS\Fonts" -dNEWPDF=true  -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -dFirstPage=3 -dLastPage=3 -sDEVICE=png16m -dDownScaleFactor=1 "-sOutputFile=C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_5_1211171921_18012576.png" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_0_1211166406_1581469258_opt.pdf"

    ----------------

    GPL Ghostscript 10.01.1 (2023-03-27)

    Copyright (C) 2023 Artifex Software, Inc.  All rights reserved.

    This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:

    see the file COPYING for details.

    Processing pages 3 through 3.

    Page 3

    ================

    ================

    "C:\Program Files\PDF24\tesseract\tesseract.exe" "--tessdata-dir" "C:\Users\dolez\AppData\Local\PDF24\tesseract\5.3.0\tessdata" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_5_1211171921_18012576.png" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_6_1211172625_3814948804" "-l" "ces" "-c" "textonly_pdf=1" "--dpi" "300" "--oem" "3" "--psm" "1" "pdf" "txt"

    ----------------

    ================

    ================

    "C:\Program Files\PDF24\gs\bin\gswinc.exe" -dBATCH -dNOPAUSE -dSAFER -dALLOWPSTRANSPARENCY "-sFONTPATH=C:\WINDOWS\Fonts" -dNEWPDF=true  -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -dFirstPage=4 -dLastPage=4 -sDEVICE=png16m -dDownScaleFactor=1 "-sOutputFile=C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_7_1211173906_2698855182.png" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_0_1211166406_1581469258_opt.pdf"

    ----------------

    GPL Ghostscript 10.01.1 (2023-03-27)

    Copyright (C) 2023 Artifex Software, Inc.  All rights reserved.

    This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:

    see the file COPYING for details.

    Processing pages 4 through 4.

    Page 4

    ================

    ================

    "C:\Program Files\PDF24\tesseract\tesseract.exe" "--tessdata-dir" "C:\Users\dolez\AppData\Local\PDF24\tesseract\5.3.0\tessdata" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_7_1211173906_2698855182.png" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_8_1211174609_2574622620" "-l" "ces" "-c" "textonly_pdf=1" "--dpi" "300" "--oem" "3" "--psm" "1" "pdf" "txt"

    ----------------

    ERR> Too few characters. Skipping this page

    TESS> Too few characters. Skipping this page

    ERR> OSD: Weak margin (0.00) for 13 blob text block, but using orientation anyway: 0

    TESS> OSD: Weak margin (0.00) for 13 blob text block, but using orientation anyway: 0

    ================

    Auto Rotating PDF Pages

    ================

    "C:\Program Files\PDF24\jre\bin\java.exe" -cp "C:\Program Files\PDF24\lib\jar\*" -Dwindows.acp=1252 "org.pdf24.PdfPagesAutoRotator" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_9_1211175609_2384946173_ocred.pdf" "C:\Users\dolez\AppData\Local\Temp\PDF24\ocr_10_1211175625_419331387_rot.pdf"

    ----------------

    AUTOROT> PDF24 PDF Pages Auto Rotator

    AUTOROT>

    ================

     

    C:\Users\dolez\Desktop>pause

    Press any key to continue . . .

     

     

     

    #63213
    dolezelt
    Participant

    Hi

    My previous question is maybe complicated due to complicated batch. The batch should only do ocr through all directory content (include SUB DIR etc). Also tried simplified batch with the same result.
    The command is (simplified):

    pdf24-Ocr.exe -outputFile -language ces -dpi 300 -deskew -autoRotatePages -inputfile

     

    I tested command with specific file with the result that job starts but give no output file pdf file (also tried to start batch as administrator on Win 11). There was no output temp files in directory C:\Users\dolez\AppData\Local\Temp\PDF24\

     

     

     

    Any help?

     

Viewing 2 posts - 1 through 2 (of 2 total)
  • You must be logged in to reply to this topic.