I have created 82 PDF using PDFSharp to create pages from individual JPG files. These all open with all readers I have tried with success. I then programmatically convert all 82 to searchable PDF using OCR:
Private Function OCRDocumentAsync(OCRList As List(Of NewCaseDocument), report As DocumentOCRProgressReport) As Task
Dim Success As Boolean = True
Dim Err As Exception = Nothing
Dim sb As New StringBuilderDim caseDocument As NewCaseDocument
For Each caseDocument In OCRList
Dim outPath As String = Path.Combine(caseDocument.ParentCase.CaseDocumentsFolder, caseDocument.SearchablePath)
Dim inPath As String = Path.Combine(caseDocument.ParentCase.CaseDocumentsFolder, caseDocument.OriginalPath)process = New Process()
sb.Clear()
sb.Append($" -outputFile ""{outPath}""")
sb.Append(" -language eng")
sb.Append(" -dpi 300")
sb.Append(" -language eng")
sb.Append(" -deskew")
sb.Append(" -autoRotatePages")
sb.Append(" """)
sb.Append(inPath)
sb.Append("""")With process.StartInfo
.FileName = Pdf24ExecutablePath
.Arguments = sb.ToString
.UseShellExecute = False
.RedirectStandardOutput = True
.RedirectStandardError = True
.CreateNoWindow = True
End Withprocess.Start()
process.BeginOutputReadLine()
process.BeginErrorReadLine()process.WaitForExit()
process.Close()
process.Dispose()
process = Nothingreport.OCRedDocuments += 1
report.NotOCRedDocuments -= 1
report.OCRedPages += caseDocument.DocumentPages.Count
report.NotOCRedPages -= caseDocument.DocumentPages.CountUpdateOCRProgress(report)
'cmp.CompressFile(outPath)
NextEnd Function
The issue is some percentage of these new documents can not be opened.
Here is my processing log. The first file produced a correct pdf. The second did not
► OCR services provided by PDF24 Creator Version 11.18.0
GPL Ghostscript 10.03.1 (2024-05-02)
Copyright (C) 2024 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 1 through 1.
Page 1
Error:warning: ignoring zlib error: incorrect data check
OCR page 1
Error:warning: ignoring zlib error: incorrect data check
Error:warning: ignoring zlib error: incorrect data check
GPL Ghostscript 10.03.1 (2024-05-02)
Copyright (C) 2024 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 2 through 2.
Page 2
OCR page 2
Error:warning: ignoring zlib error: incorrect data check
Error:warning: ignoring zlib error: incorrect data check
================
"C:\Program Files\PDF24\tesseract\tesseract.exe" -v
----------------
================
Optimizing PDF
================
"C:\Program Files\PDF24\jre\bin\java.exe" -cp "C:\Program Files\PDF24\lib\jar\*" -Dwindows.acp=1252 -XX:MaxRAMPercentage=80 "org.pdf24.OcrPdfOptimizer" "-deskew" "H:\Spokane Court Cases\SN_2320522932\Documents\SN_2320522932_0101 Sub-1.0 2023-12-06 CASE INFORMATION COVER SHEET.pdf" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_0_521603312_2282591701_opt.pdf"
----------------
OPTIMIZE> PDF24 PDF OCR Optimizer
OPTIMIZE>
OPTIMIZE> 2024.06.17 18:25:09 main: Start
OPTIMIZE> 2024.06.17 18:25:09 main: Images collected: 2
OPTIMIZE> 2024.06.17 18:25:09 main: Processing image: 1/2
OPTIMIZE> 2024.06.17 18:25:09 main: Skew angle: -0.01 deg
OPTIMIZE> 2024.06.17 18:25:10 main: Processing image: 2/2
OPTIMIZE> 2024.06.17 18:25:10 main: Skew angle: 0.07 deg
OPTIMIZE> 2024.06.17 18:25:11 main: JPG quality: 0.95
OPTIMIZE> 2024.06.17 18:25:11 main: JPG quality: 0.85
OPTIMIZE> 2024.06.17 18:25:11 main: JPG quality: 0.75
OPTIMIZE> 2024.06.17 18:25:11 main: JPG quality: 0.65
OPTIMIZE> 2024.06.17 18:25:11 main: JPG quality: 0.55
OPTIMIZE> 2024.06.17 18:25:11 main: JPG quality: 0.45
OPTIMIZE> 2024.06.17 18:25:11 main: Done
================
================
"C:\Program Files\PDF24\gs\bin\gswinc.exe" -dBATCH -dNOPAUSE -dSAFER -dALLOWPSTRANSPARENCY "-sFONTPATH=C:\WINDOWS\Fonts;C:\Users\vulca\AppData\Local\Microsoft\Windows\Fonts" -dNEWPDF=true -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -dFirstPage=1 -dLastPage=1 -sDEVICE=png16m -dDownScaleFactor=1 "-sOutputFile=C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_1_521605640_1849103157.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_0_521603312_2282591701_opt.pdf"
----------------
================
================
"C:\Program Files\PDF24\tesseract\tesseract.exe" "--tessdata-dir" "C:\Users\vulca\AppData\Local\PDF24\tesseract\5.3.0\tessdata" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_1_521605640_1849103157.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_2_521606234_1511152801" "-l" "eng" "-c" "textonly_pdf=1" "--dpi" "300" "--oem" "3" "--psm" "1" "pdf" "txt"
----------------
================
================
"C:\Program Files\PDF24\gs\bin\gswinc.exe" -dBATCH -dNOPAUSE -dSAFER -dALLOWPSTRANSPARENCY "-sFONTPATH=C:\WINDOWS\Fonts;C:\Users\vulca\AppData\Local\Microsoft\Windows\Fonts" -dNEWPDF=true -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -dFirstPage=2 -dLastPage=2 -sDEVICE=png16m -dDownScaleFactor=1 "-sOutputFile=C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_3_521607093_842780324.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_0_521603312_2282591701_opt.pdf"
----------------
================
================
"C:\Program Files\PDF24\tesseract\tesseract.exe" "--tessdata-dir" "C:\Users\vulca\AppData\Local\PDF24\tesseract\5.3.0\tessdata" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_3_521607093_842780324.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_4_521607921_209992862" "-l" "eng" "-c" "textonly_pdf=1" "--dpi" "300" "--oem" "3" "--psm" "1" "pdf" "txt"
----------------
================
Auto Rotating PDF Pages
================
"C:\Program Files\PDF24\jre\bin\java.exe" -cp "C:\Program Files\PDF24\lib\jar\*" -Dwindows.acp=1252 -XX:MaxRAMPercentage=80 "org.pdf24.PdfPagesAutoRotator" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_5_521611093_2964690093_ocred.pdf" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_6_521611125_4089732727_rot.pdf"
----------------
AUTOROT> PDF24 PDF Pages Auto Rotator
AUTOROT>
AUTOROT> Jun 17, 2024 6:25:17 PM org.apache.fontbox.ttf.PostScriptTable read
AUTOROT> WARNING: No PostScript name data is provided for the font null
AUTOROT> Jun 17, 2024 6:25:17 PM org.apache.fontbox.ttf.PostScriptTable read
AUTOROT> WARNING: No PostScript name data is provided for the font null
================
-->
================
"C:\Program Files\PDF24\tesseract\tesseract.exe" -v
----------------
================
Optimizing PDF
================
"C:\Program Files\PDF24\jre\bin\java.exe" -cp "C:\Program Files\PDF24\lib\jar\*" -Dwindows.acp=1252 -XX:MaxRAMPercentage=80 "org.pdf24.OcrPdfOptimizer" "-deskew" "H:\Spokane Court Cases\SN_2320522932\Documents\SN_2320522932_0102 Sub-2.0 2023-12-06 SUMMONS AND COMPLAINT.pdf" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_0_521612437_3271843913_opt.pdf"
----------------
OPTIMIZE> PDF24 PDF OCR Optimizer
OPTIMIZE>
OPTIMIZE> 2024.06.17 18:25:18 main: Start
OPTIMIZE> 2024.06.17 18:25:18 main: Images collected: 16
OPTIMIZE> 2024.06.17 18:25:18 main: Processing image: 1/16
OPTIMIZE> 2024.06.17 18:25:19 main: Skew angle: -0.01 deg
OPTIMIZE> 2024.06.17 18:25:19 main: Processing image: 2/16
OPTIMIZE> 2024.06.17 18:25:19 main: Skew angle: -0.02 deg
OPTIMIZE> 2024.06.17 18:25:20 main: JPG quality: 0.95
OPTIMIZE> 2024.06.17 18:25:20 main: JPG quality: 0.85
OPTIMIZE> 2024.06.17 18:25:20 main: JPG quality: 0.75
OPTIMIZE> 2024.06.17 18:25:20 main: JPG quality: 0.65
OPTIMIZE> 2024.06.17 18:25:20 main: JPG quality: 0.55
OPTIMIZE> 2024.06.17 18:25:20 main: Processing image: 3/16
OPTIMIZE> 2024.06.17 18:25:20 main: Skew angle: 0.12 deg
OPTIMIZE> 2024.06.17 18:25:21 main: JPG quality: 0.95
OPTIMIZE> 2024.06.17 18:25:21 main: JPG quality: 0.85
OPTIMIZE> 2024.06.17 18:25:21 main: JPG quality: 0.75
OPTIMIZE> 2024.06.17 18:25:21 main: JPG quality: 0.65
OPTIMIZE> 2024.06.17 18:25:21 main: JPG quality: 0.55
OPTIMIZE> 2024.06.17 18:25:21 main: JPG quality: 0.45
OPTIMIZE> 2024.06.17 18:25:21 main: Processing image: 4/16
OPTIMIZE> 2024.06.17 18:25:21 main: Skew angle: -0.02 deg
OPTIMIZE> 2024.06.17 18:25:22 main: JPG quality: 0.95
OPTIMIZE> 2024.06.17 18:25:22 main: JPG quality: 0.85
OPTIMIZE> 2024.06.17 18:25:22 main: JPG quality: 0.75
OPTIMIZE> 2024.06.17 18:25:22 main: JPG quality: 0.65
OPTIMIZE> 2024.06.17 18:25:22 main: JPG quality: 0.55
OPTIMIZE> 2024.06.17 18:25:22 main: Processing image: 5/16
OPTIMIZE> 2024.06.17 18:25:22 main: Skew angle: 0.00 deg
OPTIMIZE> 2024.06.17 18:25:22 main: Processing image: 6/16
OPTIMIZE> 2024.06.17 18:25:22 main: Skew angle: 0.12 deg
OPTIMIZE> 2024.06.17 18:25:23 main: JPG quality: 0.95
OPTIMIZE> 2024.06.17 18:25:23 main: JPG quality: 0.85
OPTIMIZE> 2024.06.17 18:25:23 main: JPG quality: 0.75
OPTIMIZE> 2024.06.17 18:25:23 main: JPG quality: 0.65
OPTIMIZE> 2024.06.17 18:25:23 main: JPG quality: 0.55
OPTIMIZE> 2024.06.17 18:25:23 main: JPG quality: 0.45
OPTIMIZE> 2024.06.17 18:25:23 main: Processing image: 7/16
OPTIMIZE> 2024.06.17 18:25:23 main: Skew angle: 0.00 deg
OPTIMIZE> 2024.06.17 18:25:23 main: Processing image: 8/16
OPTIMIZE> 2024.06.17 18:25:23 main: Skew angle: 0.00 deg
OPTIMIZE> 2024.06.17 18:25:23 main: Processing image: 9/16
OPTIMIZE> 2024.06.17 18:25:24 main: Skew angle: 0.00 deg
OPTIMIZE> 2024.06.17 18:25:24 main: Processing image: 10/16
OPTIMIZE> 2024.06.17 18:25:24 main: Skew angle: 0.04 deg
OPTIMIZE> 2024.06.17 18:25:24 main: JPG quality: 0.95
OPTIMIZE> 2024.06.17 18:25:24 main: JPG quality: 0.85
OPTIMIZE> 2024.06.17 18:25:24 main: JPG quality: 0.75
OPTIMIZE> 2024.06.17 18:25:24 main: JPG quality: 0.65
OPTIMIZE> 2024.06.17 18:25:25 main: JPG quality: 0.55
OPTIMIZE> 2024.06.17 18:25:25 main: JPG quality: 0.45
OPTIMIZE> 2024.06.17 18:25:25 main: Processing image: 11/16
OPTIMIZE> 2024.06.17 18:25:25 main: Skew angle: 0.05 deg
OPTIMIZE> 2024.06.17 18:25:25 main: JPG quality: 0.95
OPTIMIZE> 2024.06.17 18:25:25 main: JPG quality: 0.85
OPTIMIZE> 2024.06.17 18:25:25 main: JPG quality: 0.75
OPTIMIZE> 2024.06.17 18:25:25 main: JPG quality: 0.65
OPTIMIZE> 2024.06.17 18:25:25 main: JPG quality: 0.55
OPTIMIZE> 2024.06.17 18:25:25 main: JPG quality: 0.45
OPTIMIZE> 2024.06.17 18:25:26 main: Processing image: 12/16
OPTIMIZE> 2024.06.17 18:25:26 main: Skew angle: 0.00 deg
OPTIMIZE> 2024.06.17 18:25:26 main: Processing image: 13/16
OPTIMIZE> 2024.06.17 18:25:26 main: Skew angle: 0.00 deg
OPTIMIZE> 2024.06.17 18:25:26 maGPL Ghostscript 10.03.1 (2024-05-02)
Copyright (C) 2024 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 1 through 1.
Page 1
Error:warning: ignoring zlib error: incorrect data check
OCR page 1
Error:warning: ignoring zlib error: incorrect data check
Error:warning: ignoring zlib error: incorrect data check
GPL Ghostscript 10.03.1 (2024-05-02)
Copyright (C) 2024 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 2 through 2.
Page 2
OCR page 2
Error:warning: ignoring zlib error: incorrect data check
Error:warning: ignoring zlib error: incorrect data check
GPL Ghostscript 10.03.1 (2024-05-02)
Copyright (C) 2024 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 3 through 3.
Page 3
OCR page 3
Error:warning: ignoring zlib error: incorrect data check
Error:warning: ignoring zlib error: incorrect data check
in: Processing image: 14/16
OPTIMIZE> 2024.06.17 18:25:26 main: Skew angle: 0.00 deg
OPTIMIZE> 2024.06.17 18:25:26 main: Processing image: 15/16
OPTIMIZE> 2024.06.17 18:25:26 main: Skew angle: 0.01 deg
OPTIMIZE> 2024.06.17 18:25:27 main: JPG quality: 0.95
OPTIMIZE> 2024.06.17 18:25:27 main: JPG quality: 0.85
OPTIMIZE> 2024.06.17 18:25:27 main: JPG quality: 0.75
OPTIMIZE> 2024.06.17 18:25:27 main: JPG quality: 0.65
OPTIMIZE> 2024.06.17 18:25:27 main: JPG quality: 0.55
OPTIMIZE> 2024.06.17 18:25:27 main: Processing image: 16/16
OPTIMIZE> 2024.06.17 18:25:27 main: Skew angle: 0.12 deg
OPTIMIZE> 2024.06.17 18:25:27 main: JPG quality: 0.95
OPTIMIZE> 2024.06.17 18:25:28 main: JPG quality: 0.85
OPTIMIZE> 2024.06.17 18:25:28 main: JPG quality: 0.75
OPTIMIZE> 2024.06.17 18:25:28 main: JPG quality: 0.65
OPTIMIZE> 2024.06.17 18:25:28 main: JPG quality: 0.55
OPTIMIZE> 2024.06.17 18:25:28 main: Done
================
================
"C:\Program Files\PDF24\gs\bin\gswinc.exe" -dBATCH -dNOPAUSE -dSAFER -dALLOWPSTRANSPARENCY "-sFONTPATH=C:\WINDOWS\Fonts;C:\Users\vulca\AppData\Local\Microsoft\Windows\Fonts" -dNEWPDF=true -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -dFirstPage=1 -dLastPage=1 -sDEVICE=png16m -dDownScaleFactor=1 "-sOutputFile=C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_1_521622312_1804220203.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_0_521612437_3271843913_opt.pdf"
----------------
================
================
"C:\Program Files\PDF24\tesseract\tesseract.exe" "--tessdata-dir" "C:\Users\vulca\AppData\Local\PDF24\tesseract\5.3.0\tessdata" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_1_521622312_1804220203.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_2_521622859_496003721" "-l" "eng" "-c" "textonly_pdf=1" "--dpi" "300" "--oem" "3" "--psm" "1" "pdf" "txt"
----------------
================
================
"C:\Program Files\PDF24\gs\bin\gswinc.exe" -dBATCH -dNOPAUSE -dSAFER -dALLOWPSTRANSPARENCY "-sFONTPATH=C:\WINDOWS\Fonts;C:\Users\vulca\AppData\Local\Microsoft\Windows\Fonts" -dNEWPDF=true -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -dFirstPage=2 -dLastPage=2 -sDEVICE=png16m -dDownScaleFactor=1 "-sOutputFile=C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_3_521623687_2842099538.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_0_521612437_3271843913_opt.pdf"
----------------
================
================
"C:\Program Files\PDF24\tesseract\tesseract.exe" "--tessdata-dir" "C:\Users\vulca\AppData\Local\PDF24\tesseract\5.3.0\tessdata" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_3_521623687_2842099538.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_4_521624328_3083702292" "-l" "eng" "-c" "textonly_pdf=1" "--dpi" "300" "--oem" "3" "--psm" "1" "pdf" "txt"
----------------
================
================
"C:\Program Files\PDF24\gs\bin\gswinc.exe" -dBATCH -dNOPAUSE -dSAFER -dALLOWPSTRANSPARENCY "-sFONTPATH=C:\WINDOWS\Fonts;C:\Users\vulca\AppData\Local\Microsoft\Windows\Fonts" -dNEWPDF=true -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -dFirstPage=3 -dLastPage=3 -sDEVICE=png16m -dDownScaleFactor=1 "-sOutputFile=C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_5_521626046_2867351540.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_0_521612437_3271843913_opt.pdf"
----------------
================
================
"C:\Program Files\PDF24\tesseract\tesseract.exe" "--tessdata-dir" "C:\Users\vulca\AppData\Local\PDF24\tesseract\5.3.0\tessdata" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_5_521626046_2867351540.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_6_521626640_2826299966" "-l" "eng" "-c" "textonly_pdf=1" "--dpi" "300" "--oem" "3" "--psm" "1" "pdf" "txt"
----------------
================
================
"C:\Program Files\PDF24\gs\bin\gswinc.exe" -dBATCH -dNOPAUSE -dSAFER -dALLOWPSTRANSPARENCY "-sFONTPATH=C:\WINDOWS\Fonts;C:\Users\vulca\AppData\Local\Microsoft\Windows\Fonts" -dNEWPDF=true -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -dFirstPage=4 -dLastPage=4 -sDEVICE=png16m -dDownScaleFactor=1 "-sOutputFile=C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_7_521627906_2967988794.png" "C:\Users\vGPL Ghostscript 10.03.1 (2024-05-02)
Copyright (C) 2024 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 4 through 4.
Page 4
OCR page 4
Error:warning: ignoring zlib error: incorrect data check
Error:warning: ignoring zlib error: incorrect data check
GPL Ghostscript 10.03.1 (2024-05-02)
Copyright (C) 2024 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 5 through 5.
Page 5
OCR page 5
Error:warning: ignoring zlib error: incorrect data check
Error:warning: ignoring zlib error: incorrect data check
GPL Ghostscript 10.03.1 (2024-05-02)
Copyright (C) 2024 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 6 through 6.
Page 6
OCR page 6
Error:warning: ignoring zlib error: incorrect data check
Error:warning: ignoring zlib error: incorrect data check
GPL Ghostscript 10.03.1 (2024-05-02)
Copyright (C) 2024 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 7 through 7.
Page 7
OCR page 7
Error:warning: ignoring zlib error: incorrect data check
Error:warning: ignoring zlib error: incorrect data check
GPL Ghostscript 10.03.1 (2024-05-02)
Copyright (C) 2024 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 8 through 8.
Page 8
OCR page 8
Error:warning: ignoring zlib error: incorrect data check
Error:warning: ignoring zlib error: incorrect data check
ulca\AppData\Local\Temp\PDF24\ocr_0_521612437_3271843913_opt.pdf"
----------------
================
================
"C:\Program Files\PDF24\tesseract\tesseract.exe" "--tessdata-dir" "C:\Users\vulca\AppData\Local\PDF24\tesseract\5.3.0\tessdata" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_7_521627906_2967988794.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_8_521628484_2791072866" "-l" "eng" "-c" "textonly_pdf=1" "--dpi" "300" "--oem" "3" "--psm" "1" "pdf" "txt"
----------------
================
================
"C:\Program Files\PDF24\gs\bin\gswinc.exe" -dBATCH -dNOPAUSE -dSAFER -dALLOWPSTRANSPARENCY "-sFONTPATH=C:\WINDOWS\Fonts;C:\Users\vulca\AppData\Local\Microsoft\Windows\Fonts" -dNEWPDF=true -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -dFirstPage=5 -dLastPage=5 -sDEVICE=png16m -dDownScaleFactor=1 "-sOutputFile=C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_9_521629812_3462384554.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_0_521612437_3271843913_opt.pdf"
----------------
================
================
"C:\Program Files\PDF24\tesseract\tesseract.exe" "--tessdata-dir" "C:\Users\vulca\AppData\Local\PDF24\tesseract\5.3.0\tessdata" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_9_521629812_3462384554.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_10_521630546_3336302832" "-l" "eng" "-c" "textonly_pdf=1" "--dpi" "300" "--oem" "3" "--psm" "1" "pdf" "txt"
----------------
================
================
"C:\Program Files\PDF24\gs\bin\gswinc.exe" -dBATCH -dNOPAUSE -dSAFER -dALLOWPSTRANSPARENCY "-sFONTPATH=C:\WINDOWS\Fonts;C:\Users\vulca\AppData\Local\Microsoft\Windows\Fonts" -dNEWPDF=true -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -dFirstPage=6 -dLastPage=6 -sDEVICE=png16m -dDownScaleFactor=1 "-sOutputFile=C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_11_521632312_4100019467.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_0_521612437_3271843913_opt.pdf"
----------------
================
================
"C:\Program Files\PDF24\tesseract\tesseract.exe" "--tessdata-dir" "C:\Users\vulca\AppData\Local\PDF24\tesseract\5.3.0\tessdata" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_11_521632312_4100019467.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_12_521633062_1620023086" "-l" "eng" "-c" "textonly_pdf=1" "--dpi" "300" "--oem" "3" "--psm" "1" "pdf" "txt"
----------------
================
================
"C:\Program Files\PDF24\gs\bin\gswinc.exe" -dBATCH -dNOPAUSE -dSAFER -dALLOWPSTRANSPARENCY "-sFONTPATH=C:\WINDOWS\Fonts;C:\Users\vulca\AppData\Local\Microsoft\Windows\Fonts" -dNEWPDF=true -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -dFirstPage=7 -dLastPage=7 -sDEVICE=png16m -dDownScaleFactor=1 "-sOutputFile=C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_13_521635015_3754366077.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_0_521612437_3271843913_opt.pdf"
----------------
================
================
"C:\Program Files\PDF24\tesseract\tesseract.exe" "--tessdata-dir" "C:\Users\vulca\AppData\Local\PDF24\tesseract\5.3.0\tessdata" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_13_521635015_3754366077.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_14_521635734_2318769125" "-l" "eng" "-c" "textonly_pdf=1" "--dpi" "300" "--oem" "3" "--psm" "1" "pdf" "txt"
----------------
================
================
"C:\Program Files\PDF24\gs\bin\gswinc.exe" -dBATCH -dNOPAUSE -dSAFER -dALLOWPSTRANSPARENCY "-sFONTPATH=C:\WINDOWS\Fonts;C:\Users\vulca\AppData\Local\Microsoft\Windows\Fonts" -dNEWPDF=true -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -dFirstPage=8 -dLastPage=8 -sDEVICE=png16m -dDownScaleFactor=1 "-sOutputFile=C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_15_521637500_3928551793.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_0_521612437_3271843913_opt.pdf"
----------------
================
================
"C:\Program Files\PDF24\tesseract\tesseract.exe" "--tessdata-dir" "C:\Users\vulca\AppData\Local\PDF24\tesseract\5.3.0\tessdata" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_15_521637500_3928551793.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_16_521638250_458412466" "-l" "eng" "-c" "textonly_pdf=1GPL Ghostscript 10.03.1 (2024-05-02)
Copyright (C) 2024 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 9 through 9.
Page 9
OCR page 9
Error:warning: ignoring zlib error: incorrect data check
Error:warning: ignoring zlib error: incorrect data check
GPL Ghostscript 10.03.1 (2024-05-02)
Copyright (C) 2024 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 10 through 10.
Page 10
OCR page 10
Error:warning: ignoring zlib error: incorrect data check
Error:warning: ignoring zlib error: incorrect data check
GPL Ghostscript 10.03.1 (2024-05-02)
Copyright (C) 2024 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 11 through 11.
Page 11
OCR page 11
Error:warning: ignoring zlib error: incorrect data check
Error:warning: ignoring zlib error: incorrect data check
GPL Ghostscript 10.03.1 (2024-05-02)
Copyright (C) 2024 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 12 through 12.
Page 12
OCR page 12
Error:warning: ignoring zlib error: incorrect data check
Error:warning: ignoring zlib error: incorrect data check
" "--dpi" "300" "--oem" "3" "--psm" "1" "pdf" "txt"
----------------
================
================
"C:\Program Files\PDF24\gs\bin\gswinc.exe" -dBATCH -dNOPAUSE -dSAFER -dALLOWPSTRANSPARENCY "-sFONTPATH=C:\WINDOWS\Fonts;C:\Users\vulca\AppData\Local\Microsoft\Windows\Fonts" -dNEWPDF=true -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -dFirstPage=9 -dLastPage=9 -sDEVICE=png16m -dDownScaleFactor=1 "-sOutputFile=C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_17_521640140_3230207094.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_0_521612437_3271843913_opt.pdf"
----------------
================
================
"C:\Program Files\PDF24\tesseract\tesseract.exe" "--tessdata-dir" "C:\Users\vulca\AppData\Local\PDF24\tesseract\5.3.0\tessdata" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_17_521640140_3230207094.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_18_521640906_576005624" "-l" "eng" "-c" "textonly_pdf=1" "--dpi" "300" "--oem" "3" "--psm" "1" "pdf" "txt"
----------------
================
================
"C:\Program Files\PDF24\gs\bin\gswinc.exe" -dBATCH -dNOPAUSE -dSAFER -dALLOWPSTRANSPARENCY "-sFONTPATH=C:\WINDOWS\Fonts;C:\Users\vulca\AppData\Local\Microsoft\Windows\Fonts" -dNEWPDF=true -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -dFirstPage=10 -dLastPage=10 -sDEVICE=png16m -dDownScaleFactor=1 "-sOutputFile=C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_19_521642812_2030004272.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_0_521612437_3271843913_opt.pdf"
----------------
================
================
"C:\Program Files\PDF24\tesseract\tesseract.exe" "--tessdata-dir" "C:\Users\vulca\AppData\Local\PDF24\tesseract\5.3.0\tessdata" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_19_521642812_2030004272.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_20_521643515_4149302819" "-l" "eng" "-c" "textonly_pdf=1" "--dpi" "300" "--oem" "3" "--psm" "1" "pdf" "txt"
----------------
================
================
"C:\Program Files\PDF24\gs\bin\gswinc.exe" -dBATCH -dNOPAUSE -dSAFER -dALLOWPSTRANSPARENCY "-sFONTPATH=C:\WINDOWS\Fonts;C:\Users\vulca\AppData\Local\Microsoft\Windows\Fonts" -dNEWPDF=true -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -dFirstPage=11 -dLastPage=11 -sDEVICE=png16m -dDownScaleFactor=1 "-sOutputFile=C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_21_521645515_3442746883.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_0_521612437_3271843913_opt.pdf"
----------------
================
================
"C:\Program Files\PDF24\tesseract\tesseract.exe" "--tessdata-dir" "C:\Users\vulca\AppData\Local\PDF24\tesseract\5.3.0\tessdata" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_21_521645515_3442746883.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_22_521646218_3074943341" "-l" "eng" "-c" "textonly_pdf=1" "--dpi" "300" "--oem" "3" "--psm" "1" "pdf" "txt"
----------------
================
================
"C:\Program Files\PDF24\gs\bin\gswinc.exe" -dBATCH -dNOPAUSE -dSAFER -dALLOWPSTRANSPARENCY "-sFONTPATH=C:\WINDOWS\Fonts;C:\Users\vulca\AppData\Local\Microsoft\Windows\Fonts" -dNEWPDF=true -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -dFirstPage=12 -dLastPage=12 -sDEVICE=png16m -dDownScaleFactor=1 "-sOutputFile=C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_23_521648062_516607773.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_0_521612437_3271843913_opt.pdf"
----------------
================
================
"C:\Program Files\PDF24\tesseract\tesseract.exe" "--tessdata-dir" "C:\Users\vulca\AppData\Local\PDF24\tesseract\5.3.0\tessdata" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_23_521648062_516607773.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_24_521648812_3910480455" "-l" "eng" "-c" "textonly_pdf=1" "--dpi" "300" "--oem" "3" "--psm" "1" "pdf" "txt"
----------------
================
================
"C:\Program Files\PDF24\gs\bin\gswinc.exe" -dBATCH -dNOPAUSE -dSAFER -dALLOWPSTRANSPARENCY "-sFONTPATH=C:\WINDOWS\Fonts;C:\Users\vulca\AppData\Local\Microsoft\Windows\Fonts" -dNEWPDF=true -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -dFirstPage=13 -dLastPage=13 -sDEVICE=png16m -dDownScaleFactor=1 "-sOutputFGPL Ghostscript 10.03.1 (2024-05-02)
Copyright (C) 2024 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 13 through 13.
Page 13
OCR page 13
Error:warning: ignoring zlib error: incorrect data check
Error:warning: ignoring zlib error: incorrect data check
GPL Ghostscript 10.03.1 (2024-05-02)
Copyright (C) 2024 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 14 through 14.
Page 14
OCR page 14
Error:warning: ignoring zlib error: incorrect data check
Error:warning: ignoring zlib error: incorrect data check
GPL Ghostscript 10.03.1 (2024-05-02)
Copyright (C) 2024 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 15 through 15.
Page 15
OCR page 15
Error:warning: ignoring zlib error: incorrect data check
Error:warning: ignoring zlib error: incorrect data check
GPL Ghostscript 10.03.1 (2024-05-02)
Copyright (C) 2024 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 16 through 16.
Page 16
OCR page 16
Error:warning: ignoring zlib error: incorrect data check
Error:warning: ignoring zlib error: incorrect data check
ile=C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_25_521650796_1918327845.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_0_521612437_3271843913_opt.pdf"
----------------
================
================
"C:\Program Files\PDF24\tesseract\tesseract.exe" "--tessdata-dir" "C:\Users\vulca\AppData\Local\PDF24\tesseract\5.3.0\tessdata" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_25_521650796_1918327845.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_26_521651500_2346727403" "-l" "eng" "-c" "textonly_pdf=1" "--dpi" "300" "--oem" "3" "--psm" "1" "pdf" "txt"
----------------
================
================
"C:\Program Files\PDF24\gs\bin\gswinc.exe" -dBATCH -dNOPAUSE -dSAFER -dALLOWPSTRANSPARENCY "-sFONTPATH=C:\WINDOWS\Fonts;C:\Users\vulca\AppData\Local\Microsoft\Windows\Fonts" -dNEWPDF=true -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -dFirstPage=14 -dLastPage=14 -sDEVICE=png16m -dDownScaleFactor=1 "-sOutputFile=C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_27_521653296_3034046508.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_0_521612437_3271843913_opt.pdf"
----------------
================
================
"C:\Program Files\PDF24\tesseract\tesseract.exe" "--tessdata-dir" "C:\Users\vulca\AppData\Local\PDF24\tesseract\5.3.0\tessdata" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_27_521653296_3034046508.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_28_521654031_3145676435" "-l" "eng" "-c" "textonly_pdf=1" "--dpi" "300" "--oem" "3" "--psm" "1" "pdf" "txt"
----------------
================
================
"C:\Program Files\PDF24\gs\bin\gswinc.exe" -dBATCH -dNOPAUSE -dSAFER -dALLOWPSTRANSPARENCY "-sFONTPATH=C:\WINDOWS\Fonts;C:\Users\vulca\AppData\Local\Microsoft\Windows\Fonts" -dNEWPDF=true -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -dFirstPage=15 -dLastPage=15 -sDEVICE=png16m -dDownScaleFactor=1 "-sOutputFile=C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_29_521655984_1101515219.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_0_521612437_3271843913_opt.pdf"
----------------
================
================
"C:\Program Files\PDF24\tesseract\tesseract.exe" "--tessdata-dir" "C:\Users\vulca\AppData\Local\PDF24\tesseract\5.3.0\tessdata" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_29_521655984_1101515219.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_30_521656578_1760887125" "-l" "eng" "-c" "textonly_pdf=1" "--dpi" "300" "--oem" "3" "--psm" "1" "pdf" "txt"
----------------
================
================
"C:\Program Files\PDF24\gs\bin\gswinc.exe" -dBATCH -dNOPAUSE -dSAFER -dALLOWPSTRANSPARENCY "-sFONTPATH=C:\WINDOWS\Fonts;C:\Users\vulca\AppData\Local\Microsoft\Windows\Fonts" -dNEWPDF=true -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r300 -dFirstPage=16 -dLastPage=16 -sDEVICE=png16m -dDownScaleFactor=1 "-sOutputFile=C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_31_521658156_3219498737.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_0_521612437_3271843913_opt.pdf"
----------------
================
================
"C:\Program Files\PDF24\tesseract\tesseract.exe" "--tessdata-dir" "C:\Users\vulca\AppData\Local\PDF24\tesseract\5.3.0\tessdata" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_31_521658156_3219498737.png" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_32_521658718_287257631" "-l" "eng" "-c" "textonly_pdf=1" "--dpi" "300" "--oem" "3" "--psm" "1" "pdf" "txt"
----------------
TESS> Detected 20 diacritics
================
Auto Rotating PDF Pages
================
"C:\Program Files\PDF24\jre\bin\java.exe" -cp "C:\Program Files\PDF24\lib\jar\*" -Dwindows.acp=1252 -XX:MaxRAMPercentage=80 "org.pdf24.PdfPagesAutoRotator" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_33_521659859_1356719856_ocred.pdf" "C:\Users\vulca\AppData\Local\Temp\PDF24\ocr_34_521659906_1668589547_rot.pdf"
----------------
AUTOROT> PDF24 PDF Pages Auto Rotator
AUTOROT>
AUTOROT> Jun 17, 2024 6:26:06 PM org.apache.fontbox.ttf.PostScriptTable read
AUTOROT> WARNING: No PostScript name data is provided for the font null
AUTOROT> Jun 17, 2024 6:26:06 PM org.apache.fontbox.ttf.PostScriptTable read
AUTOROT> WARNING: No PostScript name data is provided for the font null
AUTOROT> Jun 17, 2024 6:26:06 PM org.apache.fontbox.ttf.PostScriptTable read
AUTOROT> WARNING: No PostScript name data is provided for the font null
AUTOROT> Jun 17, 2024 6:26:06 PM org.apache.fontbox.ttf.PostScriptTable read
AUTOROT> WARNING: No PostScript name data is provided for the font null
AUTOROT> Jun 17, 2024 6:26:06 PM org.apache.fontbox.ttf.PostScriptTable read
AUTOROT> WARNING: No PostScript name data is provided for the font null
AUTOROT> Jun 17, 2024 6:26:06 PM org.apache.fontbox.ttf.PostScriptTable read
AUTOROT> WARNING: No PostScript name data is provided for the font null
AUTOROT> Jun 17, 2024 6:26:06 PM org.apache.fontbox.ttf.PostScriptTable read
AUTOROT> WARNING: No PostScript name data is provided for the font null
AUTOROT> Jun 17, 2024 6:26:06 PM org.apache.fontbox.ttf.PostScriptTable read
AUTOROT> WARNING: No PostScript name data is provided for the font null
AUTOROT> Jun 17, 2024 6:26:06 PM org.apache.fontbox.ttf.PostScriptTable read
AUTOROT> WARNING: No PostScript name data is provided for the font null
AUTOROT> Jun 17, 2024 6:26:06 PM org.apache.fontbox.ttf.PostScriptTable read
AUTOROT> WARNING: No PostScript name data is provided for the font null
AUTOROT> Jun 17, 2024 6:26:06 PM org.apache.fontbox.ttf.PostScriptTable read
AUTOROT> WARNING: No PostScript name data is provided for the font null
AUTOROT> Jun 17, 2024 6:26:06 PM org.apache.fontbox.ttf.PostScriptTable read
AUTOROT> WARNING: No PostScript name data is provided for the font null
AUTOROT> Jun 17, 2024 6:26:06 PM org.apache.fontbox.ttf.PostScriptTable read
AUTOROT> WARNING: No PostScript name data is provided for the font null
AUTOROT> Jun 17, 2024 6:26:06 PM org.apache.fontbox.ttf.PostScriptTable read
AUTOROT> WARNING: No PostScript name data is provided for the font null
AUTOROT> Jun 17, 2024 6:26:06 PM org.apache.fontbox.ttf.PostScriptTable read
AUTOROT> WARNING: No PostScript name data is provided for the font null
AUTOROT> Jun 17, 2024 6:26:06 PM org.apache.fontbox.ttf.PostScriptTable read
AUTOROT> WARNING: No PostScript name data is provided for the font null
================
-->
GPL Ghostscript 10.03.1 (2024-05-02)
Copyright (C) 2024 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 1 through 1.
Page 1
Error:warning: ignoring zlib error: incorrect data check
OCR page 1
Error:warning: ignoring zlib error: incorrect data check
Error:warning: ignoring zlib error: incorrect data check
================
"C:\Program Files\PDF24\tesseract\tesseract.exe" -v
----------------
================
Optimizing PDF
================
I recently installed the current version in an update (shown in the above log), but the prior version had this problem as well.
Any hints or ideas would be greatly appreciated.
What immediately strikes me here is the following:
Error:warning: ignoring zlib error: incorrect data check
Seems that some data objects have deflate encoding issues..
can you send me such a file to forum@pdf24.org?
Thank you Stefan. I have e-mailed 2 files which OCR just fine, and 2 files that become corrupted. - Lee