提问人:Tanmay Sharma 提问时间:5/23/2023 更新时间:5/23/2023 访问量:297
PDFBox 印前检查解析器无法检测 PDF/A-1b 文件
PDFBox Preflight parser is not able to detect PDF/A-1b file
问:
我正在使用以下代码来检测文件是否为 PDF/A-1b 文件?
public boolean isPDF_A1BFile(File file) throws IOException {
PreflightParser parser = new PreflightParser(file);
parser.parse(Format.PDF_A1B);
PreflightDocument preflightDocument = parser.getPreflightDocument();
preflightDocument.validate();
ValidationResult validationResult = preflightDocument.getResult();
return validationResult.isValid(); //Return false in every case
}
但是无论文件是否是 PDF/A-1b,它总是返回错误。我正在使用这个pdf / a-1b文件。我已经在 acrobat 中使用了印前检查工具进行了验证,它说该文件符合 PDF/A-1b 标准。共享相同的屏幕截图 谁能告诉我我的代码有什么问题,或者我是否遗漏了什么?
另外,有什么方法可以检查文件是否符合 PDF/A-2B 标准?
答:
该文件被一些 PDF 应用程序所容忍,因为许多应用程序会修复此类差异,但 pdf 框检测到许多奇怪之处,我没有尝试花太多时间,但评论似乎可能有效,因此该文件可能不符合要求。
The file Doc1-withHelvetica-pdfa1b.pdf is not a valid PDF/A-1b file, error(s) :
1.2.2 : Body Syntax error, Expected 'EOL' before the endstream keyword at offset 32264 but found '101'
1.2.5 : Body Syntax error, Stream length is invalid [dic=COSDictionary{COSName{Length}:COSInt{8702};COSName{Subtype}:COSName{XML};COSName{Type}:COSName{Metadata};}; defined length=8702; actual length=8702, starting offset=23561
1.2.2 : Body Syntax error, Expected 'EOL' before the endstream keyword at offset 35134 but found '101'
1.2.5 : Body Syntax error, Stream length is invalid [dic=COSDictionary{COSName{Filter}:COSName{FlateDecode};COSName{Length}:COSInt{2574};COSName{N}:COSInt{3};COSName{Range}:COSArray{COSFloat{0.0};COSFloat{1.0};0;1065353216;0;1065353216;};}; defined length=2574; actual length=2574, starting offset=32559
1.2.2 : Body Syntax error, Expected 'EOL' before the endstream keyword at offset 1562 but found '101'
1.2.5 : Body Syntax error, Stream length is invalid [dic=COSDictionary{COSName{Filter}:COSName{FlateDecode};COSName{Length}:COSInt{202};}; defined length=202; actual length=202, starting offset=1359
1.2.2 : Body Syntax error, Expected 'EOL' before the endstream keyword at offset 4486 but found '101'
1.2.5 : Body Syntax error, Stream length is invalid [dic=COSDictionary{COSName{Alternate}:COSName{DeviceRGB};COSName{Filter}:COSName{FlateDecode};COSName{Length}:COSInt{2612};COSName{N}:COSInt{3};}; defined length=2612; actual length=2612, starting offset=1873
1.2.2 : Body Syntax error, Expected 'EOL' before the endstream keyword at offset 4640 but found '101'
1.2.5 : Body Syntax error, Stream length is invalid [dic=COSDictionary{COSName{Filter}:COSName{FlateDecode};COSName{Length}:COSInt{17};}; defined length=17; actual length=17, starting offset=4622
1.2.2 : Body Syntax error, Expected 'EOL' before the endstream keyword at offset 15067 but found '101'
1.2.5 : Body Syntax error, Stream length is invalid [dic=COSDictionary{COSName{Filter}:COSName{FlateDecode};COSName{Length}:COSInt{10342};COSName{Length1}:COSInt{27968};}; defined length=10342; actual length=10342, starting offset=4724
1.2.2 : Body Syntax error, Expected 'EOL' before the endstream keyword at offset 16081 but found '101'
1.2.5 : Body Syntax error, Stream length is invalid [dic=COSDictionary{COSName{Filter}:COSName{FlateDecode};COSName{Length}:COSInt{407};}; defined length=407; actual length=407, starting offset=15673
1.2.2 : Body Syntax error, Expected 'EOL' before the endstream keyword at offset 22792 but found '101'
1.2.5 : Body Syntax error, Stream length is invalid [dic=COSDictionary{COSName{Filter}:COSName{FlateDecode};COSName{Length}:COSInt{6627};COSName{Length1}:COSInt{15080};}; defined length=6627; actual length=6627, starting offset=16164
1.2.2 : Body Syntax error, Expected 'EOL' before the endstream keyword at offset 23435 but found '101'
1.2.5 : Body Syntax error, Stream length is invalid [dic=COSDictionary{COSName{Filter}:COSName{FlateDecode};COSName{Length}:COSInt{355};}; defined length=355; actual length=355, starting offset=23079
1.2.2 : Body Syntax error, Expected 'EOL' before the endstream keyword at offset 822 but found '101'
1.2.5 : Body Syntax error, Stream length is invalid [dic=COSDictionary{COSName{Filter}:COSName{FlateDecode};COSName{I}:COSInt{93};COSName{Length}:COSInt{85};COSName{S}:COSInt{39};}; defined length=85; actual length=85, starting offset=736
因此,从表面上看,我只是在 MuPDF 中使用“clean”重建文件,并在 PDF 框中重新运行以进行验证。
C:\Apps\PDF\inspectors\Apache\preflight-app-3.0.0-alpha3.jar Doc1-withHelvetica-pdfa1ba.pdf
文件 Doc1-withHelvetica-pdfa1ba.pdf 是有效的 PDF/A-1b 文件
但是,第 22 条军规,现在它在报告时未通过其他验证
PDF结构已损坏,但已修复。根据损坏的程度,理论上可能会丢失一些数据(尽管通常不太可能)。
因此,通过删除 PDF/A 兼容性来回收,并通过重新生成为 PDF/A 来查看问题所在,现在报告是 Calibri 至少有 1 个错误的字体定义(这并不奇怪,因为它以前是 word 文档打印输出。不明显的是,在行尾有一个流氓 Calibri 空间字符,其中包含 Helvetica Bold 并在删除时,然后报告其他问题,因此再次运行编辑器,最后删除所有糟粕,双方都同意不再有问题。
评论