PDFBox 也解析带有空白字段的 PDF 表格数据-解网

问：

需要：读取PDF文档中的表格数据，同时使用空白列数据

问题：如果特定列没有数据，它会跳过该列，而不是在控制台上解析数据和输出时放置空格。所以在输出中，我不知道它是买入数量还是卖出数量

研究：

问题与此链接类似，但那里没有解决方案。
我也在类似的问题上检查了这个链接，但对我的没有帮助

查询：任何关于什么设置的指针都将帮助我根据需要提取带有空格的数据

注意：我无法共享实际的 PDF 文档，因为它包含敏感数据。

PDF 文档中的表格数据：

订货号	时间	购买数量	卖出数量	价格
1234567	09:23:45	250		354.4
1234589	13:38:10	300		400
1234677	14:28:15		100	980
1234722	15:45:50		265	770

目前输出：

"1234567 09:23:45 250 354.4"

"1234589 13:38:10 300 400"

"1234677 14:28:15 100 980"

"1234722 15:45:50 265 770"

所需输出：（空字段为空白）

"1234567 09:23:45 250   354.4  "

"1234589 13:38:10 300   400  "

"1234677 14:28:15    100 980  "

"1234722 15:45:50    265 770  "

法典：

String pdfText = "";
try {
       File file =  new File("C:\\data.pdf");
       PDDocument pdDocument = Loader.loadPDF(file);
       // Tried the below option as well, but does not help
       pdfStripper.setSortByPosition(true);
       pdfText = pdfStripper.getText(pdDocument);
       pdDocument.close();
      } catch (IOException e) {
          e.printStackTrace();
      }
    
String[] arrayPDF = pdfText.split("\\r?\\n");
for (String data : arrayPDF) {
                System.out.println(data);
            }

Java PDF框

PDFBox 也解析带有空白字段的 PDF 表格数据

PDFBox parse PDF table data with blank fields as well

评论