使用 php pdfParser 通过坐标提取特定数据-解网

问：

我想从各种 pdf 中提取特定数据，每个 pdf 有 3-4 页。我不想解析所有内容（每个页面的所有文本），然后使用例如正则表达式来匹配我想要的数据。

所以我在查看文档，php pdfParser 有这个函数，它返回一个数组，它说（https://github.com/smalot/pdfparser/blob/master/doc/Usage.md$data = $pdf->getPages()[0]->getDataTm();You can extract transformation matrix (indexes 0-3) and x,y position of text objects (indexes 4,5).)

所以我试了一下，它返回了一个数组，其中包含我想要的所有数据，以及每个数据的坐标。

如果你愿意，这里有一个例子可以尝试一下。

require_once __DIR__ . '/vendor/autoload.php';
use Smalot\PdfParser\Parser;

$parser = new Parser();
$pdf = $parser->parseFile('pdfFile.pdf');

$data = $pdf->getPages()[0]->getDataTm();
print_r($data);

现在假设我有坐标，但我不知道如何使用它们来找到我想要的确切数据。我正在寻找一个函数的文档，您可以应用这样的坐标，以便从我的 pdf 中获得我想要的东西。但我什么也找不到。functionXYcoordinates("260", "120")

如果有人知道 pdfParser 中是否有这样的功能，请告诉我，或者如果您认为通过坐标提取数据是一件坏事，最好解析所有页面，然后使用正则表达式以匹配特定数据。

PHP 文本 PDF- PDF解析器

使用 php pdfParser 通过坐标提取特定数据

Extracting specific data via coordinates using php pdfParser

评论