从以关键字数组之一开头的文本中获取子字符串,并且子字符串不得包含第二个关键字

Get substrings from text which start with one of an array of keywords and the substring must not include a second keyword

提问人:WatcherD 提问时间:10/12/2023 最后编辑:mickmackusaWatcherD 更新时间:10/16/2023 访问量:100

问:

我想写一些接受两个参数和的函数。键是带有键的数组。$text$keys

在输出中,我们需要得到一个数组,其中键将是传递给函数的键(如果我们在文本中找到它们),值将是该键后面的文本,直到它遇到下一个键或文本结束。如果该键在文本中重复,则仅将最后一个值写入数组

例如:

可视化文本:Lorem Ipsum 只是印刷和两个排版行业的一个虚拟文本。自 1500 年代以来,Lorem Ipsum 一直是业界唯一的标准虚拟文本。

$text = 'Lorem Ipsum is simply one dummy text of the printing and  two typesetting industry. Lorem Ipsum has been the industry\'s one standard dummy text ever since the three 1500s.';

$keys = ['one', 'two', 'three'];

期望输出:

[
    'one' => 'standard dummy text ever since the',
    'two' => 'typesetting industry. Lorem Ipsum has been the industry\'s',
    'three' => '1500s.'
]

我尝试编写一个正则表达式来应对此任务,但没有成功。

最后一次尝试:

function getKeyedSections($text, $keys) {
    $keysArray = explode(',', $keys);
    $pattern = '/(?:' . implode('|', array_map('preg_quote', $keysArray)) . '):\s*(.*?)(?=\s*(?:' . implode('|', array_map('preg_quote', $keysArray)) . '):\s*|\z)/s';
    preg_match_all($pattern, $text, $matches);

    $keyedSections = [];
    foreach ($keysArray as $key) {
        foreach ($matches[1] as $index => $value) {
            if (stripos($matches[0][$index], $key) !== false) {
                $keyedSections[trim($key)] = trim($value);
                break;
            }
        }
    }

    return $keyedSections;
}
php 数组字符串 preg-match-all 文本提取

评论

0赞 mickmackusa 10/13/2023
preg_quote()没有默认的分隔符参数 -- 您必须显式指定为模式的分隔符,以确保在输入字符串中对正斜杠进行转义。...或者,您可以将模式分隔符更改为(默认情况下是转义的)。/#preg_quote()

答:

1赞 mickmackusa 10/13/2023 #1

这是一种提取所有段的方法,该方法以任何键开始,并在任何键之前结束。该调用只是丢弃较早的匹配项以用于后续匹配项,并设置所需的关联结果。(演示preg_match_all()array_column())

$text = "Lorem Ipsum is simply one dummy text of the printing and  two typesetting industry. Lorem Ipsum has been the industry's one standard dummy text ever since the three 1500s.";

$keys = ['one', 'two', 'three'];

$escaped = implode('|', array_map('preg_quote', $keys));

preg_match_all('#\b(' . $escaped . ')\b\s*\K.*?(?=\s*(?:$|\b(?:' . $escaped . ')\b))#', $text, $m, PREG_SET_ORDER);

var_export(array_column($m, 0, 1));

输出:

array (
  'one' => 'standard dummy text ever since the',
  'two' => 'typesetting industry. Lorem Ipsum has been the industry\'s',
  'three' => '1500s.',
)
0赞 WatcherD 10/16/2023 #2

我找到了一个解决方案,但没有时间发布它。澄清一下,随机文本和输入中指定的任何键都是从前端发送给我的,用逗号分隔。以下是我处理它们的方式:

public function getKeysArray(string $text, string $stringKeys): array
{
    $keysForPattern  = $this->prepareKeys($stringKeys);

    $matches = [];
    preg_match_all("/\b($keysForPattern)\s+(.*?)(?=\s+(?:$keysForPattern)\s+|$)/i", $text, $matches);

    $keys = array_values(array_filter($matches[1]));
    $values = array_map('trim', array_values(array_filter($matches[2])));

    return array_combine($keys, $values);
}

public function prepareKeys(string $keys): string
{
    $keysArray = explode(',', $keys);

    $keysArray = array_map(function($key) {
        return trim($key);
    }, $keysArray);

    return implode('|', $keysArray);
}