提问人:WatcherD 提问时间:10/12/2023 最后编辑:mickmackusaWatcherD 更新时间:10/16/2023 访问量:100
从以关键字数组之一开头的文本中获取子字符串,并且子字符串不得包含第二个关键字
Get substrings from text which start with one of an array of keywords and the substring must not include a second keyword
问:
我想写一些接受两个参数和的函数。键是带有键的数组。$text
$keys
在输出中,我们需要得到一个数组,其中键将是传递给函数的键(如果我们在文本中找到它们),值将是该键后面的文本,直到它遇到下一个键或文本结束。如果该键在文本中重复,则仅将最后一个值写入数组
例如:
可视化文本:Lorem Ipsum 只是印刷和两个排版行业的一个虚拟文本。自 1500 年代以来,Lorem Ipsum 一直是业界唯一的标准虚拟文本。
$text = 'Lorem Ipsum is simply one dummy text of the printing and two typesetting industry. Lorem Ipsum has been the industry\'s one standard dummy text ever since the three 1500s.';
$keys = ['one', 'two', 'three'];
期望输出:
[
'one' => 'standard dummy text ever since the',
'two' => 'typesetting industry. Lorem Ipsum has been the industry\'s',
'three' => '1500s.'
]
我尝试编写一个正则表达式来应对此任务,但没有成功。
最后一次尝试:
function getKeyedSections($text, $keys) {
$keysArray = explode(',', $keys);
$pattern = '/(?:' . implode('|', array_map('preg_quote', $keysArray)) . '):\s*(.*?)(?=\s*(?:' . implode('|', array_map('preg_quote', $keysArray)) . '):\s*|\z)/s';
preg_match_all($pattern, $text, $matches);
$keyedSections = [];
foreach ($keysArray as $key) {
foreach ($matches[1] as $index => $value) {
if (stripos($matches[0][$index], $key) !== false) {
$keyedSections[trim($key)] = trim($value);
break;
}
}
}
return $keyedSections;
}
答:
1赞
mickmackusa
10/13/2023
#1
这是一种提取所有段的方法,该方法以任何键开始,并在任何键之前结束。该调用只是丢弃较早的匹配项以用于后续匹配项,并设置所需的关联结果。(演示preg_match_all()
array_column()
)
$text = "Lorem Ipsum is simply one dummy text of the printing and two typesetting industry. Lorem Ipsum has been the industry's one standard dummy text ever since the three 1500s.";
$keys = ['one', 'two', 'three'];
$escaped = implode('|', array_map('preg_quote', $keys));
preg_match_all('#\b(' . $escaped . ')\b\s*\K.*?(?=\s*(?:$|\b(?:' . $escaped . ')\b))#', $text, $m, PREG_SET_ORDER);
var_export(array_column($m, 0, 1));
输出:
array (
'one' => 'standard dummy text ever since the',
'two' => 'typesetting industry. Lorem Ipsum has been the industry\'s',
'three' => '1500s.',
)
0赞
WatcherD
10/16/2023
#2
我找到了一个解决方案,但没有时间发布它。澄清一下,随机文本和输入中指定的任何键都是从前端发送给我的,用逗号分隔。以下是我处理它们的方式:
public function getKeysArray(string $text, string $stringKeys): array
{
$keysForPattern = $this->prepareKeys($stringKeys);
$matches = [];
preg_match_all("/\b($keysForPattern)\s+(.*?)(?=\s+(?:$keysForPattern)\s+|$)/i", $text, $matches);
$keys = array_values(array_filter($matches[1]));
$values = array_map('trim', array_values(array_filter($matches[2])));
return array_combine($keys, $values);
}
public function prepareKeys(string $keys): string
{
$keysArray = explode(',', $keys);
$keysArray = array_map(function($key) {
return trim($key);
}, $keysArray);
return implode('|', $keysArray);
}
评论
preg_quote()
没有默认的分隔符参数 -- 您必须显式指定为模式的分隔符,以确保在输入字符串中对正斜杠进行转义。...或者,您可以将模式分隔符更改为(默认情况下是转义的)。/
#
preg_quote()