俄罗斯方块数组-解网

问：

请考虑以下数组：

/www/htdocs/1/sites/lib/abcdedd
/www/htdocs/1/sites/conf/xyz
/www/htdocs/1/sites/conf/abc/def
/www/htdocs/1/sites/htdocs/xyz
/www/htdocs/1/sites/lib2/abcdedd

检测公共基本路径的最短和最优雅的方法是什么 - 在这种情况下

/www/htdocs/1/sites/

并将其从数组中的所有元素中删除？

lib/abcdedd
conf/xyz
conf/abc/def
htdocs/xyz
lib2/abcdedd

PHP 字符串算法

4赞 Richard Knop 7/19/2010

这可能值得一试：en.wikibooks.org/wiki/Algorithm_implementation/Strings/...... （我试过了，它有效）。

1赞 Pekka 7/19/2010

哇！如此多的精彩投入。我将拿一个来解决我手头的问题，但我觉得要真正选择一个合理的接受答案，我必须比较解决方案。我可能需要一段时间才能做到这一点，但我肯定会的。

0赞 The Surrican 1/20/2011

有趣的标题:D顺便说一句：为什么我在提名版主名单上找不到你？@Pekka

2赞 Gordon 6/19/2012

两年没有被接受的答案？

1赞 Camilo Martin 4/19/2013

@Pekka 自从这还没有被接受的答案以来，已经接近三年了:(这是一个很棒的标题，我刚才想起了它，并在谷歌上搜索了“tetrising an array”。

答：

10赞 Sjoerd 7/18/2010 #1

$common = PHP_INT_MAX;
foreach ($a as $item) {
        $common = min($common, str_common($a[0], $item, $common));
}

$result = array();
foreach ($a as $item) {
        $result[] = substr($item, $common);
}
print_r($result);

function str_common($a, $b, $max)
{
        $pos = 0;
        $last_slash = 0;
        $len = min(strlen($a), strlen($b), $max + 1);
        while ($pos < $len) {
                if ($a{$pos} != $b{$pos}) return $last_slash;
                if ($a{$pos} == '/') $last_slash = $pos;
                $pos++;
        }
        return $last_slash;
}

0赞 Gabe 7/18/2010

这是迄今为止发布的最佳解决方案，但需要改进。它没有考虑以前最长的公共路径（可能遍历了不必要的字符串），也没有考虑路径（所以 for 和它给出了最长的公共路径，而不是）。我（希望）都解决了。/usr/lib/usr/lib2/usr/lib/usr/

23赞 bragboy 7/18/2010 #2

将它们加载到尝试数据结构中。从父节点开始，查看哪个节点的子节点计数大于 1。找到该魔术节点后，只需拆除父节点结构并将当前节点作为根节点即可。

10赞 Ben Schwehn 7/18/2010

将数据加载到您描述的三树结构中的操作是否包括查找最长公共前缀的算法，从而使实际使用树结构变得不必要？也就是说，当你在构建树时可以检测到这一点时，为什么要检查树上的多个子项。那为什么是一棵树呢？我的意思是，如果你已经从一个数组开始。如果您可以将存储更改为仅使用 trie 而不是数组，我想这是有道理的。

2赞 starblue 7/25/2010

我认为，如果您小心，那么我的解决方案比构建尝试更有效。

0赞 Ari Ronen 8/8/2010

这个答案是错误的。在我的答案和其他答案中发布了 O（n）的微不足道的解决方案。

0赞 Billy ONeal 5/9/2012

@el.pescado：尝试的大小为四分位数，在最坏的情况下，源字符串的长度为四倍。

1赞 Brendan Bullen 7/18/2010 #3

我将基于 / 的值，然后用于检测公共元素并确保它们在数组中具有正确的相应索引。生成的数组可以重新组合以生成公共路径。explodearray_intersect_assoc

function getCommonPath($pathArray)
{
    $pathElements = array();

    foreach($pathArray as $path)
    {
        $pathElements[] = explode("/",$path);
    }

    $commonPath = $pathElements[0];

    for($i=1;$i<count($pathElements);$i++)
    {
        $commonPath = array_intersect_assoc($commonPath,$pathElements[$i]);
    }

    if(is_array($commonPath) return implode("/",$commonPath);
    else return null;
}

function removeCommonPath($pathArray)
{
    $commonPath = getCommonPath($pathArray());

    for($i=0;$i<count($pathArray);$i++)
    {
        $pathArray[$i] = substr($pathArray[$i],str_len($commonPath));
    }

    return $pathArray;
}

这是未经测试的，但是，这个想法是，数组只包含路径的元素，这些元素包含在已与之比较的所有路径数组中。当循环完成时，我们只需将其与 / 重新组合即可获得 true$commonPath$commonPath

更新正如 Felix Kling 所指出的，不会考虑具有共同元素但顺序不同的路径......为了解决这个问题，我用array_intersectarray_intersect_assocarray_intersect

更新添加了代码以从数组中删除公共路径（或俄罗斯方块！

0赞 Felix Kling 7/18/2010

这可能行不通。考虑和 .相同的元素，不同的路径。/a/b/c/d/d/c/b/a

0赞 Brendan Bullen 7/18/2010

@Felix Kling，我已经更新为使用array_intersect_assoc，它还执行索引检查

3赞 Felix Kling 7/18/2010 #4

一种幼稚的方法是将路径分解，并连续比较数组中的每个元素。因此，例如，第一个元素在所有数组中都是空的，因此它将被删除，下一个元素将是，它在所有数组中都是相同的，因此它被删除，等等。/www

类似的东西（~~未经测试~~)

$exploded_paths = array();

foreach($paths as $path) {
    $exploded_paths[] = explode('/', $path);
}

$equal = true;
$ref = &$exploded_paths[0]; // compare against the first path for simplicity

while($equal) {   
    foreach($exploded_paths as $path_parts) {
        if($path_parts[0] !== $ref[0]) {
            $equal = false;
            break;
        }
    }
    if($equal) {
        foreach($exploded_paths as &$path_parts) {
            array_shift($path_parts); // remove the first element
        }
    }
}

之后，你只需要再次内爆元素：$exploded_paths

function impl($arr) {
    return '/' . implode('/', $arr);
}
$paths = array_map('impl', $exploded_paths);

这给了我：

Array
(
    [0] => /lib/abcdedd
    [1] => /conf/xyz
    [2] => /conf/abc/def
    [3] => /htdocs/xyz
    [4] => /conf/xyz
)

这可能不能很好地扩展;)

2赞 Mark Baker 7/18/2010 #5

$values = array('/www/htdocs/1/sites/lib/abcdedd',
                '/www/htdocs/1/sites/conf/xyz',
                '/www/htdocs/1/sites/conf/abc/def',
                '/www/htdocs/1/sites/htdocs/xyz',
                '/www/htdocs/1/sites/lib2/abcdedd'
);


function splitArrayValues($r) {
    return explode('/',$r);
}

function stripCommon($values) {
    $testValues = array_map('splitArrayValues',$values);

    $i = 0;
    foreach($testValues[0] as $key => $value) {
        foreach($testValues as $arraySetValues) {
            if ($arraySetValues[$key] != $value) break 2;
        }
        $i++;
    }

    $returnArray = array();
    foreach($testValues as $value) {
        $returnArray[] = implode('/',array_slice($value,$i));
    }

    return $returnArray;
}


$newValues = stripCommon($values);

echo '<pre>';
var_dump($newValues);
echo '</pre>';

编辑我使用array_walk重建数组的原始方法的变体

$values = array('/www/htdocs/1/sites/lib/abcdedd',
                '/www/htdocs/1/sites/conf/xyz',
                '/www/htdocs/1/sites/conf/abc/def',
                '/www/htdocs/1/sites/htdocs/xyz',
                '/www/htdocs/1/sites/lib2/abcdedd'
);


function splitArrayValues($r) {
    return explode('/',$r);
}

function rejoinArrayValues(&$r,$d,$i) {
    $r = implode('/',array_slice($r,$i));
}

function stripCommon($values) {
    $testValues = array_map('splitArrayValues',$values);

    $i = 0;
    foreach($testValues[0] as $key => $value) {
        foreach($testValues as $arraySetValues) {
            if ($arraySetValues[$key] != $value) break 2;
        }
        $i++;
    }

    array_walk($testValues, 'rejoinArrayValues', $i);

    return $testValues;
}


$newValues = stripCommon($values);

echo '<pre>';
var_dump($newValues);
echo '</pre>';

编辑

最有效和最优雅的答案可能涉及从每个提供的答案中获取函数和方法

2赞 Artefacto 7/18/2010 #6

这样做的优点是没有线性时间复杂度;但是，在大多数情况下，排序绝对不会是花费更多时间的操作。

基本上，这里的聪明部分（至少我找不到它的缺点）是，在排序后，您只需要将第一条路径与最后一条路径进行比较。

sort($a);
$a = array_map(function ($el) { return explode("/", $el); }, $a);
$first = reset($a);
$last = end($a);
for ($eqdepth = 0; $first[$eqdepth] === $last[$eqdepth]; $eqdepth++) {}
array_walk($a,
    function (&$el) use ($eqdepth) {
        for ($i = 0; $i < $eqdepth; $i++) {
            array_shift($el);
        }
     });
$res = array_map(function ($el) { return implode("/", $el); }, $a);

35赞 starblue 7/18/2010 #7

编写一个将两个字符串作为输入的函数。然后以任意顺序将其应用于字符串，以将它们减少到其通用前缀。由于它是关联和交换的，因此顺序对结果无关紧要。longest_common_prefix

这与其他二进制运算相同，例如加法或最大公约数。

8赞 Milan Babuškov 7/18/2010

+1.比较前 2 个字符串后，使用结果（公共路径）与第 3 个字符串进行比较，依此类推。

0赞 KoolKabin 7/18/2010 #8

$arrMain = array(
            '/www/htdocs/1/sites/lib/abcdedd',
            '/www/htdocs/1/sites/conf/xyz',
            '/www/htdocs/1/sites/conf/abc/def',
            '/www/htdocs/1/sites/htdocs/xyz',
            '/www/htdocs/1/sites/lib2/abcdedd'
);
function explodePath( $strPath ){ 
    return explode("/", $strPath);
}

function removePath( $strPath)
{
    global $strCommon;
    return str_replace( $strCommon, '', $strPath );
}
$arrExplodedPaths = array_map( 'explodePath', $arrMain ) ;

//Check for common and skip first 1
$strCommon = '';
for( $i=1; $i< count( $arrExplodedPaths[0] ); $i++)
{
    for( $j = 0; $j < count( $arrExplodedPaths); $j++ )
    {
        if( $arrExplodedPaths[0][ $i ] !== $arrExplodedPaths[ $j ][ $i ] )
        {
            break 2;
        } 
    }
    $strCommon .= '/'.$arrExplodedPaths[0][$i];
}
print_r( array_map( 'removePath', $arrMain ) );

这很好用...与 Mark Baker 类似，但使用 str_replace

3赞 Gordon 7/18/2010 #9

好的，我不确定这是否防弹，但我认为它有效：

echo array_reduce($array, function($reducedValue, $arrayValue) {
    if($reducedValue === NULL) return $arrayValue;
    for($i = 0; $i < strlen($reducedValue); $i++) {
        if(!isset($arrayValue[$i]) || $arrayValue[$i] !== $reducedValue[$i]) {
            return substr($reducedValue, 0, $i);
        }
    }
    return $reducedValue;
});

这会将数组中的第一个值作为引用字符串。然后，它将遍历引用字符串，并将每个字符与同一位置的第二个字符串的字符进行比较。如果 char 不匹配，则引用字符串将缩短到 char 的位置，并比较下一个字符串。然后，该函数将返回最短的匹配字符串。

性能取决于给定的字符串。引用字符串越早变短，代码完成的速度就越快。不过，我真的不知道如何将其放入公式中。

我发现Artefacto对琴弦进行排序的方法提高了性能。添加

asort($array);
$array = array(array_shift($array), array_pop($array));

之前将显着提高性能。array_reduce

另请注意，这将返回最长的匹配初始子字符串，该子字符串更通用，但不会为您提供通用路径。你必须运行

substr($result, 0, strrpos($result, '/'));

在结果上。然后，您可以使用结果删除这些值

print_r(array_map(function($v) use ($path){
    return str_replace($path, '', $v);
}, $array));

这应该给出：

[0] => /lib/abcdedd
[1] => /conf/xyz/
[2] => /conf/abc/def
[3] => /htdocs/xyz
[4] => /lib2/abcdedd

欢迎反馈。

1赞 mario 7/18/2010 #10

如果仅从字符串比较的角度来看，则可以简化该问题。这可能比数组拆分更快：

$longest = $tetris[0];  # or array_pop()
foreach ($tetris as $cmp) {
        while (strncmp($longest+"/", $cmp, strlen($longest)+1) !== 0) {
                $longest = substr($longest, 0, strrpos($longest, "/"));
        }
}

0赞 Artefacto 7/18/2010

例如，对于这个集合数组（'/www/htdocs/1/sites/conf/abc/def'， '/www/htdocs/1/sites/htdocs/xyz'， '/www/htdocs/1/sitesjj/lib2/abcdedd'，）不起作用。

0赞 mario 7/18/2010

@Artefacto：你说得对。因此，我只是将其修改为在比较中始终包含尾部斜杠“/”。使其不含糊。

0赞 Richard Knop 7/19/2010 #11

可能太天真和菜鸟了，但它有效。我用过这个算法：

<?php

function strlcs($str1, $str2){
    $str1Len = strlen($str1);
    $str2Len = strlen($str2);
    $ret = array();

    if($str1Len == 0 || $str2Len == 0)
        return $ret; //no similarities

    $CSL = array(); //Common Sequence Length array
    $intLargestSize = 0;

    //initialize the CSL array to assume there are no similarities
    for($i=0; $i<$str1Len; $i++){
        $CSL[$i] = array();
        for($j=0; $j<$str2Len; $j++){
            $CSL[$i][$j] = 0;
        }
    }

    for($i=0; $i<$str1Len; $i++){
        for($j=0; $j<$str2Len; $j++){
            //check every combination of characters
            if( $str1[$i] == $str2[$j] ){
                //these are the same in both strings
                if($i == 0 || $j == 0)
                    //it's the first character, so it's clearly only 1 character long
                    $CSL[$i][$j] = 1; 
                else
                    //it's one character longer than the string from the previous character
                    $CSL[$i][$j] = $CSL[$i-1][$j-1] + 1; 

                if( $CSL[$i][$j] > $intLargestSize ){
                    //remember this as the largest
                    $intLargestSize = $CSL[$i][$j]; 
                    //wipe any previous results
                    $ret = array();
                    //and then fall through to remember this new value
                }
                if( $CSL[$i][$j] == $intLargestSize )
                    //remember the largest string(s)
                    $ret[] = substr($str1, $i-$intLargestSize+1, $intLargestSize);
            }
            //else, $CSL should be set to 0, which it was already initialized to
        }
    }
    //return the list of matches
    return $ret;
}


$arr = array(
'/www/htdocs/1/sites/lib/abcdedd',
'/www/htdocs/1/sites/conf/xyz',
'/www/htdocs/1/sites/conf/abc/def',
'/www/htdocs/1/sites/htdocs/xyz',
'/www/htdocs/1/sites/lib2/abcdedd'
);

// find the common substring
$longestCommonSubstring = strlcs( $arr[0], $arr[1] );

// remvoe the common substring
foreach ($arr as $k => $v) {
    $arr[$k] = str_replace($longestCommonSubstring[0], '', $v);
}
var_dump($arr);

输出：

array(5) {
  [0]=>
  string(11) "lib/abcdedd"
  [1]=>
  string(8) "conf/xyz"
  [2]=>
  string(12) "conf/abc/def"
  [3]=>
  string(10) "htdocs/xyz"
  [4]=>
  string(12) "lib2/abcdedd"
}

0赞 Richard Knop 7/23/2010

@Doomsday 我的回答中有一个维基百科的链接......在发表评论之前，请先尝试阅读它。

0赞 Jan Fabry 8/9/2010

我认为最终你只比较前两条路径。在您的示例中，这是可行的，但是如果您删除第一个路径，它将找到共同匹配项。此外，该算法会搜索从字符串中任意位置开始的子字符串，但对于这个问题，您知道可以从位置 0 开始，这使得它变得简单得多。/www/htdocs/1/sites/conf/

1赞 AKX 7/19/2010 #12

也许移植 Python 使用的算法会起作用？os.path.commonprefix(m)

def commonprefix(m):
    "Given a list of pathnames, returns the longest common leading component"
    if not m: return ''
    s1 = min(m)
    s2 = max(m)
    n = min(len(s1), len(s2))
    for i in xrange(n):
        if s1[i] != s2[i]:
            return s1[:i]
    return s1[:n]

也就是说，呃......类似的东西

function commonprefix($m) {
  if(!$m) return "";
  $s1 = min($m);
  $s2 = max($m);
  $n = min(strlen($s1), strlen($s2));
  for($i=0;$i<$n;$i++) if($s1[$i] != $s2[$i]) return substr($s1, 0, $i);
  return substr($s1, 0, $n);
}

之后，您只需使用公共前缀的长度作为起始偏移量来减去原始列表的每个元素。

3赞 Doomsday 7/22/2010 #13

您可以以最快的方式删除前缀，每个字符只读取一次：

function findLongestWord($lines, $delim = "/")
{
    $max = 0;
    $len = strlen($lines[0]); 

    // read first string once
    for($i = 0; $i < $len; $i++) {
        for($n = 1; $n < count($lines); $n++) {
            if($lines[0][$i] != $lines[$n][$i]) {
                // we've found a difference between current token
                // stop search:
                return $max;
            }
        }
        if($lines[0][$i] == $delim) {
            // we've found a complete token:
            $max = $i + 1;
        }
    }
    return $max;
}

$max = findLongestWord($lines);
// cut prefix of len "max"
for($n = 0; $n < count($lines); $n++) {
    $lines[$n] = substr(lines[$n], $max, $len);
}

0赞 Jan Fabry 8/9/2010

事实上，基于字符的比较将是最快的。所有其他解决方案都使用“昂贵”的运算符，这些运算符最终也会进行（多个）字符比较。甚至在圣约珥的经文中也提到了它！

1赞 rik 11/29/2010 #14

我会把我的帽子扔进擂台......

function longestCommonPrefix($a, $b) {
    $i = 0;
    $end = min(strlen($a), strlen($b));
    while ($i < $end && $a[$i] == $b[$i]) $i++;
    return substr($a, 0, $i);
}

function longestCommonPrefixFromArray(array $strings) {
    $count = count($strings);
    if (!$count) return '';
    $prefix = reset($strings);
    for ($i = 1; $i < $count; $i++)
        $prefix = longestCommonPrefix($prefix, $strings[$i]);
    return $prefix;
}

function stripPrefix(&$string, $foo, $length) {
    $string = substr($string, $length);
}

用法：

$paths = array(
    '/www/htdocs/1/sites/lib/abcdedd',
    '/www/htdocs/1/sites/conf/xyz',
    '/www/htdocs/1/sites/conf/abc/def',
    '/www/htdocs/1/sites/htdocs/xyz',
    '/www/htdocs/1/sites/lib2/abcdedd',
);

$longComPref = longestCommonPrefixFromArray($paths);
array_walk($paths, 'stripPrefix', strlen($longComPref));
print_r($paths);

1赞 acm 3/31/2011 #15

好吧，这里已经有一些解决方案，但是，仅仅因为它很有趣：

$values = array(
    '/www/htdocs/1/sites/lib/abcdedd',
    '/www/htdocs/1/sites/conf/xyz',
    '/www/htdocs/1/sites/conf/abc/def', 
    '/www/htdocs/1/sites/htdocs/xyz',
    '/www/htdocs/1/sites/lib2/abcdedd' 
);

function findCommon($values){
    $common = false;
    foreach($values as &$p){
        $p = explode('/', $p);
        if(!$common){
            $common = $p;
        } else {
            $common = array_intersect_assoc($common, $p);
        }
    }
    return $common;
}
function removeCommon($values, $common){
    foreach($values as &$p){
        $p = explode('/', $p);
        $p = array_diff_assoc($p, $common);
        $p = implode('/', $p);
    }

    return $values;
}

echo '<pre>';
print_r(removeCommon($values, findCommon($values)));
echo '</pre>';

输出：

Array
(
    [0] => lib/abcdedd
    [1] => conf/xyz
    [2] => conf/abc/def
    [3] => htdocs/xyz
    [4] => lib2/abcdedd
)

8赞 ircmaxell 9/20/2011 #16

好吧，考虑到您可以在这种情况下使用来查找字符串的公共部分。每当你对两个相同的字节进行异或时，你都会得到一个 null 字节作为输出。因此，我们可以利用它来发挥我们的优势：XOR

$first = $array[0];
$length = strlen($first);
$count = count($array);
for ($i = 1; $i < $count; $i++) {
    $length = min($length, strspn($array[$i] ^ $first, chr(0)));
}

在该单循环之后，变量将等于字符串数组之间最长的公共基部。然后，我们可以从第一个元素中提取公共部分：$length

$common = substr($array[0], 0, $length);

你有它。作为功能：

function commonPrefix(array $strings) {
    $first = $strings[0];
    $length = strlen($first);
    $count = count($strings);
    for ($i = 1; $i < $count; $i++) {
        $length = min($length, strspn($strings[$i] ^ $first, chr(0)));
    }
    return substr($first, 0, $length);
}

请注意，它确实使用了不止一次迭代，但这些迭代是在库中完成的，因此在解释型语言中，这将带来巨大的效率提升......

现在，如果你只想要完整的路径，我们需要截断到最后一个字符。所以：/

$prefix = preg_replace('#/[^/]*$', '', commonPrefix($paths));

现在，它可能会过度切割两根弦，例如和将被切割成 .但是，除了添加另一轮迭代来确定下一个字符是字符串末尾还是字符串末尾之外，我看不到解决这个问题的方法....../foo/bar/foo/bar/baz/foo/

上一个：如何避免 isset（）和 empty（）

下一个：确定跨浏览器图像的原始大小？

俄罗斯方块数组

Tetris-ing an array

评论

评论

评论

评论

评论

评论

评论

评论