任何框架函数都有助于找到多个字符串的最长公共起始子字符串?

Any Framework functions helping to find the longest common starting substring of multiple strings?

提问人:sbi 提问时间:9/21/2010 更新时间:12/26/2012 访问量:1574

问:

我有一个字符串列表(表示路径和),它们都应该有一个共同的开头(根路径)。我需要得到那个共同的开始。

这只是几行,但我有一种唠叨的感觉,它必须每年拼凑一百万次,并且框架中可能有一种算法可用于此,但找不到一些东西。
另外,我想以前在 SO 上有人问过这个问题,但我干巴巴的。

有什么提示吗?

c#

评论


答:

0赞 Tomas Petricek 9/21/2010 #1

简化实现的一个想法是,只编写一个方法来获取两个字符串中最长的子字符串,然后使用 LINQ 中的方法。像这样:Aggregate

strings.Skip(1).Aggregate(strings.First(), GetCommonSubString);

我不认为有任何优雅的方法来使用标准方法来处理字符串。如果你关心性能,那么你可能必须以“直接”的方式实现它。使用 LINQ 的较慢但较短的替代方法可能如下所示:GetCommonSubstring

var chars = 
  str1.Zip(str2, (c1, c2) => new { Match = c1 == c2, Char = c1 })
      .TakeWhile(c => c.Match).Select(c => c.Char).ToArray();
return new string(chars);

这首先“压缩”两个字符串,然后使用 .其余部分生成一个字符数组,可用于创建包含结果的字符串。TakeWhile

0赞 Jonas Elfström 9/21/2010 #2

也许我把你的问题简化得太简单了,但是呢

var rootPath = paths.Select(s => new {path = s, depth = s.Split('\\').Length}). Aggregate((memo, curr) => curr.depth < memo.depth ? curr : memo).path;

绝望,很可能是缓慢的,而且到处都是非常愚蠢的尝试

var paths = new List<string> { @"C:\Ruby19\lib\ruby\gems",
                               @"C:\Ruby19\lib\ruby\gems\1.9.2",
                               @"C:\Ruby19\lib\ruby\gems",
                               @"C:\Ruby19\lib\test\fest\hest"};

var rootPath = paths.Select(s => new { p = s.Split('\\') })
                    .Aggregate((memo, curr) => new { p = curr.p.TakeWhile((stp, ind) => stp == memo.p.ElementAtOrDefault(ind)).ToArray() })
                    .p.Join("\\");

=> rootPath = “C:\Ruby19\lib”

评论

0赞 sbi 9/22/2010
仅当最短路径为基本路径时,这才有效。
2赞 sbi 9/22/2010 #3

如果有人感兴趣,这是我想出的:

    public static string GetCommonStartingSubString(IList<string> strings)
    {
        if (strings.Count == 0)
            return "";
        if (strings.Count == 1)
            return strings[0];
        int charIdx = 0;
        while (IsCommonChar(strings, charIdx))
            ++charIdx;
        return strings[0].Substring(0, charIdx);
    }
    private static bool IsCommonChar(IList<string> strings, int charIdx)
    {
        if(strings[0].Length <= charIdx)
            return false;
        for (int strIdx = 1; strIdx < strings.Count; ++strIdx)
            if (strings[strIdx].Length <= charIdx 
             || strings[strIdx][charIdx] != strings[0][charIdx])
                return false;
        return true;
    }

评论

1赞 Rush Frisby 2/27/2013
例如:C:\Two 和 C:\Three ...此解决方案将返回 C:\t 而不是 C:\ - 因此您必须将输出修剪回最后一个 \。
1赞 Thomas Levesque 9/22/2010 #4

此方法应该有效:

string GetLongestCommonPrefix(IEnumerable<string> items)
{
    return items.Aggregate(default(string), GetLongestCommonPrefix);
}

string GetLongestCommonPrefix(string s1, string s2)
{
    if (s1 == null || s2 == null)
        return s1 ?? s2;

    int n = Math.Min(s1.Length, s2.Length);
    int i;
    for (i = 0; i < n; i++)
    {
        if (s1[i] != s2[i])
            break;
    }
    return s1.Substring(0, i);
}
0赞 Oliver 9/23/2010 #5

前段时间我遇到了同样的问题(和许多其他人一样)。这是我想出的解决方案。我没有进行任何性能测量,但我对 100 个元素的列表没有任何问题。

using System;
using System.Collections.Generic;
using System.Linq;

namespace FEV.TOPexpert.Common.Extensions
{
    public static class IEnumerableOfStringExtension
    {
        /// <summary>
        /// Finds the most common left string in a sequence of strings.
        /// </summary>
        /// <param name="source">The sequence to search in.</param>
        /// <returns>The most common left string in the sequence.</returns>
        public static string MostCommonLeftString(this IEnumerable<string> source)
        {
            return source.MostCommonLeftString(StringComparison.InvariantCulture);
        }

        /// <summary>
        /// Finds the most common left string in a sequence of strings.
        /// </summary>
        /// <param name="source">The sequence to search in.</param>
        /// <param name="comparisonType">Type of the comparison.</param>
        /// <returns>The most common left string in the sequence.</returns>
        public static string MostCommonLeftString(this IEnumerable<string> source, StringComparison comparisonType)
        {
            if (source == null)
                throw new ArgumentNullException("source");

            string mcs = String.Empty;

            using (var e = source.GetEnumerator())
            {
                if (!e.MoveNext())
                    return mcs;

                mcs = e.Current;
                while (e.MoveNext())
                    mcs = mcs.MostCommonLeftString(e.Current, comparisonType);
            }
            return mcs;
        }

        /// <summary>
        /// Returns a sequence with the most common left strings from a sequence of strings.
        /// </summary>
        /// <param name="source">A sequence of string to search through.</param>
        /// <returns>A sequence of the most common left strings ordered in descending order.</returns>
        public static IEnumerable<string> MostCommonLeftStrings(this IEnumerable<string> source)
        {
            return MostCommonLeftStrings(source, StringComparison.InvariantCulture);
        }

        /// <summary>
        /// Returns a sequence with the most common left strings from a sequence of strings.
        /// </summary>
        /// <param name="source">A sequence of string to search through.</param>
        /// <param name="comparisonType">Type of comparison.</param>
        /// <returns>A sequence of the most common left strings ordered in descending order.</returns>
        public static IEnumerable<string> MostCommonLeftStrings(this IEnumerable<string> source, StringComparison comparisonType)
        {
            if (source == null)
                throw new ArgumentNullException("source");

            var listOfMcs = new List<string>();

            using (var e = source.GetEnumerator())
            {
                while (e.MoveNext())
                {
                    if (e.Current == null)
                        continue;

                    string removeFromList = String.Empty;
                    string addToList = String.Empty;

                    foreach (var element in listOfMcs)
                    {
                        addToList = e.Current.MostCommonLeftString(element, comparisonType);

                        if (addToList.Length > 0)
                        {
                            removeFromList = element;
                            break;
                        }
                    }

                    if (removeFromList.Length <= 0)
                    {
                        listOfMcs.Add(e.Current);
                        continue;
                    }

                    if (addToList != removeFromList)
                    {
                        listOfMcs.Remove(removeFromList);
                        listOfMcs.Add(addToList);
                    }
                }
            }

            return listOfMcs.OrderByDescending(item => item.Length);
        }

        /// <summary>
        /// Returns a string that both strings have in common started from the left.
        /// </summary>
        /// <param name="first">The first string.</param>
        /// <param name="second">The second string.</param>
        /// <returns>Returns a string that both strings have in common started from the left.</returns>
        public static string MostCommonLeftString(this string first, string second)
        {
            return first.MostCommonLeftString(second, StringComparison.InvariantCulture);
        }

        /// <summary>
        /// Returns a string that both strings have in common started from the left.
        /// </summary>
        /// <param name="first">The first string.</param>
        /// <param name="second">The second string.</param>
        /// <param name="comparisonType">Type of comparison.</param>
        /// <returns>Returns a string that both strings have in common started from the left.</returns>
        public static string MostCommonLeftString(this string first, string second, StringComparison comparisonType)
        {
            if (first == null
                || second == null)
                return null;

            int length = Math.Min(first.Length, second.Length);
            first = first.Substring(0, length);
            second = second.Substring(0, length);

            while (!first.Equals(second, comparisonType))
            {
                first = first.Substring(0, first.Length - 1);
                second = second.Substring(0, second.Length - 1);
            }

            return first;
        }

        private static bool MatchesWithList(string match, IList<string> elements, StringComparison comparisonType)
        {
            string removeFromList = String.Empty;
            string addToList = String.Empty;

            foreach (var element in elements)
            {
                addToList = match.MostCommonLeftString(element, comparisonType);

                if (addToList.Length > 0)
                {
                    removeFromList = element;
                }
            }

            if (removeFromList.Length > 0)
            {
                if (addToList != removeFromList)
                {
                    elements.Remove(removeFromList);
                    elements.Add(addToList);
                }
                return true;
            }

            return false;
        }
    }
}
1赞 Kirk Broadhurst 9/23/2010 #6

请原谅我的普通变量命名,它不是很快,但这应该可以:

// your list of strings...
List<string> strings;    

string shortestString = strings.First(x => x.Length == 
    strings.Select(y => y.Length).Min());
while (!strings.All(s => s.StartsWith(shortestString)))
{
    shortestString = shortestString.Substring(0, shortestString.Length - 1);
}
0赞 cdiggins 12/26/2012 #7

下面返回任何一组(而不仅仅是字符串)的最长公共前缀。IEnumerable<T>

public static bool Same<T>(this IEnumerable<T> xs) {
  return !xs.Any() || !xs.Skip(!xs.Skip(1).All(x => x.Equals(xs.First()));
}

public static IEnumerable<T> CommonPrefix<T>(this IEnumerable<IEnumerable<T>> xss) {
  var r = new List<T>();
  var es = xss.Select(x => x.GetEnumerator()).ToList();
  while (es.Select(x => x.MoveNext()).All(x => x))
     if (!es.Select(x => x.Current).Same())
        return r;
     return r;
  }
}