C 语言中的自然排序顺序#

Natural Sort Order in C#

提问人:Michael Kniskern 提问时间:10/30/2008 最后编辑:Michael Kniskern 更新时间:9/1/2022 访问量:69818

问:

有人有好的资源或在 C# 中为数组提供自然顺序排序的示例吗?我正在以我的排序实现接口。FileInfoIComparer

C# 排序 文件 natural-sort

评论


答:

164赞 Greg Beech 10/30/2008 #1

最简单的方法是 P/Invoke Windows 中的内置函数,并将其用作 your 中的比较函数:IComparer

[DllImport("shlwapi.dll", CharSet = CharSet.Unicode)]
private static extern int StrCmpLogicalW(string psz1, string psz2);

迈克尔·卡普兰(Michael Kaplan)在这里提供了一些示例来说明此功能的工作原理,以及为使Vista更直观地工作而进行的更改。此功能的优点是它将具有与运行它的 Windows 版本相同的行为,但这确实意味着它在 Windows 版本之间有所不同,因此您需要考虑这对您来说是否是一个问题。

因此,一个完整的实现应该是这样的:

[SuppressUnmanagedCodeSecurity]
internal static class SafeNativeMethods
{
    [DllImport("shlwapi.dll", CharSet = CharSet.Unicode)]
    public static extern int StrCmpLogicalW(string psz1, string psz2);
}

public sealed class NaturalStringComparer : IComparer<string>
{
    public int Compare(string a, string b)
    {
        return SafeNativeMethods.StrCmpLogicalW(a, b);
    }
}

public sealed class NaturalFileInfoNameComparer : IComparer<FileInfo>
{
    public int Compare(FileInfo a, FileInfo b)
    {
        return SafeNativeMethods.StrCmpLogicalW(a.Name, b.Name);
    }
}

评论

9赞 Chris Charabaruk 10/30/2008
很好的答案。注意:这不适用于 Win2000,对于那些仍在该操作系统上运行东西的少数人来说。另一方面,Kaplan 的博客和 MSDN 文档之间有足够的提示来创建类似的函数。
13赞 linquize 7/29/2012
这不是可移植的,仅适用于 Win32,但不适用于 Linux / MacOS / Silverlight / Windows Phone / Metro
24赞 Greg Beech 7/30/2012
@linquize - 他说 .NET 不是 Mono,所以 Linux/OSX 并不是一个真正的问题。2008 年发布此答案时,Windows Phone/Metro 并不存在。您多久在 Silverlight 中执行一次文件操作?因此,对于OP以及大多数其他人来说,这是一个合适的答案。无论如何,您可以自由地提供更好的答案;这就是这个网站的工作方式。
8赞 linquize 7/30/2012
这并不意味着原来的答案是错误的。我只是添加带有最新信息的其他信息
2赞 Joe Amenta 6/27/2015
仅供参考,如果您继承而不是实现,您将获得调用泛型方法的(非泛型)接口的内置实现,以便在使用该方法的 API 中使用。它基本上也是免费的:只需删除“I”并更改为 .与 和 相同。Comparer<T>IComparer<T>IComparerpublic int Compare(...)public override int Compare(...)IEqualityComparer<T>EqualityComparer<T>
13赞 Jonathan Gilbert 9/2/2009 #2

你确实需要小心 -- 我依稀记得读过 StrCmpLogicalW 或类似的东西,不是严格传递的,而且我观察到了。如果比较函数破坏了该规则,NET 的排序方法有时会陷入无限循环。

传递比较将始终报告,如果 a < b 和 b < c,则 a < c。存在一个函数,它执行自然排序顺序比较,但并不总是满足该标准,但我不记得它是 StrCmpLogicalW 还是其他东西。

评论

0赞 mhenry1384 9/1/2010
你有这个说法的证据吗?在谷歌上搜索之后,我找不到任何迹象表明这是真的。
3赞 thd 12/29/2010
我用 StrCmpLogicalW 体验过那些无限循环。
1赞 Jonathan Gilbert 5/9/2015
Visual Studio 反馈项 236900 不再存在,但下面是一个更新的反馈项,可以确认该问题:connect.microsoft.com/VisualStudio/feedback/details/774540/...它还给出了一个解决方法:有一个属性,它返回的对象可以为您提供对象。反过来,这些可以进行比较并保证传递性。CultureInfoCompareInfoSortKey
6赞 Wilka 9/23/2009 #3

再加上 Greg Beech 的答案(因为我刚刚一直在寻找),如果你想从 Linq 中使用它,你可以使用 that takes an .例如:OrderByIComparer

var items = new List<MyItem>();

// fill items

var sorted = items.OrderBy(item => item.Name, new NaturalStringComparer());
24赞 James McCormack 3/12/2010 #4

适用于 linq orderby 的纯 C# 解决方案:

http://zootfroot.blogspot.com/2009/09/natural-sort-compare-with-linq-orderby.html

public class NaturalSortComparer<T> : IComparer<string>, IDisposable
{
    private bool isAscending;

    public NaturalSortComparer(bool inAscendingOrder = true)
    {
        this.isAscending = inAscendingOrder;
    }

    #region IComparer<string> Members

    public int Compare(string x, string y)
    {
        throw new NotImplementedException();
    }

    #endregion

    #region IComparer<string> Members

    int IComparer<string>.Compare(string x, string y)
    {
        if (x == y)
            return 0;

        string[] x1, y1;

        if (!table.TryGetValue(x, out x1))
        {
            x1 = Regex.Split(x.Replace(" ", ""), "([0-9]+)");
            table.Add(x, x1);
        }

        if (!table.TryGetValue(y, out y1))
        {
            y1 = Regex.Split(y.Replace(" ", ""), "([0-9]+)");
            table.Add(y, y1);
        }

        int returnVal;

        for (int i = 0; i < x1.Length && i < y1.Length; i++)
        {
            if (x1[i] != y1[i])
            {
                returnVal = PartCompare(x1[i], y1[i]);
                return isAscending ? returnVal : -returnVal;
            }
        }

        if (y1.Length > x1.Length)
        {
            returnVal = 1;
        }
        else if (x1.Length > y1.Length)
        { 
            returnVal = -1; 
        }
        else
        {
            returnVal = 0;
        }

        return isAscending ? returnVal : -returnVal;
    }

    private static int PartCompare(string left, string right)
    {
        int x, y;
        if (!int.TryParse(left, out x))
            return left.CompareTo(right);

        if (!int.TryParse(right, out y))
            return left.CompareTo(right);

        return x.CompareTo(y);
    }

    #endregion

    private Dictionary<string, string[]> table = new Dictionary<string, string[]>();

    public void Dispose()
    {
        table.Clear();
        table = null;
    }
}

评论

3赞 mhenry1384 9/1/2010
该代码最终来自 codeproject.com/KB/recipes/NaturalComparer.aspx(不是面向 LINQ 的)。
3赞 James McCormack 9/1/2010
这篇博文将 Justin Jones (codeproject.com/KB/string/NaturalSortComparer.aspx) 归功于 IComparer,而不是 Pascal Ganaye。
1赞 Michael Parker 3/8/2014
需要注意的是,此解决方案忽略了空格,这与Windows的功能不同,并且不如下面的Matthew Horsley代码好。因此,您可能会得到 'string01', 'string 01', 'string 02', 'string02'(看起来很丑)。如果删除空格的剥离,它会向后排列字符串,即“string01”在“string 01”之前,这可能是可接受的,也可能是不可接受的。
0赞 Piotr Kula 8/12/2015
这适用于地址,即“1 Smith Rd”、“10 Smith Rd”、“2 Smith Rd”等 - 自然排序。是的!好东西!
0赞 jv-dev 6/3/2016
顺便说一句,我注意到(该链接页面上的评论似乎也表明)Type 参数 <T> 是完全不必要的。
38赞 J.D. 8/13/2011 #5

现有的实现看起来都不好,所以我编写了自己的实现。结果与现代版本的 Windows 资源管理器 (Windows 7/8) 使用的排序几乎相同。我看到的唯一区别是 1) 虽然 Windows 曾经(例如 XP)处理任何长度的数字,但现在它被限制为 19 位数字 - 我的是无限的,2) Windows 给出了与某些 Unicode 数字集不一致的结果 - 我的工作正常(尽管它不会对代理项对中的数字进行数字比较;Windows 也没有),以及 3) 如果不同类型的非主要排序权重出现在不同的部分(例如“e-1é”与“é1e-” -数字前后的部分有变音符号和标点符号权重差异)。

public static int CompareNatural(string strA, string strB) {
    return CompareNatural(strA, strB, CultureInfo.CurrentCulture, CompareOptions.IgnoreCase);
}

public static int CompareNatural(string strA, string strB, CultureInfo culture, CompareOptions options) {
    CompareInfo cmp = culture.CompareInfo;
    int iA = 0;
    int iB = 0;
    int softResult = 0;
    int softResultWeight = 0;
    while (iA < strA.Length && iB < strB.Length) {
        bool isDigitA = Char.IsDigit(strA[iA]);
        bool isDigitB = Char.IsDigit(strB[iB]);
        if (isDigitA != isDigitB) {
            return cmp.Compare(strA, iA, strB, iB, options);
        }
        else if (!isDigitA && !isDigitB) {
            int jA = iA + 1;
            int jB = iB + 1;
            while (jA < strA.Length && !Char.IsDigit(strA[jA])) jA++;
            while (jB < strB.Length && !Char.IsDigit(strB[jB])) jB++;
            int cmpResult = cmp.Compare(strA, iA, jA - iA, strB, iB, jB - iB, options);
            if (cmpResult != 0) {
                // Certain strings may be considered different due to "soft" differences that are
                // ignored if more significant differences follow, e.g. a hyphen only affects the
                // comparison if no other differences follow
                string sectionA = strA.Substring(iA, jA - iA);
                string sectionB = strB.Substring(iB, jB - iB);
                if (cmp.Compare(sectionA + "1", sectionB + "2", options) ==
                    cmp.Compare(sectionA + "2", sectionB + "1", options))
                {
                    return cmp.Compare(strA, iA, strB, iB, options);
                }
                else if (softResultWeight < 1) {
                    softResult = cmpResult;
                    softResultWeight = 1;
                }
            }
            iA = jA;
            iB = jB;
        }
        else {
            char zeroA = (char)(strA[iA] - (int)Char.GetNumericValue(strA[iA]));
            char zeroB = (char)(strB[iB] - (int)Char.GetNumericValue(strB[iB]));
            int jA = iA;
            int jB = iB;
            while (jA < strA.Length && strA[jA] == zeroA) jA++;
            while (jB < strB.Length && strB[jB] == zeroB) jB++;
            int resultIfSameLength = 0;
            do {
                isDigitA = jA < strA.Length && Char.IsDigit(strA[jA]);
                isDigitB = jB < strB.Length && Char.IsDigit(strB[jB]);
                int numA = isDigitA ? (int)Char.GetNumericValue(strA[jA]) : 0;
                int numB = isDigitB ? (int)Char.GetNumericValue(strB[jB]) : 0;
                if (isDigitA && (char)(strA[jA] - numA) != zeroA) isDigitA = false;
                if (isDigitB && (char)(strB[jB] - numB) != zeroB) isDigitB = false;
                if (isDigitA && isDigitB) {
                    if (numA != numB && resultIfSameLength == 0) {
                        resultIfSameLength = numA < numB ? -1 : 1;
                    }
                    jA++;
                    jB++;
                }
            }
            while (isDigitA && isDigitB);
            if (isDigitA != isDigitB) {
                // One number has more digits than the other (ignoring leading zeros) - the longer
                // number must be larger
                return isDigitA ? 1 : -1;
            }
            else if (resultIfSameLength != 0) {
                // Both numbers are the same length (ignoring leading zeros) and at least one of
                // the digits differed - the first difference determines the result
                return resultIfSameLength;
            }
            int lA = jA - iA;
            int lB = jB - iB;
            if (lA != lB) {
                // Both numbers are equivalent but one has more leading zeros
                return lA > lB ? -1 : 1;
            }
            else if (zeroA != zeroB && softResultWeight < 2) {
                softResult = cmp.Compare(strA, iA, 1, strB, iB, 1, options);
                softResultWeight = 2;
            }
            iA = jA;
            iB = jB;
        }
    }
    if (iA < strA.Length || iB < strB.Length) {
        return iA < strA.Length ? 1 : -1;
    }
    else if (softResult != 0) {
        return softResult;
    }
    return 0;
}

签名与委托匹配:Comparison<string>

string[] files = Directory.GetFiles(@"C:\");
Array.Sort(files, CompareNatural);

下面是一个包装类,用作:IComparer<string>

public class CustomComparer<T> : IComparer<T> {
    private Comparison<T> _comparison;

    public CustomComparer(Comparison<T> comparison) {
        _comparison = comparison;
    }

    public int Compare(T x, T y) {
        return _comparison(x, y);
    }
}

例:

string[] files = Directory.EnumerateFiles(@"C:\")
    .OrderBy(f => f, new CustomComparer<string>(CompareNatural))
    .ToArray();

以下是我用于测试的一组很好的文件名:

Func<string, string> expand = (s) => { int o; while ((o = s.IndexOf('\\')) != -1) { int p = o + 1;
    int z = 1; while (s[p] == '0') { z++; p++; } int c = Int32.Parse(s.Substring(p, z));
    s = s.Substring(0, o) + new string(s[o - 1], c) + s.Substring(p + z); } return s; };
string encodedFileNames =
    "KDEqLW4xMiotbjEzKjAwMDFcMDY2KjAwMlwwMTcqMDA5XDAxNyowMlwwMTcqMDlcMDE3KjEhKjEtISox" +
    "LWEqMS4yNT8xLjI1KjEuNT8xLjUqMSoxXDAxNyoxXDAxOCoxXDAxOSoxXDA2NioxXDA2NyoxYSoyXDAx" +
    "NyoyXDAxOCo5XDAxNyo5XDAxOCo5XDA2Nio9MSphMDAxdGVzdDAxKmEwMDF0ZXN0aW5nYTBcMzEqYTAw" +
    "Mj9hMDAyIGE/YTAwMiBhKmEwMDIqYTAwMmE/YTAwMmEqYTAxdGVzdGluZ2EwMDEqYTAxdnNmcyphMSph" +
    "MWEqYTF6KmEyKmIwMDAzcTYqYjAwM3E0KmIwM3E1KmMtZSpjZCpjZipmIDEqZipnP2cgMT9oLW4qaG8t" +
    "bipJKmljZS1jcmVhbT9pY2VjcmVhbT9pY2VjcmVhbS0/ajBcNDE/ajAwMWE/ajAxP2shKmsnKmstKmsx" +
    "KmthKmxpc3QqbTAwMDNhMDA1YSptMDAzYTAwMDVhKm0wMDNhMDA1Km0wMDNhMDA1YSpuMTIqbjEzKm8t" +
    "bjAxMypvLW4xMipvLW40P28tbjQhP28tbjR6P28tbjlhLWI1Km8tbjlhYjUqb24wMTMqb24xMipvbjQ/" +
    "b240IT9vbjR6P29uOWEtYjUqb245YWI1Km/CrW4wMTMqb8KtbjEyKnAwMCpwMDEqcDAxwr0hKnAwMcK9" +
    "KnAwMcK9YSpwMDHCvcK+KnAwMipwMMK9KnEtbjAxMypxLW4xMipxbjAxMypxbjEyKnItMDAhKnItMDAh" +
    "NSpyLTAwIe+8lSpyLTAwYSpyLe+8kFwxIS01KnIt77yQXDEhLe+8lSpyLe+8kFwxISpyLe+8kFwxITUq" +
    "ci3vvJBcMSHvvJUqci3vvJBcMWEqci3vvJBcMyE1KnIwMCEqcjAwLTUqcjAwLjUqcjAwNSpyMDBhKnIw" +
    "NSpyMDYqcjQqcjUqctmg2aYqctmkKnLZpSpy27Dbtipy27Qqctu1KnLfgN+GKnLfhCpy34UqcuClpuCl" +
    "rCpy4KWqKnLgpasqcuCnpuCnrCpy4KeqKnLgp6sqcuCppuCprCpy4KmqKnLgqasqcuCrpuCrrCpy4Kuq" +
    "KnLgq6sqcuCtpuCtrCpy4K2qKnLgrasqcuCvpuCvrCpy4K+qKnLgr6sqcuCxpuCxrCpy4LGqKnLgsasq" +
    "cuCzpuCzrCpy4LOqKnLgs6sqcuC1puC1rCpy4LWqKnLgtasqcuC5kOC5lipy4LmUKnLguZUqcuC7kOC7" +
    "lipy4LuUKnLgu5UqcuC8oOC8pipy4LykKnLgvKUqcuGBgOGBhipy4YGEKnLhgYUqcuGCkOGClipy4YKU" +
    "KnLhgpUqcuGfoOGfpipy4Z+kKnLhn6UqcuGgkOGglipy4aCUKnLhoJUqcuGlhuGljCpy4aWKKnLhpYsq" +
    "cuGnkOGnlipy4aeUKnLhp5UqcuGtkOGtlipy4a2UKnLhrZUqcuGusOGutipy4a60KnLhrrUqcuGxgOGx" +
    "hipy4bGEKnLhsYUqcuGxkOGxlipy4bGUKnLhsZUqcuqYoFwx6pilKnLqmKDqmKUqcuqYoOqYpipy6pik" +
    "KnLqmKUqcuqjkOqjlipy6qOUKnLqo5UqcuqkgOqkhipy6qSEKnLqpIUqcuqpkOqplipy6qmUKnLqqZUq" +
    "cvCQkqAqcvCQkqUqcvCdn5gqcvCdn50qcu+8kFwxISpy77yQXDEt77yVKnLvvJBcMS7vvJUqcu+8kFwx" +
    "YSpy77yQXDHqmKUqcu+8kFwx77yO77yVKnLvvJBcMe+8lSpy77yQ77yVKnLvvJDvvJYqcu+8lCpy77yV" +
    "KnNpKnPEsSp0ZXN02aIqdGVzdNmi2aAqdGVzdNmjKnVBZS0qdWFlKnViZS0qdUJlKnVjZS0xw6kqdWNl" +
    "McOpLSp1Y2Uxw6kqdWPDqS0xZSp1Y8OpMWUtKnVjw6kxZSp3ZWlhMSp3ZWlhMip3ZWlzczEqd2Vpc3My" +
    "KndlaXoxKndlaXoyKndlacOfMSp3ZWnDnzIqeSBhMyp5IGE0KnknYTMqeSdhNCp5K2EzKnkrYTQqeS1h" +
    "Myp5LWE0KnlhMyp5YTQqej96IDA1MD96IDIxP3ohMjE/ejIwP3oyMj96YTIxP3rCqTIxP1sxKl8xKsKt" +
    "bjEyKsKtbjEzKsSwKg==";
string[] fileNames = Encoding.UTF8.GetString(Convert.FromBase64String(encodedFileNames))
    .Replace("*", ".txt?").Split(new[] { "?" }, StringSplitOptions.RemoveEmptyEntries)
    .Select(n => expand(n)).ToArray();

评论

1赞 SOUser 2/19/2013
数字部分需要按部分进行比较,即“abc12b”应小于“abc123”。
0赞 SOUser 2/21/2013
You could try the following data: public string[] filenames = { "-abc12.txt", "abc12.txt", "1abc_2.txt", "a0000012.txt", "a0000012c.txt", "a000012.txt", "a000012b.txt", "a012.txt", "a0000102.txt", "abc1_2.txt", "abc12.txt", "abc12b.txt", "abc123.txt", "abccde.txt", "b0000.txt", "b00001.txt", "b0001.txt", "b001.txt", "c0000.txt", "c0000c.txt", "c00001.txt", "c000b.txt", "d0.20.2b.txt", "d0.1000c.txt", "d0.2000y.txt", "d0.20000.2b.txt", "e0000120000012", "e0000012000012"};
0赞 J.D. 2/21/2013
@XichenLi 感谢您提供良好的测试用例。如果让 Windows 资源管理器对这些文件进行排序,则会根据所使用的 Windows 版本获得不同的结果。我的代码对这些名称的排序与 Server 2003(大概是 XP)相同,但与 Windows 8 不同。如果有机会,我会尝试弄清楚 Windows 8 是如何做到的,并更新我的代码。
3赞 linquize 3/18/2013
有错误。索引超出范围
3赞 kuroki 12/8/2015
很棒的解决方案!当我在大约有 10,000 个文件的正常场景中对其进行基准测试时,它比 Matthew 的正则表达式示例更快,并且与 StrCmpLogicalW() 的性能大致相同。上面的代码中有一个小错误:“while (strA[jA] == zeroA) jA++;” 和 “while (strB[jB] == zeroB) jB++;” 应该是 “while (jA < strA.Length && strA[jA] == zeroA) jA++;” 和 “while (jB < strB.Length && strB[jB] == zeroB) jB++;”。否则,仅包含零的字符串将引发异常。
17赞 mpen 7/24/2012 #6

我的解决方案:

void Main()
{
    new[] {"a4","a3","a2","a10","b5","b4","b400","1","C1d","c1d2"}.OrderBy(x => x, new NaturalStringComparer()).Dump();
}

public class NaturalStringComparer : IComparer<string>
{
    private static readonly Regex _re = new Regex(@"(?<=\D)(?=\d)|(?<=\d)(?=\D)", RegexOptions.Compiled);

    public int Compare(string x, string y)
    {
        x = x.ToLower();
        y = y.ToLower();
        if(string.Compare(x, 0, y, 0, Math.Min(x.Length, y.Length)) == 0)
        {
            if(x.Length == y.Length) return 0;
            return x.Length < y.Length ? -1 : 1;
        }
        var a = _re.Split(x);
        var b = _re.Split(y);
        int i = 0;
        while(true)
        {
            int r = PartCompare(a[i], b[i]);
            if(r != 0) return r;
            ++i;
        }
    }

    private static int PartCompare(string x, string y)
    {
        int a, b;
        if(int.TryParse(x, out a) && int.TryParse(y, out b))
            return a.CompareTo(b);
        return x.CompareTo(y);
    }
}

结果:

1
a2
a3
a4
a10
b4
b5
b400
C1d
c1d2

评论

0赞 7/3/2016
我喜欢。它易于理解,不需要 Linq。
0赞 isanae 3/30/2021
这会抛出一个数组。IndexOutOfRangeException{"1", "01"}
1赞 teedyay 11/18/2021
是的,我们绝对需要而不是.然后,当比较“1”和“01”时,它会从“1”中掉下来,而且在比较“1”和“1 a”时也会掉下来,所以我们之后也需要考虑一下决胜局。while (i < a.Length && i < b.Length)while(true)while
1赞 Chuck Savage 1/14/2022
添加了一些完成,如其他评论中所述,以及在此要点中对文件的处理 - gist.github.com/ChuckSavage/e1c61839300abd2d90873b01da9d7575
85赞 Matthew Horsley 7/30/2012 #7

只是想我会补充一下(用我能找到的最简洁的解决方案):

public static IOrderedEnumerable<T> OrderByAlphaNumeric<T>(this IEnumerable<T> source, Func<T, string> selector)
{
    int max = source
        .SelectMany(i => Regex.Matches(selector(i), @"\d+").Cast<Match>().Select(m => (int?)m.Value.Length))
        .Max() ?? 0;

    return source.OrderBy(i => Regex.Replace(selector(i), @"\d+", m => m.Value.PadLeft(max, '0')));
}

上面将字符串中的任何数字填充到所有字符串中所有数字的最大长度,并使用生成的字符串进行排序。

强制转换为 () 是为了允许没有任何数字的字符串集合(在空的可枚举中抛出一个 )。int?.Max()InvalidOperationException

评论

1赞 Gene S 10/5/2012
+1 它不仅是最简洁的,而且是我见过的最快的。除了公认的答案,但由于机器依赖性,我不能使用那个答案。它在大约 35 秒内对超过 400 万个值进行了分类。
4赞 Ian Grainger 1/16/2014
这既美丽又无法阅读。我认为 Linq 的好处意味着(至少)最佳的平均性能和最佳情况的性能,所以我想我会选择它。尽管缺乏明确性。非常感谢@Matthew霍斯利
4赞 devzero 3/10/2016
这很好,但是某些十进制数存在错误,我的例子是 k8.11 与 k8.2 的排序。为了解决这个问题,我实现了以下正则表达式:\d+([\.,]\d)?
2赞 devzero 3/10/2016
在填充此代码时,您还需要考虑第二组的长度(小数点 + 小数) m.Value.PadLeft(max, '0')
4赞 Teejay 2/22/2018
我认为您可以使用而不是投射到 .此外,值得执行操作以避免重新枚举可枚举对象。.DefaultIfEmpty().Max()int?source.ToList()
0赞 Eric Liprandi 3/22/2013 #8

我们需要一种自然的排序来处理具有以下模式的文本:

"Test 1-1-1 something"
"Test 1-2-3 something"
...

出于某种原因,当我第一次查看 SO 时,我没有找到这篇文章并实现我们自己的帖子。与这里介绍的一些解决方案相比,虽然在概念上相似,但它的好处可能是更简单、更容易理解。但是,虽然我确实尝试查看性能瓶颈,但它的实现速度仍然比默认的要慢得多。OrderBy()

这是我实现的扩展方法:

public static class EnumerableExtensions
{
    // set up the regex parser once and for all
    private static readonly Regex Regex = new Regex(@"\d+|\D+", RegexOptions.Compiled | RegexOptions.Singleline);

    // stateless comparer can be built once
    private static readonly AggregateComparer Comparer = new AggregateComparer();

    public static IEnumerable<T> OrderByNatural<T>(this IEnumerable<T> source, Func<T, string> selector)
    {
        // first extract string from object using selector
        // then extract digit and non-digit groups
        Func<T, IEnumerable<IComparable>> splitter =
            s => Regex.Matches(selector(s))
                      .Cast<Match>()
                      .Select(m => Char.IsDigit(m.Value[0]) ? (IComparable) int.Parse(m.Value) : m.Value);
        return source.OrderBy(splitter, Comparer);
    }

    /// <summary>
    /// This comparer will compare two lists of objects against each other
    /// </summary>
    /// <remarks>Objects in each list are compare to their corresponding elements in the other
    /// list until a difference is found.</remarks>
    private class AggregateComparer : IComparer<IEnumerable<IComparable>>
    {
        public int Compare(IEnumerable<IComparable> x, IEnumerable<IComparable> y)
        {
            return
                x.Zip(y, (a, b) => new {a, b})              // walk both lists
                 .Select(pair => pair.a.CompareTo(pair.b))  // compare each object
                 .FirstOrDefault(result => result != 0);    // until a difference is found
        }
    }
}

这个想法是将原始字符串拆分为数字和非数字块()。由于这是一项可能代价高昂的任务,因此每个条目仅执行一次。然后,我们使用可比较对象的比较器(对不起,我找不到更合适的表达方式)。它将每个块与另一个字符串中的相应块进行比较。"\d+|\D+"

我想就如何改进以及主要缺陷是什么提供反馈。请注意,在这一点上,可维护性对我们很重要,我们目前没有在非常大的数据集中使用它。

评论

1赞 jimrandomh 10/30/2013
当它试图比较结构不同的字符串时,这会崩溃 - 例如,将“a-1”与“a-2”进行比较效果很好,但将“a”与“1”进行比较则不然,因为“a”。CompareTo(1) 抛出异常。
0赞 Eric Liprandi 11/9/2013
@jimrandomh,你是对的。这种方法是特定于我们的模式的。
26赞 Michael Parker 3/11/2014 #9

Matthews Horsleys 的答案是最快的方法,它不会根据运行程序的 Windows 版本而改变行为。但是,通过创建一次正则表达式并使用 RegexOptions.Compiled,它可以更快。我还添加了插入字符串比较器的选项,以便您可以在需要时忽略大小写,并稍微提高了可读性。

    public static IEnumerable<T> OrderByNatural<T>(this IEnumerable<T> items, Func<T, string> selector, StringComparer stringComparer = null)
    {
        var regex = new Regex(@"\d+", RegexOptions.Compiled);

        int maxDigits = items
                      .SelectMany(i => regex.Matches(selector(i)).Cast<Match>().Select(digitChunk => (int?)digitChunk.Value.Length))
                      .Max() ?? 0;

        return items.OrderBy(i => regex.Replace(selector(i), match => match.Value.PadLeft(maxDigits, '0')), stringComparer ?? StringComparer.CurrentCulture);
    }

使用者

var sortedEmployees = employees.OrderByNatural(emp => emp.Name);

对 100,000 个字符串进行排序需要 450 毫秒,而默认的 .net 字符串比较需要 300 毫秒 - 非常快!

评论

2赞 mungflesh 6/14/2016
这值得一读 - 正则表达式中的编译和重用
1赞 Voxpire 9/24/2014 #10

扩展了前面的几个答案并使用扩展方法,我提出了以下内容,这些方法没有潜在的多个可枚举的警告,也没有与使用多个正则表达式对象有关的性能问题,或者不必要地调用正则表达式,话虽如此,它确实使用了 ToList(),这可能会抵消较大集合中的好处。

选择器支持泛型类型以允许分配任何委托,源集合中的元素由选择器改变,然后使用 ToString() 转换为字符串。

    private static readonly Regex _NaturalOrderExpr = new Regex(@"\d+", RegexOptions.Compiled);

    public static IEnumerable<TSource> OrderByNatural<TSource, TKey>(
        this IEnumerable<TSource> source, Func<TSource, TKey> selector)
    {
        int max = 0;

        var selection = source.Select(
            o =>
            {
                var v = selector(o);
                var s = v != null ? v.ToString() : String.Empty;

                if (!String.IsNullOrWhiteSpace(s))
                {
                    var mc = _NaturalOrderExpr.Matches(s);

                    if (mc.Count > 0)
                    {
                        max = Math.Max(max, mc.Cast<Match>().Max(m => m.Value.Length));
                    }
                }

                return new
                {
                    Key = o,
                    Value = s
                };
            }).ToList();

        return
            selection.OrderBy(
                o =>
                String.IsNullOrWhiteSpace(o.Value) ? o.Value : _NaturalOrderExpr.Replace(o.Value, m => m.Value.PadLeft(max, '0')))
                     .Select(o => o.Key);
    }

    public static IEnumerable<TSource> OrderByDescendingNatural<TSource, TKey>(
        this IEnumerable<TSource> source, Func<TSource, TKey> selector)
    {
        int max = 0;

        var selection = source.Select(
            o =>
            {
                var v = selector(o);
                var s = v != null ? v.ToString() : String.Empty;

                if (!String.IsNullOrWhiteSpace(s))
                {
                    var mc = _NaturalOrderExpr.Matches(s);

                    if (mc.Count > 0)
                    {
                        max = Math.Max(max, mc.Cast<Match>().Max(m => m.Value.Length));
                    }
                }

                return new
                {
                    Key = o,
                    Value = s
                };
            }).ToList();

        return
            selection.OrderByDescending(
                o =>
                String.IsNullOrWhiteSpace(o.Value) ? o.Value : _NaturalOrderExpr.Replace(o.Value, m => m.Value.PadLeft(max, '0')))
                     .Select(o => o.Key);
    }
13赞 Picsonald 10/28/2016 #11

这是我对同时具有字母和数字字符的字符串进行排序的代码。

首先,这个扩展方法:

public static IEnumerable<string> AlphanumericSort(this IEnumerable<string> me)
{
    return me.OrderBy(x => Regex.Replace(x, @"\d+", m => m.Value.PadLeft(50, '0')));
}

然后,只需在代码中的任何位置使用它,如下所示:

List<string> test = new List<string>() { "The 1st", "The 12th", "The 2nd" };
test = test.AlphanumericSort();

它是如何工作的?通过用零替换:

  Original  | Regex Replace |      The      |   Returned
    List    | Apply PadLeft |    Sorting    |     List
            |               |               |
 "The 1st"  |  "The 001st"  |  "The 001st"  |  "The 1st"
 "The 12th" |  "The 012th"  |  "The 002nd"  |  "The 2nd"
 "The 2nd"  |  "The 002nd"  |  "The 012th"  |  "The 12th"

适用于多个数字:

 Alphabetical Sorting | Alphanumeric Sorting
                      |
 "Page 21, Line 42"   | "Page 3, Line 7"
 "Page 21, Line 5"    | "Page 3, Line 32"
 "Page 3, Line 32"    | "Page 21, Line 5"
 "Page 3, Line 7"     | "Page 21, Line 42"

希望这会有所帮助。

评论

0赞 ruffin 11/13/2020
"它是如何工作的?通过用零替换...“第001号”我的意思是,公平地说,你使用的零比这多得多(),它基本上与马修·霍斯利的答案相同(afaict,尽管你对最大数字长度很幼稚,这可能很糟糕),但它是很好的代码高尔夫和一个很好的解释。;^D"The 00000000000000000000000000000000000000000000000001st"
0赞 Matthew M. 10/30/2021
好消息是可以将其变体添加到 LINQ 查询中:。OrderBy(j => Regex.Replace(j.MyField, @“\d+”, m => m.Value.PadLeft(50, '0'))) 就像一个魅力。
7赞 Drew Noakes 12/15/2016 #12

下面是一个相对简单的示例,它不使用 P/Invoke,并且避免在执行期间进行任何分配。

随意使用此处的代码,或者如果更简单,可以使用 NuGet 包:

https://www.nuget.org/packages/NaturalSort

https://github.com/drewnoakes/natural-sort

internal sealed class NaturalStringComparer : IComparer<string>
{
    public static NaturalStringComparer Instance { get; } = new NaturalStringComparer();

    public int Compare(string x, string y)
    {
        // sort nulls to the start
        if (x == null)
            return y == null ? 0 : -1;
        if (y == null)
            return 1;

        var ix = 0;
        var iy = 0;

        while (true)
        {
            // sort shorter strings to the start
            if (ix >= x.Length)
                return iy >= y.Length ? 0 : -1;
            if (iy >= y.Length)
                return 1;

            var cx = x[ix];
            var cy = y[iy];

            int result;
            if (char.IsDigit(cx) && char.IsDigit(cy))
                result = CompareInteger(x, y, ref ix, ref iy);
            else
                result = cx.CompareTo(y[iy]);

            if (result != 0)
                return result;

            ix++;
            iy++;
        }
    }

    private static int CompareInteger(string x, string y, ref int ix, ref int iy)
    {
        var lx = GetNumLength(x, ix);
        var ly = GetNumLength(y, iy);

        // shorter number first (note, doesn't handle leading zeroes)
        if (lx != ly)
            return lx.CompareTo(ly);

        for (var i = 0; i < lx; i++)
        {
            var result = x[ix++].CompareTo(y[iy++]);
            if (result != 0)
                return result;
        }

        return 0;
    }

    private static int GetNumLength(string s, int i)
    {
        var length = 0;
        while (i < s.Length && char.IsDigit(s[i++]))
            length++;
        return length;
    }
}

它不会忽略前导零,所以在 之后。012

对应单元测试:

public class NumericStringComparerTests
{
    [Fact]
    public void OrdersCorrectly()
    {
        AssertEqual("", "");
        AssertEqual(null, null);
        AssertEqual("Hello", "Hello");
        AssertEqual("Hello123", "Hello123");
        AssertEqual("123", "123");
        AssertEqual("123Hello", "123Hello");

        AssertOrdered("", "Hello");
        AssertOrdered(null, "Hello");
        AssertOrdered("Hello", "Hello1");
        AssertOrdered("Hello123", "Hello124");
        AssertOrdered("Hello123", "Hello133");
        AssertOrdered("Hello123", "Hello223");
        AssertOrdered("123", "124");
        AssertOrdered("123", "133");
        AssertOrdered("123", "223");
        AssertOrdered("123", "1234");
        AssertOrdered("123", "2345");
        AssertOrdered("0", "1");
        AssertOrdered("123Hello", "124Hello");
        AssertOrdered("123Hello", "133Hello");
        AssertOrdered("123Hello", "223Hello");
        AssertOrdered("123Hello", "1234Hello");
    }

    private static void AssertEqual(string x, string y)
    {
        Assert.Equal(0, NaturalStringComparer.Instance.Compare(x, y));
        Assert.Equal(0, NaturalStringComparer.Instance.Compare(y, x));
    }

    private static void AssertOrdered(string x, string y)
    {
        Assert.Equal(-1, NaturalStringComparer.Instance.Compare(x, y));
        Assert.Equal( 1, NaturalStringComparer.Instance.Compare(y, x));
    }
}
4赞 Tom Pažourek 11/21/2017 #13

我实际上已经将其作为扩展方法实现,以便您可以执行例如:StringComparer

  • StringComparer.CurrentCulture.WithNaturalSort()
  • StringComparer.OrdinalIgnoreCase.WithNaturalSort().

结果可用于所有地方,如 、 、 、 等。您仍然可以轻松调整区分大小写、区域性等。IComparer<string>OrderByOrderByDescendingThenByThenByDescendingSortedSet<string>

实现相当简单,即使在大型序列上也应该表现良好。


我还将其发布为一个小型 NuGet 包,因此您可以执行以下操作:

Install-Package NaturalSort.Extension

代码包括 XML 文档注释和测试套件,可在 NaturalSort.Extension GitHub 存储库中找到。


整个代码如下(如果还不能使用 C# 7,只需安装 NuGet 包):

public static class StringComparerNaturalSortExtension
{
    public static IComparer<string> WithNaturalSort(this StringComparer stringComparer) => new NaturalSortComparer(stringComparer);

    private class NaturalSortComparer : IComparer<string>
    {
        public NaturalSortComparer(StringComparer stringComparer)
        {
            _stringComparer = stringComparer;
        }

        private readonly StringComparer _stringComparer;
        private static readonly Regex NumberSequenceRegex = new Regex(@"(\d+)", RegexOptions.Compiled | RegexOptions.CultureInvariant);
        private static string[] Tokenize(string s) => s == null ? new string[] { } : NumberSequenceRegex.Split(s);
        private static ulong ParseNumberOrZero(string s) => ulong.TryParse(s, NumberStyles.None, CultureInfo.InvariantCulture, out var result) ? result : 0;

        public int Compare(string s1, string s2)
        {
            var tokens1 = Tokenize(s1);
            var tokens2 = Tokenize(s2);

            var zipCompare = tokens1.Zip(tokens2, TokenCompare).FirstOrDefault(x => x != 0);
            if (zipCompare != 0)
                return zipCompare;

            var lengthCompare = tokens1.Length.CompareTo(tokens2.Length);
            return lengthCompare;
        }
        
        private int TokenCompare(string token1, string token2)
        {
            var number1 = ParseNumberOrZero(token1);
            var number2 = ParseNumberOrZero(token2);

            var numberCompare = number1.CompareTo(number2);
            if (numberCompare != 0)
                return numberCompare;

            var stringCompare = _stringComparer.Compare(token1, token2);
            return stringCompare;
        }
    }
}
2赞 mshsayem 4/23/2018 #14

下面是一个朴素的单行无正则表达式 LINQ 方式(借自 python):

var alphaStrings = new List<string>() { "10","2","3","4","50","11","100","a12","b12" };
var orderedString = alphaStrings.OrderBy(g => new Tuple<int, string>(g.ToCharArray().All(char.IsDigit)? int.Parse(g) : int.MaxValue, g));
// Order Now: ["2","3","4","10","11","50","100","a12","b12"]

评论

0赞 Arne S 10/9/2018
删除了 Dump() 并分配给 var,这就像一个魅力!
0赞 mshsayem 10/9/2018
@ArneS:它是用 LinQPad 编写的;我忘了删除.谢谢你的指出。Dump()
2赞 Oliver 9/14/2018 #15

受 Michael Parker 解决方案的启发,下面是一个实现,您可以将其放入任何 linq 排序方法中:IComparer

private class NaturalStringComparer : IComparer<string>
{
    public int Compare(string left, string right)
    {
        int max = new[] { left, right }
            .SelectMany(x => Regex.Matches(x, @"\d+").Cast<Match>().Select(y => (int?)y.Value.Length))
            .Max() ?? 0;

        var leftPadded = Regex.Replace(left, @"\d+", m => m.Value.PadLeft(max, '0'));
        var rightPadded = Regex.Replace(right, @"\d+", m => m.Value.PadLeft(max, '0'));

        return string.Compare(leftPadded, rightPadded);
    }
}
-3赞 girishkatta9 11/16/2018 #16

让我解释一下我的问题以及我是如何解决的。

问题:- 根据从目录检索的 FileInfo 对象中的 FileName 对文件进行排序。

解决方案:- 我从FileInfo中选择了文件名并修剪了文件名的“.png”部分。现在,只需执行 List.Sort(),它按自然排序顺序对文件名进行排序。根据我的测试,我发现使用 .png 会打乱排序顺序。请看下面的代码

var imageNameList = new DirectoryInfo(@"C:\Temp\Images").GetFiles("*.png").Select(x =>x.Name.Substring(0, x.Name.Length - 4)).ToList();
imageNameList.Sort();

评论

0赞 girishkatta9 11/16/2018
我能知道这个答案上 -1 的原因吗?
1赞 Kelly Elton 10/11/2019 #17

一个更易于阅读/维护的版本。

public class NaturalStringComparer : IComparer<string>
{
    public static NaturalStringComparer Instance { get; } = new NaturalStringComparer();

    public int Compare(string x, string y) {
        const int LeftIsSmaller = -1;
        const int RightIsSmaller = 1;
        const int Equal = 0;

        var leftString = x;
        var rightString = y;

        var stringComparer = CultureInfo.CurrentCulture.CompareInfo;

        int rightIndex;
        int leftIndex;

        for (leftIndex = 0, rightIndex = 0;
             leftIndex < leftString.Length && rightIndex < rightString.Length;
             leftIndex++, rightIndex++) {
            var leftChar = leftString[leftIndex];
            var rightChar = rightString[leftIndex];

            var leftIsNumber = char.IsNumber(leftChar);
            var rightIsNumber = char.IsNumber(rightChar);

            if (!leftIsNumber && !rightIsNumber) {
                var result = stringComparer.Compare(leftString, leftIndex, 1, rightString, leftIndex, 1);
                if (result != 0) return result;
            } else if (leftIsNumber && !rightIsNumber) {
                return LeftIsSmaller;
            } else if (!leftIsNumber && rightIsNumber) {
                return RightIsSmaller;
            } else {
                var leftNumberLength = NumberLength(leftString, leftIndex, out var leftNumber);
                var rightNumberLength = NumberLength(rightString, rightIndex, out var rightNumber);

                if (leftNumberLength < rightNumberLength) {
                    return LeftIsSmaller;
                } else if (leftNumberLength > rightNumberLength) {
                    return RightIsSmaller;
                } else {
                    if(leftNumber < rightNumber) {
                        return LeftIsSmaller;
                    } else if(leftNumber > rightNumber) {
                        return RightIsSmaller;
                    }
                }
            }
        }

        if (leftString.Length < rightString.Length) {
            return LeftIsSmaller;
        } else if(leftString.Length > rightString.Length) {
            return RightIsSmaller;
        }

        return Equal;
    }

    public int NumberLength(string str, int offset, out int number) {
        if (string.IsNullOrWhiteSpace(str)) throw new ArgumentNullException(nameof(str));
        if (offset >= str.Length) throw new ArgumentOutOfRangeException(nameof(offset), offset, "Offset must be less than the length of the string.");

        var currentOffset = offset;

        var curChar = str[currentOffset];

        if (!char.IsNumber(curChar))
            throw new ArgumentException($"'{curChar}' is not a number.", nameof(offset));

        int length = 1;

        var numberString = string.Empty;

        for (currentOffset = offset + 1;
            currentOffset < str.Length;
            currentOffset++, length++) {

            curChar = str[currentOffset];
            numberString += curChar;

            if (!char.IsNumber(curChar)) {
                number = int.Parse(numberString);

                return length;
            }
        }

        number = int.Parse(numberString);

        return length;
    }
}
13赞 Thomas Levesque 2/25/2021 #18

下面是 .NET Core 2.1+ / .NET 5.0+ 的版本,使用 spans 来避免分配

public class NaturalSortStringComparer : IComparer<string>
{
    public static NaturalSortStringComparer Ordinal { get; } = new NaturalSortStringComparer(StringComparison.Ordinal);
    public static NaturalSortStringComparer OrdinalIgnoreCase { get; } = new NaturalSortStringComparer(StringComparison.OrdinalIgnoreCase);
    public static NaturalSortStringComparer CurrentCulture { get; } = new NaturalSortStringComparer(StringComparison.CurrentCulture);
    public static NaturalSortStringComparer CurrentCultureIgnoreCase { get; } = new NaturalSortStringComparer(StringComparison.CurrentCultureIgnoreCase);
    public static NaturalSortStringComparer InvariantCulture { get; } = new NaturalSortStringComparer(StringComparison.InvariantCulture);
    public static NaturalSortStringComparer InvariantCultureIgnoreCase { get; } = new NaturalSortStringComparer(StringComparison.InvariantCultureIgnoreCase);

    private readonly StringComparison _comparison;

    public NaturalSortStringComparer(StringComparison comparison)
    {
        _comparison = comparison;
    }

    public int Compare(string x, string y)
    {
        // Let string.Compare handle the case where x or y is null
        if (x is null || y is null)
            return string.Compare(x, y, _comparison);

        var xSegments = GetSegments(x);
        var ySegments = GetSegments(y);

        while (xSegments.MoveNext() && ySegments.MoveNext())
        {
            int cmp;

            // If they're both numbers, compare the value
            if (xSegments.CurrentIsNumber && ySegments.CurrentIsNumber)
            {
                var xValue = long.Parse(xSegments.Current);
                var yValue = long.Parse(ySegments.Current);
                cmp = xValue.CompareTo(yValue);
                if (cmp != 0)
                    return cmp;
            }
            // If x is a number and y is not, x is "lesser than" y
            else if (xSegments.CurrentIsNumber)
            {
                return -1;
            }
            // If y is a number and x is not, x is "greater than" y
            else if (ySegments.CurrentIsNumber)
            {
                return 1;
            }

            // OK, neither are number, compare the segments as text
            cmp = xSegments.Current.CompareTo(ySegments.Current, _comparison);
            if (cmp != 0)
                return cmp;
        }

        // At this point, either all segments are equal, or one string is shorter than the other

        // If x is shorter, it's "lesser than" y
        if (x.Length < y.Length)
            return -1;
        // If x is longer, it's "greater than" y
        if (x.Length > y.Length)
            return 1;

        // If they have the same length, they're equal
        return 0;
    }

    private static StringSegmentEnumerator GetSegments(string s) => new StringSegmentEnumerator(s);

    private struct StringSegmentEnumerator
    {
        private readonly string _s;
        private int _start;
        private int _length;

        public StringSegmentEnumerator(string s)
        {
            _s = s;
            _start = -1;
            _length = 0;
            CurrentIsNumber = false;
        }

        public ReadOnlySpan<char> Current => _s.AsSpan(_start, _length);
        
        public bool CurrentIsNumber { get; private set; }

        public bool MoveNext()
        {
            var currentPosition = _start >= 0
                ? _start + _length
                : 0;

            if (currentPosition >= _s.Length)
                return false;

            int start = currentPosition;
            bool isFirstCharDigit = Char.IsDigit(_s[currentPosition]);

            while (++currentPosition < _s.Length && Char.IsDigit(_s[currentPosition]) == isFirstCharDigit)
            {
            }

            _start = start;
            _length = currentPosition - start;
            CurrentIsNumber = isFirstCharDigit;

            return true;
        }
    }
}

评论

0赞 Dazfl 4/30/2022
这太棒了。为什么它不是 .NET 库的一部分?我想在进行排序时,自然排序将是一个相当普遍的要求。
0赞 Charlieface 7/22/2022
保留为字段似乎是合乎逻辑的,那么您就不需要不必要地保留。此外,大系列中的序列可以简化为,最终长度检查可以简化为isFirstCharDigitStringSegmentEnumeratorTryParseifwhilecmp = yIsNumber.CompareTo(xIsNumber); if (cmp != 0) return cmp; cmp = xValue.CompareTo(yValue); if (cmp != 0) return cmp; cmp = xSegments.Current.CompareTo(....return x.Length.CompareTo(y.Length);
0赞 Thomas Levesque 7/28/2022
@Charlieface感谢您的建议。我会调查一下,但请不要在未经事先同意的情况下对代码进行重大更改。
0赞 Charlieface 7/28/2022
对不起。在几天没有回应我的评论后,我以为这个答案已经被放弃了。然后检查更改并自行决定。我认为可能还有其他改进,例如删除和计算它,但不想全力以赴。_currentPosition_start + _length
0赞 Thomas Levesque 7/29/2022
@Charlieface 我明白你的观点,但它只是因为你添加了方法才有意义。事实上,我认为直接存储号码可能会更好。关于其他更改:技巧很聪明,但老实说,我认为它使代码更难理解。你说得对_currentPosition,它可以很容易地计算出来,使结构更小。isFirstCharDigitParseNumber.HasValue.CompareTo(...)