C 中一行文本中的单词比较#

word comparison in a line of text in c#

提问人:Niranjan 提问时间:9/30/2023 最后编辑:Dmitry BychenkoNiranjan 更新时间:9/30/2023 访问量:57

问:

嗨,我在我的项目中使用 c# 语言,我正在尝试获得如下所示的输出。

 string str1 = "Cat meet's a dog has";
 string str2 = "Cat meet's a dog and a bird";

 string[] str1Words = str1.ToLower().Split(' ');
 string[] str2Words = str2.ToLower().Split(' ');

 var uniqueWords = str2Words
   .Except(str1Words)
   .Concat(str1Words.Except(str2Words))
   .ToList();

这给了我 has,和,a,鸟,这是正确的,但我想要的是如下

has - 存在于第一个字符串中,不存在于第二个字符串中

和一只鸟 - 不存在于第一个字符串中,但存在于第二个字符串中

例如,第二个用户案例

String S1 = "Added"
String S2 = "Edited"

这里放出来应该是

已添加 - 存在于第一个字符串中,不存在于第二个字符串中

已编辑 - 不存在于第一个字符串中,但存在于第二个字符串中

我想得到一些指示,哪些是存在于第一而不是第二,存在于第二,而不是出现在第一,比较应该是逐字比较,而不是逐字比较。有人可以帮我解决这个问题吗?任何帮助将不胜感激。谢谢

C# ASP.NET 核心 字符串比较

评论


答:

0赞 Lajos Arpad 9/30/2023 #1
str2Words.Except(str1Words)

查找 中没有 的单词。str2Wordsstr1Words

str1Words.Except(str2Words)

查找 中没有 的单词。str1Wordsstr2Words

由于您分别需要这两者,因此您需要避免将它们连接起来,而是对它们中使用 Join 来获得空格分隔的结果,并附加您为它们计划的“当前”附录。

评论

0赞 Niranjan 9/30/2023
嗨,@Lajos Arpad,我这样做了,它发现了差异,但它没有告诉我哪个词出现在第一个而不是第二个,以及其他方式
0赞 Lajos Arpad 9/30/2023
@Niranjan你试过这样的东西吗?如果是这样,结果如何?str2Words.Except(str1Words).Join(" ") + " present in first string not present in second string"
2赞 Dmitry Bychenko 9/30/2023 #2

我建议匹配单词

让单词成为字母和撇号的序列

正则表达式的帮助下(请注意,拆分不考虑标点符号,因此将被视为三个不同的单词),然后查询两个给定字符串的匹配项:catcat,cat!

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions; 

...

private static readonly Regex WordsRegex = new Regex(@"[\p{L}']+"); 

// 1 - in text1, 2 - in text2, 3 - in both text1 and text2 
private static List<(string word, int presentAt)> MyWords(string text1, string text2) {
  HashSet<string> words1 = WordsRegex
    .Matches(text1)
    .Cast<Match>()
    .Select(match => match.Value)
    .ToHashSet(StringComparer.OrdinalIgnoreCase);

  HashSet<string> words2 = WordsRegex
    .Matches(text2)
    .Cast<Match>()
    .Select(match => match.Value)
    .ToHashSet(StringComparer.OrdinalIgnoreCase);

  return words1
    .Union(words2)
    .Select(word => (word, presentAt: (words1.Contains(word) ? 1 : 0) | 
                                      (words2.Contains(word) ? 2 : 0)))
    .ToList();
}

演示:

string str1 = "Cat meet's a dog has";
string str2 = "Cat meet's a dog and a bird";
    
var result = MyWords(str1, str2);
    
var report = string.Join(Environment.NewLine, result);
    
Console.Write(report);

输出:

(Cat, 3)         # 3: in both str1 and str2 
(meet's, 3)      # 3: in both str1 and str2
(a, 3)           # 3: in both str1 and str2
(dog, 3)         # 3: in both str1 and str2 
(has, 1)         # 1: in str1 only
(and, 2)         # 2: in str2 only
(bird, 2)        # 2: in str2 only 

小提琴

如果你想要一个冗长的输出:

string str1 = "Cat meet's a dog has";
string str2 = "Cat meet's a dog and a bird";
    
string[] options = new string[] {
  "not present",
  "present in first string not present in second string",
  "not present in first string but present in second string",
  "present in first string and present in second string"
};
        
var report = string.Join(Environment.NewLine, result
  .Select(pair => $"{pair.word} - {options[pair.presentAt]}"));

Console.Write(report);

输出:

Cat - present in first string and present in second string
meet's - present in first string and present in second string
a - present in first string and present in second string
dog - present in first string and present in second string
has - present in first string not present in second string
and - not present in first string but present in second string
bird - not present in first string but present in second string

评论

0赞 Niranjan 9/30/2023
嗨,@Dmitry,假设下面是我的两个文本 string str1 = “在方框 2 中指定的一方之间就方框 3 中显示的日期达成一致”;string str2 = “同意在方框 2 所示的日期删除方框 3 中指定的一方为”;我期望的结果是 (It, 3) (It, 3) (is, 3) (is, 3) (agreed, 3) (on, 3) (the, 3) (date, 3) (shown, 3) (in, 3) (box, 3) (between, 1) (removed, 2) (party, 3) (named, 3) (as, 3) 所以我也想保留顺序
0赞 Dmitry Bychenko 9/30/2023
@Niranjan:那你在寻找一个编辑序列吗?拥有原始字符串和目标字符串,您将获得?"Agreed on new date shown""Agreed on the old datetime shown"Agreed (keep) on (keep) new (delete) the (insert) old (insert) datetime (edit from date) shown (keep)