我应该如何从 2 个单独的文本文件中读取字符串并比较匹配项？-解网

问：

我有 2 个带字符串的文本文件（每个文件几百个）。这个想法是比较每个字符串的内容，如果找到匹配项，则该字符串将输出到一个文件。如果未找到匹配项，则字符串将输出到其他文件。从本质上讲，输入文件 1 包含名称的主列表，我将其与输入文件 2 进行比较。因此，我们在主列表中获取名称 1，然后将该名称与另一个输入文件上的每个名称进行比较。

我坚持的主要部分是制作一种能够正确遍历文件的算法。我也不确定我是否正确比较了字符串，但这可能是其他错误的结果。我是 c++ 的新手，所以我并不完全了解该语言的所有规则。

所以主名单上有 480 个名字，第二个名单上有 303 个名字，所以应该有 303 个名字匹配，177 个不匹配，我做了计数器以确保数字匹配。首先，我尝试了一个简单的 while 循环，只要从主文件输入，它就会循环，但我遇到了一个问题，我没有匹配所有文件（有意义），所以我认为也许我需要更复杂的东西，所以我尝试将每个输入文件中的所有值读取到它们自己的数组中，并尝试比较数组的元素。我阅读并成功打印了数组，但我遇到了其他一些问题。即分段错误，这显然是由 sizeof（）引起的，我仍在尝试排除故障。我试着这样做：

//Had problems with making empty arrays
string arrMidasMaster[480];
string arrMidasMath[303];

for (int i = 0; i < sizeof(arrMidasMaster); ++i)
    {
        for (int j = 0; j < sizeof(arrMidasMath); ++j)
        {
            if (arrMidasMaster[i] == arrMidasMath[j]) //match
            {
                outData_Elig << arrMidasMaster[i] << endl;
                num_eligible ++; //counter
            }
            else                                      //No match
            {
                continue;
                //Where can I put these statements?
                //outData_Ineli << arrMidasMaster[i] << endl;
                //num_ineligible ++; //counter
            }
        }
    }

在宏伟的计划中，它看起来应该能够做我需要它做的事情，但仍然有一些事情需要用它做。除了分段错误之外，if else 语句还需要工作。这是因为我需要继续使用，直到找到匹配项，但是如果从未找到匹配项，那么它看起来只会回到外部循环并测试下一个名称，但我希望它执行 2 个语句，如上所示。我希望这已经足够了。

C++ 数组字符串 IO

我坚持的主要部分是制作一种能够正确遍历文件的算法——将文件 1 的所有内容读为 .将文件 2 的所有内容读为 .然后是两个向量。然后用于获取所有常用字符串，以及一个字符串中但另一个字符串中不包含的字符串列表。std::vector<std::string>std::vector<std::string>std::sortstd::set_intersectionstd::set_difference

1赞 Andreas Wenzel 6/20/2023

我建议您不要尝试处理具有数百个字符串的文件，而是首先尝试处理具有大约 5 个字符串的文件。这样，您将能够在调试器中逐行运行程序，同时监视所有变量的值。一旦你让你的程序使用 5 个字符串，你就可以增加文件中的字符串数量。

1赞 PaulMcKenzie 6/20/2023

string arrMidasMaster[480];-- 如果没有 480 个名字怎么办？执行此操作的方法是使用一个类型，该类型在文件中的每个名称上展开，即 .std::vector<std::string>

答：

1赞 Anakin 6/20/2023 #1

您的示例代码有很多问题，逐一完成确实是一项漫长的任务。您需要从文件中读取行并在矢量中动态填充文本。然后，您不需要指定数组的大小。此外，您应该检查如何在 C++ 中使用循环。但以下是您正在尝试的示例代码。通读代码并阅读注释以理解，并根据您的用例进行尝试。首先尝试一个小示例，然后尝试实际的文本文件。

#include <iostream>
#include <fstream>
#include <vector>
#include <algorithm>

void compareFiles(const std::string& fileA, const std::string& fileB, const std::string& matchFile, const std::string& nonMatchFile) 
{
    std::ifstream inputFile1(fileA);
    std::ifstream inputFile2(fileB);
    std::ofstream outputFileMatch(matchFile);
    std::ofstream outputFileNonMatch(nonMatchFile);

    std::vector<std::string> masterList;  // to save the master list
    std::string line;

    // Read contents of fileA into a vector assuming each line has the text you want to match
    while (std::getline(inputFile1, line)) // reading until end of file
    {
        masterList.push_back(line); // push into the vector
    }

    // Compare contents of fileB with text from fileA in vector masterList
    while (std::getline(inputFile2, line)) // reading until end of file
    {
        if (std::find(masterList.begin(), masterList.end(), line) != masterList.end()) // check if the text in fileB is present in the fileA vector
        {
            outputFileMatch << line << "\n"; // write if match found
        } 
        else 
        {
            outputFileNonMatch << line << "\n"; // write if match not found
        }
    }

}

编辑：正如@john在评论中正确指出的那样，不需要关闭文件，因为它们会自动超出范围。所以删除了那个代码。

#include <algorithm>
#include <filesystem>
#include <fstream>
#include <iostream>
#include <iterator>
#include <string>
#include <unordered_set>

class LineHelper {
    std::string data;

public:
    friend std::istream& operator>>(std::istream& is, LineHelper& l);
    operator std::string() const { return data; }
};

std::istream& operator>>(std::istream& is, LineHelper& l)
{
    return std::getline(is, l.data);
}

using LineInputStreamIter = std::istream_iterator<LineHelper>;

std::unordered_set<std::string> loadDictionary(std::istream& in)
{
    return { LineInputStreamIter { in }, {} };
}

std::unordered_set<std::string> loadDictionary(const std::filesystem::path& p)
{
    std::ifstream f { p };
    return loadDictionary(f);
}

void partitionCopy(std::istream& in, std::unordered_set<std::string> dic, std::ostream& trueOut, std::ostream& falseOut)
{
    std::partition_copy(
        LineInputStreamIter { in }, {},
        std::ostream_iterator<std::string> { trueOut, "\n" },
        std::ostream_iterator<std::string> { falseOut, "\n" },
        [&dic](const auto& s) { return dic.count(s) != 0; });
}

void partitionCopy(const std::filesystem::path& in, const std::filesystem::path& dic, const std::filesystem::path& trueOut, const std::filesystem::path& falseOut)
{
    std::ifstream inF { in };
    std::ofstream trueOutF { trueOut };
    std::ofstream falseOutF { falseOut };
    partitionCopy(inF, loadDictionary(dic), trueOutF, falseOutF);
}

int main()
{
    partitionCopy("in.txt", "dic.txt", "foundItmes.txt", "remainintItmes.txt");
    return 0;
}

https://godbolt.org/z/73e4h46d3

没有正确测试，但应该可以正常工作。

0赞 Matthew Maisonave 6/24/2023 #3

在大家的帮助下，我能够找到问题的正确答案，谢谢。该函数正在查找 vComp 的第一个元素与其最后一个元素范围内的任何匹配项。这很重要，因为如果以相反的顺序比较文件，则只会找到匹配项。这是因为将仅包含匹配的字符串，而不包含其余的不匹配数据。在将选定组与整体进行比较时，这始终是正确的。find()string linestring line

//Variable Declaration
ifstream inData_Master, inData_Comp;
ofstream outData_Match, outData_NonMatch;
int num_Matches = 0, num_NonMatches = 0; // counter
vector<string> vComp; // saves the compare list
string line;

inData_Master.open("Input_File_Master.txt");
inData_Comp.open("Input_File_Comp.txt");
outData_Match.open("Output_File_Match.txt");
outData_NonMatch.open("Output_File_NonMatch.txt");

// Reads & saves contents of our Comp input file, which will be compared against the Master file, store in line
while (getline(inData_Comp,line)) 
{
    vComp.push_back(line); // Adds new element to the end of the Vector via line
}
// Reads the contents of the master file 
while (getline(inData_Master, line))
{
    if (find(vComp.begin(),vComp.end(),line) != vComp.end()) // if nonMatch, find = vComp.end()  
    {
        outData_Match << line << endl; // match found
        num_Matches++; // counts match
    }
    else
    {
        outData_NonMatch << line << endl; // match not found
        num_NonMatches++; // counts non match
    }
}
cout << "Matches: " << num_matches << endl;
cout << "NonMatches: " << num_NonMatches << endl;

上一个：C++ 不会打印定义为“字符串 s（n，k）;”然后调整大小的字符串

下一个：在 C++ 应用程序中，使用哪种设计模式来配置不同基元类型的串联输出？

我应该如何从 2 个单独的文本文件中读取字符串并比较匹配项？

How should I read in strings from 2 separate text files and compare for matches?

评论

评论