提问人:πάντα ῥεῖ 提问时间:4/14/2014 最后编辑:John Kugelmanπάντα ῥεῖ 更新时间:1/9/2021 访问量:3008
为什么从 std::istream 读取记录结构字段会失败,我该如何解决?
Why does reading a record struct fields from std::istream fail, and how can I fix it?
问:
假设我们遇到以下情况:
记录结构声明如下
struct Person { unsigned int id; std::string name; uint8_t age; // ... };
记录使用以下格式存储在文件中:
ID Forename Lastname Age ------------------------------ 1267867 John Smith 32 67545 Jane Doe 36 8677453 Gwyneth Miller 56 75543 J. Ross Unusual 23 ...
应读入该文件以收集任意数量的上述记录:Person
std::istream& ifs = std::ifstream("SampleInput.txt");
std::vector<Person> persons;
Person actRecord;
while(ifs >> actRecord.id >> actRecord.name >> actRecord.age) {
persons.push_back(actRecord);
}
if(!ifs) {
std::err << "Input format error!" << std::endl;
}
问题:
我该怎么做才能读取单独的值,将其值存储到一个变量的字段中?actRecord
上面的代码示例最终会出现运行时错误:
Runtime error time: 0 memory: 3476 signal:-1
stderr: Input format error!
答:
名字和姓氏之间有空格。将您的类更改为将 firstname 和 lastname 作为单独的字符串,它应该可以工作。您可以做的另一件事是读取两个单独的变量,例如 and 并将其赋值为name1
name2
actRecord.name = name1 + " " + name2;
评论
>>
std::getline
来获取名称:std::ifs >> actRecord.id >> actRecord.age && std::getline(ifs, actRecord.name)
一个可行的解决方案是重新排序输入字段(如果可能的话)
ID Age Forename Lastname
1267867 32 John Smith
67545 36 Jane Doe
8677453 56 Gwyneth Miller
75543 23 J. Ross Unusual
...
并在记录中阅读如下内容
#include <iostream>
#include <vector>
struct Person {
unsigned int id;
std::string name;
uint8_t age;
// ...
};
int main() {
std::istream& ifs = std::cin; // Open file alternatively
std::vector<Person> persons;
Person actRecord;
unsigned int age;
while(ifs >> actRecord.id >> age &&
std::getline(ifs, actRecord.name)) {
actRecord.age = uint8_t(age);
persons.push_back(actRecord);
}
return 0;
}
评论
一种解决方案是将第一个条目读入变量。
然后读出该行中的所有其他单词(只需将它们推入临时向量中),并使用所有元素构建个人的名称,除了最后一个条目是 Age。ID
这样一来,您仍然可以将年龄放在最后一个位置,但能够处理诸如“J. Ross Unusual”之类的名称。
更新以添加一些代码来说明上述理论:
#include <memory>
#include <string>
#include <vector>
#include <iterator>
#include <fstream>
#include <sstream>
#include <iostream>
struct Person {
unsigned int id;
std::string name;
int age;
};
int main()
{
std::fstream ifs("in.txt");
std::vector<Person> persons;
std::string line;
while (std::getline(ifs, line))
{
std::istringstream iss(line);
// first: ID simply read it
Person actRecord;
iss >> actRecord.id;
// next iteration: read in everything
std::string temp;
std::vector<std::string> tempvect;
while(iss >> temp) {
tempvect.push_back(temp);
}
// then: the name, let's join the vector in a way to not to get a trailing space
// also taking care of people who do not have two names ...
int LAST = 2;
if(tempvect.size() < 2) // only the name and age are in there
{
LAST = 1;
}
std::ostringstream oss;
std::copy(tempvect.begin(), tempvect.end() - LAST,
std::ostream_iterator<std::string>(oss, " "));
// the last element
oss << *(tempvect.end() - LAST);
actRecord.name = oss.str();
// and the age
actRecord.age = std::stoi( *(tempvect.end() - 1) );
persons.push_back(actRecord);
}
for(std::vector<Person>::const_iterator it = persons.begin(); it != persons.end(); it++)
{
std::cout << it->id << ":" << it->name << ":" << it->age << std::endl;
}
}
我该怎么做才能将构成名称的单独单词读入一个变量?
actRecord.name
一般的答案是:不,如果没有额外的分隔符规范和对形成预期内容的部分的特殊解析,就无法做到这一点。
这是因为将分析字段,直到下一个空格字符出现。actRecord.name
std::string
值得注意的是,某些标准格式(例如)可能需要支持区分空格 () 和制表符 () 或其他字符,以分隔某些记录字段(乍一看可能不可见)。.csv
' '
'\t'
另请注意:
要将值读取为数字输入,您必须使用临时值进行偏差。只读取一个 (aka ) 会搞砸流解析状态。uint8_t
unsigned int
unsigned char
uint8_t
这是我想出的一个操纵器的实现,它通过每个提取的字符来计算分隔符。使用您指定的分隔符数量,它将从输入流中提取单词。这是一个工作演示。
template<class charT>
struct word_inserter_impl {
word_inserter_impl(std::size_t words, std::basic_string<charT>& str, charT delim)
: str_(str)
, delim_(delim)
, words_(words)
{ }
friend std::basic_istream<charT>&
operator>>(std::basic_istream<charT>& is, const word_inserter_impl<charT>& wi) {
typename std::basic_istream<charT>::sentry ok(is);
if (ok) {
std::istreambuf_iterator<charT> it(is), end;
std::back_insert_iterator<std::string> dest(wi.str_);
while (it != end && wi.words_) {
if (*it == wi.delim_ && --wi.words_ == 0) {
break;
}
dest++ = *it++;
}
}
return is;
}
private:
std::basic_string<charT>& str_;
charT delim_;
mutable std::size_t words_;
};
template<class charT=char>
word_inserter_impl<charT> word_inserter(std::size_t words, std::basic_string<charT>& str, charT delim = charT(' ')) {
return word_inserter_impl<charT>(words, str, delim);
}
现在你可以做:
while (ifs >> actRecord.id >> word_inserter(2, actRecord.name) >> actRecord.age) {
std::cout << actRecord.id << " " << actRecord.name << " " << actRecord.age << '\n';
}
评论
另一种解决方案是要求特定字段使用某些分隔符,并为此目的提供特殊的提取操纵器。
假设我们定义分隔符字符,输入应如下所示:"
1267867 "John Smith" 32
67545 "Jane Doe" 36
8677453 "Gwyneth Miller" 56
75543 "J. Ross Unusual" 23
一般需要包括:
#include <iostream>
#include <vector>
#include <iomanip>
记录声明:
struct Person {
unsigned int id;
std::string name;
uint8_t age;
// ...
};
支持与全局运算符重载一起使用的代理类(结构)的声明/定义:std::istream& operator>>(std::istream&, const delim_field_extractor_proxy&)
struct delim_field_extractor_proxy {
delim_field_extractor_proxy
( std::string& field_ref
, char delim = '"'
)
: field_ref_(field_ref), delim_(delim) {}
friend
std::istream& operator>>
( std::istream& is
, const delim_field_extractor_proxy& extractor_proxy);
void extract_value(std::istream& is) const {
field_ref_.clear();
char input;
bool addChars = false;
while(is) {
is.get(input);
if(is.eof()) {
break;
}
if(input == delim_) {
addChars = !addChars;
if(!addChars) {
break;
}
else {
continue;
}
}
if(addChars) {
field_ref_ += input;
}
}
// consume whitespaces
while(std::isspace(is.peek())) {
is.get();
}
}
std::string& field_ref_;
char delim_;
};
std::istream& operator>>
( std::istream& is
, const delim_field_extractor_proxy& extractor_proxy) {
extractor_proxy.extract_value(is);
return is;
}
将所有连接在一起的东西连接在一起,并实例化:delim_field_extractor_proxy
int main() {
std::istream& ifs = std::cin; // Open file alternatively
std::vector<Person> persons;
Person actRecord;
int act_age;
while(ifs >> actRecord.id
>> delim_field_extractor_proxy(actRecord.name,'"')
>> act_age) {
actRecord.age = uint8_t(act_age);
persons.push_back(actRecord);
}
for(auto it = persons.begin();
it != persons.end();
++it) {
std::cout << it->id << ", "
<< it->name << ", "
<< int(it->age) << std::endl;
}
return 0;
}
请参阅此处的工作示例。
注意:
此解决方案还可以很好地指定 TAB 字符 () 作为分隔符,这在解析标准格式时很有用。\t
.csv
由于我们可以轻松地在空格上拆分一行,并且我们知道唯一可以分隔的值是名称,因此可能的解决方案是为包含该行的空格分隔元素的每行使用一个 deque。可以很容易地从 deque 中检索 id 和 age,其余元素可以连接起来以检索名称:
#include <iostream>
#include <fstream>
#include <deque>
#include <vector>
#include <sstream>
#include <iterator>
#include <string>
#include <algorithm>
#include <utility>
struct Person {
unsigned int id;
std::string name;
uint8_t age;
};
int main(int argc, char* argv[]) {
std::ifstream ifs("SampleInput.txt");
std::vector<Person> records;
std::string line;
while (std::getline(ifs,line)) {
std::istringstream ss(line);
std::deque<std::string> info(std::istream_iterator<std::string>(ss), {});
Person record;
record.id = std::stoi(info.front()); info.pop_front();
record.age = std::stoi(info.back()); info.pop_back();
std::ostringstream name;
std::copy
( info.begin()
, info.end()
, std::ostream_iterator<std::string>(name," "));
record.name = name.str(); record.name.pop_back();
records.push_back(std::move(record));
}
for (auto& record : records) {
std::cout << record.id << " " << record.name << " "
<< static_cast<unsigned int>(record.age) << std::endl;
}
return 0;
}
评论
解决解析问题的另一种尝试。
int main()
{
std::ifstream ifs("test-115.in");
std::vector<Person> persons;
while (true)
{
Person actRecord;
// Read the ID and the first part of the name.
if ( !(ifs >> actRecord.id >> actRecord.name ) )
{
break;
}
// Read the rest of the line.
std::string line;
std::getline(ifs,line);
// Pickup the rest of the name from the rest of the line.
// The last token in the rest of the line is the age.
// All other tokens are part of the name.
// The tokens can be separated by ' ' or '\t'.
size_t pos = 0;
size_t iter1 = 0;
size_t iter2 = 0;
while ( (iter1 = line.find(' ', pos)) != std::string::npos ||
(iter2 = line.find('\t', pos)) != std::string::npos )
{
size_t iter = (iter1 != std::string::npos) ? iter1 : iter2;
actRecord.name += line.substr(pos, (iter - pos + 1));
pos = iter + 1;
// Skip multiple whitespace characters.
while ( isspace(line[pos]) )
{
++pos;
}
}
// Trim the last whitespace from the name.
actRecord.name.erase(actRecord.name.size()-1);
// Extract the age.
// std::stoi returns an integer. We are assuming that
// it will be small enough to fit into an uint8_t.
actRecord.age = std::stoi(line.substr(pos).c_str());
// Debugging aid.. Make sure we have extracted the data correctly.
std::cout << "ID: " << actRecord.id
<< ", name: " << actRecord.name
<< ", age: " << (int)actRecord.age << std::endl;
persons.push_back(actRecord);
}
// If came here before the EOF was reached, there was an
// error in the input file.
if ( !(ifs.eof()) ) {
std::cerr << "Input format error!" << std::endl;
}
}
当看到这样的输入文件时,我认为它不是一个(新方式)分隔的文件,而是一个很好的旧的固定大小字段文件,就像 Fortran 和 Cobol 程序员过去处理的那样。所以我会这样解析它(注意我把名字和姓氏分开了):
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
struct Person {
unsigned int id;
std::string forename;
std::string lastname;
uint8_t age;
// ...
};
int main() {
std::istream& ifs = std::ifstream("file.txt");
std::vector<Person> persons;
std::string line;
int fieldsize[] = {8, 9, 9, 4};
while(std::getline(ifs, line)) {
Person person;
int field = 0, start=0, last;
std::stringstream fieldtxt;
fieldtxt.str(line.substr(start, fieldsize[0]));
fieldtxt >> person.id;
start += fieldsize[0];
person.forename=line.substr(start, fieldsize[1]);
last = person.forename.find_last_not_of(' ') + 1;
person.forename.erase(last);
start += fieldsize[1];
person.lastname=line.substr(start, fieldsize[2]);
last = person.lastname.find_last_not_of(' ') + 1;
person.lastname.erase(last);
start += fieldsize[2];
std::string a = line.substr(start, fieldsize[3]);
fieldtxt.str(line.substr(start, fieldsize[3]));
fieldtxt >> age;
person.age = person.age;
persons.push_back(person);
}
return 0;
}
评论