提问人: 提问时间:10/25/2008 最后编辑:28 revs, 15 users 28%Ashwin Nanjappa 更新时间:6/15/2023 访问量:2386940
如何遍历字符串的单词?
How do I iterate over the words of a string?
问:
如何遍历由空格分隔的单词组成的字符串的单词?
请注意,我对 C 字符串函数或那种字符操作/访问不感兴趣。比起效率,我更喜欢优雅。我目前的解决方案:
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
int main() {
string s = "Somewhere down the road";
istringstream iss(s);
do {
string subs;
iss >> subs;
cout << "Substring: " << subs << endl;
} while (iss);
}
答:
STL 还没有这样的方法。
但是,您可以通过使用 std::string::c_str
() 成员来使用 C 的 strtok()
函数,也可以编写自己的函数。这是我在快速 Google 搜索(“STL 字符串拆分”)后找到的代码示例:
void Tokenize(const string& str,
vector<string>& tokens,
const string& delimiters = " ")
{
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
string::size_type pos = str.find_first_of(delimiters, lastPos);
while (string::npos != pos || string::npos != lastPos)
{
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
}
摘自:http://oopweb.com/CPP/Documents/CPPHOWTO/Volume/C++Programming-HOWTO-7.html
如果您对代码示例有疑问,请发表评论,我会解释。
仅仅因为它没有实现被调用的迭代器或重载,运算符并不意味着它是糟糕的代码。我经常使用 C 函数。例如,printf
和 scanf
都比 std::cin 和 std::cout
快(明显),fopen
语法对二进制类型更友好,而且它们也倾向于产生更小的 EXE。typedef
<<
不要被这种“优雅胜于性能”的交易所吸引。
评论
这是我最喜欢的遍历字符串的方式。你可以对每个单词做任何你想做的事。
string line = "a line of text to iterate through";
string word;
istringstream iss(line, istringstream::in);
while( iss >> word )
{
// Do something on `word` here...
}
评论
word
char
stringstream ss("Hello World, this is*@#&$(@ a string"); char c; while(ss >> c) cout << c;
使用现有工作非常有效,并且完全按照您的意愿行事。如果你只是在寻找不同的做事方式,你可以使用 std::find()/std::find_first_of() 和 std
::string::substr()。
std::stringstream
下面是一个示例:
#include <iostream>
#include <string>
int main()
{
std::string s("Somewhere down the road");
std::string::size_type prev_pos = 0, pos = 0;
while( (pos = s.find(' ', pos)) != std::string::npos )
{
std::string substring( s.substr(prev_pos, pos-prev_pos) );
std::cout << substring << '\n';
prev_pos = ++pos;
}
std::string substring( s.substr(prev_pos, pos-prev_pos) ); // Last word
std::cout << substring << '\n';
return 0;
}
评论
prev_pos = pos += delimiter.length();
对于一个大得离谱且可能冗余的版本,请尝试大量 for 循环。
string stringlist[10];
int count = 0;
for (int i = 0; i < sequence.length(); i++)
{
if (sequence[i] == ' ')
{
stringlist[count] = sequence.substr(0, i);
sequence.erase(0, i+1);
i = 0;
count++;
}
else if (i == sequence.length()-1) // Last word
{
stringlist[count] = sequence.substr(0, i+1);
}
}
它并不漂亮,但总的来说(除了标点符号和许多其他错误)它有效!
评论
我喜欢以下内容,因为它将结果放入向量中,支持字符串作为 delim,并控制保留空值。但是,它看起来并不那么好。
#include <ostream>
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace std;
vector<string> split(const string& s, const string& delim, const bool keep_empty = true) {
vector<string> result;
if (delim.empty()) {
result.push_back(s);
return result;
}
string::const_iterator substart = s.begin(), subend;
while (true) {
subend = search(substart, s.end(), delim.begin(), delim.end());
string temp(substart, subend);
if (keep_empty || !temp.empty()) {
result.push_back(temp);
}
if (subend == s.end()) {
break;
}
substart = subend + delim.size();
}
return result;
}
int main() {
const vector<string> words = split("So close no matter how far", " ");
copy(words.begin(), words.end(), ostream_iterator<string>(cout, "\n"));
}
当然,Boost 有一个 split()
可以部分地工作。而且,如果“空白”,您确实是指任何类型的空白,那么使用 Boost 的拆分效果很好。is_any_of()
评论
这类似于堆栈溢出问题如何在 C++ 中标记字符串?。需要 Boost 外部库
#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>
using namespace std;
using namespace boost;
int main(int argc, char** argv)
{
string text = "token test\tstring";
char_separator<char> sep(" \t");
tokenizer<char_separator<char>> tokens(text, sep);
for (const string& t : tokens)
{
cout << t << "." << endl;
}
}
评论
我用它来用分隔符拆分字符串。第一个将结果放在预先构造的向量中,第二个返回一个新向量。
#include <string>
#include <sstream>
#include <vector>
#include <iterator>
template <typename Out>
void split(const std::string &s, char delim, Out result) {
std::istringstream iss(s);
std::string item;
while (std::getline(iss, item, delim)) {
*result++ = item;
}
}
std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
split(s, delim, std::back_inserter(elems));
return elems;
}
请注意,此解决方案不会跳过空令牌,因此以下将找到 4 个项目,其中一个是空的:
std::vector<std::string> x = split("one:two::three", ':');
评论
empty()
if (!item.empty()) elems.push_back(item)
->
f(split(s, d, v))
vector
使用 Boost 的可能解决方案可能是:
#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("\t "));
这种方法可能比这种方法更快。由于这是一个通用的模板函数,因此可以使用各种分隔符来拆分其他类型的字符串(wchar 等或 UTF-8)。stringstream
有关详细信息,请参阅文档。
评论
值得一提的是,这是另一种从输入字符串中提取标记的方法,仅依赖于标准库工具。这是STL设计背后的力量和优雅的一个例子。
#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>
int main() {
using namespace std;
string sentence = "And I feel fine...";
istringstream iss(sentence);
copy(istream_iterator<string>(iss),
istream_iterator<string>(),
ostream_iterator<string>(cout, "\n"));
}
无需将提取的令牌复制到输出流中,而是可以使用相同的通用复制
算法将它们插入容器中。
vector<string> tokens;
copy(istream_iterator<string>(iss),
istream_iterator<string>(),
back_inserter(tokens));
...或直接创建:vector
vector<string> tokens{istream_iterator<string>{iss},
istream_iterator<string>{}};
评论
使用模板函数的高效、小巧、优雅的解决方案:
template <class ContainerT>
void split(const std::string& str, ContainerT& tokens,
const std::string& delimiters = " ", bool trimEmpty = false)
{
std::string::size_type pos, lastPos = 0, length = str.length();
using value_type = typename ContainerT::value_type;
using size_type = typename ContainerT::size_type;
while (lastPos < length + 1)
{
pos = str.find_first_of(delimiters, lastPos);
if (pos == std::string::npos)
pos = length;
if (pos != lastPos || !trimEmpty)
tokens.emplace_back(value_type(str.data() + lastPos,
(size_type)pos - lastPos));
lastPos = pos + 1;
}
}
我通常选择使用类型作为我的第二个参数()...但有时可能比 .std::vector<std::string>
ContainerT
list<...>
vector<...>
它还允许您指定是否通过最后一个可选参数从结果中剪裁空标记。
它所需要的只是通过 .它不显式使用流或 boost 库,但能够接受其中一些类型。std::string
<string>
此外,从 C++-17 开始,您可以使用它比使用 .下面是一个修订版本,它也支持容器作为返回类型:std::vector<std::string_view>
std::string
#include <vector>
#include <string_view>
#include <utility>
template < typename StringT,
typename DelimiterT = char,
typename ContainerT = std::vector<std::string_view> >
ContainerT split(StringT const& str, DelimiterT const& delimiters = ' ', bool trimEmpty = true, ContainerT&& tokens = {})
{
typename StringT::size_type pos, lastPos = 0, length = str.length();
while (lastPos < length + 1)
{
pos = str.find_first_of(delimiters, lastPos);
if (pos == StringT::npos)
pos = length;
if (pos != lastPos || !trimEmpty)
tokens.emplace_back(str.data() + lastPos, pos - lastPos);
lastPos = pos + 1;
}
return std::forward<ContainerT>(tokens);
}
我们已注意不要制作任何不需要的副本。
这将允许:
for (auto const& line : split(str, '\n'))
艺术
auto& lines = split(str, '\n');
两者都返回默认模板容器类型 。std::vector<std::string_view>
若要获取特定容器类型,或传递现有容器,请将输入参数与类型化的初始容器或现有容器变量一起使用:tokens
auto& lines = split(str, '\n', false, std::vector<std::string>());
艺术
std::vector<std::string> lines;
split(str, '\n', false, lines);
评论
typedef ContainerT Base; typedef typename Base::value_type ValueType; typedef typename ValueType::size_type SizeType;
trimEmpty = true
"abo"
str.find_first_of
str.find_first
这是另一种方法..
void split_string(string text,vector<string>& words)
{
int i=0;
char ch;
string word;
while(ch=text[i++])
{
if (isspace(ch))
{
if (!word.empty())
{
words.push_back(word);
}
word = "";
}
else
{
word += ch;
}
}
if (!word.empty())
{
words.push_back(word);
}
}
评论
word.clear()
word = ""
另一种灵活快捷的方式
template<typename Operator>
void tokenize(Operator& op, const char* input, const char* delimiters) {
const char* s = input;
const char* e = s;
while (*e != 0) {
e = s;
while (*e != 0 && strchr(delimiters, *e) == 0) ++e;
if (e - s > 0) {
op(s, e - s);
}
s = e + 1;
}
}
将它与字符串向量一起使用(编辑:由于有人指出不要继承 STL 类......人力资源管理;) ):
template<class ContainerType>
class Appender {
public:
Appender(ContainerType& container) : container_(container) {;}
void operator() (const char* s, unsigned length) {
container_.push_back(std::string(s,length));
}
private:
ContainerType& container_;
};
std::vector<std::string> strVector;
Appender v(strVector);
tokenize(v, "A number of words to be tokenized", " \t");
就是这样!这只是使用分词器的一种方式,就像如何使用 计算字数:
class WordCounter {
public:
WordCounter() : noOfWords(0) {}
void operator() (const char*, unsigned) {
++noOfWords;
}
unsigned noOfWords;
};
WordCounter wc;
tokenize(wc, "A number of words to be counted", " \t");
ASSERT( wc.noOfWords == 7 );
受限于想象力;)
评论
有一个名为 strtok
的函数。
#include<string>
using namespace std;
vector<string> split(char* str,const char* delim)
{
char* saveptr;
char* token = strtok_r(str,delim,&saveptr);
vector<string> result;
while(token != NULL)
{
result.push_back(token);
token = strtok_r(NULL,delim,&saveptr);
}
return result;
}
评论
strtok
来自 C 标准库,而不是 C++。在多线程程序中使用是不安全的。它修改输入字符串。
strtok
我使用这个傻瓜是因为我们的 String 类是“特殊的”(即不是标准的):
void splitString(const String &s, const String &delim, std::vector<String> &result) {
const int l = delim.length();
int f = 0;
int i = s.indexOf(delim,f);
while (i>=0) {
String token( i-f > 0 ? s.substring(f,i-f) : "");
result.push_back(token);
f=i+l;
i = s.indexOf(delim,f);
}
String token = s.substring(f);
result.push_back(token);
}
以 ' ' 作为标记循环。getline
到目前为止,我在 Boost 中使用了那个,但我需要一些不依赖于它的东西,所以我想到了这个:
static void Split(std::vector<std::string>& lst, const std::string& input, const std::string& separators, bool remove_empty = true)
{
std::ostringstream word;
for (size_t n = 0; n < input.size(); ++n)
{
if (std::string::npos == separators.find(input[n]))
word << input[n];
else
{
if (!word.str().empty() || !remove_empty)
lst.push_back(word.str());
word.str("");
}
}
if (!word.str().empty() || !remove_empty)
lst.push_back(word.str());
}
一个好的点是,你可以传递多个字符。separators
#include <vector>
#include <string>
#include <sstream>
int main()
{
std::string str("Split me by whitespaces");
std::string buf; // Have a buffer string
std::stringstream ss(str); // Insert the string into a stream
std::vector<std::string> tokens; // Create vector to hold our words
while (ss >> buf)
tokens.push_back(buf);
return 0;
}
评论
getline
while
while(getline(ss, buff, ','))
如果您喜欢使用 boost,但希望使用整个字符串作为分隔符(而不是像之前提出的大多数解决方案那样使用单个字符),则可以使用 .boost_split_iterator
示例代码,包括方便的模板:
#include <iostream>
#include <vector>
#include <boost/algorithm/string.hpp>
template<typename _OutputIterator>
inline void split(
const std::string& str,
const std::string& delim,
_OutputIterator result)
{
using namespace boost::algorithm;
typedef split_iterator<std::string::const_iterator> It;
for(It iter=make_split_iterator(str, first_finder(delim, is_equal()));
iter!=It();
++iter)
{
*(result++) = boost::copy_range<std::string>(*iter);
}
}
int main(int argc, char* argv[])
{
using namespace std;
vector<string> splitted;
split("HelloFOOworldFOO!", "FOO", back_inserter(splitted));
// or directly to console, for example
split("HelloFOOworldFOO!", "FOO", ostream_iterator<string>(cout, "\n"));
return 0;
}
以下是执行此操作的更好方法。它可以接受任何字符,除非你愿意,否则不会拆分线。不需要特殊的库(好吧,除了,但谁真的认为这是一个额外的库)不需要指针或引用,它是静态的。只是简单的普通 C++。std
#pragma once
#include <vector>
#include <sstream>
using namespace std;
class Helpers
{
public:
static vector<string> split(string s, char delim)
{
stringstream temp (stringstream::in | stringstream::out);
vector<string> elems(0);
if (s.size() == 0 || delim == 0)
return elems;
for(char c : s)
{
if(c == delim)
{
elems.push_back(temp.str());
temp = stringstream(stringstream::in | stringstream::out);
}
else
temp << c;
}
if (temp.str().size() > 0)
elems.push_back(temp.str());
return elems;
}
//Splits string s with a list of delimiters in delims (it's just a list, like if we wanted to
//split at the following letters, a, b, c we would make delims="abc".
static vector<string> split(string s, string delims)
{
stringstream temp (stringstream::in | stringstream::out);
vector<string> elems(0);
bool found;
if(s.size() == 0 || delims.size() == 0)
return elems;
for(char c : s)
{
found = false;
for(char d : delims)
{
if (c == d)
{
elems.push_back(temp.str());
temp = stringstream(stringstream::in | stringstream::out);
found = true;
break;
}
}
if(!found)
temp << c;
}
if(temp.str().size() > 0)
elems.push_back(temp.str());
return elems;
}
};
我喜欢将 boost/regex 方法用于此任务,因为它们为指定拆分条件提供了最大的灵活性。
#include <iostream>
#include <string>
#include <boost/regex.hpp>
int main() {
std::string line("A:::line::to:split");
const boost::regex re(":+"); // one or more colons
// -1 means find inverse matches aka split
boost::sregex_token_iterator tokens(line.begin(),line.end(),re,-1);
boost::sregex_token_iterator end;
for (; tokens != end; ++tokens)
std::cout << *tokens << std::endl;
}
这个呢:
#include <string>
#include <vector>
using namespace std;
vector<string> split(string str, const char delim) {
vector<string> v;
string tmp;
for(string::const_iterator i; i = str.begin(); i <= str.end(); ++i) {
if(*i != delim && i != str.end()) {
tmp += *i;
} else {
v.push_back(tmp);
tmp = "";
}
}
return v;
}
评论
这是我对 Kev 来源的看法:
#include <string>
#include <vector>
void split(vector<string> &result, string str, char delim ) {
string tmp;
string::iterator i;
result.clear();
for(i = str.begin(); i <= str.end(); ++i) {
if((const char)*i != delim && i != str.end()) {
tmp += *i;
} else {
result.push_back(tmp);
tmp = "";
}
}
}
之后,调用该函数并对其执行操作:
vector<string> hosts;
split(hosts, "192.168.1.2,192.168.1.3", ',');
for( size_t i = 0; i < hosts.size(); i++){
cout << "Connecting host : " << hosts.at(i) << "..." << endl;
}
如果需要通过非空格符号解析字符串,则字符串流会很方便:
string s = "Name:JAck; Spouse:Susan; ...";
string dummy, name, spouse;
istringstream iss(s);
getline(iss, dummy, ':');
getline(iss, name, ';');
getline(iss, dummy, ':');
getline(iss, spouse, ';')
这是另一种解决方案。它结构紧凑且效率合理:
std::vector<std::string> split(const std::string &text, char sep) {
std::vector<std::string> tokens;
std::size_t start = 0, end = 0;
while ((end = text.find(sep, start)) != std::string::npos) {
tokens.push_back(text.substr(start, end - start));
start = end + 1;
}
tokens.push_back(text.substr(start));
return tokens;
}
它可以很容易地模板化以处理字符串分隔符、宽字符串等。
请注意,拆分会产生一个空字符串,而拆分(即 sep)会产生两个空字符串。""
","
它还可以轻松扩展以跳过空令牌:
std::vector<std::string> split(const std::string &text, char sep) {
std::vector<std::string> tokens;
std::size_t start = 0, end = 0;
while ((end = text.find(sep, start)) != std::string::npos) {
if (end != start) {
tokens.push_back(text.substr(start, end - start));
}
start = end + 1;
}
if (end != start) {
tokens.push_back(text.substr(start));
}
return tokens;
}
如果需要在跳过空标记的同时在多个分隔符处拆分字符串,则可以使用此版本:
std::vector<std::string> split(const std::string& text, const std::string& delims)
{
std::vector<std::string> tokens;
std::size_t start = text.find_first_not_of(delims), end = 0;
while((end = text.find_first_of(delims, start)) != std::string::npos)
{
tokens.push_back(text.substr(start, end - start));
start = text.find_first_not_of(delims, end);
}
if(start != std::string::npos)
tokens.push_back(text.substr(start));
return tokens;
}
评论
最近,我不得不将一个骆驼大小写的单词拆分为子单词。没有分隔符,只有大写字符。
#include <string>
#include <list>
#include <locale> // std::isupper
template<class String>
const std::list<String> split_camel_case_string(const String &s)
{
std::list<String> R;
String w;
for (String::const_iterator i = s.begin(); i < s.end(); ++i) { {
if (std::isupper(*i)) {
if (w.length()) {
R.push_back(w);
w.clear();
}
}
w += *i;
}
if (w.length())
R.push_back(w);
return R;
}
例如,这会将“AQueryTrades”拆分为“A”、“Query”和“Trades”。该函数适用于窄字符串和宽字符串。因为它尊重当前的语言环境,所以将“RaumfahrtÜberwachungsVerordnung”拆分为“Raumfahrt”、“Überwachungs”和“Verordnung”。
注意应该真正作为函数模板参数传递。然后,这个函数的更广义的可以在分隔符处分裂,例如 或 too。std::upper
","
";"
" "
评论
std::isupper
std::upper
typename
String::const_iterator
下面的代码用于将字符串拆分为标记,并将标记存储在向量中。strtok()
#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
using namespace std;
char one_line_string[] = "hello hi how are you nice weather we are having ok then bye";
char seps[] = " ,\t\n";
char *token;
int main()
{
vector<string> vec_String_Lines;
token = strtok( one_line_string, seps );
cout << "Extracting and storing data in a vector..\n\n\n";
while( token != NULL )
{
vec_String_Lines.push_back(token);
token = strtok( NULL, seps );
}
cout << "Displaying end result in vector line storage..\n\n";
for ( int i = 0; i < vec_String_Lines.size(); ++i)
cout << vec_String_Lines[i] << "\n";
cout << "\n\n\n";
return 0;
}
用作基类的快速版本,提供对其所有运算符的完全访问权限:vector
// Split string into parts.
class Split : public std::vector<std::string>
{
public:
Split(const std::string& str, char* delimList)
{
size_t lastPos = 0;
size_t pos = str.find_first_of(delimList);
while (pos != std::string::npos)
{
if (pos != lastPos)
push_back(str.substr(lastPos, pos-lastPos));
lastPos = pos + 1;
pos = str.find_first_of(delimList, lastPos);
}
if (lastPos < str.length())
push_back(str.substr(lastPos, pos-lastPos));
}
};
用于填充 STL 集的示例:
std::set<std::string> words;
Split split("Hello,World", ",");
words.insert(split.begin(), split.end());
评论
我使用以下内容
void split(string in, vector<string>& parts, char separator) {
string::iterator ts, curr;
ts = curr = in.begin();
for(; curr <= in.end(); curr++ ) {
if( (curr == in.end() || *curr == separator) && curr > ts )
parts.push_back( string( ts, curr ));
if( curr == in.end() )
break;
if( *curr == separator ) ts = curr + 1;
}
}
PlasmaHH,我忘了包含额外的检查( curr > ts) 用于删除带有空格的令牌。
评论
下面是一个拆分函数:
- 是通用的
- 使用标准 C++(无提升)
- 接受多个分隔符
忽略空令牌(可以轻松更改)
template<typename T> vector<T> split(const T & str, const T & delimiters) { vector<T> v; typename T::size_type start = 0; auto pos = str.find_first_of(delimiters, start); while(pos != T::npos) { if(pos != start) // ignore empty tokens v.emplace_back(str, start, pos - start); start = pos + 1; pos = str.find_first_of(delimiters, start); } if(start < str.length()) // ignore trailing delimiter v.emplace_back(str, start, str.length() - start); // add what's left of the string return v; }
用法示例:
vector<string> v = split<string>("Hello, there; World", ";,");
vector<wstring> v = split<wstring>(L"Hello, there; World", L";,");
评论
我的代码是:
#include <list>
#include <string>
template<class StringType = std::string, class ContainerType = std::list<StringType> >
class DSplitString:public ContainerType
{
public:
explicit DSplitString(const StringType& strString, char cChar, bool bSkipEmptyParts = true)
{
size_t iPos = 0;
size_t iPos_char = 0;
while(StringType::npos != (iPos_char = strString.find(cChar, iPos)))
{
StringType strTemp = strString.substr(iPos, iPos_char - iPos);
if((bSkipEmptyParts && !strTemp.empty()) || (!bSkipEmptyParts))
push_back(strTemp);
iPos = iPos_char + 1;
}
}
explicit DSplitString(const StringType& strString, const StringType& strSub, bool bSkipEmptyParts = true)
{
size_t iPos = 0;
size_t iPos_char = 0;
while(StringType::npos != (iPos_char = strString.find(strSub, iPos)))
{
StringType strTemp = strString.substr(iPos, iPos_char - iPos);
if((bSkipEmptyParts && !strTemp.empty()) || (!bSkipEmptyParts))
push_back(strTemp);
iPos = iPos_char + strSub.length();
}
}
};
例:
#include <iostream>
#include <string>
int _tmain(int argc, _TCHAR* argv[])
{
DSplitString<> aa("doicanhden1;doicanhden2;doicanhden3;", ';');
for each (std::string var in aa)
{
std::cout << var << std::endl;
}
std::cin.get();
return 0;
}
评论
我使用以下代码:
namespace Core
{
typedef std::wstring String;
void SplitString(const Core::String& input, const Core::String& splitter, std::list<Core::String>& output)
{
if (splitter.empty())
{
throw std::invalid_argument(); // for example
}
std::list<Core::String> lines;
Core::String::size_type offset = 0;
for (;;)
{
Core::String::size_type splitterPos = input.find(splitter, offset);
if (splitterPos != Core::String::npos)
{
lines.push_back(input.substr(offset, splitterPos - offset));
offset = splitterPos + splitter.size();
}
else
{
lines.push_back(input.substr(offset));
break;
}
}
lines.swap(output);
}
}
// gtest:
class SplitStringTest: public testing::Test
{
};
TEST_F(SplitStringTest, EmptyStringAndSplitter)
{
std::list<Core::String> result;
ASSERT_ANY_THROW(Core::SplitString(Core::String(), Core::String(), result));
}
TEST_F(SplitStringTest, NonEmptyStringAndEmptySplitter)
{
std::list<Core::String> result;
ASSERT_ANY_THROW(Core::SplitString(L"xy", Core::String(), result));
}
TEST_F(SplitStringTest, EmptyStringAndNonEmptySplitter)
{
std::list<Core::String> result;
Core::SplitString(Core::String(), Core::String(L","), result);
ASSERT_EQ(1, result.size());
ASSERT_EQ(Core::String(), *result.begin());
}
TEST_F(SplitStringTest, OneCharSplitter)
{
std::list<Core::String> result;
Core::SplitString(L"x,y", L",", result);
ASSERT_EQ(2, result.size());
ASSERT_EQ(L"x", *result.begin());
ASSERT_EQ(L"y", *result.rbegin());
Core::SplitString(L",xy", L",", result);
ASSERT_EQ(2, result.size());
ASSERT_EQ(Core::String(), *result.begin());
ASSERT_EQ(L"xy", *result.rbegin());
Core::SplitString(L"xy,", L",", result);
ASSERT_EQ(2, result.size());
ASSERT_EQ(L"xy", *result.begin());
ASSERT_EQ(Core::String(), *result.rbegin());
}
TEST_F(SplitStringTest, TwoCharsSplitter)
{
std::list<Core::String> result;
Core::SplitString(L"x,.y,z", L",.", result);
ASSERT_EQ(2, result.size());
ASSERT_EQ(L"x", *result.begin());
ASSERT_EQ(L"y,z", *result.rbegin());
Core::SplitString(L"x,,y,z", L",,", result);
ASSERT_EQ(2, result.size());
ASSERT_EQ(L"x", *result.begin());
ASSERT_EQ(L"y,z", *result.rbegin());
}
TEST_F(SplitStringTest, RecursiveSplitter)
{
std::list<Core::String> result;
Core::SplitString(L",,,", L",,", result);
ASSERT_EQ(2, result.size());
ASSERT_EQ(Core::String(), *result.begin());
ASSERT_EQ(L",", *result.rbegin());
Core::SplitString(L",.,.,", L",.,", result);
ASSERT_EQ(2, result.size());
ASSERT_EQ(Core::String(), *result.begin());
ASSERT_EQ(L".,", *result.rbegin());
Core::SplitString(L"x,.,.,y", L",.,", result);
ASSERT_EQ(2, result.size());
ASSERT_EQ(L"x", *result.begin());
ASSERT_EQ(L".,y", *result.rbegin());
Core::SplitString(L",.,,.,", L",.,", result);
ASSERT_EQ(3, result.size());
ASSERT_EQ(Core::String(), *result.begin());
ASSERT_EQ(Core::String(), *(++result.begin()));
ASSERT_EQ(Core::String(), *result.rbegin());
}
TEST_F(SplitStringTest, NullTerminators)
{
std::list<Core::String> result;
Core::SplitString(L"xy", Core::String(L"\0", 1), result);
ASSERT_EQ(1, result.size());
ASSERT_EQ(L"xy", *result.begin());
Core::SplitString(Core::String(L"x\0y", 3), Core::String(L"\0", 1), result);
ASSERT_EQ(2, result.size());
ASSERT_EQ(L"x", *result.begin());
ASSERT_EQ(L"y", *result.rbegin());
}
我这样做是因为我需要一种简单的方法来拆分字符串和基于 C 的字符串。希望其他人也能发现它有用。此外,它不依赖于令牌,您可以使用字段作为分隔符,这是我需要的另一个键。
我相信可以进行改进以进一步提高其优雅性,请一定这样做。
StringSplitter.hpp:
#include <vector>
#include <iostream>
#include <string.h>
using namespace std;
class StringSplit
{
private:
void copy_fragment(char*, char*, char*);
void copy_fragment(char*, char*, char);
bool match_fragment(char*, char*, int);
int untilnextdelim(char*, char);
int untilnextdelim(char*, char*);
void assimilate(char*, char);
void assimilate(char*, char*);
bool string_contains(char*, char*);
long calc_string_size(char*);
void copy_string(char*, char*);
public:
vector<char*> split_cstr(char);
vector<char*> split_cstr(char*);
vector<string> split_string(char);
vector<string> split_string(char*);
char* String;
bool do_string;
bool keep_empty;
vector<char*> Container;
vector<string> ContainerS;
StringSplit(char * in)
{
String = in;
}
StringSplit(string in)
{
size_t len = calc_string_size((char*)in.c_str());
String = new char[len + 1];
memset(String, 0, len + 1);
copy_string(String, (char*)in.c_str());
do_string = true;
}
~StringSplit()
{
for (int i = 0; i < Container.size(); i++)
{
if (Container[i] != NULL)
{
delete[] Container[i];
}
}
if (do_string)
{
delete[] String;
}
}
};
字符串拆分器.cpp:
#include <string.h>
#include <iostream>
#include <vector>
#include "StringSplit.hpp"
using namespace std;
void StringSplit::assimilate(char*src, char delim)
{
int until = untilnextdelim(src, delim);
if (until > 0)
{
char * temp = new char[until + 1];
memset(temp, 0, until + 1);
copy_fragment(temp, src, delim);
if (keep_empty || *temp != 0)
{
if (!do_string)
{
Container.push_back(temp);
}
else
{
string x = temp;
ContainerS.push_back(x);
}
}
else
{
delete[] temp;
}
}
}
void StringSplit::assimilate(char*src, char* delim)
{
int until = untilnextdelim(src, delim);
if (until > 0)
{
char * temp = new char[until + 1];
memset(temp, 0, until + 1);
copy_fragment(temp, src, delim);
if (keep_empty || *temp != 0)
{
if (!do_string)
{
Container.push_back(temp);
}
else
{
string x = temp;
ContainerS.push_back(x);
}
}
else
{
delete[] temp;
}
}
}
long StringSplit::calc_string_size(char* _in)
{
long i = 0;
while (*_in++)
{
i++;
}
return i;
}
bool StringSplit::string_contains(char* haystack, char* needle)
{
size_t len = calc_string_size(needle);
size_t lenh = calc_string_size(haystack);
while (lenh--)
{
if (match_fragment(haystack + lenh, needle, len))
{
return true;
}
}
return false;
}
bool StringSplit::match_fragment(char* _src, char* cmp, int len)
{
while (len--)
{
if (*(_src + len) != *(cmp + len))
{
return false;
}
}
return true;
}
int StringSplit::untilnextdelim(char* _in, char delim)
{
size_t len = calc_string_size(_in);
if (*_in == delim)
{
_in += 1;
return len - 1;
}
int c = 0;
while (*(_in + c) != delim && c < len)
{
c++;
}
return c;
}
int StringSplit::untilnextdelim(char* _in, char* delim)
{
int s = calc_string_size(delim);
int c = 1 + s;
if (!string_contains(_in, delim))
{
return calc_string_size(_in);
}
else if (match_fragment(_in, delim, s))
{
_in += s;
return calc_string_size(_in);
}
while (!match_fragment(_in + c, delim, s))
{
c++;
}
return c;
}
void StringSplit::copy_fragment(char* dest, char* src, char delim)
{
if (*src == delim)
{
src++;
}
int c = 0;
while (*(src + c) != delim && *(src + c))
{
*(dest + c) = *(src + c);
c++;
}
*(dest + c) = 0;
}
void StringSplit::copy_string(char* dest, char* src)
{
int i = 0;
while (*(src + i))
{
*(dest + i) = *(src + i);
i++;
}
}
void StringSplit::copy_fragment(char* dest, char* src, char* delim)
{
size_t len = calc_string_size(delim);
size_t lens = calc_string_size(src);
if (match_fragment(src, delim, len))
{
src += len;
lens -= len;
}
int c = 0;
while (!match_fragment(src + c, delim, len) && (c < lens))
{
*(dest + c) = *(src + c);
c++;
}
*(dest + c) = 0;
}
vector<char*> StringSplit::split_cstr(char Delimiter)
{
int i = 0;
while (*String)
{
if (*String != Delimiter && i == 0)
{
assimilate(String, Delimiter);
}
if (*String == Delimiter)
{
assimilate(String, Delimiter);
}
i++;
String++;
}
String -= i;
delete[] String;
return Container;
}
vector<string> StringSplit::split_string(char Delimiter)
{
do_string = true;
int i = 0;
while (*String)
{
if (*String != Delimiter && i == 0)
{
assimilate(String, Delimiter);
}
if (*String == Delimiter)
{
assimilate(String, Delimiter);
}
i++;
String++;
}
String -= i;
delete[] String;
return ContainerS;
}
vector<char*> StringSplit::split_cstr(char* Delimiter)
{
int i = 0;
size_t LenDelim = calc_string_size(Delimiter);
while(*String)
{
if (!match_fragment(String, Delimiter, LenDelim) && i == 0)
{
assimilate(String, Delimiter);
}
if (match_fragment(String, Delimiter, LenDelim))
{
assimilate(String,Delimiter);
}
i++;
String++;
}
String -= i;
delete[] String;
return Container;
}
vector<string> StringSplit::split_string(char* Delimiter)
{
do_string = true;
int i = 0;
size_t LenDelim = calc_string_size(Delimiter);
while (*String)
{
if (!match_fragment(String, Delimiter, LenDelim) && i == 0)
{
assimilate(String, Delimiter);
}
if (match_fragment(String, Delimiter, LenDelim))
{
assimilate(String, Delimiter);
}
i++;
String++;
}
String -= i;
delete[] String;
return ContainerS;
}
例子:
int main(int argc, char*argv[])
{
StringSplit ss = "This:CUT:is:CUT:an:CUT:example:CUT:cstring";
vector<char*> Split = ss.split_cstr(":CUT:");
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
将输出:
这是一个
cstring
示例
int main(int argc, char*argv[])
{
StringSplit ss = "This:is:an:example:cstring";
vector<char*> Split = ss.split_cstr(':');
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
int main(int argc, char*argv[])
{
string mystring = "This[SPLIT]is[SPLIT]an[SPLIT]example[SPLIT]string";
StringSplit ss = mystring;
vector<string> Split = ss.split_string("[SPLIT]");
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
int main(int argc, char*argv[])
{
string mystring = "This|is|an|example|string";
StringSplit ss = mystring;
vector<string> Split = ss.split_string('|');
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
要保留空条目(默认情况下,将排除空条目):
StringSplit ss = mystring;
ss.keep_empty = true;
vector<string> Split = ss.split_string(":DELIM:");
目标是使其类似于 C# 的 Split() 方法,其中拆分字符串就像以下方法一样简单:
String[] Split =
"Hey:cut:what's:cut:your:cut:name?".Split(new[]{":cut:"}, StringSplitOptions.None);
foreach(String X in Split)
{
Console.Write(X);
}
我希望其他人能像我一样发现这有用。
我有一个 2 行解决方案来解决这个问题:
char sep = ' ';
std::string s="1 This is an example";
for(size_t p=0, q=0; p!=s.npos; p=q)
std::cout << s.substr(p+(p!=0), (q=s.find(sep, p+1))-p-(p!=0)) << std::endl;
然后,您可以将其放在矢量中,而不是打印。
评论
这是我编写的一个函数,可以帮助我做很多事情。它在为.WebSockets
using namespace std;
#include <iostream>
#include <vector>
#include <sstream>
#include <string>
vector<string> split ( string input , string split_id ) {
vector<string> result;
int i = 0;
bool add;
string temp;
stringstream ss;
size_t found;
string real;
int r = 0;
while ( i != input.length() ) {
add = false;
ss << input.at(i);
temp = ss.str();
found = temp.find(split_id);
if ( found != string::npos ) {
add = true;
real.append ( temp , 0 , found );
} else if ( r > 0 && ( i+1 ) == input.length() ) {
add = true;
real.append ( temp , 0 , found );
}
if ( add ) {
result.push_back(real);
ss.str(string());
ss.clear();
temp.clear();
real.clear();
r = 0;
}
i++;
r++;
}
return result;
}
int main() {
string s = "S,o,m,e,w,h,e,r,e, down the road \n In a really big C++ house. \n Lives a little old lady. \n That no one ever knew. \n She comes outside. \n In the very hot sun. \n\n\n\n\n\n\n\n And throws C++ at us. \n The End. FIN.";
vector < string > Token;
Token = split ( s , "," );
for ( int i = 0 ; i < Token.size(); i++) cout << Token.at(i) << endl;
cout << endl << Token.size();
int a;
cin >> a;
return a;
}
我写了以下一段代码。您可以指定分隔符,该分隔符可以是字符串。 结果类似于 Java 的 String.split,结果中包含空字符串。
例如,如果我们调用 split(“ABCPICKABCANYABCTWO:ABC”, “ABC”),结果如下:
0 <len:0>
1 PICK <len:4>
2 ANY <len:3>
3 TWO: <len:4>
4 <len:0>
法典:
vector <string> split(const string& str, const string& delimiter = " ") {
vector <string> tokens;
string::size_type lastPos = 0;
string::size_type pos = str.find(delimiter, lastPos);
while (string::npos != pos) {
// Found a token, add it to the vector.
cout << str.substr(lastPos, pos - lastPos) << endl;
tokens.push_back(str.substr(lastPos, pos - lastPos));
lastPos = pos + delimiter.size();
pos = str.find(delimiter, lastPos);
}
tokens.push_back(str.substr(lastPos, str.size() - lastPos));
return tokens;
}
下面是一个仅使用标准正则表达式库的正则表达式解决方案。(我有点生疏,所以可能会有一些语法错误,但这至少是一般的想法)
#include <regex.h>
#include <string.h>
#include <vector.h>
using namespace std;
vector<string> split(string s){
regex r ("\\w+"); //regex matches whole words, (greedy, so no fragment words)
regex_iterator<string::iterator> rit ( s.begin(), s.end(), r );
regex_iterator<string::iterator> rend; //iterators to iterate thru words
vector<string> result<regex_iterator>(rit, rend);
return result; //iterates through the matches to fill the vector
}
评论
void splitString(string str, char delim, string array[], const int arraySize)
{
int delimPosition, subStrSize, subStrStart = 0;
for (int index = 0; delimPosition != -1; index++)
{
delimPosition = str.find(delim, subStrStart);
subStrSize = delimPosition - subStrStart;
array[index] = str.substr(subStrStart, subStrSize);
subStrStart =+ (delimPosition + 1);
}
}
评论
获取 Boost !: -)
#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string.hpp>
#include <iostream>
#include <vector>
using namespace std;
using namespace boost;
int main(int argc, char**argv) {
typedef vector < string > list_type;
list_type list;
string line;
line = "Somewhere down the road";
split(list, line, is_any_of(" "));
for(int i = 0; i < list.size(); i++)
{
cout << list[i] << endl;
}
return 0;
}
此示例给出的输出 -
Somewhere
down
the
road
我相信还没有人发布这个解决方案。它不是直接使用分隔符,而是基本上与 boost::split() 相同,即它允许您传递一个谓词,如果 char 是分隔符,则返回 true,否则返回 false。我认为这给了程序员更多的控制权,最棒的是你不需要提升。
template <class Container, class String, class Predicate>
void split(Container& output, const String& input,
const Predicate& pred, bool trimEmpty = false) {
auto it = begin(input);
auto itLast = it;
while (it = find_if(it, end(input), pred), it != end(input)) {
if (not (trimEmpty and it == itLast)) {
output.emplace_back(itLast, it);
}
++it;
itLast = it;
}
}
然后你可以像这样使用它:
struct Delim {
bool operator()(char c) {
return not isalpha(c);
}
};
int main() {
string s("#include<iostream>\n"
"int main() { std::cout << \"Hello world!\" << std::endl; }");
vector<string> v;
split(v, s, Delim(), true);
/* Which is also the same as */
split(v, s, [](char c) { return not isalpha(c); }, true);
for (const auto& i : v) {
cout << i << endl;
}
}
我刚刚写了一个很好的例子,说明如何按符号拆分字符,然后将每个字符数组(由符号分隔的单词)放入一个向量中。为简单起见,我制作了 std 字符串的向量类型。
我希望这对您有所帮助并且可读。
#include <vector>
#include <string>
#include <iostream>
void push(std::vector<std::string> &WORDS, std::string &TMP){
WORDS.push_back(TMP);
TMP = "";
}
std::vector<std::string> mySplit(char STRING[]){
std::vector<std::string> words;
std::string s;
for(unsigned short i = 0; i < strlen(STRING); i++){
if(STRING[i] != ' '){
s += STRING[i];
}else{
push(words, s);
}
}
push(words, s);//Used to get last split
return words;
}
int main(){
char string[] = "My awesome string.";
std::cout << mySplit(string)[2];
std::cin.get();
return 0;
}
我的实现可以是另一种解决方案:
std::vector<std::wstring> SplitString(const std::wstring & String, const std::wstring & Seperator)
{
std::vector<std::wstring> Lines;
size_t stSearchPos = 0;
size_t stFoundPos;
while (stSearchPos < String.size() - 1)
{
stFoundPos = String.find(Seperator, stSearchPos);
stFoundPos = (stFoundPos == std::string::npos) ? String.size() : stFoundPos;
Lines.push_back(String.substr(stSearchPos, stFoundPos - stSearchPos));
stSearchPos = stFoundPos + Seperator.size();
}
return Lines;
}
测试代码:
std::wstring MyString(L"Part 1SEPsecond partSEPlast partSEPend");
std::vector<std::wstring> Parts = IniFile::SplitString(MyString, L"SEP");
std::wcout << L"The string: " << MyString << std::endl;
for (std::vector<std::wstring>::const_iterator it=Parts.begin(); it<Parts.end(); ++it)
{
std::wcout << *it << L"<---" << std::endl;
}
std::wcout << std::endl;
MyString = L"this,time,a,comma separated,string";
std::wcout << L"The string: " << MyString << std::endl;
Parts = IniFile::SplitString(MyString, L",");
for (std::vector<std::wstring>::const_iterator it=Parts.begin(); it<Parts.end(); ++it)
{
std::wcout << *it << L"<---" << std::endl;
}
测试代码的输出:
The string: Part 1SEPsecond partSEPlast partSEPend
Part 1<---
second part<---
last part<---
end<---
The string: this,time,a,comma separated,string
this<---
time<---
a<---
comma separated<---
string<---
#include<iostream>
#include<string>
#include<sstream>
#include<vector>
using namespace std;
vector<string> split(const string &s, char delim) {
vector<string> elems;
stringstream ss(s);
string item;
while (getline(ss, item, delim)) {
elems.push_back(item);
}
return elems;
}
int main() {
vector<string> x = split("thi is an sample test",' ');
unsigned int i;
for(i=0;i<x.size();i++)
cout<<i<<":"<<x[i]<<endl;
return 0;
}
LazyStringSplitter:
#include <string>
#include <algorithm>
#include <unordered_set>
using namespace std;
class LazyStringSplitter
{
string::const_iterator start, finish;
unordered_set<char> chop;
public:
// Empty Constructor
explicit LazyStringSplitter()
{}
explicit LazyStringSplitter (const string cstr, const string delims)
: start(cstr.begin())
, finish(cstr.end())
, chop(delims.begin(), delims.end())
{}
void operator () (const string cstr, const string delims)
{
chop.insert(delims.begin(), delims.end());
start = cstr.begin();
finish = cstr.end();
}
bool empty() const { return (start >= finish); }
string next()
{
// return empty string
// if ran out of characters
if (empty())
return string("");
auto runner = find_if(start, finish, [&](char c) {
return chop.count(c) == 1;
});
// construct next string
string ret(start, runner);
start = runner + 1;
// Never return empty string
// + tail recursion makes this method efficient
return !ret.empty() ? ret : next();
}
};
- 我之所以称这种方法为“,是因为一个原因 - 它不会一次性拆分字符串。
LazyStringSplitter
- 从本质上讲,它的行为类似于 python 生成器
- 它公开了一个方法,该方法返回从原始字符串拆分的下一个字符串
next
- 我使用了 c++11 STL 中的unordered_set,因此分隔符的查找速度要快得多
- 这是它的工作原理
测试程序
#include <iostream>
using namespace std;
int main()
{
LazyStringSplitter splitter;
// split at the characters ' ', '!', '.', ','
splitter("This, is a string. And here is another string! Let's test and see how well this does.", " !.,");
while (!splitter.empty())
cout << splitter.next() << endl;
return 0;
}
输出
This
is
a
string
And
here
is
another
string
Let's
test
and
see
how
well
this
does
改善这一点的下一个计划是实现和方法,以便可以做类似的事情:begin
end
vector<string> split_string(splitter.begin(), splitter.end());
评论
我使用 strtok 滚动了自己的字符串,并使用 boost 来拆分字符串。我发现的最好的方法是 C++ 字符串工具包库。它非常灵活和快速。
#include <iostream>
#include <vector>
#include <string>
#include <strtk.hpp>
const char *whitespace = " \t\r\n\f";
const char *whitespace_and_punctuation = " \t\r\n\f;,=";
int main()
{
{ // normal parsing of a string into a vector of strings
std::string s("Somewhere down the road");
std::vector<std::string> result;
if( strtk::parse( s, whitespace, result ) )
{
for(size_t i = 0; i < result.size(); ++i )
std::cout << result[i] << std::endl;
}
}
{ // parsing a string into a vector of floats with other separators
// besides spaces
std::string s("3.0, 3.14; 4.0");
std::vector<float> values;
if( strtk::parse( s, whitespace_and_punctuation, values ) )
{
for(size_t i = 0; i < values.size(); ++i )
std::cout << values[i] << std::endl;
}
}
{ // parsing a string into specific variables
std::string s("angle = 45; radius = 9.9");
std::string w1, w2;
float v1, v2;
if( strtk::parse( s, whitespace_and_punctuation, w1, v1, w2, v2) )
{
std::cout << "word " << w1 << ", value " << v1 << std::endl;
std::cout << "word " << w2 << ", value " << v2 << std::endl;
}
}
return 0;
}
该工具包比这个简单示例具有更大的灵活性,但它在将字符串解析为有用元素方面的实用性令人难以置信。
我一直在寻找一种将字符串按任意长度的分隔符拆分的方法,所以我开始从头开始编写它,因为现有的解决方案不适合我。
这是我的小算法,仅使用 STL:
//use like this
//std::vector<std::wstring> vec = Split<std::wstring> (L"Hello##world##!", L"##");
template <typename valueType>
static std::vector <valueType> Split (valueType text, const valueType& delimiter)
{
std::vector <valueType> tokens;
size_t pos = 0;
valueType token;
while ((pos = text.find(delimiter)) != valueType::npos)
{
token = text.substr(0, pos);
tokens.push_back (token);
text.erase(0, pos + delimiter.length());
}
tokens.push_back (text);
return tokens;
}
据我测试过,它可以与任何长度和形式的分离器一起使用。 使用 string 或 wstring 类型进行实例化。
该算法所做的只是搜索分隔符,获取字符串中与分隔符相关的部分,删除分隔符并再次搜索,直到找不到它。
当然,您可以为分隔符使用任意数量的空格。
我希望它有所帮助。
评论
没有Boost,没有字符串流,只有标准的C库配合使用:C库函数便于分析,C++数据类型便于内存管理。std::string
std::list
空格被认为是换行符、制表符和空格的任意组合。空格字符集由变量建立。wschars
#include <string>
#include <list>
#include <iostream>
#include <cstring>
using namespace std;
const char *wschars = "\t\n ";
list<string> split(const string &str)
{
const char *cstr = str.c_str();
list<string> out;
while (*cstr) { // while remaining string not empty
size_t toklen;
cstr += strspn(cstr, wschars); // skip leading whitespace
toklen = strcspn(cstr, wschars); // figure out token length
if (toklen) // if we have a token, add to list
out.push_back(string(cstr, toklen));
cstr += toklen; // skip over token
}
// ran out of string; return list
return out;
}
int main(int argc, char **argv)
{
list<string> li = split(argv[1]);
for (list<string>::iterator i = li.begin(); i != li.end(); i++)
cout << "{" << *i << "}" << endl;
return 0;
}
跑:
$ ./split ""
$ ./split "a"
{a}
$ ./split " a "
{a}
$ ./split " a b"
{a}
{b}
$ ./split " a b c"
{a}
{b}
{c}
$ ./split " a b c d "
{a}
{b}
{c}
{d}
尾递归版本(本身分为两个函数)。所有对变量的破坏性操作都消失了,除了将字符串推送到列表中!split
void split_rec(const char *cstr, list<string> &li)
{
if (*cstr) {
const size_t leadsp = strspn(cstr, wschars);
const size_t toklen = strcspn(cstr + leadsp, wschars);
if (toklen)
li.push_back(string(cstr + leadsp, toklen));
split_rec(cstr + leadsp + toklen, li);
}
}
list<string> split(const string &str)
{
list<string> out;
split_rec(str.c_str(), out);
return out;
}
评论
下面是一个仅使用标准正则表达式库的简单解决方案
#include <regex>
#include <string>
#include <vector>
std::vector<string> Tokenize( const string str, const std::regex regex )
{
using namespace std;
std::vector<string> result;
sregex_token_iterator it( str.begin(), str.end(), regex, -1 );
sregex_token_iterator reg_end;
for ( ; it != reg_end; ++it ) {
if ( !it->str().empty() ) //token could be empty:check
result.emplace_back( it->str() );
}
return result;
}
正则表达式参数允许检查多个参数(空格、逗号等)
我通常只检查空格和逗号的拆分,所以我也有这个默认函数:
std::vector<string> TokenizeDefault( const string str )
{
using namespace std;
regex re( "[\\s,]+" );
return Tokenize( str, re );
}
检查空格 () 和逗号 ()。"[\\s,]+"
\\s
,
注意,如果要拆分而不是 ,wstring
string
- 全部更改为
std::regex
std::wregex
- 全部更改为
sregex_token_iterator
wsregex_token_iterator
请注意,您可能还希望通过引用来获取字符串参数,具体取决于您的编译器。
评论
R"([\s,]+)"
这是我的方法,剪切和拆分:
string cut (string& str, const string& del)
{
string f = str;
if (in.find_first_of(del) != string::npos)
{
f = str.substr(0,str.find_first_of(del));
str = str.substr(str.find_first_of(del)+del.length());
}
return f;
}
vector<string> split (const string& in, const string& del=" ")
{
vector<string> out();
string t = in;
while (t.length() > del.length())
out.push_back(cut(t,del));
return out;
}
顺便说一句,如果我能做些什么来优化它.
// adapted from a "regular" csv parse
std::string stringIn = "my csv is 10233478 NOTseparated by commas";
std::vector<std::string> commaSeparated(1);
int commaCounter = 0;
for (int i=0; i<stringIn.size(); i++) {
if (stringIn[i] == " ") {
commaSeparated.push_back("");
commaCounter++;
} else {
commaSeparated.at(commaCounter) += stringIn[i];
}
}
最后,您将得到一个字符串向量,句子中的每个元素都用空格分隔。只有非标准资源是 std::vector(但由于涉及 std::string,我认为这是可以接受的)。
空字符串将另存为单独的项目。
这是我的版本
#include <vector>
inline std::vector<std::string> Split(const std::string &str, const std::string &delim = " ")
{
std::vector<std::string> tokens;
if (str.size() > 0)
{
if (delim.size() > 0)
{
std::string::size_type currPos = 0, prevPos = 0;
while ((currPos = str.find(delim, prevPos)) != std::string::npos)
{
std::string item = str.substr(prevPos, currPos - prevPos);
if (item.size() > 0)
{
tokens.push_back(item);
}
prevPos = currPos + 1;
}
tokens.push_back(str.substr(prevPos));
}
else
{
tokens.push_back(str);
}
}
return tokens;
}
它适用于多字符分隔符。它可以防止空令牌进入您的结果。它使用单个标头。当您不提供分隔符时,它会将字符串作为一个标记返回。如果字符串为空,它还返回空结果。不幸的是,由于副本很大,除非您使用 C++11 进行编译,否则效率低下,这应该使用移动原理图。在 C++11 中,此代码应该很快。std::vector
这是我使用 C++11 和 STL 的解决方案。它应该是相当有效的:
#include <vector>
#include <string>
#include <cstring>
#include <iostream>
#include <algorithm>
#include <functional>
std::vector<std::string> split(const std::string& s)
{
std::vector<std::string> v;
const auto end = s.end();
auto to = s.begin();
decltype(to) from;
while((from = std::find_if(to, end,
[](char c){ return !std::isspace(c); })) != end)
{
to = std::find_if(from, end, [](char c){ return std::isspace(c); });
v.emplace_back(from, to);
}
return v;
}
int main()
{
std::string s = "this is the string to split";
auto v = split(s);
for(auto&& s: v)
std::cout << s << '\n';
}
输出:
this
is
the
string
to
split
评论
end
s.end()
pos
end
done
find_first_of
find_first_not_of
ptr_fun
std::isspace
find_first_of
std::search
我知道很晚才参加这里的聚会,但我正在考虑最优雅的方法,如果你得到一系列分隔符而不是空格,并且只使用标准库。
以下是我的想法:
要通过一系列分隔符将单词拆分为字符串向量:
template<class Container>
std::vector<std::string> split_by_delimiters(const std::string& input, const Container& delimiters)
{
std::vector<std::string> result;
for (auto current = begin(input) ; current != end(input) ; )
{
auto first = find_if(current, end(input), not_in(delimiters));
if (first == end(input)) break;
auto last = find_if(first, end(input), is_in(delimiters));
result.emplace_back(first, last);
current = last;
}
return result;
}
要以另一种方式拆分,请通过提供有效字符序列:
template<class Container>
std::vector<std::string> split_by_valid_chars(const std::string& input, const Container& valid_chars)
{
std::vector<std::string> result;
for (auto current = begin(input) ; current != end(input) ; )
{
auto first = find_if(current, end(input), is_in(valid_chars));
if (first == end(input)) break;
auto last = find_if(first, end(input), not_in(valid_chars));
result.emplace_back(first, last);
current = last;
}
return result;
}
is_in和not_in定义如下:
namespace detail {
template<class Container>
struct is_in {
is_in(const Container& charset)
: _charset(charset)
{}
bool operator()(char c) const
{
return find(begin(_charset), end(_charset), c) != end(_charset);
}
const Container& _charset;
};
template<class Container>
struct not_in {
not_in(const Container& charset)
: _charset(charset)
{}
bool operator()(char c) const
{
return find(begin(_charset), end(_charset), c) == end(_charset);
}
const Container& _charset;
};
}
template<class Container>
detail::not_in<Container> not_in(const Container& c)
{
return detail::not_in<Container>(c);
}
template<class Container>
detail::is_in<Container> is_in(const Container& c)
{
return detail::is_in<Container>(c);
}
当处理空格作为分隔符时,使用显而易见的答案已经给出并投票了很多。当然,元素可能不是用空格分隔的,而是用一些分隔符分隔的。我没有发现任何答案,只是重新定义了空格的含义,然后使用传统方法。std::istream_iterator<T>
更改流认为空格的方法,您只需使用一个 facet 来更改流的 using (),该分面具有自己对空格含义的定义(也可以为 ,但它实际上略有不同,因为它是由表驱动的,而是由虚函数驱动的)。std::locale
std::istream::imbue()
std::ctype<char>
std::ctype<wchar_t>
std::ctype<char>
std::ctype<wchar_t>
#include <iostream>
#include <algorithm>
#include <iterator>
#include <sstream>
#include <locale>
struct whitespace_mask {
std::ctype_base::mask mask_table[std::ctype<char>::table_size];
whitespace_mask(std::string const& spaces) {
std::ctype_base::mask* table = this->mask_table;
std::ctype_base::mask const* tab
= std::use_facet<std::ctype<char>>(std::locale()).table();
for (std::size_t i(0); i != std::ctype<char>::table_size; ++i) {
table[i] = tab[i] & ~std::ctype_base::space;
}
std::for_each(spaces.begin(), spaces.end(), [=](unsigned char c) {
table[c] |= std::ctype_base::space;
});
}
};
class whitespace_facet
: private whitespace_mask
, public std::ctype<char> {
public:
whitespace_facet(std::string const& spaces)
: whitespace_mask(spaces)
, std::ctype<char>(this->mask_table) {
}
};
struct whitespace {
std::string spaces;
whitespace(std::string const& spaces): spaces(spaces) {}
};
std::istream& operator>>(std::istream& in, whitespace const& ws) {
std::locale loc(in.getloc(), new whitespace_facet(ws.spaces));
in.imbue(loc);
return in;
}
// everything above would probably go into a utility library...
int main() {
std::istringstream in("a, b, c, d, e");
std::copy(std::istream_iterator<std::string>(in >> whitespace(", ")),
std::istream_iterator<std::string>(),
std::ostream_iterator<std::string>(std::cout, "\n"));
std::istringstream pipes("a b c| d |e e");
std::copy(std::istream_iterator<std::string>(pipes >> whitespace("|")),
std::istream_iterator<std::string>(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
大部分代码用于打包一个提供软分隔符的通用工具:合并连续的多个分隔符。无法生成空序列。当流中需要不同的分隔符时,可能会使用不同的设置流,使用共享流缓冲区:
void f(std::istream& in) {
std::istream pipes(in.rdbuf());
pipes >> whitespace("|");
std::istream comma(in.rdbuf());
comma >> whitespace(",");
std::string s0, s1;
if (pipes >> s0 >> std::ws // read up to first pipe and ignore sequence of pipes
&& comma >> s1 >> std::ws) { // read up to first comma and ignore commas
// ...
}
}
谢谢@Jairo Abdiel Toribio Cisneros。它对我有用,但是您的函数返回一些空元素。因此,对于不为空的返回,我编辑了以下内容:
std::vector<std::string> split(std::string str, const char* delim) {
std::vector<std::string> v;
std::string tmp;
for(std::string::const_iterator i = str.begin(); i <= str.end(); ++i) {
if(*i != *delim && i != str.end()) {
tmp += *i;
} else {
if (tmp.length() > 0) {
v.push_back(tmp);
}
tmp = "";
}
}
return v;
}
用:
std::string s = "one:two::three";
std::string delim = ":";
std::vector<std::string> vv = split(s, delim.c_str());
我们可以在 c++ 中使用 strtok ,
#include <iostream>
#include <cstring>
using namespace std;
int main()
{
char str[]="Mickey M;12034;911416313;M;01a;9001;NULL;0;13;12;0;CPP,C;MSC,3D;FEND,BEND,SEC;";
char *pch = strtok (str,";,");
while (pch != NULL)
{
cout<<pch<<"\n";
pch = strtok (NULL, ";,");
}
return 0;
}
作为一个业余爱好者,这是我想到的第一个解决方案。我有点好奇为什么我还没有在这里看到类似的解决方案,我的做法有什么根本性的问题吗?
#include <iostream>
#include <string>
#include <vector>
std::vector<std::string> split(const std::string &s, const std::string &delims)
{
std::vector<std::string> result;
std::string::size_type pos = 0;
while (std::string::npos != (pos = s.find_first_not_of(delims, pos))) {
auto pos2 = s.find_first_of(delims, pos);
result.emplace_back(s.substr(pos, std::string::npos == pos2 ? pos2 : pos2 - pos));
pos = pos2;
}
return result;
}
int main()
{
std::string text{"And then I said: \"I don't get it, why would you even do that!?\""};
std::string delims{" :;\".,?!"};
auto words = split(text, delims);
std::cout << "\nSentence:\n " << text << "\n\nWords:";
for (const auto &w : words) {
std::cout << "\n " << w;
}
return 0;
}
这是我的条目:
template <typename Container, typename InputIter, typename ForwardIter>
Container
split(InputIter first, InputIter last,
ForwardIter s_first, ForwardIter s_last)
{
Container output;
while (true) {
auto pos = std::find_first_of(first, last, s_first, s_last);
output.emplace_back(first, pos);
if (pos == last) {
break;
}
first = ++pos;
}
return output;
}
template <typename Output = std::vector<std::string>,
typename Input = std::string,
typename Delims = std::string>
Output
split(const Input& input, const Delims& delims = " ")
{
using std::cbegin;
using std::cend;
return split<Output>(cbegin(input), cend(input),
cbegin(delims), cend(delims));
}
auto vec = split("Mary had a little lamb");
第一个定义是采用两对迭代器的 STL 样式泛型函数。第二个是便利功能,让您不必自己做所有的事情。例如,如果要使用 ,还可以将输出容器类型指定为模板参数。begin()
end()
list
它之所以优雅 (IMO) 是因为与大多数其他答案不同,它不限于字符串,而是适用于任何与 STL 兼容的容器。在不对上面的代码进行任何更改的情况下,您可以说:
using vec_of_vecs_t = std::vector<std::vector<int>>;
std::vector<int> v{1, 2, 0, 3, 4, 5, 0, 7, 8, 0, 9};
auto r = split<vec_of_vecs_t>(v, std::initializer_list<int>{0, 2});
每次遇到 A 或 A 时,都会将向量拆分为单独的向量。v
0
2
(还有一个额外的好处是,对于字符串,这种实现比基于和基于的版本都快,至少在我的系统上是这样。strtok()
getline()
如果你想通过一些字符拆分字符串,你可以使用
#include<iostream>
#include<string>
#include<vector>
#include<iterator>
#include<sstream>
#include<string>
using namespace std;
void replaceOtherChars(string &input, vector<char> ÷rs)
{
const char divider = dividers.at(0);
int replaceIndex = 0;
vector<char>::iterator it_begin = dividers.begin()+1,
it_end= dividers.end();
for(;it_begin!=it_end;++it_begin)
{
replaceIndex = 0;
while(true)
{
replaceIndex=input.find_first_of(*it_begin,replaceIndex);
if(replaceIndex==-1)
break;
input.at(replaceIndex)=divider;
}
}
}
vector<string> split(string str, vector<char> chars, bool missEmptySpace =true )
{
vector<string> result;
const char divider = chars.at(0);
replaceOtherChars(str,chars);
stringstream stream;
stream<<str;
string temp;
while(getline(stream,temp,divider))
{
if(missEmptySpace && temp.empty())
continue;
result.push_back(temp);
}
return result;
}
int main()
{
string str ="milk, pigs.... hot-dogs ";
vector<char> arr;
arr.push_back(' '); arr.push_back(','); arr.push_back('.');
vector<string> result = split(str,arr);
vector<string>::iterator it_begin= result.begin(),
it_end= result.end();
for(;it_begin!=it_end;++it_begin)
{
cout<<*it_begin<<endl;
}
return 0;
}
这是热门答案之一的延伸。它现在支持设置返回元素的最大数量 N。字符串的最后一位将出现在第 N 个元素中。MAXELEMENTS 参数是可选的,如果设置为默认值 0,它将返回无限数量的元素。:-)
.h:
class Myneatclass {
public:
static std::vector<std::string>& split(const std::string &s, char delim, std::vector<std::string> &elems, const size_t MAXELEMENTS = 0);
static std::vector<std::string> split(const std::string &s, char delim, const size_t MAXELEMENTS = 0);
};
.cpp:
std::vector<std::string>& Myneatclass::split(const std::string &s, char delim, std::vector<std::string> &elems, const size_t MAXELEMENTS) {
std::stringstream ss(s);
std::string item;
while (std::getline(ss, item, delim)) {
elems.push_back(item);
if (MAXELEMENTS > 0 && !ss.eof() && elems.size() + 1 >= MAXELEMENTS) {
std::getline(ss, item);
elems.push_back(item);
break;
}
}
return elems;
}
std::vector<std::string> Myneatclass::split(const std::string &s, char delim, const size_t MAXELEMENTS) {
std::vector<std::string> elems;
split(s, delim, elems, MAXELEMENTS);
return elems;
}
短小精悍
#include <vector>
#include <string>
using namespace std;
vector<string> split(string data, string token)
{
vector<string> output;
size_t pos = string::npos; // size_t to avoid improbable overflow
do
{
pos = data.find(token);
output.push_back(data.substr(0, pos));
if (string::npos != pos)
data = data.substr(pos + token.size());
} while (string::npos != pos);
return output;
}
可以使用任何字符串作为分隔符,也可以与二进制数据一起使用(std::string 支持二进制数据,包括 null)
用:
auto a = split("this!!is!!!example!string", "!!");
输出:
this
is
!example!string
评论
#include <iostream>
#include <vector>
using namespace std;
int main() {
string str = "ABC AABCD CDDD RABC GHTTYU FR";
str += " "; //dirty hack: adding extra space to the end
vector<string> v;
for (int i=0; i<(int)str.size(); i++) {
int a, b;
a = i;
for (int j=i; j<(int)str.size(); j++) {
if (str[j] == ' ') {
b = j;
i = j;
break;
}
}
v.push_back(str.substr(a, b-a));
}
for (int i=0; i<v.size(); i++) {
cout<<v[i].size()<<" "<<v[i]<<endl;
}
return 0;
}
为方便起见:
template<class V, typename T>
bool in(const V &v, const T &el) {
return std::find(v.begin(), v.end(), el) != v.end();
}
基于多个分隔符的实际拆分:
std::vector<std::string> split(const std::string &s,
const std::vector<char> &delims) {
std::vector<std::string> res;
auto stuff = [&delims](char c) { return !in(delims, c); };
auto space = [&delims](char c) { return in(delims, c); };
auto first = std::find_if(s.begin(), s.end(), stuff);
while (first != s.end()) {
auto last = std::find_if(first, s.end(), space);
res.push_back(std::string(first, last));
first = std::find_if(last + 1, s.end(), stuff);
}
return res;
}
用法:
int main() {
std::string s = " aaa, bb cc ";
for (auto el: split(s, {' ', ','}))
std::cout << el << std::endl;
return 0;
}
对于那些需要使用字符串分隔符拆分字符串的人,也许您可以尝试我的以下解决方案。
std::vector<size_t> str_pos(const std::string &search, const std::string &target)
{
std::vector<size_t> founds;
if(!search.empty())
{
size_t start_pos = 0;
while (true)
{
size_t found_pos = target.find(search, start_pos);
if(found_pos != std::string::npos)
{
size_t found = found_pos;
founds.push_back(found);
start_pos = (found_pos + 1);
}
else
{
break;
}
}
}
return founds;
}
std::string str_sub_index(size_t begin_index, size_t end_index, const std::string &target)
{
std::string sub;
size_t size = target.length();
const char* copy = target.c_str();
for(size_t i = begin_index; i <= end_index; i++)
{
if(i >= size)
{
break;
}
else
{
char c = copy[i];
sub += c;
}
}
return sub;
}
std::vector<std::string> str_split(const std::string &delimiter, const std::string &target)
{
std::vector<std::string> splits;
if(!delimiter.empty())
{
std::vector<size_t> founds = str_pos(delimiter, target);
size_t founds_size = founds.size();
if(founds_size > 0)
{
size_t search_len = delimiter.length();
size_t begin_index = 0;
for(int i = 0; i <= founds_size; i++)
{
std::string sub;
if(i != founds_size)
{
size_t pos = founds.at(i);
sub = str_sub_index(begin_index, pos - 1, target);
begin_index = (pos + search_len);
}
else
{
sub = str_sub_index(begin_index, (target.length() - 1), target);
}
splits.push_back(sub);
}
}
}
return splits;
}
这些代码段由 3 个函数组成。坏消息是使用您将需要其他两个功能的功能。是的,这是一大块代码。但好消息是,这两个额外的功能能够独立工作,有时也很有用。:)str_split
在块中测试了函数,如下所示:main()
int main()
{
std::string s = "Hello, world! We need to make the world a better place. Because your world is also my world, and our children's world.";
std::vector<std::string> split = str_split("world", s);
for(int i = 0; i < split.size(); i++)
{
std::cout << split[i] << std::endl;
}
}
它将产生:
Hello,
! We need to make the
a better place. Because your
is also my
, and our children's
.
我相信这不是最有效的代码,但至少它有效。希望它有所帮助。
这是我对这个问题的解决方案:
vector<string> get_tokens(string str) {
vector<string> dt;
stringstream ss;
string tmp;
ss << str;
for (size_t i; !ss.eof(); ++i) {
ss >> tmp;
dt.push_back(tmp);
}
return dt;
}
此函数返回字符串向量。
#include <iostream>
#include <regex>
using namespace std;
int main() {
string s = "foo bar baz";
regex e("\\s+");
regex_token_iterator<string::iterator> i(s.begin(), s.end(), e, -1);
regex_token_iterator<string::iterator> end;
while (i != end)
cout << " [" << *i++ << "]";
}
IMO,这是最接近 python 的 re.split() 的东西。有关regex_token_iterator的详细信息,请参阅 cplusplus.com。-1(regex_token_iterator ctor 中的第 4 个参数)是序列中不匹配的部分,使用匹配作为分隔符。
这是我对此的看法。我必须逐字处理输入字符串,这可以通过使用空格来计算单词来完成,但我觉得这会很乏味,我应该将单词拆分为向量。
#include<iostream>
#include<vector>
#include<string>
#include<stdio.h>
using namespace std;
int main()
{
char x = '\0';
string s = "";
vector<string> q;
x = getchar();
while(x != '\n')
{
if(x == ' ')
{
q.push_back(s);
s = "";
x = getchar();
continue;
}
s = s + x;
x = getchar();
}
q.push_back(s);
for(int i = 0; i<q.size(); i++)
cout<<q[i]<<" ";
return 0;
}
- 不处理多个空间。
- 如果最后一个单词后面没有紧跟换行符,则它包括最后一个单词的最后一个字符和换行符之间的空格。
根据加利克的回答,我做了这个。这主要是在这里,所以我不必一遍又一遍地写它。C++仍然没有原生的拆分函数,这太疯狂了。特征:
- 应该非常快。
- 易于理解(我认为)。
- 合并空白部分。
- 使用多个分隔符(例如
"\r\n"
)
#include <string>
#include <vector>
#include <algorithm>
std::vector<std::string> split(const std::string& s, const std::string& delims)
{
using namespace std;
vector<string> v;
// Start of an element.
size_t elemStart = 0;
// We start searching from the end of the previous element, which
// initially is the start of the string.
size_t elemEnd = 0;
// Find the first non-delim, i.e. the start of an element, after the end of the previous element.
while((elemStart = s.find_first_not_of(delims, elemEnd)) != string::npos)
{
// Find the first delem, i.e. the end of the element (or if this fails it is the end of the string).
elemEnd = s.find_first_of(delims, elemStart);
// Add it.
v.emplace_back(s, elemStart, elemEnd == string::npos ? string::npos : elemEnd - elemStart);
}
// When there are no more non-spaces, we are done.
return v;
}
我的一般实现 for 和 ~,使用签名。string
u32string
boost::algorithm::split
template<typename CharT, typename UnaryPredicate>
void split(std::vector<std::basic_string<CharT>>& split_result,
const std::basic_string<CharT>& s,
UnaryPredicate predicate)
{
using ST = std::basic_string<CharT>;
using std::swap;
std::vector<ST> tmp_result;
auto iter = s.cbegin(),
end_iter = s.cend();
while (true)
{
/**
* edge case: empty str -> push an empty str and exit.
*/
auto find_iter = find_if(iter, end_iter, predicate);
tmp_result.emplace_back(iter, find_iter);
if (find_iter == end_iter) { break; }
iter = ++find_iter;
}
swap(tmp_result, split_result);
}
template<typename CharT>
void split(std::vector<std::basic_string<CharT>>& split_result,
const std::basic_string<CharT>& s,
const std::basic_string<CharT>& char_candidate)
{
std::unordered_set<CharT> candidate_set(char_candidate.cbegin(),
char_candidate.cend());
auto predicate = [&candidate_set](const CharT& c) {
return candidate_set.count(c) > 0U;
};
return split(split_result, s, predicate);
}
template<typename CharT>
void split(std::vector<std::basic_string<CharT>>& split_result,
const std::basic_string<CharT>& s,
const CharT* literals)
{
return split(split_result, s, std::basic_string<CharT>(literals));
}
是的,我浏览了所有 30 个示例。
我找不到适用于多字符分隔符的版本,所以这是我的:split
#include <string>
#include <vector>
using namespace std;
vector<string> split(const string &str, const string &delim)
{
const auto delim_pos = str.find(delim);
if (delim_pos == string::npos)
return {str};
vector<string> ret{str.substr(0, delim_pos)};
auto tail = split(str.substr(delim_pos + delim.size(), string::npos), delim);
ret.insert(ret.end(), tail.begin(), tail.end());
return ret;
}
可能不是最有效的实现,但它是一个非常直接的递归解决方案,仅使用 和 .<string>
<vector>
啊,它是用 C++11 编写的,但这段代码没有什么特别之处,所以你可以很容易地将其改编为 C++98。
并不是说我们需要更多的答案,但这是我在受到埃文·特兰(Evan Teran)的启发后想出的。
std::vector <std::string> split(const string &input, auto delimiter, bool skipEmpty=true) {
/*
Splits a string at each delimiter and returns these strings as a string vector.
If the delimiter is not found then nothing is returned.
If skipEmpty is true then strings between delimiters that are 0 in length will be skipped.
*/
bool delimiterFound = false;
int pos=0, pPos=0;
std::vector <std::string> result;
while (true) {
pos = input.find(delimiter,pPos);
if (pos != std::string::npos) {
if (skipEmpty==false or pos-pPos > 0) // if empty values are to be kept or not
result.push_back(input.substr(pPos,pos-pPos));
delimiterFound = true;
} else {
if (pPos < input.length() and delimiterFound) {
if (skipEmpty==false or input.length()-pPos > 0) // if empty values are to be kept or not
result.push_back(input.substr(pPos,input.length()-pPos));
}
break;
}
pPos = pos+1;
}
return result;
}
#include <iostream>
#include <string>
#include <deque>
std::deque<std::string> split(
const std::string& line,
std::string::value_type delimiter,
bool skipEmpty = false
) {
std::deque<std::string> parts{};
if (!skipEmpty && !line.empty() && delimiter == line.at(0)) {
parts.push_back({});
}
for (const std::string::value_type& c : line) {
if (
(
c == delimiter
&&
(skipEmpty ? (!parts.empty() && !parts.back().empty()) : true)
)
||
(c != delimiter && parts.empty())
) {
parts.push_back({});
}
if (c != delimiter) {
parts.back().push_back(c);
}
}
if (skipEmpty && !parts.empty() && parts.back().empty()) {
parts.pop_back();
}
return parts;
}
void test(const std::string& line) {
std::cout << line << std::endl;
std::cout << "skipEmpty=0 |";
for (const std::string& part : split(line, ':')) {
std::cout << part << '|';
}
std::cout << std::endl;
std::cout << "skipEmpty=1 |";
for (const std::string& part : split(line, ':', true)) {
std::cout << part << '|';
}
std::cout << std::endl;
std::cout << std::endl;
}
int main() {
test("foo:bar:::baz");
test("");
test("foo");
test(":");
test("::");
test(":foo");
test("::foo");
test(":foo:");
test(":foo::");
return 0;
}
输出:
foo:bar:::baz
skipEmpty=0 |foo|bar|||baz|
skipEmpty=1 |foo|bar|baz|
skipEmpty=0 |
skipEmpty=1 |
foo
skipEmpty=0 |foo|
skipEmpty=1 |foo|
:
skipEmpty=0 |||
skipEmpty=1 |
::
skipEmpty=0 ||||
skipEmpty=1 |
:foo
skipEmpty=0 ||foo|
skipEmpty=1 |foo|
::foo
skipEmpty=0 |||foo|
skipEmpty=1 |foo|
:foo:
skipEmpty=0 ||foo||
skipEmpty=1 |foo|
:foo::
skipEmpty=0 ||foo|||
skipEmpty=1 |foo|
评论
这个答案接受字符串并将其放入字符串向量中。它使用 boost 库。
#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("\t "));
使用和 Eric Niebler 的库:std::string_view
range-v3
https://wandbox.org/permlink/kW5lwRCL1pxjp2pW
#include <iostream>
#include <string>
#include <string_view>
#include "range/v3/view.hpp"
#include "range/v3/algorithm.hpp"
int main() {
std::string s = "Somewhere down the range v3 library";
ranges::for_each(s
| ranges::view::split(' ')
| ranges::view::transform([](auto &&sub) {
return std::string_view(&*sub.begin(), ranges::distance(sub));
}),
[](auto s) {std::cout << "Substring: " << s << "\n";}
);
}
通过使用范围循环而不是算法:for
ranges::for_each
#include <iostream>
#include <string>
#include <string_view>
#include "range/v3/view.hpp"
int main()
{
std::string str = "Somewhere down the range v3 library";
for (auto s : str | ranges::view::split(' ')
| ranges::view::transform([](auto&& sub) { return std::string_view(&*sub.begin(), ranges::distance(sub)); }
))
{
std::cout << "Substring: " << s << "\n";
}
}
评论
我的方法与其他解决方案截然不同,它以其他解决方案所缺乏的方式提供了很多价值,但当然也有其自身的缺点。这是工作实现,以单词为例。<tag></tag>
首先,这个问题可以通过一个循环来解决,无需额外的内存,并且只考虑四种逻辑情况。从概念上讲,我们对边界感兴趣。我们的代码应该反映这一点:让我们遍历字符串并一次查看两个字符,请记住,字符串的开头和结尾都有特殊情况。
缺点是我们必须编写实现,这有点冗长,但主要是方便的样板。
好处是我们编写了实现,因此很容易根据特定需求对其进行自定义,例如区分左字和书写字边界,使用任何一组分隔符,或处理其他情况,例如非边界或错误位置。
using namespace std;
#include <iostream>
#include <string>
#include <cctype>
typedef enum boundary_type_e {
E_BOUNDARY_TYPE_ERROR = -1,
E_BOUNDARY_TYPE_NONE,
E_BOUNDARY_TYPE_LEFT,
E_BOUNDARY_TYPE_RIGHT,
} boundary_type_t;
typedef struct boundary_s {
boundary_type_t type;
int pos;
} boundary_t;
bool is_delim_char(int c) {
return isspace(c); // also compare against any other chars you want to use as delimiters
}
bool is_word_char(int c) {
return ' ' <= c && c <= '~' && !is_delim_char(c);
}
boundary_t maybe_word_boundary(string str, int pos) {
int len = str.length();
if (pos < 0 || pos >= len) {
return (boundary_t){.type = E_BOUNDARY_TYPE_ERROR};
} else {
if (pos == 0 && is_word_char(str[pos])) {
// if the first character is word-y, we have a left boundary at the beginning
return (boundary_t){.type = E_BOUNDARY_TYPE_LEFT, .pos = pos};
} else if (pos == len - 1 && is_word_char(str[pos])) {
// if the last character is word-y, we have a right boundary left of the null terminator
return (boundary_t){.type = E_BOUNDARY_TYPE_RIGHT, .pos = pos + 1};
} else if (!is_word_char(str[pos]) && is_word_char(str[pos + 1])) {
// if we have a delimiter followed by a word char, we have a left boundary left of the word char
return (boundary_t){.type = E_BOUNDARY_TYPE_LEFT, .pos = pos + 1};
} else if (is_word_char(str[pos]) && !is_word_char(str[pos + 1])) {
// if we have a word char followed by a delimiter, we have a right boundary right of the word char
return (boundary_t){.type = E_BOUNDARY_TYPE_RIGHT, .pos = pos + 1};
}
return (boundary_t){.type = E_BOUNDARY_TYPE_NONE};
}
}
int main() {
string str;
getline(cin, str);
int len = str.length();
for (int i = 0; i < len; i++) {
boundary_t boundary = maybe_word_boundary(str, i);
if (boundary.type == E_BOUNDARY_TYPE_LEFT) {
// whatever
} else if (boundary.type == E_BOUNDARY_TYPE_RIGHT) {
// whatever
}
}
}
正如你所看到的,代码非常容易理解和微调,代码的实际使用非常简短和简单。使用 C++ 不应该阻止我们编写最简单、最容易定制的代码,即使这意味着不使用 STL。我认为这是 Linus Torvalds 可能称之为“品味”的一个例子,因为我们已经消除了所有我们不需要的逻辑,同时以一种自然而然的风格写作,当需要处理它们时,可以处理更多案例。
可以改进此代码的可能是使用 ,接受指向 in 的函数指针而不是直接调用,并传递 lambda。enum class
is_word_char
maybe_word_boundary
is_word_char
没有任何内存分配的 C++17 版本(除了std::function
)
void iter_words(const std::string_view& input, const std::function<void(std::string_view)>& process_word) {
auto itr = input.begin();
auto consume_whitespace = [&]() {
for(; itr != input.end(); ++itr) {
if(!isspace(*itr))
return;
}
};
auto consume_letters = [&]() {
for(; itr != input.end(); ++itr) {
if(isspace(*itr))
return;
}
};
while(true) {
consume_whitespace();
if(itr == input.end())
return;
auto word_start = itr - input.begin();
consume_letters();
auto word_end = itr - input.begin();
process_word(input.substr(word_start, word_end - word_start));
}
}
int main() {
iter_words("foo bar", [](std::string_view sv) {
std::cout << "Got word: " << sv << '\n';
});
return 0;
}
评论
C++20 终于为我们带来了一个函数。或者更确切地说,是一个范围适配器。Godbolt 链接。split
#include <iostream>
#include <ranges>
#include <string_view>
namespace ranges = std::ranges;
namespace views = std::views;
using str = std::string_view;
auto view =
"Multiple words"
| views::split(' ')
| views::transform([](auto &&r) -> str {
return str(r.begin(), r.end());
});
auto main() -> int {
for (str &&sv : view) {
std::cout << sv << '\n';
}
}
评论
每个人都回答了预定义的字符串输入。 我认为这个答案将帮助某人进行扫描输入。
我使用令牌向量来保存字符串令牌。这是可选的。
#include <bits/stdc++.h>
using namespace std ;
int main()
{
string str, token ;
getline(cin, str) ; // get the string as input
istringstream ss(str); // insert the string into tokenizer
vector<string> tokens; // vector tokens holds the tokens
while (ss >> token) tokens.push_back(token); // splits the tokens
for(auto x : tokens) cout << x << endl ; // prints the tokens
return 0;
}
示例输入:
port city international university
示例输出:
port
city
international
university
请注意,默认情况下,这仅适用于空格作为分隔符。 您可以使用自定义分隔符。为此,您已经自定义了代码。设分隔符为“,”。所以使用
char delimiter = ',' ;
while(getline(ss, token, delimiter)) tokens.push_back(token) ;
而不是
while (ss >> token) tokens.push_back(token);
尽管有一些答案提供了 C++20 解决方案,但自从发布以来,进行了一些更改并将其作为缺陷报告应用于 C++20。正因为如此,解决方案更短、更好:
#include <iostream>
#include <ranges>
#include <string_view>
namespace views = std::views;
using str = std::string_view;
constexpr str text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit.";
auto splitByWords(str input) {
return input
| views::split(' ')
| views::transform([](auto &&r) -> str {
return {r.begin(), r.end()};
});
}
auto main() -> int {
for (str &&word : splitByWords(text)) {
std::cout << word << '\n';
}
}
截至今天,它仍然仅在 GCC 的主干分支(Godbolt 链接)上可用。它基于两个更改:P1391 迭代器构造函数和 P2210 DR 修复以保留范围类型。std::string_view
std::views::split
在 C++23 中,不需要任何样板,因为 P1989 向 std::string_view 添加了一个范围构造函数:transform
#include <iostream>
#include <ranges>
#include <string_view>
namespace views = std::views;
constexpr std::string_view text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit.";
auto main() -> int {
for (std::string_view&& word : text | views::split(' ')) {
std::cout << word << '\n';
}
}
有一种更简单的方法可以做到这一点!
#include <vector>
#include <string>
std::vector<std::string> splitby(std::string string, char splitter) {
int splits = 0;
std::vector<std::string> result = {};
std::string locresult = "";
for (unsigned int i = 0; i < string.size(); i++) {
if ((char)string.at(i) != splitter) {
locresult += string.at(i);
}
else {
result.push_back(locresult);
locresult = "";
}
}
if (splits == 0) {
result.push_back(locresult);
}
return result;
}
void printvector(std::vector<std::string> v) {
std::cout << '{';
for (unsigned int i = 0; i < v.size(); i++) {
if (i < v.size() - 1) {
std::cout << '"' << v.at(i) << "\",";
}
else {
std::cout << '"' << v.at(i) << "\"";
}
}
std::cout << "}\n";
}
最小解决方案是一个函数,该函数将 a 和一组分隔符字符(作为 a)作为输入,并返回 的 。std::string
std::string
std::vector
std::strings
#include <string>
#include <vector>
std::vector<std::string>
tokenize(const std::string& str, const std::string& delimiters)
{
using ssize_t = std::string::size_type;
const ssize_t str_ln = str.length();
ssize_t last_pos = 0;
// container for the extracted tokens
std::vector<std::string> tokens;
while (last_pos < str_ln) {
// find the position of the next delimiter
ssize_t pos = str.find_first_of(delimiters, last_pos);
// if no delimiters found, set the position to the length of string
if (pos == std::string::npos)
pos = str_ln;
// if the substring is nonempty, store it in the container
if (pos != last_pos)
tokens.emplace_back(str.substr(last_pos, pos - last_pos));
// scan past the previous substring
last_pos = pos + 1;
}
return tokens;
}
使用示例:
#include <iostream>
int main()
{
std::string input_str = "one + two * (three - four)!!---! ";
const char* delimiters = "! +- (*)";
std::vector<std::string> tokens = tokenize(input_str, delimiters);
std::cout << "input = '" << input_str << "'\n"
<< "delimiters = '" << delimiters << "'\n"
<< "nr of tokens found = " << tokens.size() << std::endl;
for (const std::string& tk : tokens) {
std::cout << "token = '" << tk << "'\n";
}
return 0;
}
我不敢相信这些答案中的大多数都过于复杂。为什么没有人提出这么简单的东西?
#include <iostream>
#include <sstream>
std::string input = "This is a sentence to read";
std::istringstream ss(input);
std::string token;
while(std::getline(ss, token, ' ')) {
std::cout << token << endl;
}
还有一种方式——延续传递样式、零分配、基于函数的定界。
void split( auto&& data, auto&& splitter, auto&& operation ) {
using std::begin; using std::end;
auto prev = begin(data);
while (prev != end(data) ) {
auto&&[prev,next] = splitter( prev, end(data) );
operation(prev,next);
prev = next;
}
}
现在我们可以基于此编写特定的拆分函数。
auto anyOfSplitter(auto delimiters) {
return [delimiters](auto begin, auto end) {
while( begin != end && 0 == std::string_view(begin, end).find_first_of(delimiters) ) {
++begin;
}
auto view = std::string_view(begin, end);
auto next = view.find_first_of(delimiters);
if (next != view.npos)
return std::make_pair( begin, begin + next );
else
return std::make_pair( begin, end );
};
}
我们现在可以生成一个传统的 std 字符串拆分,如下所示:
template<class C>
auto traditional_any_of_split( std::string_view<C> str, std::string_view<C> delim ) {
std::vector<std::basic_string<C>> retval;
split( str, anyOfSplitter(delim), [&](auto s, auto f) {
retval.emplace_back(s,f);
});
return retval;
}
或者我们可以用 find 代替
auto findSplitter(auto delimiter) {
return [delimiter](auto begin, auto end) {
while( begin != end && 0 == std::string_view(begin, end).find(delimiter) ) {
begin += delimiter.size();
}
auto view = std::string_view(begin, end);
auto next = view.find(delimiter);
if (next != view.npos)
return std::make_pair( begin, begin + next );
else
return std::make_pair( begin, end );
};
}
template<class C>
auto traditional_find_split( std::string_view<C> str, std::string_view<C> delim ) {
std::vector<std::basic_string<C>> retval;
split( str, findSplitter(delim), [&](auto s, auto f) {
retval.emplace_back(s,f);
});
return retval;
}
通过更换分流器部分。
这两者都分配了返回值的缓冲区。我们可以将返回值交换到字符串视图,但代价是手动管理生存期。
我们还可以采用一个延续,一次传递一个字符串视图,避免甚至分配视图向量。
这可以通过 abort 选项进行扩展,这样我们就可以在读取一些前缀字符串后中止。
一些 C++20 编译器和大多数 C++23 编译器(和ranges
string_view
)
for (auto word : std::views::split("Somewhere down the road", ' '))
std::cout << std::string_view{ word.begin(), word.end() } << std::endl;
上一个:显式关键字是什么意思?
下一个:如何设置、清除和切换单个位
评论
while (iss) { string subs; iss >> subs; cout << "Substring: " << sub << endl; }
string sub; while (iss >> sub) cout << "Substring: " << sub << '\n';