提问人:Szabolcs 提问时间:11/5/2023 最后编辑:John GordonSzabolcs 更新时间:11/7/2023 访问量:75
在 c 中从 txt 文件中读取具有不同数据类型的行
Reading lines with different data types from txt file in c
问:
如果我想将数据存储在不同的变量中,如何从 txt 文件中读取多行?每行包含相同的数据类型顺序:int string string char string,用制表符分隔。
例如,txt 文件中的一行如下所示:
11 \t I would like an apple \t What is your favourite car brand? \t b \t elephant
提前感谢您的帮助。
我尝试用fscanf(“%d\t%s\t%s\t%c\t%s\n”,..);但我无法读取字符串,因为 %s 在第一个空格处剪切了我的句子,它只读取第一行,我无法移动到下一行。
答:
这是一个文件,被写入以使用这种类型的数据,即表格数据的大型文件。而且真的很擅长。但是--- and also and ---跳过了空格。空格包括制表符、空格和换行符。csv
scanf
scanf
scanf
sscanf
fscanf
因此,使用制表符作为分隔符,您的文件是有问题的。另一个问题是,许多编辑器将制表符转换为制表位列的空格,因此制表符可能根本不会被记录。制表符作为符号在文档编辑器(如 Microsoft 的 Word)上更为常见,可以为每个制表符、换行符、段落标记等打印一个符号。在 Unix/Linux/Mac 上,您可以使用 .vi
set list
TAB
^I
例
我将展示一个使用 2 个替代方案的示例:[1] use 和 [2] 解析代码中的行。sscanf
像往常一样,使用封装和指针更容易,并且将每条记录作为:某种对象,因此我在这里将使用这种方式。
以下是示例中用于每条记录的定义:
typedef struct
{
int f_int;
char f_string_1[80];
char f_string_2[80];
char f_char;
char f_string_3[80];
} Record;
在实践中,这更好:
typedef struct
{
int f_int;
char* f_string_1;
char* f_string_2;
char f_char;
char* f_string_3;
} P_Record;
将一个转换为另一个是微不足道的,并且在代码中包含一些函数可以做到这一点。在记录中使用指针的主要原因是仅使用所需的 RAM 量,而不是每组字符串的 240 字节。
示例中使用的文件
11\tI would like an apple\tWhat is your favourite car brand?\tb\telephant
-11\tI would like an apple\tWhat is your favourite car brand?\tb\telephant
0\t \t \t \tStack Overflow
这几乎与原始示例中一样,但我删除了多余的空格。末尾的空白字段至少有一个空格,用于测试消耗。对于本地解析器来说,这没有区别。scanf
\t
当然,在使用中会用分隔符代替。
代码中使用的函数
Record* so_free(Record*);
char so_get_delim(const char*, const char);
Record* so_parse(const char*, const size_t, const char);
Record* so_parse_sc(const char*, const size_t, const char);
int so_show(Record*, const char*);
int so_show_parms(const char* f_name, const char delim);
// conversion helpers
P_Record* so_free_pack(P_Record*);
P_Record* so_pack(Record*);
int so_show_pack(P_Record*, const char*);
Record* so_unpack(P_Record*);
这些是显而易见的,但是:
so_parse
通过解析该行获取一行并返回包含提取字段的 A。Record
so_parse_sc
做同样的事情,但使用sscanf
main
用于测试
int main(int argc, char** argv)
{
const char* df_file = "input.txt";
const char df_delim = ',';
char line[1024] = {0};
if (argc > 1)
strcpy(line, argv[1]);
else
strcpy(line, df_file);
char delim = df_delim;
if (argc > 2) delim = so_get_delim(argv[2], df_delim);
so_show_parms(line, delim);
FILE* in = fopen(line, "r");
if (in == NULL) return -1;
char* p = NULL;
size_t n_line = 0;
char r_msg[40];
while (NULL != (p = fgets(line, sizeof(line) - 1, in)))
{
// fgets returns the '\n' where possible
if (line[strlen(line) - 1] == '\n')
line[strlen(line) - 1] = 0;
n_line += 1;
// local parser
sprintf(r_msg, "\nRecord %llu\n", n_line);
Record* one = so_parse(line, 1023, delim);
if (one == NULL)
{
fprintf(stderr, "Ignored: %s", r_msg);
continue;
}
so_show(one, r_msg);
one = so_free(one);
// using sscanf
sprintf(
r_msg, "\n[using sscanf]\nRecord %llu\n",
n_line);
one = so_parse_sc(line, 1023, delim);
if (one == NULL)
{
fprintf(stderr, "Ignored: %s", r_msg);
continue;
}
so_show(one, r_msg);
one = so_free(one);
}; // while
fclose(in);
return 0;
}
需要两个参数:文件名和分隔符。默认值为“input.txt”和分隔符的逗号。分隔符可以输入为“;”或分号,“\t”表示制表符,或十进制值输入,如,
;
\nnn
\064
@
正如预期的那样,当 TAB 是分隔符时,可以正常但无法解析某些行。so_parse
so_parse_sc
使用逗号作为分隔符的输出
C: SO> p input.txt ","
file is "input.txt", delimiter is ',' = 0x2C
Record 1
int: 11
string 1: "I would like an apple"
string 2: "What is your favourite car brand?"
char: 'b'
string 3: "elephant"
[using sscanf]
Record 1
int: 11
string 1: "I would like an apple"
string 2: "What is your favourite car brand?"
char: 'b'
string 3: "elephant"
Record 2
int: -11
string 1: "I would like an apple"
string 2: "What is your favourite car brand?"
char: 'b'
string 3: "elephant"
[using sscanf]
Record 2
int: -11
string 1: "I would like an apple"
string 2: "What is your favourite car brand?"
char: 'b'
string 3: "elephant"
Record 3
int: 0
string 1: " "
string 2: " "
char: ' '
string 3: "Stack Overflow"
[using sscanf]
Record 3
int: 0
string 1: " "
string 2: " "
char: ' '
string 3: "Stack Overflow"
C: SO>
使用 TAB 作为分隔符的输出
C: SO> ..\x64\debug\soc23-1104-fread.exe input-tab.txt "\t"
file is "input-tab.txt", delimiter is 0x9
Record 1
int: 11
string 1: "I would like an apple"
string 2: "What is your favourite car brand?"
char: 'b'
string 3: "elephant"
[using sscanf]
Record 1
int: 11
string 1: "I would like an apple"
string 2: "What is your favourite car brand?"
char: 'b'
string 3: "elephant"
Record 2
int: -11
string 1: "I would like an apple"
string 2: "What is your favourite car brand?"
char: 'b'
string 3: "elephant"
[using sscanf]
Record 2
int: -11
string 1: "I would like an apple"
string 2: "What is your favourite car brand?"
char: 'b'
string 3: "elephant"
Record 3
int: 0
string 1: "."
string 2: "."
char: '.'
string 3: "Stack Overflow"
[using sscanf]
Record 3
int: 0
string 1: "."
string 2: "."
char: '.'
string 3: "Stack Overflow"
C: SO>
并且无法读取最后一条记录,因为分隔符也是空白字段的记录。scanf
那么为什么要使用scanf
此函数可以在单个调用中解析分隔符并转换字符串、 、 和 等类型。通常能够处理和转换任何有效文件。因为我们可以尝试任何像这样的在线验证器或阅读RFC4180。没有真正正式的定义,因为该格式比互联网和 W3C 早了一段时间。char
float
double
int
scanf
csv
valid
csv
这里使用的面具是
char mask[] =
"%dx%79[^x]x%79[^x]x%cx%79[^x\n]";
其中 是正在使用的分隔符。它可以解析字符串、和值。在生产代码中:x
char
int
- 它可以更精确地构建,而不是用于字节字段。:)
79
80
- 我们需要知道第一行是否有字段名称,以及是否需要它们---请参阅 RFC。这里第一行有正常数据。
- 我们需要知道字段是否被转义,以及分隔符(如果是)。例如,对于被编码的字段来说,这是很常见---请参阅 RFC。在这里,字段没有转义,因此它们内部不能有分隔符。
"
- 我们有 5 个说明符,用于 5 个字段,因此可以返回从 -1 到 5 的内容。
scanf
csv
是具有 N 个字段的 M 条记录的巨型 MxN 表,因此所有 M 行都必须在此处具有 N=4 个分隔符- 表示最多 79 个字符之间不分隔符。
x%79[^x]x
x
x
完整代码C
#define CRT_SECURE_NO_WARNINGS
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct
{
int f_int;
char f_string_1[80];
char f_string_2[80];
char f_char;
char f_string_3[80];
} Record;
typedef struct
{
int f_int;
char* f_string_1;
char* f_string_2;
char f_char;
char* f_string_3;
} P_Record;
Record* so_free(Record*);
char so_get_delim(const char*, const char);
Record* so_parse(const char*, const size_t, const char);
Record* so_parse_sc(const char*, const size_t, const char);
int so_show(Record*, const char*);
int so_show_parms(const char* f_name, const char delim);
// conversion helpers
P_Record* so_free_pack(P_Record*);
P_Record* so_pack(Record*);
int so_show_pack(P_Record*, const char*);
Record* so_unpack(P_Record*);
/// <summary>
/// defaults are "input.txt" for file name and
/// ',' comma for the delimiter
/// </summary>
/// <param name="argc"></param>
/// <param name="argv">
/// argv[1] is the file name
/// argv[2] is the delimiter. can be \nnn decimal or \t or
/// the delimiter itself
/// </param>
/// <returns></returns>
int main(int argc, char** argv)
{
const char* df_file = "input-tab.txt";
const char df_delim = '\t';
char line[1024] = {0};
if (argc > 1)
strcpy(line, argv[1]);
else
strcpy(line, df_file);
char delim = df_delim;
if (argc > 2) delim = so_get_delim(argv[2], df_delim);
so_show_parms(line, delim);
FILE* in = fopen(line, "r");
if (in == NULL) return -1;
char* p = NULL;
size_t n_line = 0;
char r_msg[40];
while (NULL != (p = fgets(line, sizeof(line) - 1, in)))
{
// fgets returns the '\n' where possible
if (line[strlen(line) - 1] == '\n')
line[strlen(line) - 1] = 0;
n_line += 1;
// local parser
sprintf(r_msg, "\nRecord %llu\n", n_line);
Record* one = so_parse(line, 1023, delim);
if (one == NULL)
{
fprintf(stderr, "Ignored: %s", r_msg);
continue;
}
so_show(one, r_msg);
one = so_free(one);
// using sscanf
sprintf(
r_msg, "\n[using sscanf]\nRecord %llu\n",
n_line);
one = so_parse_sc(line, 1023, delim);
if (one == NULL)
{
fprintf(stderr, "Ignored: %s", r_msg);
continue;
}
so_show(one, r_msg);
one = so_free(one);
}; // while
fclose(in);
return 0;
}
/// <summary>
/// free...
/// </summary>
/// <param name="one"></param>
/// <returns>returns NULL</returns>
Record* so_free(Record* one)
{
if (one == NULL) return NULL;
free(one);
return NULL;
}
/// <summary>
/// free a packed record
/// </summary>
/// <param name="one"></param>
/// <returns>returns NULL</returns>
P_Record* so_free_pack(P_Record* one)
{
if (one == NULL) return NULL;
free(one->f_string_1);
free(one->f_string_2);
free(one->f_string_3);
free(one);
return NULL;
}
/// <summary>
/// get argument from arg.
/// </summary>
/// <param name="arg"></param>can be a char or \t for a tab
/// or \nnn for a decimal value <returns>delimiter</returns>
char so_get_delim(const char* arg, const char df_delim)
{ // argument should be \t or \nnn decimal
char delim = df_delim;
if (arg[0] == '\\')
{
if (arg[1] == 't')
delim = '\t';
else
{
if (strlen(arg) > 3)
delim = (arg[1] - '0') * 100 +
(arg[2] - '0') * 10 +
(arg[3] - '0');
else
delim = df_delim;
}
}
else
delim = arg[0];
return delim;
}
/// <summary>
/// returns a new packed record from a record
/// </summary>
/// <param name="src"></param>
/// <returns></returns>
P_Record* so_pack(Record* src)
{
size_t len = 0;
if (src == NULL) return NULL;
P_Record* one = malloc(sizeof(P_Record*));
if (one == NULL) return NULL;
one->f_int = src->f_int; // field 1
len = strlen(src->f_string_1);
one->f_string_1 = malloc(1 + len);
if (one->f_string_1 == NULL)
{
free(one);
return NULL;
}
strcpy(one->f_string_1, src->f_string_1); // field 2
// now for the 2nd string
len = strlen(src->f_string_2);
one->f_string_2 = malloc(1 + len);
if (one->f_string_2 == NULL)
{
free(one->f_string_1);
free(one);
return NULL;
}
strcpy(one->f_string_2, src->f_string_2); // field 3
// now for the single char
one->f_char = src->f_char; // field 4;
// now for the last string
len = strlen(src->f_string_3);
one->f_string_3 = malloc(1 + len);
if (one->f_string_3 == NULL)
{
free(one->f_string_1);
free(one->f_string_2);
free(one);
return NULL;
}
strcpy(one->f_string_3, src->f_string_3); // field 5
return one;
}
/// <summary>
/// parse a line to get a Record
/// </summary>
/// <param name="line"></param>
/// <param name="limit"></param>
/// <param name="delim"></param>
/// <returns>pointer to a new Record</returns>
Record* so_parse(
const char* line, size_t limit, const char delim)
{
if (line == NULL) return NULL;
size_t len = strlen(line);
if (len > limit) return NULL;
const size_t n_tabs = 4; // 5 fields
size_t tabs[5] = {0};
const char* p = line;
// check line format
for (size_t i = 0; i < len; i += 1)
{
if (*p == delim)
{
tabs[0] += 1;
tabs[tabs[0]] = i;
}
p++;
}
if (tabs[0] != 4) return NULL;
// line has 5 fields:
// create record
// extract fields
Record* nr = malloc(sizeof(Record));
if (nr == NULL) return NULL;
// first field is int
nr->f_int = atoi(line);
char* begin = NULL;
char* end = NULL;
size_t fl = 0;
// now for the 1st string
begin = (char*)line + tabs[1];
end = (char*)line + tabs[2];
fl = end - begin;
*(nr->f_string_1 + fl - 1) = 0; // terminate string
memcpy(nr->f_string_1, begin + 1, fl - 1);
// now for the 2nd string
begin = (char*)line + tabs[2];
end = (char*)line + tabs[3];
fl = end - begin;
*(nr->f_string_2 + fl - 1) = 0; // terminate string
memcpy(nr->f_string_2, begin + 1, fl - 1);
// now for the single char
// format: <tab3><field><tab4>
nr->f_char =
*(line + tabs[3] + 1); // 1st char is blank
// now for the last string
begin = (char*)line + tabs[4];
end = (char*)line + len;
fl = end - begin;
*(nr->f_string_3 + fl - 1) = 0; // terminate string
memcpy(nr->f_string_3, begin + 1, fl - 1);
return nr;
}
/// <summary>
/// build a record from a line, using sscanf
/// </summary>
/// <param name="line"></param>
/// <param name="limit"></param>
/// <returns>pointer to Record</returns>
Record* so_parse_sc(
const char* line, size_t limit, const char delim)
{
if (line == NULL) return NULL;
// should use the size of the strings and not fix 79
// (“%d\t%s\t%s\t%c\t%s\n”,..)
char mask[] =
"%dx%79[^x]x%79[^x]x%cx%79[^x\n]";
// change mask for delimiter in use
for (int i = 0; mask[i] != '\n'; i += 1)
if (mask[i] == 'x') mask[i] = delim;
size_t len = strlen(line);
if (len > limit) return NULL;
Record lcl;
int res = sscanf(
line, mask, &lcl.f_int, lcl.f_string_1, lcl.f_string_2,
&lcl.f_char, lcl.f_string_3);
if (res != 5) return NULL;
Record* nr = malloc(sizeof(Record));
if (nr == NULL) return NULL;
*nr = lcl;
return nr;
}
/// <summary>
/// display Record contents
/// </summary>
/// <param name="one"></param>
/// <param name="msg"></param>
/// <returns>0 for success or -1</returns>
int so_show(Record* one, const char* msg)
{
if (one == NULL) return -1;
if (msg != NULL) printf("%s", msg);
printf("\t int: %d\n", one->f_int);
printf("\tstring 1: \"%s\"\n", one->f_string_1);
printf("\tstring 2: \"%s\"\n", one->f_string_2);
printf("\t char: '%c' \n", one->f_char);
printf("\tstring 3: \"%s\"\n", one->f_string_3);
return 0;
}
/// <summary>
/// display P_Record contents
/// </summary>
/// <param name="one"></param>
/// <param name="msg"></param>
/// <returns></returns>
int so_show_pack(P_Record* one, const char* msg)
{
if (one == NULL) return -1;
if (msg != NULL) printf("%s", msg);
printf("\t int: %d\n", one->f_int);
printf("\tstring 1: \"%s\"\n", one->f_string_1);
printf("\tstring 2: \"%s\"\n", one->f_string_2);
printf("\t char: '%c' \n", one->f_char);
printf("\tstring 3: \"%s\"\n", one->f_string_3);
return 0;
}
/// <summary>
/// show file name and delimiter in use
/// </summary>
/// <param name="f_name"></param>
/// <param name="delim"></param>
/// <returns>0</returns>
int so_show_parms(const char* f_name, const char delim)
{
if (f_name == NULL) return -1;
if (isprint(delim))
printf(
"\f file is \"%s\", delimiter is '%c' = "
"0x%X\n",
f_name, delim, delim);
else
printf(
"\f file is \"%s\", delimiter is 0x%x\n",
f_name, delim);
return 0;
}
/// <summary>
/// convert from P_Record to Record
/// </summary>
/// <param name="src"></param>
/// <returns>pointer</returns>
Record* so_unpack(P_Record* src)
{
size_t len = 0;
if (src == NULL) return NULL;
Record* one = malloc(sizeof(Record));
if (one == NULL) return NULL;
one->f_int = src->f_int;
if (sizeof(one->f_string_1) - strlen(src->f_string_1) <
1)
{
free(one);
return NULL;
}
strcpy(one->f_string_1, src->f_string_1); // field 2
// now for the 2nd string
if (sizeof(one->f_string_2) - strlen(src->f_string_2) <
1)
{
free(one);
return NULL;
}
strcpy(one->f_string_2, src->f_string_2); // field 3
// now for the single char
one->f_char = src->f_char; // field 4;
// now for the last string
if (sizeof(one->f_string_3) - strlen(src->f_string_3) <
1)
{
free(one);
return NULL;
}
strcpy(one->f_string_3, src->f_string_3); // field 3
return one;
}
// https://stackoverflow.com/questions/77423959/
// reading-lines-with-different-data-types-from-
// txt-file-in-c
评论
fgets()
strtok()
strtol()
strtok()
fscanf()
strsep()
fgets()
getline()
sscanf()
strtok()
strchr()
strpbrk()
sscanf()
int
strtol()
struct