为什么会导致分段错误?(用 C 语言编写的程序,用于计算文件中某个单词的出现次数)

Why does it result in a segmentation fault? (A program in C that counts occurrences of a word in the file)

提问人:SUPERoya 提问时间:10/29/2023 最后编辑:chqrlieSUPERoya 更新时间:10/29/2023 访问量:56

问:

countSH根据行的第一个字符计算单词在文件中出现的次数。

我还有一个获取用户输入的函数。

我分别编写了它们,当我单独测试这些功能时,它们都工作正常。

#include <stdio.h>
#include <string.h>

int *countSH(char *filename, char *SentWord) {
    //opening file and checking whether it opens
    FILE *file = fopen(filename, "r");

    if (file == NULL) {
        perror("error while opening file");
        return NULL;
    }

    char line[500]; //line lenght = 500
    int hams = 0;
    int spams = 0;
    char *token = NULL;
    const char *word = SentWord;
    char *lineDupe;

    while(1) {
        if (feof(file)) { 
            break; //breaking the loop when getting to the end of the file
        } 

        if (fgets(line, sizeof(line), file) != NULL) { //if line != NULL
            lineDupe = strdup(line);
            token = strtok(lineDupe, " \t\n");
            while (token != NULL) {
                if (strcmp(token, word) == 0) {
                    if (line[0] == 'h') { //if line starts with h (ham)
                        hams++;
                    } else { //else it starts with s (spam)
                        spams = spams + 1;
                    }
                }
                token = strtok(NULL, " \t\n");
            }
        }
    }

    int countsh[2] = { hams, spams };

    if (hams == 0 && spams == 0) {
        printf("This word doesn’t occur in the text!");
    } else {
        printf("The word '%s' appears %d times in ham messages and %d times in spam messages.");
    }
    return countsh;
}

//this function gets the user input and saves the word for searching
char *getWord() {
    static char word[256];

    printf("Please enter a word to search for:\t");

    scanf("%255s", word);

    return word;
}

int main() {
    
    char *searchWord = getWord(); //assigning the searchword we got from the user to a var
    char *file = "preprocessed_dataset.txt";
    int *word_in_sh = countSH(file, searchWord);

    return 0;
}

唯一的警告是这个,我真的不明白问题出在哪里

try1_3.c: In function ‘countSH’:
try1_3.c:50:2: warning: function returns address of local variable [-Wreturn-local-addr]
  return countsh;
C 分段故障

评论

0赞 pmg 10/29/2023
countsh是一个“属于”的变量(它是一个局部变量)。countSH() 返回时,它不复存在。因此无法从调用函数访问。countSH()
0赞 Frankie_C 10/29/2023
如果您希望变量在函数结束后存在,请像您所做的那样声明。countshstaticword
0赞 Tom Karzes 10/29/2023
声明静态并返回其地址通常是不受欢迎的。这意味着每当调用该函数时,所有以前返回的结果都会被重新定义。最好让调用方传递结果数组的地址,或者在这种情况下,它可以传递两个指向两个结果变量的指针。countsh
0赞 Fe2O3 10/29/2023
OT:就目前而言,没有必要为此烧毁堆内存,更不用说内存像筛子一样泄漏,因为什么都没有被“d”。摆脱 和 只需使用带有 .并且,也关闭打开的指针。及早养成好习惯...free()lineDupestrdup()line[]strtok()FILE*

答:

0赞 chqrlie 10/29/2023 #1

该语句将数组的地址返回给调用方,但该数组被定义为具有自动存储的本地对象,因此一旦函数返回,它就会失效。这是一个真正的问题,因为调用方无法可靠地从此数组中读取值。return countsh;countsh

您可以解决此问题:

  • 通过提供目标数组作为参数(首选解决方案)。
  • 或者通过将数组定义为一个对象,但这将是一个快速而肮脏的修复,并带有其他副作用,例如,如果您多次调用,中间结果将被下一次调用覆盖。staticcountSH

请注意以下进一步备注:

  • 您为读取的每一行分配一个副本,但没有释放这些时钟,从而导致系统性内存泄漏。您实际上不需要为此任务分配内存,只需在循环之前保存火腿/垃圾邮件状态即可。strdup(line)strtok

  • 解析循环的逻辑很繁琐,应该删除测试,循环应该写成feof()

      while (fgets(line, sizeof(line), file)) {
          /* parse the line */
      }
    
  • 第二个语句需要搜索词和计数的参数。printf

  • 打开的文件未关闭。countSH

这是修改后的版本:

#include <errno.h>
#include <stdio.h>
#include <string.h>

// count words and update counts in the destination array
// return 1 for success, 0 in case of error
int countSH(const char *filename, const char *word, int *countsh) {
    //opening file and checking whether it opens
    FILE *file = fopen(filename, "r");
    if (file == NULL) {
        fprintf(stderr, "error while opening file %s: %s\n",
                filename, strerror(errno));
        return 0;
    }

    char line[500]; //line length is 498
    int hams = 0;
    int spams = 0;

    while (fgets(line, sizeof line, file)) {
        char c0 = line[0];
        char *token = strtok(line, " \t\n");
        while (token != NULL) {
            if (strcmp(token, word) == 0) {
                if (c0 == 'h') { // if line starts with h (ham)
                    hams++;
                } else {         // else it starts with s (spam)
                    spams++;
                }
            }
            token = strtok(NULL, " \t\n");
        }
    }
    fclose(file);
    countsh[0] = hams;
    countsh[1] = spams;
    return 1;
}

// get the user input and save the word for searching
int getWord(char *word, size_t size) {
    char format[20];
    snprintf(format, sizeof format, "%%%ds", (int)size);
    printf("Please enter a word to search for: ");
    return scanf(format, word) == 1;
}

int main(void) {
    char searchWord[256];
    
    if (!getWord(searchWord, sizeof searchWord))
        return 1;

    cosnt char *filename = "preprocessed_dataset.txt";
    int word_in_sh[2];
    if (!countSH(filename, searchWord, word_in_sh))
        return 1;

    if (word_in_sh[0] == 0 && word_in_sh[1] == 0) {
        printf("This word doesn’t occur in the text!\n");
    } else {
        printf("The word '%s' appears %d times in ham messages and %d times in spam messages.\n",
               searchWord, word_in_sh[0], word_in_sh[1]);
    }
    return 0;
}

评论

1赞 SUPERoya 10/29/2023
该死的,非常感谢你,伙计,我花了 2 天时间,我仍然不明白问题出在哪里,非常感谢