C 编程 - 计算字符串中字符的频率(我的代码中的问题)

C Programming - Counting Frequencies Of Characters In A String (Problem In My Code)

提问人:Thriller 提问时间:9/13/2023 最后编辑:Thriller 更新时间:9/13/2023 访问量:91

问:

我正在尝试在 C 编程中做这个练习:“编写一个程序,将字符串作为输入并计算每个字符的频率。

我写了这个方法:

void printCharactersFrequenciesOf(char s[]){
    size_t stringlength = strlen(s); // length of string
    char chars[stringlength]; // variable for the different characters in the string
    int charsFrequencies[stringlength], charAlreadyExists, differentCharsNumber = 0; // variables for the frequency of each character in the string, a flag to know whether the character already exists in the characters array, and for the different characters number in the string
    // putting the different characters of the string in the different characters array
    for (int i = 0; i < stringlength; i++){
        charAlreadyExists = 0;
        for (int j = 0; j < i; j++){
            if (s[i] == chars[j]){
                charAlreadyExists = 1;
                j = i; // break loop
            }
        }
        if (charAlreadyExists == 0){
            chars[differentCharsNumber] = s[i];
            differentCharsNumber++;
        }
    }
    chars[differentCharsNumber] = charsFrequencies[differentCharsNumber] = '\0'; // terminating the different characters array and the characters frequencies array with a null terminator if they're shorter than the length of the string
    int charCount; // a counter variable for the number of appearance of each existing character
    // getting character frequencies into the character frequencies array
    for (int i = 0; i < differentCharsNumber; i++){
        charCount = 0;
        for (int j = 0; j < stringlength; j++){
            if (chars[i] == s[j]){
                charCount++;
            }
        }
        charsFrequencies[i] += charCount;
    }
    // printing the frequencies of the different characters
    for (int i = 0; i < differentCharsNumber; i++){
        printf("Frequency of '%c': %d\n", chars[i], charsFrequencies[i]);
    }
}

在此方法中,我首先将源字符串中的所有不同字符放入数组中。然后我浏览不同的字符数组,并针对每个字符检查字符串并尝试找到字符频率。

但不幸的是,这段代码似乎不起作用。它确实获取并打印字符串的不同字符,但频率变得疯狂。

例如,对于字符串“Temme”,我得到:

Frequency of 'T': -1920988639

Frequency of 'e': -23

Frequency of 'm': -606004806

当我期望得到:

Frequency of 'T': 1

Frequency of 'e': 2

Frequency of 'm': 2

但是,对于字符串“bb”,我得到:

Frequency of 'b': 2

不出所料。

我想知道我做错了什么,即使这个解决方案并不理想。

提前致谢。

数组 c 字符串 char 频率

评论

0赞 Shawn 9/13/2023
字符串的长度不是要用作频率数组长度的值。
0赞 Thriller 9/13/2023
为什么?最大字符数(以及最大频率数)是字符串的长度
5赞 infinitezero 9/13/2023
你应该做一个直方图。有 127 或 255 个可能的字符。只需创建一个整数数组并增加数组中的相应位置即可。

答:

1赞 0___________ 9/13/2023 #1

你把一个非常简单的函数过于复杂(并且由于算法不是很合乎逻辑,很难阅读你的代码)。简单:有一个足够长的数组来容纳所有字符的计数。

在此示例代码中,我计算从 32 到 127 的字符数。您可以将其更改为包含(例如)控制字符

#define MAX_ASCII 127
#define MIN_ASCII 32

size_t count(const char *str, size_t *arr)
{
    size_t len = 0;
    if(str && arr)
    {
        memset(arr, 0, (MAX_ASCII - MIN_ASCII + 1) * sizeof(*arr));
        while(*str)
        {
            if(*str >= MIN_ASCII && (unsigned char)*str <= MAX_ASCII)
            {
                arr[*str - MIN_ASCII] += 1;
            }
            str++;
            len++;
        }
    }
    return len;
}


int main(void)
{
    size_t freq[MAX_ASCII - MIN_ASCII + 1];
    char *str = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.";
    size_t len = count(str, freq);
    
    printf("Total length of the string is: %zu\n", len);
    for(int i = 0; i <= MAX_ASCII - MIN_ASCII; i++)
    {
        if(freq[i])
            printf("Char %03d ('%c') was found % 4zu times (% 6.2f%%)\n", i + MIN_ASCII,
                i + MIN_ASCII, freq[i], (100.0 * freq[i]) / len);
    }
}

https://godbolt.org/z/53corfj19

结果:

Total length of the string is: 574
Char 032 (' ') was found   90 times ( 15.68%)
Char 039 (''') was found    1 times (  0.17%)
Char 044 (',') was found    4 times (  0.70%)
Char 046 ('.') was found    4 times (  0.70%)
Char 048 ('0') was found    3 times (  0.52%)
Char 049 ('1') was found    2 times (  0.35%)
Char 053 ('5') was found    1 times (  0.17%)
Char 054 ('6') was found    1 times (  0.17%)
Char 057 ('9') was found    1 times (  0.17%)
Char 065 ('A') was found    1 times (  0.17%)
Char 073 ('I') was found    6 times (  1.05%)
Char 076 ('L') was found    5 times (  0.87%)
Char 077 ('M') was found    1 times (  0.17%)
Char 080 ('P') was found    1 times (  0.17%)
Char 097 ('a') was found   28 times (  4.88%)
Char 098 ('b') was found    5 times (  0.87%)
Char 099 ('c') was found   10 times (  1.74%)
Char 100 ('d') was found   16 times (  2.79%)
Char 101 ('e') was found   59 times ( 10.28%)
Char 102 ('f') was found    6 times (  1.05%)
Char 103 ('g') was found   11 times (  1.92%)
Char 104 ('h') was found   14 times (  2.44%)
Char 105 ('i') was found   32 times (  5.57%)
Char 107 ('k') was found    7 times (  1.22%)
Char 108 ('l') was found   17 times (  2.96%)
Char 109 ('m') was found   18 times (  3.14%)
Char 110 ('n') was found   38 times (  6.62%)
Char 111 ('o') was found   25 times (  4.36%)
Char 112 ('p') was found   18 times (  3.14%)
Char 114 ('r') was found   24 times (  4.18%)
Char 115 ('s') was found   39 times (  6.79%)
Char 116 ('t') was found   43 times (  7.49%)
Char 117 ('u') was found   17 times (  2.96%)
Char 118 ('v') was found    5 times (  0.87%)
Char 119 ('w') was found    6 times (  1.05%)
Char 120 ('x') was found    2 times (  0.35%)
Char 121 ('y') was found   13 times (  2.26%)

评论

0赞 0___________ 9/15/2023
@Thriller 如果它回答了您的问题,请接受答案