从文本文件中删除停用词/连接词

Remove Stop Words/ Connecting Words From Text File

提问人:Katie Cook 提问时间:5/15/2023 最后编辑:falseKatie Cook 更新时间:5/16/2023 访问量:56

问:

我正在开发一个程序,该程序读取文本文件并按升序显示前 10 个最常用的单词并打印它们。我已经定义了停用词/连接词并编写了代码以将它们从常用词分析中删除,但是,停用词仍然包含在分析中。

% Prints the words
print_top_words(File, N):-
    read_file_to_string(File, String, [encoding(utf8)]),
    re_split("\\w+", String, Words),
    lower_case(Words, Lower),
    sort(1, @=<, Lower, Sorted),
    exclude(word_to_ignore, Sorted, RelevantWords),
    merge_words(RelevantWords, Counted),
    sort(2, @>, Counted, Top_words),
    writef("Top %w words:\nRank\tCount\tWord\n", [N]),
    print_top_words(Top_words, N, 1).

% Predicate to filter out words that are to be ignored.
word_to_ignore(Word) :-
    ignore_words(IgnoreWords),
    member(Word, IgnoreWords).

% Defines the words to ignore.
ignore_words(['', 'a', 'an', 'the', 'for', 'of', 'and', 'to', 'in', 'is', 'it', 'on', 'that', 'with', 'this', 'you', 'be', 'are', 'at', 'or', 'as', 'if', 'not', 'from']).

lower_case([_], []):-!.
lower_case([_, Word|Words], [Lower - 1|Rest]):-
    string_lower(Word, Lower),
    lower_case(Words, Rest).

merge_words([], []):-!.
merge_words([Word - C1, Word - C2|Words], Result):-
    !,
    C is C1 + C2,
    merge_words([Word - C|Words], Result).
merge_words([W|Words], [W|Rest]):-
    merge_words(Words, Rest).

print_top_words([], _, _):-!.
print_top_words(_, 0, _):-!.
print_top_words([Word - Count|Rest], N, R):-
    writef("%w\t%w\t%w\n", [R, Count, Word]),
    N1 is N - 1,
    R1 is R + 1,
    print_top_words(Rest, N1, R1).

main:-
    print_top_words("SuspiciousEmail.txt", 10).
prolog swi-prolog 停用词

评论

0赞 TessellatingHeckler 5/15/2023
ignore_words(['', 'a', 'an', ...使用单引号时,这些不是字符串,而是带引号的原子。成员不会找到,但它会找到,所以尝试将忽略的单词更改为全部使用双引号。member('an', ["a", "an"])member("an", ["a", "an"])
0赞 Katie Cook 5/15/2023
您好,感谢您的回复。所以我把它更新为有双引号,但是,不幸的是,要忽略的单词仍然显示在频率字数内。
0赞 TessellatingHeckler 5/15/2023
好的,它输出与数字配对的单词(并遗漏了一些单词),所以它结果为一个项目,它也被写成,并且该对不会在没有数字的排除单词列表中找到。您需要在添加计数数字之前执行排除排除,或者以其他方式执行。lower_case(["Cat","Dog","Cow"], Lower)"Dog"-1-("Dog", 1)
0赞 Katie Cook 5/16/2023
非常感谢你,请允许我对此提供一些指导,因为我对prolog很陌生。

答: 暂无答案