TM TF-IDF 摘要最大值大于 1

TM TF-IDF Summary Max Value is Above 1

提问人:ningawater 提问时间:11/1/2023 最后编辑:Jon Springningawater 更新时间:11/1/2023 访问量:24

问:

我提前道歉,我是 R 的新手,并使用我学校的代码作为参考。我不知道为什么当我密切关注我给出的示例时,TF-IDF 值的最大值可能高于 1,因为我已经规范化了我的值。我不确定为什么会这样。感谢任何帮助,并告知是否需要更多信息。谢谢。

# Create Document-Term Matrix
dtm_bumble <- DocumentTermMatrix(bumble)

# Find the unique indexed numbers from each document
ui = unique(dtm_bumble$i)

# If dtm$i does not contain a particular row index p, then row p is empty
new_dtm_bumble = dtm_bumble[ui,]

# Create Document-Term Matrix with TF-IDF values
dtm_tfidf_bumble <- weightTfIdf(new_dtm_bumble, normalize=TRUE)

# Info on DTM
inspect(new_dtm_bumble)

<<DocumentTermMatrix (documents: 84146, terms: 23016)>>
Non-/sparse entries: 645486/1936058850
Sparsity           : 100%
Maximal term length: 277
Weighting          : term frequency (tf)
Sample             :
       Terms
Docs    date good match messag money pay peopl profil swipe time
  33615    0    1     2      0     3   0     0      0     3    0
  36782    0    0     0      1     1   0     0      0     0    1
  37333    0    0     0      0     2   0     1      0     0    0
  40474    1    2     1      0     1   2     0      0     0    1
  49551    1    0     1      0     2   1     0      2     0    2
  58630    3    0     3      0     2   2     0      0     3    0
  63130    1    0    12      0     1   1     0      3     4    8
  66277    2    2     0      0     1   0     1      0     0    1
  73764    0    1     3      1     0   0     2      2     1    2
  83079    0    0     1      0     0   0     0      0     0    0

# Retrieve statistical summary of TF-IDF
summary(dtm_tfidf_bumble$v)

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
 0.01849  0.30264  0.50189  0.86867  0.91498 16.36061 
R TF-IDF TM 系列

评论


答: 暂无答案