提问人:Dez Miller 提问时间:10/15/2023 最后编辑:Dez Miller 更新时间:10/17/2023 访问量:44
LDA 主题建模生成相同/空主题
LDA Topic Modeling Producing Identical/Empty Topics
问:
我正在对两个大型文本文档(大约 500-750 KB)进行主题建模,并要求十个主题。我一直在重复两个话题。这可能是文件数量少的问题吗?或者我应该更改 alpha/beta 参数?
以下是模型部分的代码:
`lda_model = gensim.models.ldamodel.LdaModel(corpus=corpus,
id2word=id2word,
num_topics=10,
random_state=100,
update_every=1,
chunksize=2,
passes=10,
alpha='auto',
per_word_topics=True)`
以下是主题:
[(0,
'0.005*"city" + 0.004*"police" + 0.003*"people" + 0.003*"thank" + '
'0.003*"know" + 0.003*"want" + 0.002*"go" + 0.002*"say" + 0.002*"time" + '
'0.002*"cop"'),
(1,
'0.001*"people" + 0.001*"cop" + 0.001*"city" + 0.001*"want" + 0.001*"go" + '
'0.001*"police" + 0.001*"thank" + 0.001*"time" + 0.001*"know" + 0.001*"say"'),
(2,
'0.001*"people" + 0.001*"police" + 0.001*"city" + 0.001*"thank" + '
'0.001*"want" + 0.001*"cop" + 0.001*"go" + 0.001*"know" + 0.001*"say" + '
'0.001*"make"'),
(3,
'0.002*"city" + 0.002*"people" + 0.001*"know" + 0.001*"want" + '
'0.001*"police" + 0.001*"go" + 0.001*"say" + 0.001*"vote" + 0.001*"time" + '
'0.001*"cop"'),
(4,
'0.001*"city" + 0.001*"police" + 0.001*"cop" + 0.001*"people" + 0.001*"go" + '
'0.001*"thank" + 0.001*"want" + 0.001*"vote" + 0.001*"make" + 0.001*"time"'),
(5,
'0.020*"city" + 0.014*"people" + 0.013*"police" + 0.011*"cop" + 0.010*"go" + '
'0.010*"thank" + 0.009*"want" + 0.009*"know" + 0.008*"say" + 0.006*"time"'),
(6,
'0.001*"city" + 0.001*"go" + 0.001*"know" + 0.001*"people" + 0.001*"police" '
'+ 0.001*"cop" + 0.001*"want" + 0.001*"vote" + 0.000*"say" + 0.000*"time"'),
(7,
'0.002*"city" + 0.001*"people" + 0.001*"police" + 0.001*"thank" + 0.001*"go" '
'+ 0.001*"want" + 0.001*"know" + 0.001*"cop" + 0.001*"vote" + 0.001*"say"'),
(8,
'0.003*"city" + 0.003*"people" + 0.003*"police" + 0.002*"thank" + 0.002*"go" '
'+ 0.002*"know" + 0.002*"vote" + 0.002*"want" + 0.002*"say" + 0.002*"time"'),
(9,
'0.017*"people" + 0.014*"city" + 0.012*"police" + 0.010*"go" + 0.010*"thank" '
'+ 0.010*"want" + 0.009*"know" + 0.009*"say" + 0.009*"vote" + 0.008*"time"')]
可视化:
`# Visualize the topics
pyLDAvis.enable_notebook()
vis = pyLDAvis.gensim.prepare(lda_model, corpus, id2word)
vis
`
我尝试更改了一些参数,但没有看到结果。很难找到 alpha 和 beta 参数的正常范围。
答: 暂无答案
评论